random: Speed of random sampling from distribution in R

samedi 30 janvier 2021

Speed of random sampling from distribution in R

I was trying to investigate a probability distribution whose moments were the Catalan numbers, and came up with

qcatmo <- function(p, k=4){ (qbeta(p/2+1/2, 3/2, 3/2)*2 - 1)^2 * k } 
colMeans(outer(qcatmo(ppoints(10^6)), 0:10, "^"))
#      1     1     2     5    14    42   132   429  1430  4862 16796

which worked out nicely. But then I tried to generate random values from this distribution, and found three possible approaches (A using the quantile function I already knew worked applied to runif, B slightly more direct using the built-in rbeta function, and C using a form of rejection sampling with runif) with noticeably different speeds when used on large samples:

rcatmoA <- function(n, k=4){ qcatmo(runif(n), k) }
rcatmoB <- function(n, k=4){ (rbeta(n, 3/2, 3/2)*2 - 1)^2 * k }
rcatmoC <- function(n, k=4){
             n0 <- ceiling(n*4/pi + 7*sqrt(n) + 35)
             x0 <- runif(n0)^2 
             y0 <- runif(n0)^2 
             x0[x0 + y0 < 1][1:n] * k
             }

which when benchmarked gave

library(microbenchmark)
n <- 10^4
microbenchmark(
  rcatmoA(n,4),
  rcatmoB(n,4),
  rcatmoC(n,4)
  ) 
#Unit: microseconds
#          expr     min       lq      mean   median       uq     max neval cld
# rcatmoA(n, 4) 22817.2 23014.95 23259.688 23186.95 23322.80 25128.9   100   c
# rcatmoB(n, 4)  1526.5  1534.40  1615.255  1541.30  1607.15  4952.1   100  b 
# rcatmoC(n, 4)   781.5   788.70   884.339   795.00   813.80  7266.2   100 a

My questions are:

Why is the B version so much faster than the A version?
If the B version is faster because it avoids applying a function to runif data, why is the C version even faster?
Is there any general advice on how best to make random samples in this sort of situation?

random

samedi 30 janvier 2021

Speed of random sampling from distribution in R

Aucun commentaire:

Enregistrer un commentaire