I was trying to investigate a probability distribution whose moments were the Catalan numbers, and came up with
qcatmo <- function(p, k=4){ (qbeta(p/2+1/2, 3/2, 3/2)*2 - 1)^2 * k }
colMeans(outer(qcatmo(ppoints(10^6)), 0:10, "^"))
# 1 1 2 5 14 42 132 429 1430 4862 16796
which worked out nicely. But then I tried to generate random values from this distribution, and found three possible approaches (A using the quantile function I already knew worked applied to runif
, B slightly more direct using the built-in rbeta
function, and C using a form of rejection sampling with runif
) with noticeably different speeds when used on large samples:
rcatmoA <- function(n, k=4){ qcatmo(runif(n), k) }
rcatmoB <- function(n, k=4){ (rbeta(n, 3/2, 3/2)*2 - 1)^2 * k }
rcatmoC <- function(n, k=4){
n0 <- ceiling(n*4/pi + 7*sqrt(n) + 35)
x0 <- runif(n0)^2
y0 <- runif(n0)^2
x0[x0 + y0 < 1][1:n] * k
}
which when benchmarked gave
library(microbenchmark)
n <- 10^4
microbenchmark(
rcatmoA(n,4),
rcatmoB(n,4),
rcatmoC(n,4)
)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# rcatmoA(n, 4) 22817.2 23014.95 23259.688 23186.95 23322.80 25128.9 100 c
# rcatmoB(n, 4) 1526.5 1534.40 1615.255 1541.30 1607.15 4952.1 100 b
# rcatmoC(n, 4) 781.5 788.70 884.339 795.00 813.80 7266.2 100 a
My questions are:
- Why is the B version so much faster than the A version?
- If the B version is faster because it avoids applying a function to
runif
data, why is the C version even faster? - Is there any general advice on how best to make random samples in this sort of situation?
Aucun commentaire:
Enregistrer un commentaire