mardi 3 mars 2020

Benchmarking "sample" function in R

I was benchmarking the sample function in R and comparing it with igraph:sample_seq and ran into a strange result.

When I run something like:

library(microbenchmark)
library(igraph)
set.seed(1234)
N <- 55^4
M <- 500
(mbm <- microbenchmark(v1 = {sample(N,M)}, 
                       v2 = {igraph::sample_seq(1,N,M)}, times=50))

I get a result like this:

Unit: microseconds
 expr       min        lq        mean     median        uq       max neval
   v1 21551.475 22655.996 26966.22166 23748.2555 28340.974 47566.237    50
   v2    32.873    37.952    82.85238    81.7675    96.141   358.277    50

But when I run, for example,

set.seed(1234)
N <- 100^4
M <- 500
(mbm <- microbenchmark(v1 = {sample(N,M)}, 
                      v2 = {igraph::sample_seq(1,N,M)}, times=50))

I get a much faster result for sample:

Unit: microseconds
 expr    min     lq     mean  median     uq     max neval
   v1 52.165 55.636 64.70412 58.2395 78.636  88.120    50
   v2 39.174 43.504 62.09600 53.5715 73.253 176.419    50

It seems that when N is a power of 10 (or some other special number?), sample is much faster than other smaller N that are not powers of 10. Is this expected behavior or am I missing something?




Aucun commentaire:

Enregistrer un commentaire