mercredi 24 juin 2015

r dplyr sample_frac using seed in data

I have a grouped data frame, in which the grouping variable is SEED. I want to take the groups defined by the values of SEED, set the seed to the value of SEED for each group, and then shuffle the rows of each group using dplyr::sample_frac. However, I cannot replicate my results, which indicates that the seed isn't being set correctly.

To do this in a dplyr-ish way, I wrote the following function:

> ss_sampleseed <- function(df, seed.){
>   set.seed(df$seed.)
>   sample_frac(df, 1)
> }

I then use this function on my data:

> dg <- structure(list(Gene = c("CAMK1", "ARPC4", "CIDEC", "CAMK1", "ARPC4", 
> "CIDEC"), GENESEED = c(1, 1, 1, 2, 2, 2)), class = c("tbl_df", 
> "tbl", "data.frame"), row.names = c(NA, -6L), .Names = c("Gene", 
> "GENESEED"))

> dg2 <- dg %>%
>   group_by(GENESEED) %>%
>   ss_sampleseed(GENESEED)

> dg2
Source: local data frame [6 x 2]
Groups: GENESEED

   Gene GENESEED
1 ARPC4        1
2 CIDEC        1
3 CAMK1        1
4 CIDEC        2
5 ARPC4        2
6 CAMK1        2

However, when I repeat the above code, I cannot replicate my results.

> dg2
Source: local data frame [6 x 2]
Groups: GENESEED

   Gene GENESEED
1 ARPC4        1
2 CAMK1        1
3 CIDEC        1
4 CAMK1        2
5 ARPC4        2
6 CIDEC        2

Any help is appreciated. Thanks.




Aucun commentaire:

Enregistrer un commentaire