random: Randomly sampling groups, followed by sampling within these sampled groups

samedi 15 février 2020

Randomly sampling groups, followed by sampling within these sampled groups

My dataset contains several groups and each group can have a different number of unique observations. I carry out some calculations by group (simplified in the code below), resulting in a summary value for each group. Next, for the purpose of a bootstrap, I want to:

Randomly sample the groups with replacement (number of sampled groups = equal to number of different groups in the original dataset)
Within these sampled groups, randomly sample observations with replacement (number of sampled observations per group = equal to number of unique observations in that group in the original dataset)

A simplified version of my data set up (data1):

data1:

id    group  y
1001  1      10
1002  1      15
1003  1      3
3002  2      24
3003  2      15
3005  2      37
3006  2      32
3007  2      11
4001  3      12
4002  3      15
5006  4      7
5007  4      9
5009  4      22
5010  4      19

For the above, I some have code working for these steps separately, but I cannot seem to combine them:

# Calculate group value
y.group <- tapply(data1$y,data1$group,mean)

# Step 1. Sample groups, with replacement:
sampled.group <- sample(1:length(unique(data1$group)),replace=T)

# Step 2. Sample within groups, with replacement
data2 <- data.frame(data1 %>%
   group_by(group) %>%   # for each group
   sample_frac(1, replace = TRUE) %>%
   ungroup)

Obviously, the code above in full does not do what I want, as in step 2 the sampled groups from step 1 are ignored since it just uses the original group var. I have tried to solve this using step 1 and trying to generate a new dataframe containing only the sampled groups' observations (with duplicates if a group was sampled more than once, which is likely to happen), and then apply step 2 to that new dataframe, but I cannot get this to work.

I think I am just on the wrong path or overthinking things. Hopefully you can give me some advice on how to proceed.

random

samedi 15 février 2020

Randomly sampling groups, followed by sampling within these sampled groups

Aucun commentaire:

Enregistrer un commentaire