My dataset contains several groups and each group can have a different number of unique observations. I carry out some calculations by group (simplified in the code below), resulting in a summary value for each group. Next, for the purpose of a bootstrap, I want to:
- Randomly sample the groups with replacement (number of sampled groups = equal to number of different groups in the original dataset)
- Within these sampled groups, randomly sample observations with replacement (number of sampled observations per group = equal to number of unique observations in that group in the original dataset)
A simplified version of my data set up (data1):
data1:
id group y
1001 1 10
1002 1 15
1003 1 3
3002 2 24
3003 2 15
3005 2 37
3006 2 32
3007 2 11
4001 3 12
4002 3 15
5006 4 7
5007 4 9
5009 4 22
5010 4 19
For the above, I some have code working for these steps separately, but I cannot seem to combine them:
# Calculate group value
y.group <- tapply(data1$y,data1$group,mean)
# Step 1. Sample groups, with replacement:
sampled.group <- sample(1:length(unique(data1$group)),replace=T)
# Step 2. Sample within groups, with replacement
data2 <- data.frame(data1 %>%
group_by(group) %>% # for each group
sample_frac(1, replace = TRUE) %>%
ungroup)
Obviously, the code above in full does not do what I want, as in step 2 the sampled groups from step 1 are ignored since it just uses the original group var. I have tried to solve this using step 1 and trying to generate a new dataframe containing only the sampled groups' observations (with duplicates if a group was sampled more than once, which is likely to happen), and then apply step 2 to that new dataframe, but I cannot get this to work.
I think I am just on the wrong path or overthinking things. Hopefully you can give me some advice on how to proceed.
Aucun commentaire:
Enregistrer un commentaire