mardi 22 décembre 2020

How to create a dataframe with repeated columns created from randomly sampling another dataframe?

I am trying to repeatedly add columns to a dataframe using random sampling from another dataframe.

My first dataframe with the actual data to be sampled from looks like this

df <- data.frame(cat = c("a", "b", "c","a", "b", "c"),
                 x = c(6,23,675,1,78,543))

I have another dataframe like this:

df2 <- data.frame(obs =c(1,2,3,4,5,6,7,8,9,10),
                  cat=c("a", "a", "a", "b", "b", "b", "c","c","c", "c"))

I want to add 1000 new columns to df2 that randomly samples from df, grouped by cat. I figure out a (probably very amateurish) way of doing this once, by using slice_sample() to make a new dataframe sample1 with a random sample of df, and then merging sample1 with df2.

df <- df %>%
  group_by(cat)

df2 <- df2 %>%
  group_by(cat)

sample1 <- slice_sample(df, preserve = T, n=3, replace = T )
sample1 <- sample1 %>%
  ungroup() %>%
  mutate(obs=c(1:9)) %>%
  select(-cat)

df3 <- merge(df2,sample1, by= "obs")

Now, I want to find a way to repeat this 1000 times, to end up with df3 with 1000 columns (x1,x2,x3 etc.)

I have looked into repeat loops, but haven't been able to figure out how to make the above code work inside the loop.




Aucun commentaire:

Enregistrer un commentaire