jeudi 18 mars 2021

Using sample() to create a new variable based on levels of other variables

Consider this df (the one I'm working with is much, much bigger)

set.seed(13)
test <- tibble(A = as.factor(seq(1:10)),
               B = as.factor(sample(c("Apple", "Banana"), 10, replace = T)),
               C = as.factor(sample(c("Cut", "Mashed"), 10, replace = T)),
               D = as.factor(sample(seq(1:3), 10, replace = T)))

I need to create another numeric variable but the data of the new variable needs to be the same where the levels of the other variables are equal. Let me illustrate.

When I do this, or any other method I tried to find

test %>%
  group_by(B,C,D) %>%
  mutate(E = sample(seq(0.01:100, 0.01), 10, replace = T))

I get an error message,

The result I'm after is the following, I need to use sample or a random creator function

         A     B      C      D       E
>      <fct>   <fct>  <fct>  <fct> <fct> 
>      1 1     Banana Mashed 3    0.2
>      2 2     Apple  Cut    1    4
>      3 3     Banana Mashed 1    5
>      4 4     Apple  Mashed 2    3
>      5 5     Banana Cut    1    1.3
>      6 6     Apple  Cut    3    4.7
>      7 7     Banana Mashed 1    5
>      8 8     Banana Mashed 1    5
>      9 9     Banana Cut    3    3.2
>     10 10    Banana Cut    3    3.2

So rows 9 and 10, 3, 7 and 8 need to be the exact same because the levels are the same across certain variables (B,C,D)

Any idea how to do this?




Aucun commentaire:

Enregistrer un commentaire