jeudi 19 juillet 2018

Creat a random group variable in long format

Im working in R with a long format dataset. I have on variable (City) and every row is an indivudal with age, sex and som more info. I now want to creat a variable which is either 0 or 1 and i want the distibution of 0 and 1 to be close to 50/50.

My data looks similar to this. The length of the city variable is not necessary the same length.

   Sample    City    Age   Sex
   1        City_a   15     M
   2        City_a   27     F
   3        City_a   25     M
   4        City_b   20     M

And i want to get something like :

   Sample    City    Age   Sex   Random_g
   1        City_a   15     M      0 
   2        City_a   27     F      0
   3        City_a   25     M      1
   4        City_b   20     M      1

I started by doing following but didn't work due to my setting size to 1 and I dont know what else to set it to.

library(tidyverse)
df %>% 
group_by(City) %>% 
mutate(Random_g =sample(c(0,1), replace=TRUE,size=1))

Tried following which works but then the distribution of 0/1 can be anything:

df %>% 
rowwise() %>% 
mutate(Random_g =sample(c(0,1), replace=TRUE,size=1))




Aucun commentaire:

Enregistrer un commentaire