lundi 1 juin 2020

Create a random binary variable for a subset of observations assigning 1 to a specific proportion of rows

I have a dataframe...

df <- tibble(
  id = 1:10, 
  family = c("a","a","b","b","c", "d", "e", "f", "g", "h")
  )

Families will only contain 2 members at most (so they're either individuals or pairs).

For individuals (families with only one row, i.e. id = 5:10), I want to create a column called 'random' that randomly assigns 50% of the entries as 1 and the rest as 0. All other rows (those belonging to families with 2 members) should also equal 0.

By the end, the data should look like the following (depending on which 50% of rows are assigned 1)...

df <- tibble(
  id = 1:10, 
  family = c("a","a","b","b","c", "d", "e", "f", "g", "h"),
  random = c(0, 0, 0, 0, 1, 0, 1, 1, 0, 0)
  )

I am mostly using Tidyverse and would like to include it within a pipe.

I am currently trying something along the lines of...

df %>%
   group_by(family) %>% 
   mutate(random = if(n() == 1) *not sure what goes here* else 0)



Aucun commentaire:

Enregistrer un commentaire