lundi 22 juin 2020

assign random integer to each group in a dataframe

Given a data frame with three different observations for each individual, I'm trying to assign a unique random integer to each unique individual. df <- data.frame(sample = 1:15, ID = rep(1:5, times = 3))

     sample ID
1       1  1
2       2  2
3       3  3
4       4  4
5       5  5
6       6  1
7       7  2
8       8  3
9       9  4
10     10  5
11     11  1
12     12  2
13     13  3
14     14  4
15     15  5

In this case, I want each ID to have a random value 'newvar' between 1 and 5.

I've tried

df %>% group_by(ID) %>% mutate(newvar = sample(5, n(), replace = FALSE))

which doesn't keep the new variable the same within each ID, and

df %>% group_by(ID) %>% mutate(newvar = sample.int(n()))

which gives a random number between 1 and 3 within each group, and

df %>% group_by(ID) %>% mutate(newvar = sample(5, replace = FALSE))

which doesn't work as it wants newvar to be size 1 or 3, not 5.

I've also tried using the levels of ID:

levels(df$ID) <- sample(length(levels(df$ID)))
df$newvar <- levels(df$ID)

This randomized the ID column and wrote it to newvar on the test df

    sample ID newvar
1       1  5      5
2       2  2      2
3       3  1      1
4       4  4      4
5       5  3      3
6       6  5      5
7       7  2      2
8       8  1      1
9       9  4      4
10     10  3      3
11     11  5      5
12     12  2      2
13     13  1      1
14     14  4      4
15     15  3      3

and on my full data set (918 observations of 306 individuals) it throws an error:

Error: Assigned data `value` must be compatible with existing data. x Existing data has 918 rows. x Assigned data has 306 rows. ℹ Only vectors of size 1 are recycled.

Is there a way to get sample() to happen within the group_by command, or to get each level of ID assigned correctly to a random integer?




Aucun commentaire:

Enregistrer un commentaire