Given a data frame with three different observations for each individual, I'm trying to assign a unique random integer to each unique individual. df <- data.frame(sample = 1:15, ID = rep(1:5, times = 3))
sample ID
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 1
7 7 2
8 8 3
9 9 4
10 10 5
11 11 1
12 12 2
13 13 3
14 14 4
15 15 5
In this case, I want each ID to have a random value 'newvar' between 1 and 5.
I've tried
df %>% group_by(ID) %>% mutate(newvar = sample(5, n(), replace = FALSE))
which doesn't keep the new variable the same within each ID, and
df %>% group_by(ID) %>% mutate(newvar = sample.int(n()))
which gives a random number between 1 and 3 within each group, and
df %>% group_by(ID) %>% mutate(newvar = sample(5, replace = FALSE))
which doesn't work as it wants newvar to be size 1 or 3, not 5.
I've also tried using the levels of ID:
levels(df$ID) <- sample(length(levels(df$ID))) df$newvar <- levels(df$ID)
This randomized the ID column and wrote it to newvar on the test df
sample ID newvar
1 1 5 5
2 2 2 2
3 3 1 1
4 4 4 4
5 5 3 3
6 6 5 5
7 7 2 2
8 8 1 1
9 9 4 4
10 10 3 3
11 11 5 5
12 12 2 2
13 13 1 1
14 14 4 4
15 15 3 3
and on my full data set (918 observations of 306 individuals) it throws an error:
Error: Assigned data `value` must be compatible with existing data. x Existing data has 918 rows. x Assigned data has 306 rows. ℹ Only vectors of size 1 are recycled.
Is there a way to get sample() to happen within the group_by command, or to get each level of ID assigned correctly to a random integer?
Aucun commentaire:
Enregistrer un commentaire