I have a data like this, where Average is the average of X, Y, and Z.
head(df)
ID X Y Z Average
A 2 2 5 3
A 4 3 2 3
A 4 3 2 3
B 5 3 1 3
B 3 4 2 3
B 1 5 3 3
C 5 3 1 3
C 2 3 4 3
C 5 3 1 3
D 2 3 4 3
D 3 2 4 3
D 3 2 4 3
E 5 3 1 3
E 4 3 2 3
E 3 4 2 3
To reproduce this, we can use
df <- data.frame(ID = c("A", "A", "A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "E"),
X = c(2L, 4L, 4L, 5L, 3L,1L, 5L, 2L, 5L, 2L, 3L, 3L, 5L, 4L, 3L),
Y = c(2L, 3L, 3L, 3L,4L, 5L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 4L),
Z = c(5L, 2L, 2L,1L, 2L, 3L, 1L, 4L, 1L, 4L, 4L, 4L, 1L, 2L, 2L),
Average = c(3L,3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L))
From this, I want to extract one observation per ID such that we don't get same (as much as is possible) values of the combination of X, Y, and Z. I tried
library(dplyr)
df %>% sample_n(size = nrow(.), replace = FALSE) %>% distinct(ID, .keep_all = T)
But, on a larger dataset, I see too many repetitions of the combination of X, Y, Z. To the extent possible, I need the output with equal or close to equal representation of cases (i.e. the combination of X, Y, Y) like this:
ID X Y Z Average
A 2 2 5 3
B 5 3 1 3
C 2 3 4 3
D 3 2 4 3
E 4 3 2 3
Aucun commentaire:
Enregistrer un commentaire