samedi 10 juillet 2021

Randomly deleting duplicate observations

Refers to data set of 2346 observations with household id's and individual data. From 340 households more than one individual is included (varying from 2 up to 5). As individuals from the same household are more similar then from different households, I need to take out duplicate observations from households, to generate a data set with 2346 - 340 = 1931 observations from unique households.
I have applied: DDW_2020_test <-with(DDW_2020,DDW_2020[order(hhid, pid, DDS),])

```{r}
DDW_2020_test[!duplicated(DDW_2020_test$hhid),]

This gives me indeed a new data set with 1931 observations from unique households, however the duplicated data has not been taken out randomly, but just the first observation was maintained while subsequent observations were deleted. 
How can I take out duplicate observations from unique households in a random way? 
 



Aucun commentaire:

Enregistrer un commentaire