jeudi 12 septembre 2019

Replace NA with sample() by group

I would like to gapfill missing data by sample() from the group that the missing data belongs to.

Here's what I've tried so far:

Sample data

> dput(droplevels((example)))
structure(list(LENGTH = c(0, 7193.48815617057, 1571.95459212405, 
18191.381972185, 20366.2132412031, 10014.987524596, 1403.72511829297, 
5651.17842991513, 6848.03271105711, 8043.32937011393, 8926.65133418451, 
5808.44456603825, 2208.14264175252, 1797.4936747033, 5325.76651327694, 
2660.66730207955, 5844.07912541444, 3956.40473896271, 959.873314407621, 
3294.01472360025, 5221.94864001864, 3781.51913857335, 7811.83819953768, 
3387.20323328623, 5514.92099458441, 5792.54371531706, 5643.98385143961, 
15478.916809379, 8401.66533205217, 7046.25074819247, 2734.73639821402, 
10562.0938581209, 62332.3343404513, 0), NUMPOINTS = c(0, 2, 0, 
9, 0, 0, 0, 3, 1, 0, 6, 1, 1, 0, 0, 1, 0, 0, 0, 1, 2, 0, 0, 0, 
0, 1, 0, 0, 0, 1, 0, 4, 10, 0), CTRY_ = structure(c(1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L), .Label = c("WCY_____ES", 
"WCY_____FR", "WCY_____IT"), class = "factor"), Outlet = structure(c(2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("DSO0045543", "ESO0244476"), class = "factor")), row.names = c(NA, 
-34L), class = "data.frame") 

Sample code

example %>% 
  mutate(NUMPOINTS = if_else(CTRY_ != 'WCY_____FR', NA_real_, NUMPOINTS),
         LENGTH = if_else(CTRY_ != 'WCY_____FR', NA_real_, LENGTH)) %>%
  mutate(nuLENGTH = if_else(CTRY_ == 'WCY_____FR', LENGTH, sample(LENGTH[!is.na(LENGTH) & Outlet == Outlet], 1, TRUE)),
         nuNUMPOINTS = if_else(CTRY_ == 'WCY_____FR', NUMPOINTS, sample(NUMPOINTS[!is.na(NUMPOINTS) & Outlet == Outlet], 1, TRUE))) 

Within each group I expect the values for nuLENGTH and nuNUMPOINTS where CTRY_ != 'WCY_____FR' to be different, not the same. And so far I can't even get the nuLENGTH or nuNUMPOINTS values to differ by group.




Aucun commentaire:

Enregistrer un commentaire