mardi 2 juillet 2019

Why does a tbf remain grouped after filling NAs with na.locf?

library(dplyr)
library(zoo)

df_a <- iris %>%
    group_by(Species) %>%
    summarise(mean_petal_length = mean(Petal.Length))
sample_n(df_a, 2)

This returns 2 random rows of summarized iris as expected, though there is only one row per group, Species.

However, the other example below seems to behave differently.

df_b <- iris %>%
    group_by(Species) %>%
    mutate(Petal.Length = na.locf(Petal.Length))
# Now df_b is the same with iris in terms of data contents
# since there's no missing vales in Petal.Length
sample_n(df_b, 60)

I expected to get 60 random rows of df_b, but this gives me an error message: size must be less or equal than 50 (size of data), set replace = TRUE to use sampling with replacement.

I can see it's because there are only 50 rows per group Species, and I have to ungroup after mutate in this case to get my expected output. Still I don't get the reasons why there is such difference.




Aucun commentaire:

Enregistrer un commentaire