library(dplyr)
library(zoo)
df_a <- iris %>%
group_by(Species) %>%
summarise(mean_petal_length = mean(Petal.Length))
sample_n(df_a, 2)
This returns 2 random rows of summarized iris
as expected, though there is only one row per group, Species
.
However, the other example below seems to behave differently.
df_b <- iris %>%
group_by(Species) %>%
mutate(Petal.Length = na.locf(Petal.Length))
# Now df_b is the same with iris in terms of data contents
# since there's no missing vales in Petal.Length
sample_n(df_b, 60)
I expected to get 60 random rows of df_b
, but this gives me an error message: size
must be less or equal than 50 (size of data), set replace
= TRUE to use sampling with replacement.
I can see it's because there are only 50 rows per group Species
, and I have to ungroup
after mutate
in this case to get my expected output. Still I don't get the reasons why there is such difference.
Aucun commentaire:
Enregistrer un commentaire