The problem: giving one data frame (A) of 100 rows an IntHours
value, which is sampled from a different data frame (B), without using a loop.
I have a summary data frame which is:
C <- data.frame(IntHours = c(1, 2, 3), HoursCount = c(274, 50, 46))
The IntHours
come from B, which has IntHours
values up to 8. I only need the values 1 through 3. I do not require the other columns from B. C represents the actual filtered, grouped, summarised data from B.
How do I take a random sample of 100 values of 1, 2, and 3, without replacement, from C? The hours count shows the number of underlying rows for each value of 1, 2, and 3.
I know how to sample from C using a loop with vectors and an index, and how to expand C into 370 rows and randomly sample treating the IntHours
as a grouped variable.
But how can I directly sample 100 IntHours
values without doing any expansion? The HoursCount
value is treated as a strict weight, and not replicates. So slice_sample()
in dplyr
will only return the three rows, in descending order of HoursCount
. The base R sample()
fails, logically, with the error that there are not enough rows in order to provide a sample of 100 using sampling without replacement.
Desired outcome: construct a 1-column data frame of 100 rows, consisting of the sampled IntHours
. I will then bind_col
to the 100-row data frame for which I need these values. Without using a loop. Using sampling without replacement.
I'm still writing my package (!) and I'm trying to keep the code as short as possible. This includes removing all non-essential loops but also using code that is easy to read.
Is there a direct way of doing this? I've searched with the [R] and [sample] tags, and I can't find anyone who wants to sample from a summary table/data frame who didn't expand the summary data first. A Google search provided Pandas answers.
Edited: this is one approach. Expand the data and then slice_sample()
from it.
D <- data.frame(IntHours = (c(rep(1, times = 274), rep(2, times = 50), rep(3, times = 46))))
E <- D %>%
slice_sample(n = 100, replace = FALSE)
This gives the random sample of 100. But is there a way of doing this directly from C?
Aucun commentaire:
Enregistrer un commentaire