jeudi 3 décembre 2020

How do I both randomly select rows from a data frame and delete each row as it has been selected?

I'm randomly sampling without replacement from a data frame that consists of a single column. This column contains duplicated numeric values.

I'm using dplyr to do this. My data from which I need to sample looks like:

testSO <- data.frame(ToSample = c(round(runif(100, min=1, max=3),0)))

I use the code below to randomly sample 15 rows:

MyRandomSample <- testSO %>%
slice_sample(n=15, replace = FALSE)

Is there a direct method to remove each of these 15 samples from testSO as they are selected? Effectively, slice_sample is doing this under the hood. I can't locate a method for creating a list of the row indices to be able to remove these from testSO. Then I would simply delete the rows that match the row indices.

The real testSO data has some ordering effects, hence why I am using slice_sample instead of slice_head.

I can reorder testSO randomly and then slice_head. But is there a method for both drawing a sample and simultaneously deleting the sampled rows? I found a base R method using -sample that deletes rows from the data frame, but it doesn't then pass the deleted rows to another object.




Aucun commentaire:

Enregistrer un commentaire