samedi 18 avril 2020

Draw a random sample without replacement based on a strict range in R

I'm trying to draw a random sample of rows without replacement from a dataset such that the sum of a column in the sample should be strictly within a range. For the example dataset mtcars, the random sample should be such that the sum of mpg is strictly within 90-100.

A reproducible example:

data("mtcars")

random_sample <- function(dataset){
  final_mpg = 0
  while (final_mpg < 100) {
    basic_dat <- dataset %>%
      sample_n(1) %>%
      ungroup()
    total_mpg <- basic_dat %>%
      summarise(mpg = sum(mpg)) %>%
      pull(mpg)
    final_mpg <- final_mpg + total_mpg
    if (final_mpg > 90 & final_mpg < 100){
      break()
    }
    final_dat <- rbind(get0("final_dat"), get0("basic_dat"))
  }
  return(final_dat)
}

chosen_sample <- random_sample(mtcars)

But this function output samples with sum(mpg) > 100. How do I ensure that every sample it generates is strictly within that range? Any help is much appreciated.




Aucun commentaire:

Enregistrer un commentaire