dimanche 31 juillet 2022

Sample Multiple Columns Without Repeats R dplyr

I am trying to solve a problem where I take a random sample based on probability with 5 observations per row where I observe a color out of a possible set of colors, exclude the observed color from the next observation and repeat. Colors can repeat in any given column, but not in the same row.

Here is how I have approached the problem:

library(tidyverse)

data <- tibble(obsId = 1:100)

colors <- tibble(color = c('red', 'blue', 'white', 'yellow', 'green', 'orange', 
                           'gray', 'brown', 'purple', 'black', 'pink', 'navy', 
                           'maroon'), 
                  prob = c(0.85, 0.85, 0.75, 0.75, 0.65, 0.5, 0.5, 0.5, 0.4, 
                           0.4, 0.25, 0.15, 0.15))

data <- data %>% 
      mutate(color1 = sample(x = colors$color, size = n(), 
                          prob  = colors$prob, replace = T),
             color2 = sample(x = colors$color, size = n(), 
                          prob  = colors$prob, replace = T),
             color3 = sample(x = colors$color, size = n(), 
                          prob  = colors$prob, replace = T),
             color4 = sample(x = colors$color, size = n(), 
                          prob  = colors$prob, replace = T),
             color5 = sample(x = colors$color, size = n(), 
                          prob  = colors$prob, replace = T)

The issue I have is that color 2 will be equal to color 1 (and so forth) in certain rows. Is there any easy way to resolve this?




Aucun commentaire:

Enregistrer un commentaire