I am trying to simulate some data by sampling multiple steps.
The first step (create x) works fine.
In the second step, I want to create the variable y by sampling from different vectors based on the value of x.
My code runs without errors, but fails at what I am trying to achieve as it only samples one value for e.g., x == "A", and then reuses that value for all subsequent rows where x == "A". I want it to sample one time for each row where x == "A"
Code:
library(tidyverse)
set.seed(1)
data <- tibble(
x = sample(c("A", "B", "C"), size = 10000, prob = c(0.1, 0.2, 0.7), replace = TRUE),
y = case_when(
x == "A" ~ sample(c("A1", "A2", "A3"), size = 1, prob = c(0.3, 0.4, 0.3)),
x == "B" ~ sample(c("B1", "B2", "B3"), size = 1, prob = c(0.3, 0.4, 0.3)),
x == "C" ~ sample(c("C1", "C2", "C3"), size = 1, prob = c(0.3, 0.4, 0.3)),
))
unique(data$x)
[1] "C" "A" "B"
unique(data$y)
[1] "C1" "A2" "B3"
If the code works as intended unique(data$y)
should return something similar to [1] "A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2", "C3"
I know the problem is the size = 1
argument in sample(), but what can I replace it with? Removing it returns the error:
Error: `x == "A" ~ sample(c("A1", "A2", "A3"), prob = c(0.3, 0.4, 0.3))` must be length 100 or one, not 3
And I have tried size = nrow(.data)
and size=nrow(.)
, but that also returns error.
Is there a simple solution to this?
Aucun commentaire:
Enregistrer un commentaire