I have the following dataframes "df1" and "df2":
x1 <- c(1,1,1,2,2,3)
y1 <- c(0,0,1,1,2,2)
df1 <- data.frame(x1,y1)
y <- c(0,1,2)
p <- c(0.1,0.6,0.9)
df2 <- data.frame(y,p)
What I want to do is to update df1$x1 to a new vector df1$x2, based on a random experiment. This can be manually done using the following function and "lapply" on vector df1$x1:
example_function <- function(x,p){
if(runif(1) <= p) return(x + 1)
return(x)
}
set.seed(123)
df1$x2 <- unlist(lapply(df1$x1,example_function,0.5))
The function performs a random experiment and compares it with a given probability p. Depending on the result either x remains the same for df$x2 or increases by the value of 1.
In the procedure described above, "p" was selected manually within the function (here 0.5 for all x-values in df1). However, I want p to be chosen automatically depending on the combination of df1$x1 and df1$y1. Here comes df2 into play. df2 shows which p-values are related to which y-values. For example df1$x1[3] equals 1, the corresponding y value df1$y1[3] is also equal 1. df2 shows that the associated p-value has to be 0.6 (that is the p-value for y equal 1). In order to determine the corresponding value df1$x2, p = 0.6 should be used in "example_function". Depending on df1$y1, p should be 0.1 for df1$x1[1] and df1$x1[2], 0.6 for df1$x1[3] and df1$x1[4] and 0.9 for df1$x1[5] and df1$x1[6].
Following example is an approach, but only if vector df$x1 contains only different values:
x1 <- c(1,2,3,4,5,6)
y1 <- c(0,0,1,1,2,2)
df1 <- data.frame(x1,y1)
set.seed(123)
df1$x2 <- unlist(lapply(df1$x1,
function(z) {
example_function(z, df2$p[df2$y == df1$y1[df1$x1 == z]])
}))
df1
x1 y1 x2
#1 1 0 1
#2 2 0 2
#3 3 1 4
#4 4 1 4
#5 5 2 5
#6 6 2 7
Using x1 <- c(1,1,1,2,2,3), as mentioned above, leads to warnings and errors:
x1 <- c(1,1,1,2,2,3)
y1 <- c(0,0,1,1,2,2)
df1 <- data.frame(x1,y1)
set.seed(123)
df1$x2 <- unlist(lapply(df1$x1,
function(z) {
example_function(z, df2$p[df2$y == df1$y1[df1$x1 == z]])
}))
Error in if (runif(1) <= p) return(x + 1) : argument is of length zero
In addition: Warning message:
In df2$y == df1$y1[df1$x1 == z] :
Error in if (runif(1) <= p) return(x + 1) : argument is of length zero
Is there anyone who has an idea how to fix that problem? I am very grateful for any help.
Aucun commentaire:
Enregistrer un commentaire