mercredi 21 mars 2018

Updating a vector within a dataframe using a random experiment

I have the following dataframes "df1" and "df2":

x1 <- c(1,1,1,2,2,3)
y1 <- c(0,0,1,1,2,2)
df1 <- data.frame(x1,y1)

y <- c(0,1,2)
p <- c(0.1,0.6,0.9)
df2 <- data.frame(y,p)

What I want to do is to update df1$x1 to a new vector df1$x2, based on a random experiment. This can be manually done using the following function and "lapply" on vector df1$x1:

example_function <- function(x,p){
   if(runif(1) <= p) return(x + 1)
   return(x)
}

set.seed(123)
df1$x2 <- unlist(lapply(df1$x1,example_function,0.5))

The function performs a random experiment and compares it with a given probability p. Depending on the result either x remains the same for df$x2 or increases by the value of 1.

In the procedure described above, "p" was selected manually within the function (here 0.5 for all x-values in df1). However, I want p to be chosen automatically depending on the combination of df1$x1 and df1$y1. Here comes df2 into play. df2 shows which p-values are related to which y-values. For example df1$x1[3] equals 1, the corresponding y value df1$y1[3] is also equal 1. df2 shows that the associated p-value has to be 0.6 (that is the p-value for y equal 1). In order to determine the corresponding value df1$x2, p = 0.6 should be used in "example_function". Depending on df1$y1, p should be 0.1 for df1$x1[1] and df1$x1[2], 0.6 for df1$x1[3] and df1$x1[4] and 0.9 for df1$x1[5] and df1$x1[6].

Following example is an approach, but only if vector df$x1 contains only different values:

x1 <- c(1,2,3,4,5,6)
y1 <- c(0,0,1,1,2,2)
df1 <- data.frame(x1,y1)
set.seed(123)

df1$x2 <- unlist(lapply(df1$x1, 
                     function(z) {
                       example_function(z, df2$p[df2$y == df1$y1[df1$x1 == z]])
                     }))
df1

   x1 y1 x2
#1  1  0  1
#2  2  0  2
#3  3  1  4
#4  4  1  4
#5  5  2  5
#6  6  2  7

Using x1 <- c(1,1,1,2,2,3), as mentioned above, leads to warnings and errors:

x1 <- c(1,1,1,2,2,3)
y1 <- c(0,0,1,1,2,2)
df1 <- data.frame(x1,y1)
set.seed(123)

df1$x2 <- unlist(lapply(df1$x1, 
                     function(z) {
                       example_function(z, df2$p[df2$y == df1$y1[df1$x1 == z]])
                     }))

Error in if (runif(1) <= p) return(x + 1) : argument is of length zero
In addition: Warning message:
In df2$y == df1$y1[df1$x1 == z] :
Error in if (runif(1) <= p) return(x + 1) : argument is of length zero

Is there anyone who has an idea how to fix that problem? I am very grateful for any help.




Aucun commentaire:

Enregistrer un commentaire