I am trying to populate a dataframe B with random samples from another dataframe A.
dfA
dfB
Each draw would be made according to the elements in common between them.
I have looked into the sample_n function in dplyr, but I have not been able to figure out how to only sample within the same factor in common.
What I would like is to have a random color drawn from dfA and use it to populate the smp column in dfB. The draw, however, is restricted to elements only in common to the factor in dfA. So for the first row in dfB, there is are three choices: two reds and one blue, for the second, there is only one choice: black, for the third row: NA, because I chose no replacement. and so on.
Some example data:
dfA<-cbind(c("blue",
"red","red","black","blue","red"),c("A","A","A","B","C","C"))
colnames(dfA)<-c("color", "factor")
dfB<-cbind(c("A","B","B","B","B","C","C"),NA)
colnames(dfB)<-c("factor", "smp")
I have tried something along these lines
library(dplyr)
sample_n(dfA, color,?n?, replace = FALSE)
The code is not functional. The ?n? is the part of the command I do not know how to insert, which is basically the factor in common with dfB. Would it be more efficient as aboolean operation or a for loop? I struggle with both, however.
The result is a random draw, but would look something like this:
Any insight will be most welcome, at my level of syntax in R, I am quite stumped.
Aucun commentaire:
Enregistrer un commentaire