jeudi 25 juillet 2019

Sampling elements according to common factor between data frames in R

I am trying to populate a dataframe B with random samples from another dataframe A.

dfA

enter image description here

dfB

enter image description here

Each draw would be made according to the elements in common between them.

I have looked into the sample_n function in dplyr, but I have not been able to figure out how to only sample within the same factor in common.

What I would like is to have a random color drawn from dfA and use it to populate the smp column in dfB. The draw, however, is restricted to elements only in common to the factor in dfA. So for the first row in dfB, there is are three choices: two reds and one blue, for the second, there is only one choice: black, for the third row: NA, because I chose no replacement. and so on.

Some example data:

dfA<-cbind(c("blue", 
"red","red","black","blue","red"),c("A","A","A","B","C","C"))
colnames(dfA)<-c("color", "factor")
dfB<-cbind(c("A","B","B","B","B","C","C"),NA)
colnames(dfB)<-c("factor", "smp")

I have tried something along these lines

library(dplyr)
sample_n(dfA, color,?n?, replace = FALSE)

The code is not functional. The ?n? is the part of the command I do not know how to insert, which is basically the factor in common with dfB. Would it be more efficient as aboolean operation or a for loop? I struggle with both, however.

The result is a random draw, but would look something like this:

enter image description here

Any insight will be most welcome, at my level of syntax in R, I am quite stumped.




Aucun commentaire:

Enregistrer un commentaire