random: Random sampling based on vector with multiple conditions R

dimanche 17 novembre 2019

Random sampling based on vector with multiple conditions R

I have a large dataframe SYN_data with 150000 rows and 3 columns named SNP, Gene and count.There is a list r with 2545 count values that also include some duplicates. Now I need to randomly sample 2545 rows without replacement from SYN_data with similar count values as in the list r. I could successfully do it until here by using this code:

test1<-SYN_data[ sample( which( SYN_data$count %in% r ) , 2545 ) , ]

The second condition is that the unique length of Genes should be 1671 in total 2545 rows, means that some of the Genes have more than 1 SNPs. Is there any way I can incorporate this condition in the same code or any other code meeting all conditions would be very helpful. Thanks!

Sample data:

r 1,7,3,14,9

random

dimanche 17 novembre 2019

Random sampling based on vector with multiple conditions R

Aucun commentaire:

Enregistrer un commentaire