I have a large dataframe SYN_data with 150000 rows and 3 columns named SNP, Gene and count.There is a list r with 2545 count values that also include some duplicates. Now I need to randomly sample 2545 rows without replacement from SYN_data with similar count values as in the list r. I could successfully do it until here by using this code:
test1<-SYN_data[ sample( which( SYN_data$count %in% r ) , 2545 ) , ]
The second condition is that the unique length of Genes should be 1671 in total 2545 rows, means that some of the Genes have more than 1 SNPs. Is there any way I can incorporate this condition in the same code or any other code meeting all conditions would be very helpful. Thanks!
Sample data:
r 1,7,3,14,9
Aucun commentaire:
Enregistrer un commentaire