jeudi 2 février 2023

Random sampling of rows based on a catergorical column for n iterations in R then using mclust for clustering these dataframes

I have a dataframe where the samples are rows and columns are proteins, I have an additional categorical column which are the group names that the samples belong to :

Sample Group protein1 protein2
s1 group1 2.5 0.1
s2 group2 0.2 3.0

the number of samples in each group is different, so I would like to randomly sample based on the minimum number of samples in say group1 and then make 1 dataframe out of it, then use mclust to cluster the data. I would like to repeat this process till all samples have been used and for a fixed number of iterations say 10. And finally have a table where in I have the samples that were selected for clustering using mclust and the optimal K that was found using those samples. At every random sampling process, I want to have samples from each group.

mclust optimal k iteration number of samples from group1 number of samples from group1
2 1 5 5
2 2 5 6
n ..10 5 7

i would be happy to recieve any help :) Thanks a lot




Aucun commentaire:

Enregistrer un commentaire