I have a dataframe where the samples are rows and columns are proteins, I have an additional categorical column which are the group names that the samples belong to :
Sample | Group | protein1 | protein2 |
---|---|---|---|
s1 | group1 | 2.5 | 0.1 |
s2 | group2 | 0.2 | 3.0 |
the number of samples in each group is different, so I would like to randomly sample based on the minimum number of samples in say group1 and then make 1 dataframe out of it, then use mclust to cluster the data. I would like to repeat this process till all samples have been used and for a fixed number of iterations say 10. And finally have a table where in I have the samples that were selected for clustering using mclust and the optimal K that was found using those samples. At every random sampling process, I want to have samples from each group.
mclust optimal k | iteration | number of samples from group1 | number of samples from group1 |
---|---|---|---|
2 | 1 | 5 | 5 |
2 | 2 | 5 | 6 |
n | ..10 | 5 | 7 |
i would be happy to recieve any help :) Thanks a lot
Aucun commentaire:
Enregistrer un commentaire