For example, if I have a column called companyId and many other columns I want to keep, and in companyId I have values like 100, 101, 102, ..., basically a list of Ids and each Id appear different number of times. How do I randomly sample data based on companyId column so that it's according to the proportion of each Id?
eg: If I have 500 rows and 100 companyA, 100 companyB and 300 companyC and I want to sample 100 rows from this table. How do I make my data have 20 companyA, 20 companyB and 60 companyC?
Thanks a lot.
Aucun commentaire:
Enregistrer un commentaire