jeudi 1 mars 2018

In PostgreSQL, How do I random sample from table based on the proportion of each level in a categorical column?

For example, if I have a column called companyId and many other columns I want to keep, and in companyId I have values like 100, 101, 102, ..., basically a list of Ids and each Id appear different number of times. How do I randomly sample data based on companyId column so that it's according to the proportion of each Id?

eg: If I have 500 rows and 100 companyA, 100 companyB and 300 companyC and I want to sample 100 rows from this table. How do I make my data have 20 companyA, 20 companyB and 60 companyC?

Thanks a lot.




Aucun commentaire:

Enregistrer un commentaire