I have a dataframe with a category column. Df has different number of rows for each category.
category number_of_rows
cat1 19189
cat2 13193
cat3 4500
cat4 1914
cat5 568
cat6 473
cat7 216
cat8 206
cat9 197
cat10 147
cat11 130
cat12 49
cat13 38
cat14 35
cat15 35
cat16 30
cat17 29
cat18 9
cat19 4
cat20 4
cat21 1
cat22 1
cat23 1
I want to select different number of rows from each category. (Instead of n fixed number of rows from each category)
Example input:
size_1 : {"cat1": 40, "cat2": 20, "cat3": 15, "cat4": 11, ...}
Example input:
size_2 : {"cat1": 51, "cat2": 42, "cat3": 18, "cat4": 21, ...}
What I want to do is actually a stratified sampling with given number of instances corresponding to each category.
Also, it should be randomly selected. For example, I don't need the top 40 values for size_1.["cat1"], I need random 40 values.
Thanks for the help.
Aucun commentaire:
Enregistrer un commentaire