When I run a basic to see the counts of my clusters as follows:
a.groupby('clusters').count()
my results look like so:
clusters a b c
0 10000 10000 10000
1 10000 10000 10000
2 20000 20000 20000
I then want to stratify sample say by these amounts to get a prorated amount of output columns and use the below code as so:
stratify = data.sample(n=10000, weights='clusters', random_state=0)
so that in this fake example my dataset should decrease by a factor of 4 and if I do the same groupby on the new dataframe I create based on the 1 line of above I should I get row 0 to be =2500, row 1 to be =2500 and row 2 to be = 5000, however, for some I have no clue what it can be reason what I get instead I get the correct output for rows 1 and 2 but row 0 just disappears:
stratify.groupby('clusters').count()
the output looks as follows
clusters a b c
1 2500 2500 2500
2 5000 5000 5000
Why in the world did my 1st row disappear? There looks to be nothing special about it...
Aucun commentaire:
Enregistrer un commentaire