mardi 2 mars 2021

Pandas, randomly create bucket with probabilities

I've a dataframe with a "zone_id" column. The "zone_id" is one of the Nj possible. Each zone has the probabilities "Pa","Pb" and "Pc".

I'd like to select randomly three subsets for every zone, according to the three probabilities.

I thought something like:

for i in range(Nj):
    subset1 = df[df['zone_id']==Nj[i]].sample(frac=Pa) 
    subset2 = df[df['zone_id']==Nj[i]].sample(frac=Pb)
    subset3 = df[df['zone_id']==Nj[i]].sample(frac=Pc)

The problem comes now… I've no chance to avoid overlapping in this way.

How could I select the three subsets without overlapping and still referring to the probabilities? I cannot just subtract the subset1, because then the probability wouldn't be right due to the different total of rows.




Aucun commentaire:

Enregistrer un commentaire