I've a dataframe with a "zone_id" column. The "zone_id" is one of the Nj possible. Each zone has the probabilities "Pa","Pb" and "Pc".
I'd like to select randomly three subsets for every zone, according to the three probabilities.
I thought something like:
for i in range(Nj):
subset1 = df[df['zone_id']==Nj[i]].sample(frac=Pa)
subset2 = df[df['zone_id']==Nj[i]].sample(frac=Pb)
subset3 = df[df['zone_id']==Nj[i]].sample(frac=Pc)
The problem comes now… I've no chance to avoid overlapping in this way.
How could I select the three subsets without overlapping and still referring to the probabilities? I cannot just subtract the subset1, because then the probability wouldn't be right due to the different total of rows.
Aucun commentaire:
Enregistrer un commentaire