mardi 8 novembre 2022

Creating a new column as a function of probabilities stored in another column

Suppose I have a dataframe with a column p which represents the probability that an individual will choose option 1 as opposed to option 2.

id   p
A    0.2
C    0.4
B    0.7
E    0.2
D    0.9

I want to make a new column choice which captures each individual's choice, given their probability of selecting each choice.

Using random, I can do something like

df['choice'] = df['p'].apply(lambda p : random.choices(population=[1, 2], weights=[p, p-1], k=1)[0])

I am hoping to find something that is faster than this, and makes fewer calls to random.choices (I am simulating choices in the dataset many times). Does anyone know a method that could help here?

If it helps, the values of p are discrete, in that there are only so many options for p and many individuals will have the same value. I was thinking I could use some sort of groupby, but not sure exactly what it would look like.

Any help would be greatly appreciated! Thanks




Aucun commentaire:

Enregistrer un commentaire