Suppose I have a dataframe with a column p which represents the probability that an individual will choose option 1 as opposed to option 2.
id p
A 0.2
C 0.4
B 0.7
E 0.2
D 0.9
I want to make a new column choice which captures each individual's choice, given their probability of selecting each choice.
Using random, I can do something like
df['choice'] = df['p'].apply(lambda p : random.choices(population=[1, 2], weights=[p, p-1], k=1)[0])
I am hoping to find something that is faster than this, and makes fewer calls to random.choices (I am simulating choices in the dataset many times). Does anyone know a method that could help here?
If it helps, the values of p are discrete, in that there are only so many options for p and many individuals will have the same value. I was thinking I could use some sort of groupby, but not sure exactly what it would look like.
Any help would be greatly appreciated! Thanks
Aucun commentaire:
Enregistrer un commentaire