Suppose I want to select two records from a set of three, where the probabilities of the three are 0.1, 0.5, and 0.4, respectively. Per this SO answer, numpy.random.choice
will work:
import pandas as pd
from numpy import random
df = pd.DataFrame({
'id': [1, 2, 3],
'prob': [0.1, 0.5, 0.4]
})
random.seed(0)
random.choice(df.id, p=df.prob, size=2, replace=False)
# array([2, 3])
Now suppose each item also has a weight, and rather than selecting two items, I want to select a maximum weight. So if these items have weight of 4, 5, and 6, and I have a budget of 10, I could select {1, 2}, {1, 3}, or {3}. The relative probabilities of each item being included would still be governed by the probabilities (though in practice I think an algorithm would return item 1 more often because its low weight can serve as a filler).
Is there a way to adapt random.choice
for this, or another approach to yield this result?
Aucun commentaire:
Enregistrer un commentaire