mardi 26 octobre 2021

Python random.choices, how to get less deviation from expected probability?

With arrays for traitname and traitWeights Using trait["traitX"] = random.choices(traitName, traitxWeights)[0] all works fine. Currently using 5 x different traits (so 5x sets of item/weight). I return a number of random combinations of all 5. All good so far.

Problem: Results for multiple traits are dumped to JSON and CSV. I open the CSV to test if distribution of traits was similar to expected - and it's not. It deviates by an average of around 6, no matter how I change weights or specify them, every test has approximately the same average deviation.

Meaning, if my weights sum to 100, and I generate 100 random picks, then for weights of [10, 20, 30, 40] I'd expect close to results of: 10 of a, 20 of b, 30 of c, 40 of d. Not exactly that amount of course. But close.

Instead I can get results like 4, 26, 28, 42. Which is a very different distribution.

This is a simplified example, in reality I have 150 options for trait a, 100 for trait b, etc. So across 150 things, the average deviation is 6. Some deviate from the expected amount by 1 or 2, OK. But then some deviate by 20. Like I said, average remains 6-ish. But the big swings are having too much effect on my intended distribution.

Desired Outcome: What would I use either different to, or in conjunction with, random.choices to generate outcomes more closely aligned with the probabilities? If I hard-code accepted deviation for traits that could work, but it will also be a ton of checks and ifs and it seems slow and unwieldy.

Thanks kindly




Aucun commentaire:

Enregistrer un commentaire