The title probably sounds weird, but I'll explain what I mean.
Given k values, I want to map those to m buckets (m < k by at least an order of magnitude), such that each bucket contains either floor(k/m) or floor(k/m)+1 of the initial values.
Writing code to do that is very simple. The issue is, I want to be able to replicate the same distribution, by including something that makes the distribution reproducible in my code (something along the lines of a "seed"). Does anyone have any ideas how this can be done?
For the record, what I'm doing now in my Python code is selecting a random non-full hashbucket for every one of the k values, until all buckets contain floor(k/m) values. After this is done, I assign each of the values that I have left to a random bucket.
Aucun commentaire:
Enregistrer un commentaire