jeudi 30 avril 2020

What is the best way to create a random fair hash function that can be replicated?

The title probably sounds weird, but I'll explain what I mean.

Given k values, I want to map those to m buckets (m < k by at least an order of magnitude), such that each bucket contains either floor(k/m) or floor(k/m)+1 of the initial values.

Writing code to do that is very simple. The issue is, I want to be able to replicate the same distribution, by including something that makes the distribution reproducible in my code (something along the lines of a "seed"). Does anyone have any ideas how this can be done?

For the record, what I'm doing now in my Python code is selecting a random non-full hashbucket for every one of the k values, until all buckets contain floor(k/m) values. After this is done, I assign each of the values that I have left to a random bucket.




Aucun commentaire:

Enregistrer un commentaire