mardi 28 juillet 2020

Fair algorithm to convert random bytes into digit sequence

Most (if not all) CSPRNG functions available out there provide us byte sequences as result (eg. getrandom, CryptGenRandom, BCryptGenRandom, RNGCryptoServiceProvider, SecureRandom, CRYPT_GEN_RANDOM etc).

However, depending where we are supposed to use this random sequence our charset may be different (suppose you need only digits), and we may need to convert the byte sequence to fit our constraint.

A naive way to solve the problem is to convert each byte to its decimal representation and concatenate all numbers (the programming language doesn't matter):

randbytes = AnyRandomGenerator()
sequence = ""
for byte in randbytes:
  sequence = sequence + byte.ToInt().ToStr()
return sequence

This way, the random sequence 0xFE501000 would become the sequence 25480100.

It works, however, there is a big mistake in this approach: the probability of the first number to be "1" or "2" is significantly bigger than any other number (because bytes ranges from 0 to 255 and not from 0 to 999).

To solve this, an improvement would be to MOD 10 each digit:

randbytes = AnyRandomGenerator()
sequence = ""
for byte in randbytes:
  sequence = sequence + (byte.ToInt() MOD 10).ToStr()
return sequence

However, since MAX_BYTE = 255 is not a multiple of 10, this leads to small unfair distribution for numbers greather than five.

In an example made by Microsoft for .NET, they avoid this problem considering some random numbers "unfair". When an unfair number is generated, it is discarded and a new one is generated. This solves the problem, but I must say it isn't very elegant (it "waste" numbers :P).

Is there a best/fast way to achieve this conversion keeping the numbers normally distributed? Or the method used by Microsoft is the only way to go?

(PS: Although my examples uses a 10-digit charset = [0-9], I'm asking about charsets of any size, like the Dice Roll charset = [0-5] in the Microsoft example)




Aucun commentaire:

Enregistrer un commentaire