mercredi 16 octobre 2019

Why so many collisions with the MariaDB 10.2 RAND() function?

Note: Running MariaDB 10.2.27 on Windows Server 2012 R2 Standard.

I wanted to generate random integers for use in MariaDB so I have been experimenting with the MariaDB RAND() function. Either my expectations & understanding are way off base (definitely possible!) or the MariaDB RAND() function is not very random.

Using a BIGINT(20) column I wanted to generate random integers up to 16 digits in length, so I used this SQL: FLOOR(RAND()*9999999999999999)+1). The exact SQL I use, in a loop is:

INSERT INTO rnd_test VALUES (FLOOR(RAND()*9999999999999999)+1);

Table rnd_test has a single column which is a BIGINT(20) and is the primary ID.

With a pool of 10^16 numbers and considering the Birthday Paradox I would expect about a 50% chance of a collision after 10^8 numbers generated. Obviously there is some variance to this but every time I run the insert loop I start seeing collisions almost immediately and then repeating every 2000 or 3000 generated numbers, sometimes more often. After ~50,000 randoms have been generated I'm seeing collisions every few hundred numbers.

Thinking that maybe my understanding was wildly incorrect I adjusted my looping vb.net code to generate the random locally and then insert that into the MariaDB table. I define a new System.Random at the top of the routine and then use this to generate random numbers:

Dim r As Long = CLng(Math.Floor(rNum.NextDouble() * 9999999999999999)) + 1

Generally this works much better but still not quite as well as I would expect. It will usually run for about 100,000 iterations before a collision occurs, and then after that there seems to be one or two collisions per 10,000 randoms generated. Sometimes a batch of 10,000 will go by without any collisions at all.

So, why is the MariaDB RAND() function performing so poorly compared to the vb.net function?




Aucun commentaire:

Enregistrer un commentaire