samedi 9 juin 2018

Random token generation - a supposedly unlikely collision occurred

A couple months ago, we were using UUIDs to generate random string IDs that needed to be unique across the board. I then changed the algorithm in order to save some data and index space in our database. I tested a few ways to generate unique string IDs, and I decided to use this function:

function generateToken($length) {
    $characters = '0123456789abcdefghijklmnopqrstuvwxyz';
    $max = strlen($characters) - 1;

    $token = '';
    for ($i = 0; $i < $length; $i++) {
        $token .= $characters[mt_rand(0, $max)];
    }

    return $token;
}

I'm using this function to generate IDs that are 20 characters long using digits and letters, or you could say these IDs are numbers in base 36. The probability of any 2 IDs colliding should be 1/36^20, but due to the birthday paradox, it can be expected for a collision to occur after about 36^10 records - that's 3.6 quadrillion records. Yet, just a few hours ago a collision occurred, when there were only 5.3 million existing records in the database. Am I extremely unlucky, or my ID-generating function is flawed with respect to randomness? I know mt_rand() isn't truly random, but it is random enough, isn't it?

I would've written a loop that checks if the generated ID is unique and generate a new one if it isn't, but I thought that the chance of getting a collision was so small that the performance cost of such a loop wasn't worth it. I will now include such a loop in the code, but I'm still interested in perfecting the ID generation function if it is truly flawed.




Aucun commentaire:

Enregistrer un commentaire