mardi 24 novembre 2015

Why does this random string generator perform so poorly?

I found this bit of PHP code for generating random strings (alphabetical, alphanumeric, numeric, and hexadecimal).

<?php
function random($length = 8, $seeds = 'alpha') {
  // Possible seeds
  $seedings['alpha'] = 'abcdefghijklmnopqrstuvwqyz';
  $seedings['numeric'] = '0123456789';
  $seedings['alphanum'] = 'abcdefghijklmnopqrstuvwqyz0123456789';
  $seedings['hexidec'] = '0123456789abcdef';

  // Choose seed
  if (isset($seedings[$seeds])) {
    $seeds = $seedings[$seeds];
  }

  // Seed generator
  list($usec, $sec) = explode(' ', microtime());
  $seed = (float) $sec + ((float) $usec * 100000);
  mt_srand($seed);

  // Generate
  $str = '';
  $seeds_count = strlen($seeds);

  for ($i = 0; $length > $i; $i++) {
    $str .= $seeds{mt_rand(0, $seeds_count - 1)};
  }

  return $str;
}
?>

If I run this function with the default arguments (so it is generating 8 character strings, alphabetical only) and generate 1,000,000 strings, I'd think my collision rate would be low:

26^8 = 208,827,064,576
1,000,000 / 208,827,064,576 ~= 0.0004%

In actuality, when I run that on my machine, I get a 90% collision rate! Only 10% of my generated strings are unique.

Actually, it is suspiciously close to 10%. Generating multiple sets of 1,000,000 random strings, I find that each set generates...

  1. 100,032 unique strings
  2. 100,035 unique strings
  3. 100,032 unique strings
  4. 100,028 unique strings
  5. 100,030 unique strings
  6. you get the idea...

So what gives? Obviously it has to do with how I'm seeding mt_srand, or how php implements mt_rand, or something else.

So...

Why doesn't this code generate useful random strings?

And what would be a better approach?




Aucun commentaire:

Enregistrer un commentaire