mercredi 29 mars 2017

C#: Most efficient way to generate lots of "good" random ranged integers

I have a program wherein I need to generate a huge quantity of random integers, all within a specific range.

I've written a little benchmark tool that will test various ways to generate random integers and populate an array of ints with the results. The methods I've benchmarked are as follows:

  1. Simple Random.Next() without ranging, populating array using a for loop*
  2. Random.Next(x,y), with ranging, populating array using a for loop
  3. Random.next(x,y), with ranging, populating array using LINQ
  4. RNGCryptoServiceProvider, directly populating a byte[] array, no conversion to ints, no ranging*
  5. RNGCryptoServiceProvider, converting all values to int, without range constraints, using a for loop*
  6. RNGCryptoServiceProvider, generating 4x the desired random numbers as bytes, converting each value to a double between 0 and 1, then using multiplication/addition to range the value while converting to int. All in a for loop.
  7. RNGCryptoServiceProvider, same as #6 but using LINQ
  8. RNGCryptoServiceProvider, same as #6, but only generating n+3 bytes and using 8-bit offsets rather than 32-bit offsets to generate the floats
  9. RNGCryptoServiceProvider, bytes converted to ints and ranged using the modulus operator, in a for loop
  10. RNGCryptoServiceProvider, same as #9 but with 4x the bytes generated, each int converted from 32-bits rather than just 8 bits, also ranged with modulus
  11. RNGCryptoServiceProvider, same as #10 but rejecting values that might create an uneven distribution.
    • I do this by figuring out the largest multiple within the integer space. So for example, if working with 8-bit values, and we desire numbers from 1 to 10, we know that there are 10 possible values. We take 255 (maximum value of a byte) % 10 to get 5, and subtract 5 from 255 to get 250. Now, any random number we get that is >= 250 is rejected and a new number is generated in its place.

I haven't thought of any other strategies so far.

There are problems with most of these strategies:

  • Any strategy ending with a * does not enforce the range for the ints and thus the results are unacceptable.
  • The LINQ methods are always slower than their respective for loop methods. (This is a given.)
  • Random.Next() is not necessarily "cryptographically" secure or "good enough" randomness.
  • Method 8 uses overlapping offsets (e.g. each double is composed of three of the bits from the previous double, thereby affecting randomness) In this method, the double for value 0 is extracted from bytes 0-3 of the random pool; the double for value 1 is taken from bytes 1-4, and so on.
  • The methods using the modulus operator other than method 11 will not necessarily produce an even distribution. For example, if you need values betweeen 1 and 10 and you are working with 8-bit byte valuies, the numbers 1 through 6 will occur slightly more likely than 7 through 10, because the numbers 250-255 would map to 1-6 after modulus and ranging. These values would be 4% more likely to occur in any given value.

Here are the results of a benchmark run, running on a Xeon E5-2620v1, which has AES instruction support:

Generate 50,000,000 random numbers in the range 1 to 10...
 *1: 1.10 sec 45,335,560 num/sec [Random.Next, unranged ints, for loop]
 +2: 1.90 sec 26,266,911 num/sec [Random.Next, ranged ints, for loop]
 +3: 3.23 sec 15,476,958 num/sec [Random.Next, ranged ints, LINQ]
 *4: 0.11 sec 475,817,979 num/sec [RCSP.GetBytes, direct to byte array]
 *5: 0.44 sec 114,849,593 num/sec [RCSP.GetBytes, unranged ints, for loop]
  6: 3.11 sec 16,090,076 num/sec [RCSP.GetBytes, 32-bit offsets, ranged ints, for loop]
  7: 4.09 sec 12,215,084 num/sec [RCSP.GetBytes, 32-bit offsets, ranged ints, LINQ]
 +8: 2.59 sec 19,289,457 num/sec [RCSP.GetBytes, 8-bit offsets, ranged ints, for loop]
 +9: 0.42 sec 118,669,825 num/sec [RCSP.GetBytes, ranged ints via modulus, for loop]
+10: 0.99 sec 50,311,904 num/sec [RCSP.GetBytes, ranged ints from 32-bits via modulus, for loop]
 11: 1.10 sec 45,335,515 num/sec [RCSP.GetBytes, ranged ints from 32-bits via modulus with check, for loop]

*unacceptable result
+undesirable result

Method 11 turns out to be the fastest method and appears to give me what I want.

So my questions are:

  1. Is there any more efficient way to generate a large quantity of random, ranged integers in C#?
  2. Is my method of rejecting numbers on the tail end of the valid range valid for keeping the numbers evenly distributed?

Here's the code for method 11:

    int rnCount = 50000000;
    DateTime startTime;
    TimeSpan endTime;
    RNGCryptoServiceProvider rcsp = new RNGCryptoServiceProvider();

    int min = 1;
    int max = 10;

    int randomNumbers = new int[rnCount];

    // method 11: with RNGCryptoServiceProvider, ranged ints, 32 bits per int, ranged with modulus and checked
    byte[] randomBytes = new byte[rnCount * 4];
    rcsp.GetBytes(randomBytes);
    startTime = DateTime.Now;
    // The maximum value we should allow, i.e. the maximum multiple of the desired range delta
    uint maxValue = uint.MaxValue - (uint.MaxValue % (max-min));

    for (int i = 0; i < randomNumbers.Length; i++)
    {
        uint thisNumber = BitConverter.ToUInt32(randomBytes, i * 4);
        while (thisNumber >= maxValue)
        {
            byte[] newNumber = new byte[4];
            rcsp.GetBytes(newNumber);
            thisNumber = BitConverter.ToUInt32(newNumber, 0);
        }
        randomNumbers[i] = (int)(thisNumber % max) + min;
    }
    timeTaken = DateTime.Now - startTime;
    Console.WriteLine("+11: {0:F2} sec {1:N0} num/sec [RCSP.GetBytes, ranged ints from 32-bits via modulus with check, for loop]", timeTaken.TotalSeconds, rnCount / timeTaken.TotalSeconds);




Aucun commentaire:

Enregistrer un commentaire