I have a program wherein I need to generate a huge quantity of random integers, all within a specific range.
I've written a little benchmark tool that will test various ways to generate random integers and populate an array of ints with the results. The methods I've benchmarked are as follows:
- Simple
Random.Next()
without ranging, populating array using afor
loop* Random.Next(x,y)
, with ranging, populating array using afor
loopRandom.next(x,y)
, with ranging, populating array using LINQRNGCryptoServiceProvider
, directly populating abyte[]
array, no conversion to ints, no ranging*RNGCryptoServiceProvider
, converting all values toint
, without range constraints, using afor
loop*RNGCryptoServiceProvider
, generating 4x the desired random numbers as bytes, converting each value to adouble
between 0 and 1, then using multiplication/addition to range the value while converting toint
. All in afor
loop.RNGCryptoServiceProvider
, same as #6 but using LINQRNGCryptoServiceProvider
, same as #6, but only generatingn+3
bytes and using 8-bit offsets rather than 32-bit offsets to generate the floatsRNGCryptoServiceProvider
, bytes converted to ints and ranged using the modulus operator, in afor
loopRNGCryptoServiceProvider
, same as #9 but with 4x the bytes generated, each int converted from 32-bits rather than just 8 bits, also ranged with modulusRNGCryptoServiceProvider
, same as #10 but rejecting values that might create an uneven distribution.- I do this by figuring out the largest multiple within the integer space. So for example, if working with 8-bit values, and we desire numbers from 1 to 10, we know that there are 10 possible values. We take 255 (maximum value of a byte) % 10 to get 5, and subtract 5 from 255 to get 250. Now, any random number we get that is
>=
250 is rejected and a new number is generated in its place.
- I do this by figuring out the largest multiple within the integer space. So for example, if working with 8-bit values, and we desire numbers from 1 to 10, we know that there are 10 possible values. We take 255 (maximum value of a byte) % 10 to get 5, and subtract 5 from 255 to get 250. Now, any random number we get that is
I haven't thought of any other strategies so far.
There are problems with most of these strategies:
- Any strategy ending with a * does not enforce the range for the
int
s and thus the results are unacceptable. - The LINQ methods are always slower than their respective
for
loop methods. (This is a given.) Random.Next()
is not necessarily "cryptographically" secure or "good enough" randomness.- Method 8 uses overlapping offsets (e.g. each
double
is composed of three of the bits from the previousdouble
, thereby affecting randomness) In this method, the double for value 0 is extracted from bytes 0-3 of the random pool; the double for value 1 is taken from bytes 1-4, and so on. - The methods using the modulus operator other than method 11 will not necessarily produce an even distribution. For example, if you need values betweeen 1 and 10 and you are working with 8-bit byte valuies, the numbers 1 through 6 will occur slightly more likely than 7 through 10, because the numbers 250-255 would map to 1-6 after modulus and ranging. These values would be 4% more likely to occur in any given value.
Here are the results of a benchmark run, running on a Xeon E5-2620v1, which has AES instruction support:
Generate 50,000,000 random numbers in the range 1 to 10...
*1: 1.10 sec 45,335,560 num/sec [Random.Next, unranged ints, for loop]
+2: 1.90 sec 26,266,911 num/sec [Random.Next, ranged ints, for loop]
+3: 3.23 sec 15,476,958 num/sec [Random.Next, ranged ints, LINQ]
*4: 0.11 sec 475,817,979 num/sec [RCSP.GetBytes, direct to byte array]
*5: 0.44 sec 114,849,593 num/sec [RCSP.GetBytes, unranged ints, for loop]
6: 3.11 sec 16,090,076 num/sec [RCSP.GetBytes, 32-bit offsets, ranged ints, for loop]
7: 4.09 sec 12,215,084 num/sec [RCSP.GetBytes, 32-bit offsets, ranged ints, LINQ]
+8: 2.59 sec 19,289,457 num/sec [RCSP.GetBytes, 8-bit offsets, ranged ints, for loop]
+9: 0.42 sec 118,669,825 num/sec [RCSP.GetBytes, ranged ints via modulus, for loop]
+10: 0.99 sec 50,311,904 num/sec [RCSP.GetBytes, ranged ints from 32-bits via modulus, for loop]
11: 1.10 sec 45,335,515 num/sec [RCSP.GetBytes, ranged ints from 32-bits via modulus with check, for loop]
*unacceptable result
+undesirable result
Method 11 turns out to be the fastest method and appears to give me what I want.
So my questions are:
- Is there any more efficient way to generate a large quantity of random, ranged integers in C#?
- Is my method of rejecting numbers on the tail end of the valid range valid for keeping the numbers evenly distributed?
Here's the code for method 11:
int rnCount = 50000000;
DateTime startTime;
TimeSpan endTime;
RNGCryptoServiceProvider rcsp = new RNGCryptoServiceProvider();
int min = 1;
int max = 10;
int randomNumbers = new int[rnCount];
// method 11: with RNGCryptoServiceProvider, ranged ints, 32 bits per int, ranged with modulus and checked
byte[] randomBytes = new byte[rnCount * 4];
rcsp.GetBytes(randomBytes);
startTime = DateTime.Now;
// The maximum value we should allow, i.e. the maximum multiple of the desired range delta
uint maxValue = uint.MaxValue - (uint.MaxValue % (max-min));
for (int i = 0; i < randomNumbers.Length; i++)
{
uint thisNumber = BitConverter.ToUInt32(randomBytes, i * 4);
while (thisNumber >= maxValue)
{
byte[] newNumber = new byte[4];
rcsp.GetBytes(newNumber);
thisNumber = BitConverter.ToUInt32(newNumber, 0);
}
randomNumbers[i] = (int)(thisNumber % max) + min;
}
timeTaken = DateTime.Now - startTime;
Console.WriteLine("+11: {0:F2} sec {1:N0} num/sec [RCSP.GetBytes, ranged ints from 32-bits via modulus with check, for loop]", timeTaken.TotalSeconds, rnCount / timeTaken.TotalSeconds);
Aucun commentaire:
Enregistrer un commentaire