lundi 25 novembre 2019

boost::random generates identical values too often from the same seed at different states

Problem description

Sometimes I get the same random number from a uniform distribution using a Mersenne Twister engine even I properly used the engine and iterated it. I know that the number of possible states of the engine is finite and number of possible generated values is also finite, but this is not the case now.

Using boost's implementation, 1e6 number of uniformly distributed random values are generated on the range [0; 1e7). That means that there way more possible values than required number of random values. However, I get quite often the same values, sometimes more than 100 times in this range. How is it possible?

Code

A simple code is provided to reproduce the situation. On both platforms I get the same problem:

  • MSVS 2019 with boost-random:x64-windows 1.71.0, and
  • g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 with libboost-dev 1.58.0.1ubuntu1
#include <iostream>
#include <chrono>

#include <boost/random/mersenne_twister.hpp>          // random number generator
#include <boost/random/uniform_real_distribution.hpp> // uniform distribution generator
using namespace std;

int main()
{
    size_t seed = static_cast<int> (std::chrono::system_clock::now().time_since_epoch().count());
    cout << "seed = " << seed << endl;

    boost::random::mt19937 engine(seed);                         // the random number generator engine
    boost::random::uniform_real_distribution<double> u(0, 1e7);  // uniformly distributed double values on the range [0; 1e7)
    cout.precision(20);
    vector<double> history;                                      // stores the generated values for comparison
    for (size_t i = 0; i < 1e6; ++i)
    {
        history.push_back(u(engine));
        for (size_t j = 0; j < i; ++j)
            if (history[i] == history[j])
                cout << "Equal values ("<< history[i] <<") at ID = " << i << " and " << j << endl;
    }
}

Question

Is there a bug in the code that produces the same values? Or is it a bug in boost?

For my task it is important to generate numbers with uniform distribution. Finding identical values is one of the easiest tests but there are many more and I am sure I don't want to do quality analysis on a well-known library like Boost. I didn't want to use the standard library, because it is not guaranteed that two different compilers will give the same sequence for the same seed values, but it was a requirement for the task. What kind of a solution can you suggest?

Note

A strange behavior can be seen if one compares the generated values with the one std::random generates. Example for values from random::boost for seed 4561565448989 is

1755586.0406719148159
3354420.976247638464   <--
3630764.0071026980877
3488445.2889673411846  <--
7920481.4555123448372
8773544.1024415194988  <--

while standard library generates

3354420.9766563926823  <--
3488445.2898126943037  <--
8773544.1042856499553  <--
...

That is, every second generated value in the boost's sequence is very close to a corresponding value in the standard library's implementation. When two values in the boost-sequence are equal, the values in the standard-library-sequence are not equal, but close to each other. The similarity holds for MSVS and g++ compilers too, which have the right to have different implementation for Mersenne Twister and distributions.




Aucun commentaire:

Enregistrer un commentaire