jeudi 9 juin 2016

Bottleneck at random number generation with multiple threads

I was facing performance issues while generating random numbers via multiple threads. This was cause of using the same random engine for all threads. Then I implemented a vector which contains a random engine for each thread (found this solution in another post here on stackoverflow). But I would expect that the number of iterations per second grows linearily with the number of threads I'm executing. But this seems not to be the case.

Here is a minimal example:

#include <random>
#include <omp.h>

const int threads = 4;

int main()
{
    std::uniform_int_distribution<uint64_t> uint_dist;
    std::vector<std::mt19937_64> random_engines;
    std::random_device rd;

    for (int i = 0;i < threads;i++)
        random_engines.push_back(std::mt19937_64((rd())));

    omp_set_num_threads(threads);

    int counter = 0;
    #pragma omp parallel for
    for (int i = 0;i < threads;++i)
    {
        int thread = omp_get_thread_num();
        while (counter < 100)
        {
            if (uint_dist((random_engines[thread])) < (1ULL << 42))
                counter++;
        }
    }
}

While executing this code with one active thread it takes an average execution time of ~4 seconds on my CPU. Setting threads to 4 gives me an average execution time of ~2 seconds, so the number of threads gets a multiplicator of 4, which ends up in a speedup of 2. Do I miss something?




Aucun commentaire:

Enregistrer un commentaire