I was facing performance issues while generating random numbers via multiple threads. This was cause of using the same random engine for all threads. Then I implemented a vector which contains a random engine for each thread (found this solution in another post here on stackoverflow). But I would expect that the number of iterations per second grows linearily with the number of threads I'm executing. But this seems not to be the case.
Here is a minimal example:
#include <random>
#include <omp.h>
const int threads = 4;
int main()
{
std::uniform_int_distribution<uint64_t> uint_dist;
std::vector<std::mt19937_64> random_engines;
std::random_device rd;
for (int i = 0;i < threads;i++)
random_engines.push_back(std::mt19937_64((rd())));
omp_set_num_threads(threads);
int counter = 0;
#pragma omp parallel for
for (int i = 0;i < threads;++i)
{
int thread = omp_get_thread_num();
while (counter < 100)
{
if (uint_dist((random_engines[thread])) < (1ULL << 42))
counter++;
}
}
}
While executing this code with one active thread it takes an average execution time of ~4 seconds on my CPU. Setting threads to 4 gives me an average execution time of ~2 seconds, so the number of threads gets a multiplicator of 4, which ends up in a speedup of 2. Do I miss something?
Aucun commentaire:
Enregistrer un commentaire