mardi 24 février 2015

determinism for the __gnu_parallel::random_shuffle generator

I was hoping to use the parallel gnu extensions to accelerate a shuffle of an array, the shuffled array would then be used as a workload for an experiment involving sorting. My two goals are speed in generating the workload, and reproducibility of the experiment's results. A description of these gnu extensions is provided here. I ended up confused about the interface:


From what I read, either I pass my own RNG, like the standard library allows, but then I assume I need to worry about thread safety. The documentation here (std::random_shuffle in cppreference) doesn't say much about thread safety requirements on the generator, neither does the gnu link above. So, I assume the worst.


Or, If I don't pass my own RNG, then I assume the library takes care of it. I assume they do consider the multi-threading issues of the RNG in the implementation (though I couldn't find documentation affirming this either). But this leaves me with the problem: I wanted to control the seed to make sure the experiment I am doing is repeatable. I found no documentation about controlling the seed of their default generator.


Both approaches have clear shortcomings.


A third option is to pass in a thread safe generator, for example, by using a lock on it. This seems silly, considering i'm trying to parallelize it. It may still be better than purely sequential shuffle. I haven't tried.


Another option would be one based on thread local state, but I'm starting to get annoyed at having to think so much about how to use this library function.


Note: generating the workload in advance and remembering it is not an option, as I want to generate multiple GBs of data.


What would you do? Do you know of some other documentation I missed?





Aucun commentaire:

Enregistrer un commentaire