lundi 4 juin 2018

Setting random seed in python disturbs multiprocessing

I've observed that setting a random seed before using multiprocessing in python causes strange behaviour.

In python 3.5.2, only 2 or 3 cores are used with a low percentage of used CPU. In python 2.7.13, all requested cores are used at 100%, but the code seems to never finish. When I remove the initialization of the random seed, the parallelization works fine.

This happens even though there are not an explicit use of random in the parallelized function. I now assume the seed is shared among processes and that prevents the smooth running of multiprocessing, but can someone provide the correct answer?


I've run the code on Linux and here is a minimal code example :

  from multiprocessing import Pool
  import numpy as np
  import random

  random.seed = 2018

  NB_CPUS = 4

  def test(x):
      return x**2

  pool = Pool(NB_CPUS)
  args = [np.random.rand() for _ in range(100000)]

  results = pool.map(test, args)

  pool.terminate()
  results[-5:]




Aucun commentaire:

Enregistrer un commentaire