dimanche 9 décembre 2018

Python: Thread-safe way to call Networkx and functions on random graphs

I have created this program, mostly to understand how Networkx and parallelization work:

import random
import numpy as np
import networxk as nx
import multiprocessing

def unpack(func):
    @wraps(func)
    def wrapper(arg_tuple):
        return func(*arg_tuple)
    return wrapper

 @unpack
 def parallel_job(seed,shift):
     N = 1000
     k = 10
     random.seed(seed)
     np.random.seed(seed) 
     #Use Networkx to generate a random graph.           
     G = nx.erdos_renyi_graph(int(N),k/float(N), seed = seed) 
     #select 10 random nodes and print them
     for j in range(10):
         I = [10]
         S = [N - I[0]]
         X = np.array([0]*S[0] +[1]*I[0]).reshape((N,1))
         np.random.shuffle(X) 
         print X                 



if __name__ == "__main__":
        threadnum = 10           
        simnum = 10
        seed = [j*2759 + 37*j**2 + 4757 for j in range(threadnum)]
        shift = [j*simnum for j in range(simnum)

        pool = multiprocessing.Pool(threadnum)
        arguments = zip(shift,seed)
        #spawn threadnum threads and give them parallel jobs
        pool.map(parallel_job, iterable=arguments) 

So this program defines a vector of seeds, spawns a certain number of threads, for each thread it assigns a seed. Then generates a random graph with that seed and then selects and prints 10 times a random selection of nodes.

My questions are:

1) if instead of generating exactly one graph per thread I would like to generate m different graphs, how should I modify it? after I generate each graph shall I change the seed with some method or there is a betetr way? Is it even necessary to call networkx with the optional argument seed = seed? I am reading on the documentation of NetworkX that it may use the global random generator to do the random graph and I am a bit worried

2) If I put the part of the program that selects a random number of nodes in a separate function and I call it from the parallelized part of the code, will it use the "correct" seed to randomize the vector?

3) Is there a better way to create random graphs in parallel and then picking random nodes on those graphs starting from a seed (that might be user given) ?

I am using NetworkX version 2.2, numpy 1.15.4, python 2.7




Aucun commentaire:

Enregistrer un commentaire