samedi 23 novembre 2019

How to efficiently perform billions of Bernoulli extractions using numpy?

I am working at a thesis about epidemiology, and I have to simulate a SI epidemic in a temporal network. At each time step there's a probability ~ Bernoulli(beta) to perform an extraction between an infected and a susceptible node. I am using np.random.binomial(size=whatever, n=1, p=beta) to make the computer decide. Now, I have to simulate the epidemic in the same network by making it start from each one of the nodes. This should be repeated K times to get some statistically relevant results for each node, and, since the temporal network is stochastic too, everything should be repeated NET_REALIZATION times.

So, in a network with N = 100, if K=500 and NET=REALIZATION=500, the epidemic should be repeated 25,000,000‬ times. If T=100, it means 2,500,000,000‬ extractions per set of S-I couples (which of course varies in time). If beta is small, which is often the case, this leads to a very time-spending computation. If you think that, for my computer, the bernoulli extraction takes 3.63 µs to happen, this means I have to wait hours to get some results, which is really limitating the development of my thesis. The problem is that more than half of the time is just spent in random extractions. I should use numpy since the results of extractions interact with other data structures. I tried to use numba, but it didn't seem to improve extractions' speed. Is there a faster way to get the same results? I was thinking about doing a very very big extraction once forever, something like 10^12 extractions of 0s and 1s, and just import a part of them for each different simulation (this should be repeated for several values of beta), but I wonder if there's a smarter move.

Thanks for help




Aucun commentaire:

Enregistrer un commentaire