jeudi 26 janvier 2017

generating large number of random variates

I'm trying to figure out the best way to generate many random numbers in python. The difficult part is that I won't know how many numbers I'll need before runtime

I have a program that uses random numbers one at a time, but it needs to do this many times.

The things I've tried so far are:

  • generate random numbers one at a time using random.random()
  • generate random numbers one at a time using np.random.rand()
  • generate random numbers in a batch of N using np.random.rand(N)
  • generate random numbers in a batch of N using np.random.rand(N) and make a new batch after the first N have all been used (I've tried two different implementations, and both are slower than just generating one number at a time)

In the following script, I compare these three methods (for both uniform and normally-distributed random numbers).

I don't know whether the p function is really necessary, but I wanted to do equivalent things with the random numbers in each case, and this seemed like the simplest way to do that.

#!/bin/python3

import time
import random
import numpy as np

def p(x):
    pass

def gRand(n):
    for i in range(n):
        p(random.gauss(0,1))

def gRandnp1(n):
    for i in range(n):
        p(np.random.randn())

def gRandnpN(n):
    rr=np.random.randn(n)
    for i in rr:
        p(i)

def uRand(n):
    for i in range(n):
        p(random.random())

def uRandnp1(n):
    for i in range(n):
        p(np.random.rand())

def uRandnpN(n):
    rr=np.random.rand(n)
    for i in rr:
        p(i)

tStart=[]
tEnd=[]
N=1000000
for f in [uRand, uRandnp1, uRandnpN]:
    tStart.append(time.time())
    f(N)
    tEnd.append(time.time())

for f in [gRand, gRandnp1, gRandnpN]:
    tStart.append(time.time())
    f(N)
    tEnd.append(time.time())

print(np.array(tEnd)-np.array(tStart))

A representative example of the output of this timing script is:
[ 0.26499939 0.45400381 0.19900227 1.57501364 0.49000382 0.23000193]
The first three numbers are for uniform random numbers on [0,1), and the next three are for normally-distributed numbers (mu=0, sigma=1).

For either type of random variate, the fastest method (of these three) is to generate all random numbers at once, store them in an array, and iterate over the array. The problem is that I won't know how many of these numbers I'll need until after I run the program.

What I'd like to do is generate the random numbers in large batches. Then when I use all the numbers in one batch, I'll just repopulate the object where they're stored. The problem is that I don't know of a clean way to implement this. One solution I came up with is the following:

N=1000000
numRepop=4
N1=N//numRepop
__rands__=[]
irand=-1

def repop():
    global __rands__
    __rands__=np.random.rand(N1)

repop()

def myRand():
    global irand
    try:
        irand += 1
        return __rands__[irand]
    except:
        irand=1
        repop()
        return __rands__[0]

but this is actually slower than any of the other options.

If I convert the numpy array to a list and then pop elements off, I get performance similar to just using numpy to generate random variates one at a time:

__r2__=[]

def repop2():
    global __r2__
    rr=np.random.rand(N1)
    __r2__=rr.tolist()

repop2()

def myRandb():
    try:
        return __r2__.pop()
    except:
        repop2()
        return __r2__.pop()

Is there a better way to do this?




Aucun commentaire:

Enregistrer un commentaire