I'm attempting to generate random numbers inside a nogil
prange
loop in Cython by extending numpy.random
RNGs. I'm attempting to use this example, and combine it with intuitions from here to create separate bit generators for each thread to avoid locking requirements. I am clearly misunderstanding something, however, as the bitgen_t
entity's state
and memory location are the same across what I assumed would be different bit generators with independent states. The result is that, e.g., when generating 1000 random numbers in this way, there are something like 40% duplicate numbers (which makes sense, given there is only actually one bit generator being used). When generating large amounts of random numbers, it also segfaults, which I imagine is related.
If there is an easier way to do this, let me know!
Here is current code:
#!/usr/bin/env python3
#cython: language_level=3
import numpy as np
cimport numpy as np
cimport cython
from cpython.pycapsule cimport PyCapsule_IsValid, PyCapsule_GetPointer
from cython.parallel import prange, threadid
from numpy.random import PCG64, SeedSequence, Generator
from numpy.random cimport bitgen_t
from libc.stdlib cimport malloc, free
from libc.stdint cimport uintptr_t
@cython.boundscheck(False)
@cython.wraparound(False)
def uniforms(Py_ssize_t n):
"""
Create an array of `n` uniformly distributed doubles.
A 'real' distribution would want to process the values into
some non-uniform distribution
"""
cdef Py_ssize_t i, thid
cdef const char *capsule_name = "BitGenerator"
cdef double[:] random_values = np.ones(n, dtype='float64') * -1.
# Create an RNG for each thread so we don't have to lock them.
cdef Py_ssize_t num_threads = 16
capsules = [Generator(PCG64(s)).bit_generator.capsule for s in SeedSequence(1234).spawn(num_threads)]
cdef bitgen_t **rngs_c
try:
# Cast capsules into ptr array for use outside of gil.
rngs_c = <bitgen_t**>malloc(num_threads * sizeof(bitgen_t*))
for i, capsule in enumerate(capsules):
if not PyCapsule_IsValid(capsule, capsule_name):
raise ValueError("Invalid pointer to anon_func_state")
rngs_c[i] = <bitgen_t *> PyCapsule_GetPointer(capsule, capsule_name)
print(<uintptr_t>rngs_c[i])
with nogil:
for i in prange(n, schedule="static", num_threads=num_threads):
thid = threadid()
rng = rngs_c[thid]
random_values[i] = rng.next_double(rng.state)
finally:
free(rngs_c)
return np.asarray(random_values)
If I run this as is, I get this output (note the print(<uintptr_t>rngs_c[i])
that prints the address of the bitgen_t
as retrieved from the different capsules.
>>> x = uniforms(1000)
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
>>> np.unique(x).size
594
Note that compiling this (and observing issues related to parallelization) requires compiling with openmp. Details can be found here.
Aucun commentaire:
Enregistrer un commentaire