Given pre-defined ranges, a list of percentages, and some data, I need to randomly select a percentage of IDs from those elements that are located between each range.
The code below shows how I do it, and the for block is currently the bottleneck. I'm sure this could be made faster probably with some vectorization, but I don't know how.
import numpy as np
import itertools
# Generate some random data
N = 1000
aa = np.random.uniform(12., 20., N)
# Define edges/ranges.
edges = np.array([16.67666667, 16.77721569, 16.87776471, 16.97831373,
17.07886275, 17.17941176, 17.27996078, 17.3805098,
17.48105882, 17.58160784, 17.68215686, 17.78270588,
17.8832549, 17.98380392, 18.08435294, 18.18490196,
18.28545098, 18.386])
# Percentage of elements in 'aa' that will be kept for each 'edges' range.
perc = np.random.uniform(0., 1., len(edges) - 1)
# Locate indexes of 'aa' elements within each 'edges' range.
c_indx = np.searchsorted(edges, aa, side='left')
# THIS IS THE BOTTLENECK
cc = []
# For each defined percentage value (one per edge range).
for i, p in enumerate(perc):
# Locate IDs of lements within each range. Use 'i + 1' since the first
# edge range (ie: those elements with c_indx=0) are discarded.
idxs = np.where(c_indx == i + 1)[0]
# Shuffle IDs within this edge range (in place)
np.random.shuffle(idxs)
# Estimate the number of elements from 'aa' to keep for
# this range, given the fixed percentage 'p'.
d = int(round(idxs.size * p, 0))
# Store the correct percentage of random IDs from 'aa' for this range.
cc.append(idxs[:d])
# Final list (flattened)
cc = list(itertools.chain.from_iterable(cc))
Aucun commentaire:
Enregistrer un commentaire