mercredi 26 février 2020

Python - Fast way to sample data from array when sample size changes

I am trying to sample data from a list of integers. The tricky part is that each sample should have a different size to emulate some other data I have. I am doing a for loop right now that can do the job, but I was just wondering if there are faster ways that I am not aware of.

Since I think random.sample is supposed to be fast, I am doing:

result = []
for i in range(100000):
    size = list_of_sizes[i]
    result.append(random.sample(data, size))

So the result I get is something like:

>>>list_of_sizes
    [3, 4, 1, 2,...]

>>>result
    [[1,2,3],
     [3, 6, 2, 8],
     [9],
     [10, 100],
     ...]

I have tried using np.random.choice(data, size, replace=False) and random.sample(data, k=size), but they don't allow giving an array of different sizes to vectorize the operation (when np.random.choice takes an array in the size parameter, it creates a tensor whose output's shape is that of size, but not an array of samples). Ideally, I would be expecting something like:

>>>np.random.choice(data, list_of_sizes, replace=False)
    [[1,2,3],
     [3, 6, 2, 8],
     [9],
     [10, 100],
     ...]



Aucun commentaire:

Enregistrer un commentaire