I want to create an array (say output_list
) from a given numpy (say input_list
) after resampling such that each element from input_list
exists in output_list
at least once. The length of output_list
will be always > the length of input_list.
I tried a few approaches, and I am looking for a faster method. Unfortunately, numpy
's random.choice
doesn't guarantee that at least one element exists.
Step 1: Generate Data
import string
import random
import numpy as np
size = 150000
chars = string.digits + string.ascii_lowercase
input_list= [
"".join(
[random.choice(chars) for i in range(5)]
) for j in range(dict_data[1]['unique_len'])]
Option 1: Let's try numpy
's random.choice
with uniform distribution in terms of probability.
output_list = np.random.choice(
input_list,
size=output_size,
replace=True,
p=[1/input_list.__len__()]*input_list.__len__()
)
assert set(input_list).__len__()==set(output_list).__len__(),\
"Output list has fewer elements than input list"
This raises assertion:
Output list has fewer elements than input list
Option 2 Let's pad random numbers to input_list
and then shuffle it.
output_list = np.concatenate((np.array(input_list),np.random.choice(
input_list,
size=output_size-input_list.__len__(),
replace=True,
p=[1/input_list.__len__()]*input_list.__len__()
)),axis=None)
np.random.shuffle(output_list)
assert set(input_list).__len__()==set(output_list).__len__(),\
"Output list has fewer elements than input list"
While this doesn't raise any assertion, I am looking for a faster solution than this either algorithmically or using numpy
's in-built function.
Thanks for any help.
Aucun commentaire:
Enregistrer un commentaire