First, I want to take random samples from 3 small dataframes and concat the results. Second, I want to repeat this process as many times as possible, filter out uninteresting selections and store and examine the interesting results (later on).
For part 1 I use the following approach now:
def get_sample(n_A, n_B, n_C):
A = df_A.sample(n = n_A, replace=False)
B = df_B.sample(n = n_B, replace=False)
C = df_C.sample(n = n_C, replace=False)
return pd.concat([A, B, C])
For part 2 I use:
def get_picks(n):
return [pick for pick in [get_sample(5,5,3) for i in range(n)] if (pick_value(pick) > 750 and pick_price(pick) < 90)]
Currently repeating this thing for 50.000 times takes about 1 minute and 40 seconds on my MacBook? Is that the best I can expect?
Part 2 entails a list comprehension (and if clause) that calls get_sample
50.000 times. The get_sample
function concatenates random samples from three different dataframes. The 3 dataframes in the get_sample()
method are preset, each have a size of about 150 rows and don't change in the course of the experiment. The 3 dataframes differ in one categorical value.
Any advise on how to improve the speed of this process or alternative approaches to take random samples are welcome of course.
Aucun commentaire:
Enregistrer un commentaire