random: Rapid Repetative Samples Pandas Dataframe

jeudi 27 décembre 2018

Rapid Repetative Samples Pandas Dataframe

First, I want to take random samples from 3 small dataframes and concat the results. Second, I want to repeat this process as many times as possible, filter out uninteresting selections and store and examine the interesting results (later on).

For part 1 I use the following approach now:

def get_sample(n_A, n_B, n_C):
    A = df_A.sample(n = n_A, replace=False)
    B = df_B.sample(n = n_B, replace=False)
    C = df_C.sample(n = n_C, replace=False)
    return pd.concat([A, B, C])

For part 2 I use:

def get_picks(n):
    return [pick for pick in [get_sample(5,5,3) for i in range(n)] if (pick_value(pick) > 750 and pick_price(pick) < 90)]

Currently repeating this thing for 50.000 times takes about 1 minute and 40 seconds on my MacBook? Is that the best I can expect?

Part 2 entails a list comprehension (and if clause) that calls get_sample 50.000 times. The get_sample function concatenates random samples from three different dataframes. The 3 dataframes in the get_sample() method are preset, each have a size of about 150 rows and don't change in the course of the experiment. The 3 dataframes differ in one categorical value.

Any advise on how to improve the speed of this process or alternative approaches to take random samples are welcome of course.

random

jeudi 27 décembre 2018

Rapid Repetative Samples Pandas Dataframe

Aucun commentaire:

Enregistrer un commentaire