jeudi 27 décembre 2018

Rapid Repetative Samples Pandas Dataframe

First, I want to take random samples from 3 small dataframes and concat the results. Second, I want to repeat this process as many times as possible, filter out uninteresting selections and store and examine the interesting results (later on).

For part 1 I use the following approach now:

def get_sample(n_A, n_B, n_C):
    A = df_A.sample(n = n_A, replace=False)
    B = df_B.sample(n = n_B, replace=False)
    C = df_C.sample(n = n_C, replace=False)
    return pd.concat([A, B, C])

For part 2 I use:

def get_picks(n):
    return [pick for pick in [get_sample(5,5,3) for i in range(n)] if (pick_value(pick) > 750 and pick_price(pick) < 90)]

Currently repeating this thing for 50.000 times takes about 1 minute and 40 seconds on my MacBook? Is that the best I can expect?

Part 2 entails a list comprehension (and if clause) that calls get_sample 50.000 times. The get_sample function concatenates random samples from three different dataframes. The 3 dataframes in the get_sample() method are preset, each have a size of about 150 rows and don't change in the course of the experiment. The 3 dataframes differ in one categorical value.

Any advise on how to improve the speed of this process or alternative approaches to take random samples are welcome of course.




Aucun commentaire:

Enregistrer un commentaire