dimanche 15 novembre 2020

Grab 20 random samples from a grouped dataframe?

I'm trying to group a dataframe by date and store_id, then grab 20 random samples per date and store_id. I keep getting an error and I don't understand why? There are thousands of samples per date and store_id in this dataframe. I do not want any repeat samples, i.e. no replacement.

COLS = ['price', 'customer', 'item', 'currency']

data_to_use = new_k_data.groupby(['date', 'store_id'])[COLS].apply(pd.Series.sample, n=20, replace=False)

Error:

ValueError: Cannot take a larger sample than population when 'replace=False'




Aucun commentaire:

Enregistrer un commentaire