mardi 8 février 2022

ValueError: Cannot take a larger sample than population when 'replace=False' using Groupby pandas

I want to randomly pick-up i.e. 10 groups that I have in a dataframe, but i'm stuck with this error. What can I do if I want to apply a groupby before the random selection? I try the following approaches: random_selection=tot_groups.groupby('query_col').apply(lambda x: x.sample(3)) random_selection=tot_groups.groupby('query_col').sample(n=10)

Error: ValueError: Cannot take a larger sample than population when 'replace=False'

Thanks !

UPDATE:

Current dataset

ABG23209.1,UBH04469.1,89.655,145,15,0,1,145,19,163,3.63e-100,275.0
ABG23209.1,UBH04470.1,89.655,145,15,0,1,145,20,164,4.68e-100,275.0
ABG23209.1,UBH04471.1,89.655,145,15,0,1,145,19,163,4.83e-100,275.0
ABG23209.1,UBH04472.1,89.655,145,15,0,1,145,24,168,5.58e-100,275.0
KOX89835.1,SFN69046.1,79.07,86,18,0,1,86,12,97,1.36e-49,143.0
KOX89835.1,SFE98714.1,77.907,86,19,0,1,86,19,104,2.1400000000000002e-49,143.0
KOX89835.1,WP_086938959.1,76.471,85,20,0,1,85,4,88,1.25e-48,140.0
KOX89835.1,WP_231794161.1,76.471,85,20,0,1,85,5,89,1.75e-48,140.0
KOX89835.1,WP_231794169.1,75.294,85,21,0,1,85,5,89,2.41e-48,140.0
WP_001287378.1,QBP98897.1,86.765,136,17,1,1,135,1,136,1.68e-85,241.0
WP_001287378.1,WP_005164157.1,86.765,136,17,1,1,135,1,136,1.68e-85,241.0
WP_001287378.1,WP_085071573.1,86.667,135,18,0,1,135,1,135,1.73e-85,241.0
WP_001287378.1,WP_014608965.1,86.765,136,17,1,1,135,1,136,2.49e-85,240.0
WP_001287378.1,WP_004932170.1,86.667,135,18,0,1,135,1,135,6.88e-78,221.0
WP_001287378.1,GGD19357.1,91.912,136,10,1,1,136,1,135,1.01e-77,221.0
WP_001287378.1,OMQ27200.1,85.926,135,19,0,1,135,1,135,1.79e-77,221.0
XP_037955766.1,WP_229689219.1,93.583,374,24,0,3,376,5,378,0.0,745.0
XP_037955766.1,WP_229799179.1,93.583,374,24,0,3,376,1,374,0.0,744.0
XP_037955766.1,WP_017454560.1,92.308,377,28,1,1,376,1,377,0.0,738.0
XP_037955766.1,WP_108127780.1,92.838,377,26,1,1,376,1,377,0.0,736.0

Desidered output: Randomly select n groups in the dataframe, groupby query_col . I.e. with n=2:

WP_001287378.1,QBP98897.1,86.765,136,17,1,1,135,1,136,1.68e-85,241.0
WP_001287378.1,WP_005164157.1,86.765,136,17,1,1,135,1,136,1.68e-85,241.0
WP_001287378.1,WP_085071573.1,86.667,135,18,0,1,135,1,135,1.73e-85,241.0
WP_001287378.1,WP_014608965.1,86.765,136,17,1,1,135,1,136,2.49e-85,240.0
WP_001287378.1,WP_004932170.1,86.667,135,18,0,1,135,1,135,6.88e-78,221.0
WP_001287378.1,GGD19357.1,91.912,136,10,1,1,136,1,135,1.01e-77,221.0
WP_001287378.1,OMQ27200.1,85.926,135,19,0,1,135,1,135,1.79e-77,221.0
ABG23209.1,UBH04469.1,89.655,145,15,0,1,145,19,163,3.63e-100,275.0
ABG23209.1,UBH04470.1,89.655,145,15,0,1,145,20,164,4.68e-100,275.0
ABG23209.1,UBH04471.1,89.655,145,15,0,1,145,19,163,4.83e-100,275.0
ABG23209.1,UBH04472.1,89.655,145,15,0,1,145,24,168,5.58e-100,275.0



Aucun commentaire:

Enregistrer un commentaire