lundi 2 avril 2018

Python - Pandas random sampling per group

I have a dataFrame really similar to that, but with thousands of values :

import numpy as np
import pandas as pd 

# Setup fake data.
np.random.seed([3, 1415])      
df = pd.DataFrame({
    'Class': list('AAAAAAAAAABBBBBBBBBB'),
    'type': (['short']*5 + ['long']*5) *2,
    'image name': (['image01']*2  + ['image02']*2)*5,
    'Value2': np.random.random(20)})

I was able to find a way to do a random sampling of 2 values per images, per Class and per Type with the following code :

df2 = df.groupby(['type', 'Class', 'image name'])[['Value2']].apply(lambda s: s.sample(min(len(s),2)))

I got the following result :

My table

I'm looking for a way to subset that table to be able to randomly choose a random image ('image name') per type and per Class (and conserve the 2 values for the randomly selected image.




Aucun commentaire:

Enregistrer un commentaire