lundi 25 avril 2016

pandas - groupby and select variable amount of random values according to column

Starting from this simple dataframe df:

df = pd.DataFrame({'c':[1,1,2,2,2,2,3,3,3], 'n':[1,2,3,4,5,6,7,8,9], 'N':[1,1,2,2,2,2,2,2,2]})

I'm trying to select N random value from n for each c. So far I managed to groupby and get one single element / group with:

sample = df.groupby('c').apply(lambda x :x.iloc[np.random.randint(0, len(x))])

that returns:

   N  c  n
c         
1  1  1  2
2  2  2  4
3  2  3  8

My expected output would be something like:

   N  c  n
c         
1  1  1  2
2  2  2  4
2  2  2  3
3  2  3  8
3  2  3  7

so getting 1 sample from c=1 and 2 samples for c=2 and c=3, according to the N column.




Aucun commentaire:

Enregistrer un commentaire