jeudi 29 mars 2018

Groupby and Sample pandas

I am trying to sample the resulting data after doing a groupby on multiple columns. If the respective groupby has more than 2 elements, I want to take sample 2 records, else take all the records

df:

col1   col2   col3   col4
A1     A2     A3     A4
A1     A2     A3     A5
A1     A2     A3     A6
B1     B2     B3     B4
B1     B2     B3     B5
C1     C2     C3     C4

target df:

col1   col2   col3   col4
A1     A2     A3     A4 or A5 or A6
A1     A2     A3     A4 or A5 or A6
B1     B2     B3     B4
B1     B2     B3     B5
C1     C2     C3     C4

I have mentioned A4 or A5 or A6 because, when we take sample, either of the three might return

This is what i have tried so far:

trial = pd.DataFrame(df.groupby(['col1', 'col2','col3'])['col4'].apply(lambda x: x if (len(x) <=2) else x.sample(2)))

However, in this I do not get col1, col2 and col3




Aucun commentaire:

Enregistrer un commentaire