mardi 21 janvier 2020

Get one random sample for each group and end with a stratified sample pandas

I'm working with a dataframe like this:

group    period
  A      20130101
  A      20130201
  .          .
  E      20130901
  E      20131001

Let's say I have 100 different groups and 10 possible dates, which are distributed like this: [.1,.05,.2,.05,.1,.1,.2,.05,.05,.1]. I need to get one sample for each group, so 10% of the final sample is obtained from the first period, 5% from the second period, 20% fom the third period, and so on. I managed to get a random sample for each group, but it's heavily skewed, like this:

fn = lambda obj: obj.loc[np.random.choice(obj.index, 1, replace=False),:]
dfrd = df[['group','period']].groupby('group', as_index=False).apply(fn)
dfrd.index = [index[1] for index in dfrd.index]

So, is there any way to do something similar, but stratified? Thanks




Aucun commentaire:

Enregistrer un commentaire