lundi 28 octobre 2019

Sample dataframe with number of records sampled per hour predefined

I have to sample a dataframe (df1) and I have another dataframe (df2) that tells me how many records I should retrieve from each hour of the day.

For example, df1:

   Hour number
0.  00    A
1.  00    B
2.  00    C
3.  01    D
4.  01    A
5.  01    B
6.  01    D

df2:

   Hour number
0.  00    1
1.  01    2

So that in the end, I would get for example, record number 1 for midnight and records 3 and 5 for 1 am (or any other combination so long as it respects the number in df2)

The thing is that I need to write this in a function in order for me to call this inside another function.

So far I have

def sampling(frame):
     return np.random.choice(frame.index)

but I am failing to add the constraints of the df2.

Could anybody help?




Aucun commentaire:

Enregistrer un commentaire