samedi 23 février 2019

How to randomly generate an unobserved data in Python3

I have an dataframe which contain the observed data as:

import pandas as pd
d = {'humanID': [1, 1, 2,2,2,2 ,2,2,2,2], 'dogID': 
[1,2,1,5,4,6,7,20,9,7],'month': [1,1,2,3,1,2,3,1,2,2]}
df = pd.DataFrame(data=d)

The df is follow

    humanID  dogID  month
0        1      1      1
1        1      2      1
2        2      1      2
3        2      5      3
4        2      4      1
5        2      6      2
6        2      7      3
7        2     20      1
8        2      9      2
9        2      7      2

We total have two human and twenty dog, and above df contains the observed data. For example:

The first row means: human1 adopt dog1 at January

The second row means: human1 adopt dog2 at January

The third row means: human2 adopt dog1 at Febuary

My goal is randomly generating two unobserved data for each (human, month).

like for human1 at January, he does't adopt the dog [3,4,5,6,7,..20] And I want to randomly create two unobserved sample (human, month) in triple form

humanID dogID month
   1      20    1
   1      10    1

For human1, he doesn't have any activity at Feb, so we don't need to sample the unobserved data.

For human2, he have activity for Jan, Feb and March. Therefore, for each month, we want to randomly create the unobserved data. For example, In Jan, human2 adopt dog1, dog4 and god 20. The two random unobserved samples can be

humanID dogID month
   2      2    1
   2      6    1

same process can be used for Feb and March.

I want to put all of the unobserved in one dataframe such as follow unobserved

    humanID  dogID  month
0        1      20      1
1        1      10      1
2        2      2       1
3        2      6       1
4        2      13      2
5        2      16      2
6        2      1       3
7        2      20      3

Any fast way to do this?

PS: this is an code interview for a start-up company.




Aucun commentaire:

Enregistrer un commentaire