I have an dataframe which contain the observed data as:
import pandas as pd
d = {'humanID': [1, 1, 2,2,2,2 ,2,2,2,2], 'dogID':
[1,2,1,5,4,6,7,20,9,7],'month': [1,1,2,3,1,2,3,1,2,2]}
df = pd.DataFrame(data=d)
The df
is follow
humanID dogID month
0 1 1 1
1 1 2 1
2 2 1 2
3 2 5 3
4 2 4 1
5 2 6 2
6 2 7 3
7 2 20 1
8 2 9 2
9 2 7 2
We total have two human
and twenty dog
, and above df
contains the observed data. For example:
The first row means: human1
adopt dog1
at January
The second row means: human1
adopt dog2
at January
The third row means: human2
adopt dog1
at Febuary
My goal is randomly generating two
unobserved data for each (human, month)
.
like for human1
at January
, he does't adopt the dog [3,4,5,6,7,..20]
And I want to randomly create two unobserved sample (human, month)
in triple form
humanID dogID month
1 20 1
1 10 1
For human1
, he doesn't have any activity at Feb, so we don't need to sample the unobserved data.
For human2
, he have activity for Jan, Feb and March. Therefore, for each month, we want to randomly create the unobserved data. For example, In Jan, human2
adopt dog1
, dog4
and god 20
. The two random unobserved samples can be
humanID dogID month
2 2 1
2 6 1
same process can be used for Feb and March.
I want to put all of the unobserved in one dataframe such as follow unobserved
humanID dogID month
0 1 20 1
1 1 10 1
2 2 2 1
3 2 6 1
4 2 13 2
5 2 16 2
6 2 1 3
7 2 20 3
Any fast way to do this?
PS: this is an code interview for a start-up company.
Aucun commentaire:
Enregistrer un commentaire