I have two Multi-Indexed pd Series, with level Subject
matching in both:
s1 = pd.DataFrame({'Subject':['S1', 'S1', 'S1', 'S1', 'S2', 'S2', 'S2', 'S2', 'S3', 'S3', 'S3', 'S3', 'S3', 'S3'],
'Movie':['mov1', 'mov1', 'mov2', 'mov2', 'mov1', 'mov1', 'mov2', 'mov2', 'mov1', 'mov1', 'mov2', 'mov2', 'mov3', 'mov3'],
'TimeStamp':['Start', 'End', 'Start', 'End', 'Start', 'End', 'Start', 'End', 'Start', 'End', 'Start', 'End', 'Start', 'End'],
'Distance': np.random.rand(14)*100}).set_index(['Subject', 'Movie', 'TimeStamp'])['Distance']
s2 = pd.DataFrame({'Subject':['S1', 'S1', 'S1', 'S1', 'S1', 'S2', 'S2', 'S2', 'S2', 'S3', 'S3', 'S3', 'S3', 'S3', 'S3'],
'Movie':['mov1', 'mov1', 'mov2', 'mov2', 'mov2', 'mov1', 'mov1', 'mov2', 'mov2', 'mov1', 'mov1', 'mov1', 'mov2', 'mov2', 'mov2'],
'TimeStamp':['Start', 'End', 'Start', 'Mid', 'End', 'Start', 'End', 'Start', 'End', 'Start', 'Mid', 'End','Start', 'Mid', 'End'],
'Distance': np.random.rand(15)*100}).set_index(['Subject', 'Movie', 'TimeStamp'])['Distance']
I want to create two new series, each containing all Subjects and each subject is randomly chosen from one of the two original Series. That is, if for example new_serie1
contains the entry matching s2.xs('S1', level='Subject')
, than we need to have the equivalent new_series2
to contain the entry matching s1.xs('S1', level='Subject')
.
I'm hoping to avoid iterating over all Subject
values and doing this in a faster way using numpy/pandas randomization.
Thanks!
Aucun commentaire:
Enregistrer un commentaire