vendredi 18 septembre 2020

Randomly Choose pd.Series Entries Based on Single Index Level

I have two Multi-Indexed pd Series, with level Subject matching in both:

s1 = pd.DataFrame({'Subject':['S1', 'S1', 'S1', 'S1', 'S2', 'S2', 'S2', 'S2', 'S3', 'S3', 'S3', 'S3', 'S3', 'S3'],
                  'Movie':['mov1', 'mov1', 'mov2', 'mov2', 'mov1', 'mov1', 'mov2', 'mov2', 'mov1', 'mov1', 'mov2', 'mov2', 'mov3', 'mov3'],
                  'TimeStamp':['Start', 'End', 'Start', 'End', 'Start', 'End', 'Start', 'End', 'Start', 'End', 'Start', 'End', 'Start', 'End'],
                  'Distance': np.random.rand(14)*100}).set_index(['Subject', 'Movie', 'TimeStamp'])['Distance']
s2 = pd.DataFrame({'Subject':['S1', 'S1', 'S1', 'S1', 'S1', 'S2', 'S2', 'S2', 'S2', 'S3', 'S3', 'S3', 'S3', 'S3', 'S3'],
                  'Movie':['mov1', 'mov1', 'mov2', 'mov2', 'mov2', 'mov1', 'mov1', 'mov2', 'mov2', 'mov1', 'mov1', 'mov1', 'mov2', 'mov2', 'mov2'],
                  'TimeStamp':['Start', 'End', 'Start', 'Mid', 'End', 'Start', 'End', 'Start', 'End', 'Start', 'Mid', 'End','Start', 'Mid', 'End'],
                  'Distance': np.random.rand(15)*100}).set_index(['Subject', 'Movie', 'TimeStamp'])['Distance']

I want to create two new series, each containing all Subjects and each subject is randomly chosen from one of the two original Series. That is, if for example new_serie1 contains the entry matching s2.xs('S1', level='Subject'), than we need to have the equivalent new_series2 to contain the entry matching s1.xs('S1', level='Subject').
I'm hoping to avoid iterating over all Subject values and doing this in a faster way using numpy/pandas randomization.

Thanks!




Aucun commentaire:

Enregistrer un commentaire