I have a data set about 50k~ rows that has a certain Job ID and the User ID of the person that performed the job. It is represented by this sample I've created:
df = pd.DataFrame(data={
'job_id': ['00001', '00002', '00003', '00004', '00005', '00006', '00007', '00008', '00009', '00010', '00011', '00012', '00013', '00014', '00015'],
'user_id': ['frank', 'josh', 'frank', 'jessica', 'josh', 'eric', 'frank', 'josh', 'eric', 'jessica', 'jessica', 'james', 'frank', 'josh', 'james']
})
job_id user_id
0 00001 frank
1 00002 josh
2 00003 frank
3 00004 jessica
4 00005 josh
5 00006 eric
6 00007 frank
7 00008 josh
8 00009 eric
9 00010 jessica
10 00011 jessica
11 00012 james
12 00013 frank
13 00014 josh
14 00015 james
I wish to assign peer reviewers for those jobs in a new column called 'reviewer_id', where the reviewer is from the list of user_id's but the cannot be the same user_id
. For example: frank
can't review his own job, but jessica
can.
My desired output would be something like this:
job_id user_id reviewer_id
0 00001 frank jessica
1 00002 josh frank
2 00003 frank josh
3 00004 jessica eric
4 00005 josh james
...
11 00012 james frank
12 00013 frank josh
13 00014 josh eric
14 00015 james eric
I'm quite new to python so I can only think of getting a list of unique user_id
from reviewers = df['user_id'].unique().tolist()
and iterating over the dataframe and assigning a reviewer ID but I know you should typically never iterate over a pandas dataframe. So I'm lost on how I would go about something like this.
Aucun commentaire:
Enregistrer un commentaire