I am making a dummy dataset of list of companies as user_id, the jobs posted by each company as job_id and c_id as candidate id. I have already achieved the first two steps and my dataset looks like below.
user_id job_id 0 HP HP2 1 Microsoft Microsoft4 2 Accenture Accenture2 3 HP HP0 4 Dell Dell4 5 FIS FIS1 6 HP HP0 7 Microsoft Microsoft4 8 Dell Dell2 9 Accenture Accenture0
Also they are shuffled. now i wish to add a random candidate id to this dataset in such a way that no c_id is repeated to a particular job_id.
My approach for this is as follows. joblist is a list of all job_ids.
for i in range(50): l = list(range(0,len(df[df['job_id'] == joblist[i]]))) random.shuffle(l) df['c_id'][df['job_id'] == joblist[i]] = l
after which i tested it as
len(df['c_id'][df['job_id'] == joblist[0]])
output = 168
df['c_id'][df['job_id'] == joblist[0]].nunique()
output = 101
and the same is happening with all values. i have rechecked the uniqueness of l
after each step and its 168 unique values. What am i doing wrong here?
Aucun commentaire:
Enregistrer un commentaire