jeudi 29 novembre 2018

Python random.shuffle does not give exact unique values to the data frame

I am making a dummy dataset of list of companies as user_id, the jobs posted by each company as job_id and c_id as candidate id. I have already achieved the first two steps and my dataset looks like below.

user_id job_id 0 HP HP2 1 Microsoft Microsoft4 2 Accenture Accenture2 3 HP HP0 4 Dell Dell4 5 FIS FIS1 6 HP HP0 7 Microsoft Microsoft4 8 Dell Dell2 9 Accenture Accenture0

Also they are shuffled. now i wish to add a random candidate id to this dataset in such a way that no c_id is repeated to a particular job_id.

My approach for this is as follows. joblist is a list of all job_ids.

for i in range(50):
    l = list(range(0,len(df[df['job_id'] == joblist[i]])))
    random.shuffle(l)
    df['c_id'][df['job_id'] == joblist[i]] = l

after which i tested it as

len(df['c_id'][df['job_id'] == joblist[0]])

output = 168

df['c_id'][df['job_id'] == joblist[0]].nunique()

output = 101

and the same is happening with all values. i have rechecked the uniqueness of l after each step and its 168 unique values. What am i doing wrong here?




Aucun commentaire:

Enregistrer un commentaire