I'm having some troubles for adding some values to a sorted list. I have two dataframes, which are the following: The first one is df_ordenaleatorio (original DF) that has 45000 rows, here's a summary of its format.
id qwe nivel num_orden
17312 40914720 516 1107300 29
22231 41682691 516 1107300 11
22875 41793014 516 1107300 22
24797 42154820 516 1107300 32
25258 42478054 516 1107300 5
25315 42519052 516 1107300 1
26098 43119817 516 1107300 35
26268 43201796 516 1107300 4
26495 43301451 516 1107300 37
26529 43313762 516 1107300 31
26937 43413528 516 1107300 28
26957 43425220 516 1107300 15
26964 43425466 516 1107300 36
27568 43539349 516 1107300 38
27605 43552829 516 1107300 7
27643 43565565 516 1107300 19
27868 43608550 516 1107300 13
27875 43609319 516 1107300 27
28094 43651052 516 1107300 8
28371 43718414 516 1107300 20
28491 43746553 516 1107300 45
28515 43748545 516 1107300 10
28711 43802508 516 1107300 46
28915 43832421 516 1107300 18
28922 43833155 516 1107300 43
28967 43846105 516 1107300 25
29407 43931105 516 1107300 9
29443 43944652 516 1107300 42
29482 43958493 516 1107300 16
30307 44139221 516 1107300 21
For every pair {qwe,nivel} there are many different ids and each one has a different num_orden. For giving some context to the question, id is the identificador of a person, the pair {qwe,nivel} is a course in an specific university. Num_orden is the relativity preference that the university expressed. For example, if the applicant id 1 has 1 as num_orden is one {qwe,nivel} it means that has the first preference for entering that course in that university. The number of applicants for each pair {qwe_nivel} is different, so in many the maximum num_orden is 50, and in others it can be 150.
I have to add some applicants to df_ordenaleatorio, but assigning their num_orden randomly. The dataframe df_new is as the following:
id qwe nivel
0 30004612 12683 1101200
1 30007619 127 1101100
2 30018027 24318 1101300
3 30116284 2330 1101200
4 30116078 127 1101300
5 30007603 127 1101100
So, it has almost everything for joining it to the original dataframe except for the num_orden. Num_orden has to be assigned randomly between 1 and the number of students that are applying in that moment to that pair {qwe,nivel}. Notice that this will cause, that when we add that application to the original DF, there will be a repeated num_orden (num_orden has to be unique), so the repeated number (but the original one, not the one we just added) and all under it, have to go down one position, i.e. add one to their num_orden.
For example, using pair {qwe,nivel}={127,1101100} that has two applicants in df_new. When we add the first one:
id qwe nivel
30007619 127 1101100
Num_orden has to be randomly assigned between 1 and the original number of applicants for that {qwe,nivel}, let's suppose it was 100. So using numpy,the number should be assigned randomly np.random.randint(100). Let's suppose Numpy gave as 20 as num_orden. Hence, the following line has to be added to df_ordenaleatorio.
id qwe nivel num_orden
30007619 127 1101100 20
Notice that there already was a application for the same pair {qwe,nivel} in df_ordenaleatorio. This application (the original with num_orden 20), and all the ones below it, have to go down one position, so we have to add 1 to the num_order of all of them.
Now, we have to add the second application for the pair {qwe,nivel}={127,1101100}
id qwe nivel
30007603 127 1101100
The process has to be the same as the previous, except that now the num_order has to be randomly assigned between 1 and 101 (remember that we already added an application to that pair {qwe,nivel}. Also, we will have the same problem with repeated values, so we will have to add 1 to the num_orden of the repeated one and all the students below it.
I have no idea to do this, is there any command of pandas that I can use, or I will have to create my own function?
Thanks a lot!
Aucun commentaire:
Enregistrer un commentaire