dimanche 8 juillet 2018

Adding random value to a sorted list

I'm having some troubles for adding some values to a sorted list. I have two dataframes, which are the following: The first one is df_ordenaleatorio (original DF) that has 45000 rows, here's a summary of its format.

       id    qwe  nivel  num_orden
17312    40914720    516    1107300         29
22231    41682691    516    1107300         11
22875    41793014    516    1107300         22
24797    42154820    516    1107300         32
25258    42478054    516    1107300          5
25315    42519052    516    1107300          1
26098    43119817    516    1107300         35
26268    43201796    516    1107300          4
26495    43301451    516    1107300         37
26529    43313762    516    1107300         31
26937    43413528    516    1107300         28
26957    43425220    516    1107300         15
26964    43425466    516    1107300         36
27568    43539349    516    1107300         38
27605    43552829    516    1107300          7
27643    43565565    516    1107300         19
27868    43608550    516    1107300         13
27875    43609319    516    1107300         27
28094    43651052    516    1107300          8
28371    43718414    516    1107300         20
28491    43746553    516    1107300         45
28515    43748545    516    1107300         10
28711    43802508    516    1107300         46
28915    43832421    516    1107300         18
28922    43833155    516    1107300         43
28967    43846105    516    1107300         25
29407    43931105    516    1107300          9
29443    43944652    516    1107300         42
29482    43958493    516    1107300         16
30307    44139221    516    1107300         21

For every pair {qwe,nivel} there are many different ids and each one has a different num_orden. For giving some context to the question, id is the identificador of a person, the pair {qwe,nivel} is a course in an specific university. Num_orden is the relativity preference that the university expressed. For example, if the applicant id 1 has 1 as num_orden is one {qwe,nivel} it means that has the first preference for entering that course in that university. The number of applicants for each pair {qwe_nivel} is different, so in many the maximum num_orden is 50, and in others it can be 150.

I have to add some applicants to df_ordenaleatorio, but assigning their num_orden randomly. The dataframe df_new is as the following:

    id           qwe  nivel
0    30004612  12683    1101200
1    30007619    127    1101100
2    30018027  24318    1101300
3    30116284   2330    1101200
4    30116078    127    1101300
5    30007603    127    1101100

So, it has almost everything for joining it to the original dataframe except for the num_orden. Num_orden has to be assigned randomly between 1 and the number of students that are applying in that moment to that pair {qwe,nivel}. Notice that this will cause, that when we add that application to the original DF, there will be a repeated num_orden (num_orden has to be unique), so the repeated number (but the original one, not the one we just added) and all under it, have to go down one position, i.e. add one to their num_orden.

For example, using pair {qwe,nivel}={127,1101100} that has two applicants in df_new. When we add the first one:

id           qwe  nivel
30007619    127    1101100

Num_orden has to be randomly assigned between 1 and the original number of applicants for that {qwe,nivel}, let's suppose it was 100. So using numpy,the number should be assigned randomly np.random.randint(100). Let's suppose Numpy gave as 20 as num_orden. Hence, the following line has to be added to df_ordenaleatorio.

    id           qwe  nivel  num_orden
30007619    127    1101100   20

Notice that there already was a application for the same pair {qwe,nivel} in df_ordenaleatorio. This application (the original with num_orden 20), and all the ones below it, have to go down one position, so we have to add 1 to the num_order of all of them.

Now, we have to add the second application for the pair {qwe,nivel}={127,1101100}

       id      qwe  nivel     
   30007603    127    1101100

The process has to be the same as the previous, except that now the num_order has to be randomly assigned between 1 and 101 (remember that we already added an application to that pair {qwe,nivel}. Also, we will have the same problem with repeated values, so we will have to add 1 to the num_orden of the repeated one and all the students below it.

I have no idea to do this, is there any command of pandas that I can use, or I will have to create my own function?

Thanks a lot!




Aucun commentaire:

Enregistrer un commentaire