mardi 10 décembre 2019

How to randomly create a preference dataframe from a dataframe of choices?

I have a Dataframe of vote and I would like to create one of preferences. For example here is the number of votes for each party P1, P2, P3 in each city Comm, Comm2 ...

    Comm    Votes   P1      P2      P3
0   comm1   1315.0  2.0     424.0   572.0
1   comm2   4682.0  117.0   2053.0  1584.0
2   comm3   2397.0  2.0     40.0    192.0
3   comm4   931.0   2.0     12.0    345.0
4   comm5   842.0   47.0    209.0   76.0
... ... ... ... ... ...
1524    comm1525    10477.0 13.0    673.0   333.0
1525    comm1526    2674.0  1.0 55.0    194.0
1526    comm1527    1691.0  331.0   29.0    78.0

I would like, for each political party, create preferences from random numbers. I suppose that voters are honest. For example, for the "P1" party in town "comm" We know that 2 people voted for it and that there are 1315 voters. I need to create preferences to see if people would put it as their first, second or third option. It is to say, and for each party:

     Comm      Votes    P1_1        P1_2    P1_3    P2_1    P2_2    P2_3    P3_1     P3_2   P3_3
0    comm1      1315.0  2.0         1011.0  303.0   424.0   881.0   10.0    570.0    1.0    1.0
... ... ... ... ... ...
1526 comm1527   1691.0  331.0   1300.0  60.0    299.0   22.0    10.0    ...  

So I have to do:

# for each column in parties I create (parties -1) other columns
# I rename them all Party_i. The former 1 becomes Party_1.
# In the other columns I put a random number. 
# For a given line, the sum of all Party_i for i in [1, parties] mus t be equal to Votes

I drafted this so far:

# for each column in parties I create (parties -1) other columns
for party in parties:
    for i in range(0,len(parties)):
        # I rename them all Party_i. The former 1 becomes Party_1. # In the other columns I put a random number. 
        df[{party,perference}.format(party = party,preference = i)] = [randrange(0, df['Votes']) if df[party] <df['Votes'] else 0] # false because the sum of the votes isn't = to df['Votes']

# for each column in parties I create (parties -1) other columns
for party in parties:
    for i in range(0,len(parties)):
        # I rename them all Party_i. The former 1 becomes Party_1. # In the other columns I put a random number. 
        df[{party,perference}.format(party = party,preference = i)] = [randrange(0, df['Votes']) if df[party] <df['Votes'] else 0] # false because the sum of the votes isn't = to df['Votes']



Aucun commentaire:

Enregistrer un commentaire