I have a dataframe (df) and a variable containing a group number. Each observation has a group number going from 1 to 80. I would like to create a new variable, called new_group, containing new random numbers from 1 to 80 for each observation. However, these new group numbers must be consistent with the original group numbers in the sense that if 2 observations were in group 1, both observation should have the same new random group number.
Example:
observation group random_group
0 1 4
1 2 3
2 1 4
3 43 1
4 1 4
5 21 80
6 43 1
I am using Python 3.7. I tried the following: 1.I created a dictionary with keys from 1 to 80 and values from 1 to 80 but with a different, random order. The idea is to use this dictionary to do a Excel "vlookup" type of matching.
- I created a new dataframe with 2 columns: one colum with values from 1 to 80, and another column with numbers with 1 to 80 but in a different, random order. The idea would be to merge the original dataframe with the new one.
Here is what I did:
import random
ordered_group = list(range(1,81))
random_group = random.sample(range(1, 81), 80)
group_dict = dict(zip(ordered_group ,random_group))
df['new_group'] = df.group.map(group_dict)
The new_group column only has nan
I also tried this instead of the last line:
df['new_group'] = df["group"].apply(lambda x: group_dict .get(x))
Now it maps correctly all 80 groups once but it does not go through all observations
I also tried using merge instead of using map
import random
random_group= list(range(1,81))
random_group= pd.DataFrame(random_group)
random_group['new_group'] = random.sample(range(1, 81), 80)
random_group.rename(columns={0:'group'},inplace=True )
df= df.merge(random_group, on = 'group', how = 'outer')
It maps correctly all 80 groups once but it does not go through all observations
So i get something like this:
observation group random_group
0 1 4
1 2 3
2 1 nan
3 43 1
4 1 nan
5 21 80
6 43 nan
My two methods seem to work well but they do not go through the whole dataframe. Any idea where did I go wrong? Also, any more efficient method is welcome
Thank you!
Aucun commentaire:
Enregistrer un commentaire