I want to randomly assign individuals from an existing dataset into 3 different group according to a fixed daily percentage. Below, is the sample dataset:
Date Customer_ID
1. 1/3/2019 411
2. 1/3/2019 414
3. 1/3/2019 421
4. 5/3/2019 431
5. 5/3/2019 433
6. 5/3/2019 441
7. 6/3/2019 442
8. 6/3/2019 443
9. 6/3/2019 444
I used the Python code below to create the groups. While the overall traffic % is correct, the groups are not correctly assigned according to the required percentage per day.
Group %
A 10%
B 45%
C 45%
Expected outcome Actual outcome
Date Group A Group B Group C Group A Group B Group C
1/3/2019 10% 45% 45% 7% 2% 91%
1/4/2019 10% 45% 45% 12% 25% 63%
1/5/2019 10% 45% 45% 15% 50% 35%
1/6/2019 10% 45% 45% 20% 61% 19%
1/7/2019 10% 45% 45% 2% 7% 91%
1/8/2019 10% 45% 45% 1% 12% 87%
1/9/2019 10% 45% 45% 9% 21% 70%
1/10/2019 10% 45% 45% 13% 25% 62%
Overall 10% 45% 45% 10% 45% 45%
Current code:
# Create 3 different groups that have traffic assigned 10%/45%/45%
df['Groups'] = df.groupby('date')
['Customer_ID'].transform(lambda x: np.random.choice([‘Group_A’, ’Group_B’, ’Group_C’],len(x), p= [0.1,0.45,0.45]))
The code only gives desired output on the overall dataset but not per day (as shown in the actual outcome table)
Which python code can I use to create the three groups according to the required distribution per day?
Aucun commentaire:
Enregistrer un commentaire