jeudi 14 mars 2019

Cannot randomly assign **per day in data set** individuals to 3 groups according to required percentages - 10%/45%/45%

I want to randomly assign individuals from an existing dataset into 3 different group according to a fixed daily percentage. Below, is the sample dataset:

 Date               Customer_ID
 1. 1/3/2019         411
 2. 1/3/2019         414
 3. 1/3/2019         421
 4. 5/3/2019         431
 5. 5/3/2019         433
 6. 5/3/2019         441
 7. 6/3/2019         442
 8. 6/3/2019         443
 9. 6/3/2019         444

I used the Python code below to create the groups. While the overall traffic % is correct, the groups are not correctly assigned according to the required percentage per day.

Group   %
 A    10%
 B    45%
 C    45%

              Expected outcome               Actual outcome
 Date      Group A  Group B Group C     Group A Group B Group C
  1/3/2019  10%      45%    45%           7%    2%       91%
  1/4/2019  10%      45%    45%           12%   25%      63%
  1/5/2019  10%      45%    45%           15%   50%      35%
  1/6/2019  10%      45%    45%           20%   61%      19%
  1/7/2019  10%      45%    45%           2%    7%       91%
  1/8/2019  10%      45%    45%           1%    12%      87%
  1/9/2019  10%      45%    45%           9%    21%      70%
  1/10/2019 10%      45%    45%           13%   25%      62%
  Overall   10%      45%    45%           10%   45%      45%

Current code:

# Create 3 different groups that have traffic assigned 10%/45%/45%
df['Groups'] = df.groupby('date')
['Customer_ID'].transform(lambda x: np.random.choice([‘Group_A’, ’Group_B’, ’Group_C’],len(x),  p= [0.1,0.45,0.45]))

The code only gives desired output on the overall dataset but not per day (as shown in the actual outcome table)

Which python code can I use to create the three groups according to the required distribution per day?




Aucun commentaire:

Enregistrer un commentaire