lundi 6 février 2023

Generating a random sample with pre-defined mean without duplication in python

I have following data table of 15 patients. "ID" is an individual patient. I want to generate a random sample of 5 unique patients. The mean of “age” of sampled patients should be 50 (+-5). The same patient (based on "ID") should not be repeated in the sample. However, the same “age” can be repeated in the mean calculation for generating sample. After that, create a new column “sample” and if the row “ID” was included in the calculation of the mean, “sample” should be “yes” for that row, otherwise “no” for the rest.

import pandas as pd
htn = pd.DataFrame({"ID": [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
               "age":[55,55,44,55,69,55,37,45,50,52,62,37,22,70,29]
                })

The code I have tried:

import pandas as pd
import random
import statistics as st
htn = pd.DataFrame({"ID": [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
               "age":[55,55,44,55,69,55,37,45,50,52,62,37,22,70,29]
                })
mean = 0
while abs(50 - mean) > 5:
sample = random.choices(htn["age"],k=5)
mean = st.mean(sample)
htn["sample"] = "no"
for i in sample:
htn.loc[htn["age"] == i, "sample"] = "yes"
print(sample)
htn

I am using Spyder and Python on Deepnote. Many thanks for your help.




Aucun commentaire:

Enregistrer un commentaire