vendredi 22 octobre 2021

Is there a way to filter a fixed percentage sample from a specific categoric variable?

Say I have a population of 1000 patients with data of their sex. I'm being asked to draw a sample of size n that meets strictly that 65% of them must be male.

Some sample data (in here, the sex distribution is 50%-50%):

data <- data.frame(patient_id = 1:1000,
               sex = append(rep("male", 500),
                            rep("female", 500))
                   )

Can't really see a way to solve this task using sample_n or sample_frac in dplyr.

Result data should be something like this for n = 500, but with random patient_ids.

data.frame(patient_id = 1:500,
           sex = append(rep("male", 325),
                        rep("female", 175))
           )

Any insight is appreciated.




Aucun commentaire:

Enregistrer un commentaire