random: Assign an individual in a dataset to a particular state based on predetermined probabilities in R

mercredi 29 mai 2019

Assign an individual in a dataset to a particular state based on predetermined probabilities in R

I have data which looks like this

df <- data.frame(
age_grp10 = rep(c("00-09", "10-19", "20-29", "30-39", "40-49", "50-59", "60-    69", "70-79", "80-89"), 2),
sex = c(rep("M", 9), rep("F", 9)),
prob_arr = round((runif(18, min = 0.11, max = 2.50)), digits = 2),
prob_dep = round((runif(18, min = 0.11, max = 2.50)), digits = 2)
)

This dataset gives the probability of a person, by age and gender, arriving or departing in a calendar year.

Then I have population level data, which looks like this

  pop_df <- data.frame(
  uniq_ID = c("AFG1234", "WED1234", "POJ1234", "DER234", "QWE1234", "BGR1234", "ABC1234", "DSE1234", "UHJ1234", "POI234",
          "EDC1234", "BGT1234", "MJI1234", "WEX1234", "FGH1234", "UJN1234", "LOK1234", "DRT1234", "URD1234", "MVR1234"),
  age_grp10 = c("50-59", "40-49", "20-29", "40-49", "00-09",  "50-59", "30-39", "70-79",  "60-69", "40-49",
            "80-89", "10-19", "30-39", "30-39", "50-59", "70-79", "00-09", "70-79", "20-29", "20-29"),
  sex = c("M", "M", "F", "M", "F", "F", "F", "M", "F", "M", "F", "F", "M", "M", "M", "M", "M", "F", "M", "F"))

In this population dataset, each row is an individual, for about 5 million people. It shows their age and gender, and unique ID number. Based on the probabilities in the first dataframe (df), I would like to assign arrival and departure status to the individuals in the population dataframe (pop_df).

My desired output would look like

pop_df <- pop_df %>%
left_join(df) %>%
mutate(Arrived = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0),
     Departed = c(1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

In this last dataset, the values of Arrived and Departed are dependent on the probabilities in the df dataframe. So XX% of males aged 0-9 years would be assigned arrival status, based on the value of prob_arr in the df dataframe.

Thanks for your help

random

mercredi 29 mai 2019

Assign an individual in a dataset to a particular state based on predetermined probabilities in R

Aucun commentaire:

Enregistrer un commentaire