jeudi 7 juillet 2022

Create random values in dataframe with diferent weights by condition

I have been trying to create a simulated dataframe with the sex and education as features, generating the data acording to some proportions i already know.

Something like this:

weight_sex = [0.55, 0.45]
options_sex = [0, 1] # 0 = Men, 1 = Women

weight_educ = [0.6, 26.8, 23.6, 23.6, 24.1, 1.3]

options_educ = [0, 1, 2, 3, 4, 5] # 0 = None, 5 = Bachelor or more

sex = pd.Series(random.choices(options_sex, weights = weight_sex, k = 100), name = 'sex')
education = pd.Series(random.choices(options_educ, weights = weight_educ, k = 100), name = 'education')
people = pd.concat([sex, education], axis = 1)

Now I want to create a new column wich will say if the person is unemployed, has an informal work or has a formal work. I know this proportions to be diferent depending on the features of the population, to make it simple let's say males with education higher than 3 have a better ocupation rate then the rest of the population.

Something like this:

Option_work = [0, 1, 2] # 0 = Unemployed, 1 = informal work, 2 = formal work
weight_work_educated_man = [0.2, 0.3, 0.5]
weight_work_other_people = [0.3, 0.4, 0.3]

So if I say

people['sex'] == 0 & people['education'] > 3 generate the value with the weight_work_educated_man

And if I say

people['education'] <= 3 generate the value with the weight_work_other_people

How can I create a new column randomizing the data with the weights I have but with the fetures of the row as condition? I've beeng trying to find a way with random.choice or the sample function from pandas but got stucked. It is important to be randomized so the results don't be exaclty the same the next time I run the code.




Aucun commentaire:

Enregistrer un commentaire