vendredi 26 mai 2023

Generate Birth date based on Hire Date and grade PYTHON

In my data I have hire dates of employees and their paygrades. Paygrades are divided in categories: ( 1 = Intern , 2 : Junior , 3 : Senior ...)

Based on this data , I'm trying to generate approximate Birth Dates for these employees. Taking in account that an employee would be at least 23 years old.

This is the function I developed :

def generate_birth_date(paygrade, hire_date_str):
    if isinstance(hire_date_str, float) and math.isnan(hire_date_str):
        # Handle the case when hire_date_str is NaN
        return None
    if isinstance(hire_date_str, float):
        hire_date_str = str(int(hire_date_str))

    hire_date = datetime.strptime(hire_date_str, "%y-%m-%d").date()
    if paygrade == 'Intern':
        birth_year = random.randint(1998, 2000)
    elif paygrade == 'Junior':
        birth_year = random.randint(1996, 1998)
    elif paygrade == 'Senior':
        birth_year = random.randint(1994, 1996)
    elif paygrade == 'Manager':
        birth_year = random.randint(1992, 1994)
    elif paygrade == 'Senior Manager':
        birth_year = random.randint(1990, 1992)
    elif paygrade == 'Director':
        birth_year = random.randint(1988, 1990)
    else:
        birth_year = random.randint(1982, 1984)

    birth_month = random.randint(1, 12)
    birth_day = random.randint(1, 28)  # Assuming maximum of 28 days in a month

    birth_date = datetime(birth_year, birth_month, birth_day)

    return birth_date.date()

And this is how i'm calling it:

# Apply the function to the PAY_GRADE and HIRE_DATE columns to generate birth dates
df['BIRTH_DATE'] = df.apply(lambda row: generate_birth_date(row['PAY_GRADE'], row['HIRE_DATE']), axis=1)

The results are not 100% accurate, because II feel like sometimes he takes in account only the paygrade and sometimes the hire date only. For instance , an employee may be hired in 2006 with paygrade 2 , meaning he's a junior, meaning he was at least 23 years old by that age. Which means he would've at least almost 40 years old by now. How can I correct my function to retrieve ideal results ?




Aucun commentaire:

Enregistrer un commentaire