dimanche 22 septembre 2019

Create column based on the multiple condition on other columns and and some randomnes

I have a data frame as shown below. Which is data of people who stays in an area.

ID   Nationality   Age   
1    India         38
2    China         45
3    USA           78
4    China         12
5    Pakistan      48
6    India         10
7    India         71
8    India         16
9    China         36
10   China         31
11   USA           82
12   Pakistan      3
13   Pakistan      36
14   India         26
15   USA           52
16   China         26
17   China         5
18   USA           4
19   Pakistan      24
20   Pakistan      85

In the above dataframe I would like to add one more column as 'Owner_ID'.

Conditions:

1. Pick random 20% ID whose Nationality == India or Pakistan  whose age is 
 20 < age 70 and age Tag them as No_Owner (Here ID=14 Nationality = India and Age 26 tagged as No_Owner, similarly ID = 19).



2. Owner_ID should be one of the ID (1-20) and their Nationality should match each other


3. The Age of the Owner_ID should be in between 30 to 50 for country other than USA.


4. If the Nationality is USA, age can be 25 to 85


5. Percentage of Owner_ID from USA should be more than 20%

The Expected Output:

    ID   Nationality   Age   Owner_ID
    1    India         38    1
    2    China         45    2
    3    USA           78    15
    4    China         12    9
    5    Pakistan      48    5
    6    India         10    1
    7    India         71    1
    8    India         16    1
    9    China         36    2
    10   China         31    2
    11   USA           82    11
    12   Pakistan      31    5
    13   Pakistan      36    5
    14   India         26    No_Owner
    15   USA           52    15
    16   China         26    9
    17   China         5     9
    18   USA           4     15
    19   Pakistan      24    No_Owner
    20   Pakistan      85    5




Aucun commentaire:

Enregistrer un commentaire