dimanche 22 septembre 2019

Random Data generation based on condition in pandas and numpy

I have a data frame as shown below.

ID      
1    
2    
3    
4    
5    
6    
7    
8    
9    
10   
11   
12   
13   
14   
15   
16  
17   
18   
19   
20

Which has only one column ID and 20 unique values. randomly, I want to pick 25% of the unique values of column ID and create a new column OWNER_ID by randomly populating that across 20 rows with 10% missing (2 rows).

The randomly picked ID and Owner_ID should match.

For example randomly I picked 2,3,8,9,11

The expected output:

ID   OWNERD_ID  
1    2
2    2
3    3
4    11
5    9
6    11
7    11
8    8
9    9
10   2
11   11
12   2
13   na
14   8
15   9
16   8
17   9
18   2
19   2
20   na

I just don't know how start for this. So I did not tried anything. I am just learning random data generation using pandas.




Aucun commentaire:

Enregistrer un commentaire