I want to create a dataset where I have years of experience from 1 to 10 and have salary from 30k to 100k. I want these salaries to be random and to follow the years of experience. Sometimes a person with more experience may make less than a person with less experience.
For example:
years of experience | Salary
1 | 30050
2 | 28500
3 | 36000
...
10 | 100,500
Here is what I have done so far:
import numpy as np
import random
import pandas as pd
years = np.linspace(1.0, 10.0, num=10)
salary = np.linspace(30000.0, 100000.0, num=10) + random.uniform(-1,1)*5000#plus/minus 5k
data = pd.DataFrame({'experience' : years, 'salary': salary})
print (data)
Which gives me:
experience salary
0 1.0 31060.903965
1 2.0 38838.681742
2 3.0 46616.459520
3 4.0 54394.237298
4 5.0 62172.015076
5 6.0 69949.792853
6 7.0 77727.570631
7 8.0 85505.348409
8 9.0 93283.126187
9 10.0 101060.903965
we can see that we do not get some records where a person with higher experience made less than a person with lower experience. How can I fix this? Of course I want to scale this to give me 1000 rows
Aucun commentaire:
Enregistrer un commentaire