I have extracted some variables from my python data set and I want to generate a larger data set from the distributions I have. The problem is that I am trying to introduce some variability to the new data set while maintaining the similar behaviour. This is an example of my extracted data that consists of 400 observations:
Value Observation Count Ratio of Entries
1 352 0.88
2 28 0.07
3 8 0.02
4 4 0.01
7 4 0.01
13 4 0.01
Now I am trying to use this information to generate a similar dataset with 2,000 observations. I am aware of the numpy.random.choice
and the random.choice
functions, but I do not want to use the exact same distributions. Instead I would like to generate random variables (the values column) based from the distribution but with more variability. An example of how I want my larger data set to look like:
Value Observation Count Ratio of Entries
1 1763 0.8815
2 151 0.0755
3 32 0.0160
4 19 0.0095
5 10 0.0050
6 8 0.0040
7 2 0.0010
8 4 0.0020
9 2 0.0010
10 3 0.0015
11 1 0.0005
12 1 0.0005
13 1 0.0005
14 2 0.0010
15 1 0.0010
So the new distribution is something that could be estimated if I fitted my original data with an exponential decay function, however, I am not interested in continuous variables. How do I get around this and is there a particular or mathematical method relevant to what I am trying to do?
Aucun commentaire:
Enregistrer un commentaire