I am trying to simulate a geometric distribution, using the Inverse CDF method, however I am getting slightly wrong results and I am not sure why.
To be more specific, a geometric distribution with a shape factor p = 0.8, should have the following characteristics:
mean: 1.25
variance: 0.31
However, running the code below, I am getting:
mean: 0.6224363901913519
var: 0.391813011265263
[Finished in 0.3s]
As you can see, I am getting a wildly different mean value compared to the expected one.
np.log(uniform[i])/np.log(1-p) is the result of solving the equation: F(X) = R for X in terms of R, F(X) = CDF of geometric distribution = 1 - (1 - p)^k.
R is a uniform distribution over the interval (0,1).
So solving it results in the following:
X = ln(1-R)/ln(1-p)
However, since both 1-R and R are uniformly distributed on (0,1), we can do the following simplification:
X = ln(R)/ln(1-p)
The above equation is correct and should result in a geometric distribution sample.
import numpy as np
n = 10000
p = 0.8
geo_dist = np.zeros(n,dtype = np.float64)
uniform = np.random.uniform(0, 1, n)
for i in range(n):
geo_dist[i] = np.log(uniform[i])/np.log(1-p)
print("mean: " +str(geo_dist.mean()))
print("var: " +str(geo_dist.var()))
I have tried increasing the calculation precision by using np.float64 in a desperate attempt to fix what should be a trivial script, to no avail.
I have also tried generating the uniform distribution using scipy uniform.rvs() instead of np.uniform and the problem persists.
If p = 0.5:
expected mean: 2
expected variance : 2
However, the code I have written has the following result:
mean: 1.4440009653569306
var: 2.0421079966161093
[Finished in 0.3s]
Anybody have any idea why this is not working? Thank you.
Aucun commentaire:
Enregistrer un commentaire