lundi 28 janvier 2019

Wrong distribution from inverse(CDF) transform sampling

I am trying to simulate a geometric distribution, using the Inverse CDF method, however I am getting slightly wrong results and I am not sure why.

To be more specific, a geometric distribution with a shape factor p = 0.8, should have the following characteristics:

mean: 1.25 
variance: 0.31

However, running the code below, I am getting:

mean: 0.6224363901913519
var: 0.391813011265263
[Finished in 0.3s]

As you can see, I am getting a wildly different mean value compared to the expected one.

np.log(uniform[i])/np.log(1-p) is the result of solving the equation: F(X) = R for X in terms of R, F(X) = CDF of geometric distribution = 1 - (1 - p)^k.

R is a uniform distribution over the interval (0,1).

So solving it results in the following:

X = ln(1-R)/ln(1-p)

However, since both 1-R and R are uniformly distributed on (0,1), we can do the following simplification:

X = ln(R)/ln(1-p)

The above equation is correct and should result in a geometric distribution sample.

import numpy as np

n = 10000
p = 0.8
geo_dist = np.zeros(n,dtype = np.float64)
uniform = np.random.uniform(0, 1, n)
for i in range(n):
    geo_dist[i] = np.log(uniform[i])/np.log(1-p)
print("mean: " +str(geo_dist.mean()))
print("var: " +str(geo_dist.var())) 

I have tried increasing the calculation precision by using np.float64 in a desperate attempt to fix what should be a trivial script, to no avail.

I have also tried generating the uniform distribution using scipy uniform.rvs() instead of np.uniform and the problem persists.

If p = 0.5:

expected mean: 2
expected variance : 2

However, the code I have written has the following result:

mean: 1.4440009653569306
var: 2.0421079966161093
[Finished in 0.3s]

Anybody have any idea why this is not working? Thank you.




Aucun commentaire:

Enregistrer un commentaire