mercredi 30 octobre 2019

Simulating correlated lognormals in Python

I'm following the answer in this question How can I sample a multivariate log-normal distribution in Python?, but I'm getting that the marginal distributions of the sample data fail to have the same mean and standard deviation of the inputted marginals. For example, consider the multivariate distribution below in the code sample. If we label the marginals as X, Y, and Z, then I would expect that the scale and location parameters (implied from the sample data) to match inputted data. However, for X, you can see below that the scale and location parameters are 0.1000 and 0.5219. So the scale is what we expect, but the location is off by 4%. I'm thinking I'm doing something wrong with the covariance matrix, but I can't seem to figure out what is wrong. I tried setting the correlation matrix to the identity matrix and then the location and scale of the sample data match with the inputted data. Something must be wrong with my covariance matrix, or I'm making another fundamental error. Any help would be appreciated. Please advise if the question is unclear.

import pandas as pd
import numpy as np
from copy import deepcopy

mu  = [0.1, 0.2, 0.3]
sigma = [0.5, 0.8, 0.6]
sims = 3000000
rho = [[1, 0.9, 0.3], [0.9, 1, 0.8], [0.3, 0.8 ,1]]

cov = deepcopy(rho)
for row in range(len(rho)):
    for col in range(len(rho)):
        cov[row][col] = rho[row][col] * sigma[row] * sigma[col]

mvn = np.random.multivariate_normal(mu, cov, size=sims) 

sim = pd.DataFrame(np.exp(mvn), columns=['X', 'Y', 'Z'])

def computeImpliedLogNormalsParams(mean, std):
    # This method implies lognormal params which match the moments inputed 
    secondMoment = std * std + mean *mean
    location = np.log(mean*mean / np.sqrt(secondMoment))
    scale = np.sqrt(np.log(secondMoment / (mean * mean)))
    return (location, scale)

def printDistributionProp(col, sim):
    print(f"Mean = {sim[col].mean()}, std = {sim[col].std()}")
    location, scale = computeImpliedLogNormalsParams(sim[col].mean(), sim[col].std())
    print(f"Matching moments gives a lognormal with location {location} and scale {scale}")


printDistributionProp('X', sim)

Output:

Mean = 1.2665338803521895, std = 0.708713940557892
Matching moments gives a lognormal with location 0.10008162992913544 and scale 0.5219239625443672   

Observing the output, we would expect that scale parameter to be very close to 0.5, but it's a bit off. Increasing the number of simulations does nothing since the value has converged.




Aucun commentaire:

Enregistrer un commentaire