lundi 14 juin 2021

Numpy.random.normal gives bad results

I tried to model a random number using numpy.random.normal. From this random number (mean=0, std=1)

  1. I draw multiple samples of similar size (e.g., m=100)
  2. I compute the std of each sample
  3. I take the mean of all standard deviations

Theoretical statistics, and also R tells me that this must converge towards the chosen std (which is 1). But somehow, using numpy (and scipy.stats), it does not.

This code generates a figure that shows this strange behavior:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm, tstd

# system setup
m = 100         # number of measurments
sigma = 1       # sensor std

ez = np.arange(1,6,.05)
sample_sizes = [int(10**e) for e in ez]

# testing normal and std - they seem to work fine
sig_est = []
for n in sample_sizes:
    sample = np.random.normal(0, sigma, (n*m))
    sig_est += [np.std(sample)]
plt.plot(ez, sig_est, marker='.', color='b', ls='', label='numpy - no means')

# numpy implementation of problem
sig_est = []
for n in sample_sizes:
    sample = np.random.normal(0, sigma, (n,m))
    sigma_est = np.std(sample, axis=1)
    sig_est += [np.mean(sigma_est)]
plt.plot(ez, sig_est, marker='.', color='k', ls='', label='numpy')

# scipy.stats implementation
sig_est = []
for n in sample_sizes:
    sample = norm.rvs(loc=0, scale=sigma, size=(n,m))
    sigma_est = tstd(sample, axis=1)
    sig_est += [np.mean(sigma_est)]
plt.plot(ez, sig_est, marker='.', color='r', ls='', label='scipy.stats')

plt.gca().set(xlabel = 'Number of samples [log10]')
plt.gca().legend()
plt.gca().grid(color='.9')
plt.show()

output

Any ideas?




Aucun commentaire:

Enregistrer un commentaire