I tried to model a random number using numpy.random.normal
. From this random number (mean=0, std=1)
- I draw multiple samples of similar size (e.g., m=100)
- I compute the std of each sample
- I take the mean of all standard deviations
Theoretical statistics, and also R tells me that this must converge towards the chosen std (which is 1). But somehow, using numpy (and scipy.stats), it does not.
This code generates a figure that shows this strange behavior:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm, tstd
# system setup
m = 100 # number of measurments
sigma = 1 # sensor std
ez = np.arange(1,6,.05)
sample_sizes = [int(10**e) for e in ez]
# testing normal and std - they seem to work fine
sig_est = []
for n in sample_sizes:
sample = np.random.normal(0, sigma, (n*m))
sig_est += [np.std(sample)]
plt.plot(ez, sig_est, marker='.', color='b', ls='', label='numpy - no means')
# numpy implementation of problem
sig_est = []
for n in sample_sizes:
sample = np.random.normal(0, sigma, (n,m))
sigma_est = np.std(sample, axis=1)
sig_est += [np.mean(sigma_est)]
plt.plot(ez, sig_est, marker='.', color='k', ls='', label='numpy')
# scipy.stats implementation
sig_est = []
for n in sample_sizes:
sample = norm.rvs(loc=0, scale=sigma, size=(n,m))
sigma_est = tstd(sample, axis=1)
sig_est += [np.mean(sigma_est)]
plt.plot(ez, sig_est, marker='.', color='r', ls='', label='scipy.stats')
plt.gca().set(xlabel = 'Number of samples [log10]')
plt.gca().legend()
plt.gca().grid(color='.9')
plt.show()
Any ideas?
Aucun commentaire:
Enregistrer un commentaire