vendredi 8 janvier 2021

Why is the linear regression wrong when I switch X and Y of a random variable Z(X, Y)?

I am having a weird bug that I can not seem to understand:

  1. I draw N values (X, Y) of a random variable in two dimensions Z(X,Y).
  2. I construct the histogram and plot it with imshow.
  3. I calculate the linear regression with the (X,Y) values and plot it: enter image description here So far everything looks normal.
  4. Now I repeat 1., 2. and 3 but switching X and Y. I would expect to find the same picture but with the axis switched. However, this time the linear regression (orange dotted line) is not correct and has a slope different than the expected 1/0.25 (red dashed line). enter image description here

Any ideas where the error might lie?

The code in python:

from scipy.stats import linregress
import numpy as np
import matplotlib.pyplot as plt

#Parameters
delta = 0.2
N = 10**5

#Bins
x = y = np.arange(-3.0, 3.0, delta)

#Draw N values of the random variable Z(X,Y)
rnd = np.random.default_rng(seed = 0)
Z = rnd.uniform(0, 1, N)
X = rnd.uniform(-3, 3, N)
Y = 0.25*X + np.sqrt(np.log( 1 / Z ) ) - 0.89

#Construct histogram
H, xedges, yedges = np.histogram2d(X, Y, bins=[x, y])
#Tranpose to have x in columns and y in rows
H = H.T

#Plot
plt.imshow(H, cmap='Purples',
            origin='lower', extent=[-3, 3, -3, 3])

#Do linear regresion
lr = linregress(X, Y)
poly1d_fn = np.poly1d([lr.slope, lr.intercept])
xLine=[xedges[0], xedges[-1]]
plt.plot(xLine, poly1d_fn(xLine), 'orange', ls = ':',
            label = '$y = ax+b$\n $a = %.2f \pm %.2f$\n $b = %.2f$, $R^2 = %.2f$ '%(lr.slope, lr.stderr, lr.intercept, lr.rvalue**2))
    
plt.colorbar()
plt.legend()
plt.savefig("first.png", dpi = 300)

#Repeat but switching X with Y
plt.figure()
X2 = Y
Y2 = X
H, xedges, yedges = np.histogram2d(X2, Y2, bins=[x, y])
H = H.T

plt.imshow(H, cmap='Purples',
            origin='lower', extent=[-3, 3, -3, 3])

lr = linregress(X2, Y2)
poly1d_fn = np.poly1d([lr.slope, lr.intercept])
xLine=[xedges[0], xedges[-1]]
plt.plot(xLine, poly1d_fn(xLine), 'orange', ls = ':',
            label = '$y = ax+b$\n $a = %.2f \pm %.2f$\n $b = %.2f$, $R^2 = %.2f$ '%(lr.slope, lr.stderr, lr.intercept, lr.rvalue**2))

plt.plot(xLine, [4*z for z in xLine], 'red', ls = '--')


plt.ylim([-3, 3])
plt.colorbar()
plt.legend()
plt.savefig("second.png", dpi = 300)




Aucun commentaire:

Enregistrer un commentaire