mercredi 21 février 2018

Understanding how pseudo random numbers in Matlab imply statistical independence

Consider the following Matlab code in which I generate some data and then extract some rows of them at random. I would like your help to understand "how" random are these draws from a statistical point of view, in the terms I explain below.

I first set some parameters

%%%%%%%%Parameters
clear
rng default
Xsup=-1:6; 
Zsup=1:10; 
n_m=200; 
n_w=200; 
R=n_m;
T=100000; %number of draws

Then I generate the data

%%%%%%%%Creation of data [XZ,etapair,zetapair,etasingle,zetasingle]

%Vector X of dimension n_mx1
idX=randi(size(Xsup,2),n_m,1); %n_mx1
X=Xsup(idX).'; %n_mx1

%Vector Z of dimension n_wx1
idZ=randi(size(Zsup,2),n_w,1); 
Z=Zsup(idZ).'; %n_wx1

%Combine X and Z in a matrix XZ of dimension (n_m*n_w)x2 
which lists all possible combinations of values in X and Z
[cX, cZ] = ndgrid(X,Z);
XZ = [cX(:), cZ(:)]; %(n_m*n_w)x2

%Vector etapair of dimension (n_m*n_w)x1
etapair=randn(n_m*n_w,1); %(n_m*n_w)x1

%Vector zetapair of dimension (n_m*n_w)x1
zetapair=randn(n_m*n_w,1); %(n_m*n_w)x1

%Vector etasingle of dimension n_mx1
etasingle=max(randn(n_m,R),[],2); %n_mx1 

%Vector zetasingle of dimension n_wx1
zetasingle=max(randn(n_w,R),[],2); %n_wx1

Then I extract some rows at random

%%%%%%%%%%Random draws from [XZ,etapair,zetapair,etasingle,zetasingle]
idm=unidrnd(n_m,T,1); %Tx1
idw=unidrnd(n_w,T,1); %Tx1
idmupair=idm+n_m*(idw-1); %Tx1

Xsample=X(idm); %Tx1
Zsample=Z(idw); %Tx1
etapairsample=etapair(idmupair);%Tx1
zetapairsample=zetapair(idmupair);%Tx1
etasinglesample=etasingle(idm); %Tx1
zetasinglesample=zetasingle(idw); %Tx1

Let me now translate these draws into statistical terms:

For t=1,...,T, Xsample(t) can be thought as a realisation of a random variable X_t

For t=1,...,T, Zsample(t) can be thought as a realisation of a random variable Z_t

For t=1,...,T, etapairsample(t) can be thought as a realisation of a random variable E_t

For t=1,...,T, zetapairsample(t) can be thought as a realisation of a random variable Q_t

For t=1,...,T, etasinglesample(t) can be thought as a realisation of a random variable Y_t

For t=1,...,T, zetasinglesample(t) can be thought as a realisation of a random variable S_t

Since I draw realisations of these random variables at random with replacement (through the function unidrnd) I thought I could claim that (X_1,...,X_T, Z_1,...,Z_T, E_1,...,E_T, Q_1,...,Q_T, Y_1,...,Y_T,S_1,...,S_T) are mutually independent

As a check of this hypothetical claim, I define W_t:=-E_t-Q_t+Y_t+S_t and empirically compute Pr(W_t<=1|X_t=5, Z_t=1)

If mutual independence holds, then Pr(W_t<=1|X_t=5, Z_t=1)=Pr(W_t<=1) and their empirical counterparts below, named option1 and option2, should be ALMOST the same.

%option 1: conditional probability
num1=zeros(T,1);
for h=1:T
    if -etapairsample(h)-zetapairsample(h)+etasinglesample(h)+zetasinglesample(h)<=1 && Xsample(h)==5 && Zsample(h)==1
        num1(h)=1;
    end
end
den1=zeros(T,1);
for h=1:T
    if  Xsample(h)==5 && Zsample(h)==1
        den1(h)=1;
    end
end
option1=sum(num1)/sum(den1);


%option 2: unconditional probability
num2=zeros(T,1);
for h=1:T
    if -etapairsample(h)-zetapairsample(h)+etasinglesample(h)+zetasinglesample(h)<=1 
        num2(h)=1;
    end
end
option2=sum(num2)/T;

Question: the difference between option1 (=0.0034) and option2 (=0.0013) is referred to the "ALMOST" or I am doing something wrong?




Aucun commentaire:

Enregistrer un commentaire