I am running a Monte carlo simulation in Matlab using parallelisation due to the extensive time that the simulation takes to run.
The main objective is create a really big panel data set and use that to estimate some regressions.
The problem is that when I run the simulation without parallelise they take A LOT of time to run, so I decided to use spmd option. However, results are very different running the parallelised code compared to the normal one.
rng(3857);
for r=1:MCREP
Ycom=[];
Xcom=[];
YLcom=[];
spmd
for it=labindex:numlabs:NT
(code to generate different components, alpha, delta, x_it, eps_it)
%e.g. x_it=2+1*randn(TT,1);
(uses random number generator: rndn)
% Create different time periods observations for each individual
for t=2:TT
yi(t)=xi*alpha+mu*delta+rho*yi(t-1)+beta*x_it(t)+eps_it(t);
yLi(t)=yi(t-1);
end
% Concatenate each individual in a big matrix: create panel
Ycom=[Ycom yi];
Xcom=[Xcom x_it];
YLcom=[YLcom yLi];
end
end
% Retrieve data stored in composite form
mm=matlabpool('size');
for i=1:mm
Y(:,(i-1)*(NT/mm)+1:i*(NT/mm))=Ycom{i};
X(:,(i-1)*(NT/mm)+1:i*(NT/mm))=Xcom{i};
YL(:,(i-1)*(NT/mm)+1:i*(NT/mm))=YLcom{i};
end
(rest of the code, run regressions)
end
The intensive part of the code is the one that is parallelised with the spmd, it creates a really large panel data set in where columns are independent individuals, and rows are dependent time periods.
My main problem is that when I run the code using the parallel then results are different than when I don't use it, moreover results are different if I use 8 workers or 16 workers. However for a matter of time is unfeasible to run the code without parallelisation.
I believe problem is coming from the random numbers generation, but I can not fix the seed inside the spmd because that mean fixing the seed inside the Monte Carlo loop, so all the repetitions are going to have the same numbers.
I would want to know how can I fix the random number generator in such a way that it does not matter how many workers I use it will give me the same results.
PS. Another solution would be to do the spmd in the most outer loop (the Monte Carlo loop), however I can not see a performance gain when I use the parallelisation in that way.
Thank you very much for your help.
Aucun commentaire:
Enregistrer un commentaire