vendredi 21 août 2015

Generate random numbers inside spmd in matlab

I am running a Monte carlo simulation in Matlab using parallelisation due to the extensive time that the simulation takes to run.

The main objective is create a really big panel data set and use that to estimate some regressions.

The problem is that when I run the simulation without parallelise they take A LOT of time to run, so I decided to use spmd option. However, results are very different running the parallelised code compared to the normal one.

rng(3857);
for r=1:MCREP
Ycom=[];
Xcom=[];
YLcom=[];

spmd
for it=labindex:numlabs:NT
    (code to generate different components, alpha, delta, x_it, eps_it)
    %e.g. x_it=2+1*randn(TT,1);   
    (uses random number generator: rndn)

    % Create different time periods observations for each individual
    for t=2:TT
        yi(t)=xi*alpha+mu*delta+rho*yi(t-1)+beta*x_it(t)+eps_it(t);
        yLi(t)=yi(t-1);
    end

    % Concatenate each individual in a big matrix: create panel
    Ycom=[Ycom yi];
    Xcom=[Xcom x_it];
    YLcom=[YLcom yLi];
end
end

% Retrieve data stored in composite form
mm=matlabpool('size');
for i=1:mm
Y(:,(i-1)*(NT/mm)+1:i*(NT/mm))=Ycom{i};
X(:,(i-1)*(NT/mm)+1:i*(NT/mm))=Xcom{i};
YL(:,(i-1)*(NT/mm)+1:i*(NT/mm))=YLcom{i};
end

(rest of the code, run regressions)

end

The intensive part of the code is the one that is parallelised with the spmd, it creates a really large panel data set in where columns are independent individuals, and rows are dependent time periods.

My main problem is that when I run the code using the parallel then results are different than when I don't use it, moreover results are different if I use 8 workers or 16 workers. However for a matter of time is unfeasible to run the code without parallelisation.

I believe problem is coming from the random numbers generation, but I can not fix the seed inside the spmd because that mean fixing the seed inside the Monte Carlo loop, so all the repetitions are going to have the same numbers.

I would want to know how can I fix the random number generator in such a way that it does not matter how many workers I use it will give me the same results.

PS. Another solution would be to do the spmd in the most outer loop (the Monte Carlo loop), however I can not see a performance gain when I use the parallelisation in that way.

Thank you very much for your help.




Aucun commentaire:

Enregistrer un commentaire