mercredi 28 avril 2021

Generating very large (million+) highly correlated random datasets

Currently I have a point-cloud containing 1million+ correlated points.

My aim is to generate a "virtual" point-cloud, a random point-cloud created by generating 1million+ random samples from a gaussian distribution that follows the same correlation as my point-cloud. I am only focusing on one dimension for now, so i am not perturbing the x and y axes. I am able to calculate the covariance matrix for these points (incrementally) from work done previously. The problem is that the covariance matrix is huge (many terabytes) so I can't calculate the whole thing, nor can I store it. The covariance is not sparse, but changes smoothly across the entire point-cloud.

I am aware of the technique to generate correlated random numbers using cholesky decomposition. Even if I could store the covariance matrix calculating the cholesky decomposition of a (1million+ x 1million+) matrix is not feasible.

For background: This is for conducting a Monte-Carlo simulation on a point-cloud.

My question: Is this approach remotely feasible? Perhaps there is more a efficient way of representing the covariance matrix? Perhaps using a neural network (I have TensorFlow experience) I could simulate the generation of a specifically correlated random numbers using a smaller random seed? Has this been done before? I am struggling to find solutions of similar problems online.

This is being coded in Python. Thanks for any help.




Aucun commentaire:

Enregistrer un commentaire