lundi 29 juin 2020

Finding the correlation matrix of a large dataset in R

I have a data set that has 98 variables and nearly 1.1 million observations. I want to see the correlations between the variables however since the data is too large, R cannot proceed with the computation due to the memory allocation failure.

Then, I wanted to sample the data set with stratified sampling method so that I can compute correlations on sampled data. But again I got the same memory error which is "Error: cannot allocate vector of size ... Mb"

So, how can I find the correlation matrix of either the whole data or the sampled data?




Aucun commentaire:

Enregistrer un commentaire