mardi 22 juin 2021

Use of for next loop to introduce blanks at random in large matrix, R

I have a large matrix, n x m, where m - no. of columns, dataset is called data. In reality, ncol could range from 80 to 200 columns.

I want to introduce random missing cell values, say 1% in each of the columns, using

res<-do.call(cbind,lapply(lapply(data[,1:ncol(data)],function(x) data.frame(x)),function(x) x[sample(1:nrow(x),0.01*nrow(x)),]))

Missing cells will contain NA, using

data[,1][data[,1]%in%res[,1]]<- NA

Where [,1] is column 1. If the no. of columns [ncol] is 5, I can do manually, by changing the number in the equation above each time. This gets very time consuming if say there are 50 columns. If 200 or more columns....

I tried using a for/next loop i.e.

ncol(data)
n = length(ncol(data))

for (i in 1:n)

{

data[,i][data[,i]%in%res[,i]]<- NA

}

But this didn't work - no random NAs were inserted.

My questions -

[1] how do I generate random NAs in an n x m matrix, at 0.1%, 1%, and 5%, using for/next loops?

[2] I have no doubt there is a more efficient way to do this, but I have had no luck so far. What would be the best method?

[3] If I take the manual approach, the columns contents are changed as required. Is there a way to save the changed [i.e. now containing random NAs] n x m matrix?

Many thx - s.




Aucun commentaire:

Enregistrer un commentaire