I have a large matrix, n x m, where m - no. of columns, dataset is called data. In reality, ncol could range from 80 to 200 columns.
I want to introduce random missing cell values, say 1% in each of the columns, using
res<-do.call(cbind,lapply(lapply(data[,1:ncol(data)],function(x) data.frame(x)),function(x) x[sample(1:nrow(x),0.01*nrow(x)),]))
Missing cells will contain NA, using
data[,1][data[,1]%in%res[,1]]<- NA
Where [,1] is column 1. If the no. of columns [ncol] is 5, I can do manually, by changing the number in the equation above each time. This gets very time consuming if say there are 50 columns. If 200 or more columns....
I tried using a for/next loop i.e.
ncol(data)
n = length(ncol(data))
for (i in 1:n)
{
data[,i][data[,i]%in%res[,i]]<- NA
}
But this didn't work - no random NAs were inserted.
My questions -
[1] how do I generate random NAs in an n x m matrix, at 0.1%, 1%, and 5%, using for/next loops?
[2] I have no doubt there is a more efficient way to do this, but I have had no luck so far. What would be the best method?
[3] If I take the manual approach, the columns contents are changed as required. Is there a way to save the changed [i.e. now containing random NAs] n x m matrix?
Many thx - s.
Aucun commentaire:
Enregistrer un commentaire