random: R: How to randomly remove data from a matrix while keeping some columns constant

mercredi 24 mars 2021

R: How to randomly remove data from a matrix while keeping some columns constant

For context, I am trying to test if removing certain species from a community has a different effect on diversity than removing any other species randomly. The idea is to build a null model in which I keep constant the values in some of the columns in my df and randomly change to 0 some of the rest. In each row I want to change to 0 the same number of columns as I have held constant so that the number of data in each row is equal to the total number of presences in a row minus the number of columns held en each row. The main problem is that number of data to change to 0 in each row changes and thus I have to do it one row at a time.

So far I have been able to do it with a loop, but it is slow, I have to do it a thousand times and my actual matrix is much bigger. In the following example, let's say I want to keep columns 'V1', 'V7', 'V10' and 'V12':

set.seed(20)
library(picante)
phylo<-rtree(n = ncol(df), tip.label = colnames(df)) #This is needed later for the diversity index
df<-as.data.frame(matrix(data = sample(c(0,1), size = 10*20, replace = T), 
           nrow = 10, ncol = 20))
constant.sps<-c('V1', 'V7', 'V10', 'V12')

I created to separete dfs, one with the columns I want to hold (df1) and one with the rest (df2). Then calculate how many columns i have to change to 0 in df2 based on df1, randomly select those columns in each row and change them. After that I joined df1 and the randomized df and calculated the diversity measure I needed. Then I would have to do it again q times in order to obtain a null distribution of diveristy measures for each row.

df1<-df[, colnames(df) %in% constant.sps]
df2<-df[, !colnames(df) %in% constant.sps]
constant.n<-rowSums(df1)

runs<-10
results<-c()
for (q in 1:runs) {
  temp<-df2
  for (i in 1:dim(temp)[1]) {
  temp[i,][sample(which(temp[i,] > 0), constant.n[i])]<-0
  
}
  results[[q]]<-pd(cbind(df1, temp), phylo)[1]
}

This is super slow and inefficient but I can't think of other way, does anyone have an idea on how to properly do it? Thanks in advance

random

mercredi 24 mars 2021

R: How to randomly remove data from a matrix while keeping some columns constant

Aucun commentaire:

Enregistrer un commentaire