I have a similar question this the one on this thread: Using R, replace all values in a matrix <0.1 with 0?
But in my case I have hypothetically larger dataset and variable thresholds. I need to create a dataframe with each value retrieved from a condition using the values on the first columns of the same dataframe. These values are different for each line.
Here is an example of the dataframe:
SNP A1 A2 MAF
rs3094315 G A 0.172
rs7419119 G T 0.240
rs13302957 G A 0.081
rs6696609 T C 0.393
Here is a sample of my code:
seqIndividuals = seq(1:201)
for(i in seqIndividuals) {
alFrequ[paste("IND",i,"a",sep="")] = ifelse(runif(length(alFrequ$SNP),0.00,1.00) < alFrequ$MAF, alFrequ$A1, alFrequ$A2)
alFrequ[paste("IND",i,"b",sep="")] = ifelse(runif(length(alFrequ$SNP),0.00,1.00) < alFrequ$MAF, alFrequ$A1, alFrequ$A2)
}
I am creating two new columns for each individual "i" in "seqIndividuals" by retrieving either values from column "A1" if a random value if lower than column "MAF", or "A2" if higher. The code is working great, but as a dataset grows in rows and columns (individuals) the time also grows significantly.
Is there a way to avoid using IFELSE for this situation, as I understand it works as a loop? I tried generating a matrix of random values and then replacing them, but it takes the same time or even longer.
mtxAlFrequ = matrix(runif(length(alFrequ$SNP)*(201)),nrow=length(alFrequ$SNP),ncol=201)
mtxAlFrequ[mtxAlFrequ < alFrequ$MAF] = alFrequ$A1
Thanks!
Aucun commentaire:
Enregistrer un commentaire