mercredi 21 juillet 2021

How to introduce pre-specified random variation into a continuous variable in R?

I've been asked to "simulate" random variations into a set of continuous biomarkers, to see which of them are more robust to possible analytical variations.

Let's say we have the initial values of biomk:

df = structure(list(biomk = c(4.97673374242057, 4.9600435079802, 4.73707525686803, 
            4.6737629774537, 5.12038615805537, 5.16421438202456, 5.94437293957413, 
            5.33464929579543, 5.12871458216186, 4.50424426739813)), row.names = c(NA, 
            -10L), class = "data.frame")
    
        
> df
          biomk
    1  4.976734
    2  4.960044
    3  4.737075
    4  4.673763
    5  5.120386
    6  5.164214
    7  5.944373
    8  5.334649
    9  5.128715
    10 4.504244

 

Let's say we want 15% of variation in biomk_15. The person doing this before me had coded:

set.seed(20)
seq_15 <- seq(from=-15, to=15, by=.01)
df$factor15<-sample(seq_15, size=10, replace=TRUE)
df$biomk15 <- df$biomk+((df$factor15*df$biomk)/100)
    
> df
          biomk factor15  biomk15
    1  4.976734   -13.35 4.312340
    2  4.960044    -2.86 4.818186
    3  4.737075     3.98 4.925611
    4  4.673763     4.11 4.865855
    5  5.120386     1.65 5.204873
    6  5.164214    13.08 5.839694
    7  5.944373     3.89 6.175609
    8  5.334649    -9.60 4.822523
    9  5.128715    -5.11 4.866637
    10 4.504244     3.36 4.655587

This is a simple approach to simulate some random variations.

But this comes from an idea to simulate some sort of "inter-assay" coefficient of variance (CV), calculated as mean/sd. But the issue with this approach is how it is limited to a (-15,+15) range and ignores the initial "intra-assay" CV:

 # Original biomk CV
> (mean(df$biomk)/sd(df$biomk))
[1] 12.56766
 # New biomk CV
> (mean(df$biomk15)/sd(df$biomk15))
[1] 9.047034

Of course, ideally this CV simulations should not be done in silico but with inter-lab data etc etc, but I have to do this.

QUESTION: Do you see a way that this could be done better? Or to introduce this random variation leaving the "intra-assay" CV unchanged? So that new values would still have cv = 12.56766?

I'm not sure if it makes sense, but thanks anyway.




Aucun commentaire:

Enregistrer un commentaire