mardi 12 novembre 2019

How can I perform constrained randomization in R, nested by custom number of study factors?

TLDR: I want to achieve randomization of samples in a nested manner, given an arbitrary number of factors.

Background
In several types of biochemical analysis there is systematic bias in response due to e.g. instrumental drift or sample degradation. Hence the need to randomize analytical sequence order. However, we also wish to keep samples from the the same individual and e.g. treatment close together to minimize drift between these samples. Thus, we can employ constrained randomization:

  1. We first randomize between individuals
  2. Within individual, we randomize between treatments
  3. Within treatment, we randomize different time points

Some pseudocode for a three-factor case:

constRandom <- function(metadata, factors) {
  for (i in factors[1]) {
    sample(factors[j]) # random order of the 2nd factor within the 1st factor
    for (j in factors[2]) {
      sample(factor[3]) # random order of the 3rd factor within the 2nd factor
    }
  }
}

I have looked at Constrained Randomization of data frame in R which makes nice code for constrained/nested randomization for two study factors.

The first issue is that I want to be able to perform this for >2 factors (sometimes I have 4 or even 5 such factors). I have coded a (less tidyverse) example for a three factor case:

constRand3 <- function(metadata, factors) {
  newMeta <- metadata # set up df for constrained randomized data
  uniq1 <- unique(metadata[,factors[1]]) %>% sample() # extract random order of 1st factor
  n1 <- length(uniq1) # Number of levels in 1st factor
  nSamp <- 0 # Index of where in newMeta to put in random samples
  # Loop thru randomized levels of 1st factor
  for (i in 1:n1) {
    level1 <- uniq1[i]
    meta1 <- metadata[metadata[,factors[1]]==level1,] # extract samples at the ith level of factor 1
    uniq2 <- unique(meta1[,factors[2]]) %>% sample() # extract random order of 2nd factor within 1st factor
    n2 <- length(uniq2)
    # Loop thru randomized levels of 2nd factor
    for (j in 1:n2) {
      level2 <- uniq2[j]
      meta2 <- meta1[meta1[,factors[2]]==level2,] # extract samples at the jth level of factor 2
      uniq3 <- unique(meta2[,factors[3]]) %>% sample() # extract random order of 2nd factor within 1st factor
      newMeta[(nSamp+1):(nSamp+nrow(meta2)),] <- meta2[match(meta2[,factors[3]], uniq3) %>% order(),]
      nSamp <- nSamp+nrow(meta2)
    }
  }
  return(newMeta)
}

Which seems to perform well, although I'm sure a more tidyverse solution would look nicer (tips on coding solutions and style are more than welcome):

# Make synthetic data
n_ID <- 3
n_Trt <- 3
n_Time <- 3
df <- data.frame(sample=paste0('s',1:(n_ID*n_Trt*n_Time)),ID=paste0('id',rep(1:n_ID, each=n_Trt*n_Time)), Trt=rep(paste0('Trt',LETTERS[1:n_Trt]),each=n_Trt), time=paste0('t',1:n_Time))
# Perform constrained randomization
constRand3(df,2:4)

   sample  ID  Trt time
1     s21 id3 TrtA   t3
2     s20 id3 TrtA   t2
3     s19 id3 TrtA   t1
4     s25 id3 TrtC   t1
5     s27 id3 TrtC   t3
6     s26 id3 TrtC   t2
7     s23 id3 TrtB   t2
8     s22 id3 TrtB   t1
9     s24 id3 TrtB   t3
10     s6 id1 TrtB   t3
11     s5 id1 TrtB   t2
12     s4 id1 TrtB   t1
13     s2 id1 TrtA   t2
14     s1 id1 TrtA   t1
15     s3 id1 TrtA   t3
16     s9 id1 TrtC   t3
17     s7 id1 TrtC   t1
18     s8 id1 TrtC   t2
19    s11 id2 TrtA   t2
20    s12 id2 TrtA   t3
21    s10 id2 TrtA   t1
22    s18 id2 TrtC   t3
23    s16 id2 TrtC   t1
24    s17 id2 TrtC   t2
25    s14 id2 TrtB   t2
26    s13 id2 TrtB   t1
27    s15 id2 TrtB   t3

We see that all samples for an individual are kept together (but in random order between individuals). All samples per treatment are kept together (but in random order between treatments) and timepoints per treatment are in random order.

The second and main issue is that I want to be able to generalize this into a function where I can specify a user-defined number of randomization factors, such that I can call:

constRand(df, 1)

To perform a complete randomization on sample level only. Or:

constRand(df, 2:3)

To randomize between individuals and treatments but leave timepoint order intact. Or:

constRand(otherDF, c(4,2,5,7))

To randomize by four factors in a specific order (I hope you'll forgive that the last is not a reproducible example).

But I don't know how to construct a recursive loop or split to accomodate for a user-defined number of factors. Using my for-loop approach, I don't see how I can nest a dynamic amount of levels. Perhaps something using eval/do.call or tidyverse with splits? Any help is greatly appreciated.

Kind regards,

Carl




Aucun commentaire:

Enregistrer un commentaire