dimanche 6 décembre 2020

How to efficiently sample sequences from a bed file using R?

I need to sample sequences of differing length from a bed file (a file that privides the start and end coordinates of a sequence, and a category). For example, given the bed file:

bed <- data.table(category = c("A", "A", "A", "A", "B", "B"),
                  start = c(1, 100, 300, 410, 1, 810),
                  end = c(80, 220, 400, 700,  400, 900))

And a vector of sequence lengths:

seq_lengths <- sample(10:100, 100, replace = T)

How can I randomly sample the same number and length of sequences from seq_length from within the bed file coordinates?

The dataset I am applying this to is very large, and so performance is important.

Any help will be greatly appreciated!




Aucun commentaire:

Enregistrer un commentaire