I need to sample sequences of differing length from a bed file (a file that privides the start and end coordinates of a sequence, and a category). For example, given the bed file:
bed <- data.table(category = c("A", "A", "A", "A", "B", "B"),
start = c(1, 100, 300, 410, 1, 810),
end = c(80, 220, 400, 700, 400, 900))
And a vector of sequence lengths:
seq_lengths <- sample(10:100, 100, replace = T)
How can I randomly sample the same number and length of sequences from seq_length from within the bed file coordinates?
The dataset I am applying this to is very large, and so performance is important.
Any help will be greatly appreciated!
Aucun commentaire:
Enregistrer un commentaire