I want to resample a sequence as follows:
fastadict = {"seq1" : "ATGCAGTCACGT", "seq2" : "ATGTGTGTACG"}
I wrote the following function:
import sys
import random
def resampling_f(fastadict, seq, num):
fastadict[seq] = fastadict[seq].replace("N","").replace("n","")
l = []
new_seq = ''.join([random.choice(fastadict[seq]) for i in range(num)])
l.append(new_seq)
return l
# Run function for 20 replicates:
for i in range(20):
print resampling_f(fastadict, "seq1", 10)
This works fine for a small sequence as in the example. In my work, I need to sample about 1 million letters (base of DNA, ACTG) for 10000 times. This function is too slow for this purpose. Is there a faster way of obtaining sampling with replacement with python?
Aucun commentaire:
Enregistrer un commentaire