samedi 17 février 2018

Efficient resampling with Python

I want to resample a sequence as follows:

fastadict = {"seq1" : "ATGCAGTCACGT", "seq2" : "ATGTGTGTACG"}

I wrote the following function:

import sys
import random

def resampling_f(fastadict, seq, num):
    fastadict[seq] = fastadict[seq].replace("N","").replace("n","")
    l = []
    new_seq = ''.join([random.choice(fastadict[seq]) for i in range(num)]) 
    l.append(new_seq)
    return l

# Run function for 20 replicates:
for i in range(20):
    print resampling_f(fastadict, "seq1", 10)

This works fine for a small sequence as in the example. In my work, I need to sample about 1 million letters (base of DNA, ACTG) for 10000 times. This function is too slow for this purpose. Is there a faster way of obtaining sampling with replacement with python?




Aucun commentaire:

Enregistrer un commentaire