I have a txt file with genecodes separated by tab similar to this structure:
ENSG00000111111 ENSG00000111111 ENSG00000111111 ENSG00000111555
ENSG00000111111 ENSG00000111111 ENSG00000111111 ENSG00000111222
ENSG00000111111 ENSG00000111111 ENSG00000111111 ENSG00000333555
and I want to create a list with selecting from each row one item randomly and selected items must be DIFFERENT BETWEEN THEM. At the end I want to repeat the process n times in order to obtain an output file with this structure:
ENSG00000111111 ENSG00000111222 ENSG00000333555
ENSG00000111555 ENSG00000111222 ENSG00000333555
ENSG00000111555 ENSG00000111222 ENSG00000111111
...
(each row correspond to each generated list of random items) . At the moment I have this script: where: all_cand is the txt input file
#!/usr/bin/python
import sys
import os
import random
from itertools
import numpy as np
def rand_cand (all_cand):
cand_list= []
main_list = []
cand_file= open(all_cand, "r")
for _ in itertools.repeat(None, 10):
for line in cand_file:
cand_rows = line.split()
cand_list.append(cand_rows)
for item in cand_list:
aux_old = np.random.choice(item, replace=False)
if not aux_old in main_list:
main_list.append(aux_old)
else:
aux_new = np.random.choice(item, replace=False)
main_list.append(aux_new)
print(main_list)
Related to my script, every generated list contains repetitions and I think that is due to the If loop. I try to compare every item that is going to be appended to the list to those which have already stored but it fails ... so one of my wrong outputs are:
ENSG00000111111 ENSG00000111111 ENSG00000111111
ENSG00000111111 ENSG00000111111 ENSG00000111222
ENSG00000111111 ENSG00000111111 ENSG00000111111
ENSG00000111555 ENSG00000111111 ENSG00000111111
...
Thanks beforehand!, I hope to be clear with the explanation of my problem
Aucun commentaire:
Enregistrer un commentaire