dimanche 25 janvier 2015

To change the identifier line name to random shoten name in fasta file.

I have a fasta file with about 8,000 sequences in it. I need to change the identifier line name to a random unique shorten name (max length 10). The fasta file contains seqences like this.



AX039539.1.1212 Bacteria;Chloroflexi;Dehalococcoidia;Dehalococcoidales; GAUGAACGCUAGCGGCGUGCCUUAUGCAUGCAAGUCGAACGGUCUUAAGCAAUUAAGAUAGUGGCAAACGGGUGAGUAACGCGUAAGUAACCUACCUCUAAGUGGGGGAUAGCUUCGGGAAACUGAAGGUAAUACCGCAUGUGGUGGGCCGACAUAAGUUGGUUCACUAAAGCCGUAAGGUGCUUGGUGAGGGGCUUGCGUCCGAUUAGCUAGUUGGUGGGGUAACGGCCUACCAAGGCUUCGAUCGGUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAG



use strict; use warnings;


change ID line name to random unique shorten (max 10 characters) string


open (my $fh,"$ARGV[0]") or die "Failed to open file: $!\n"; open (my $out_fh, ">$ARGV[0]_shorten_ID.fasta");


my $string;


while() {


for (0..9) { $string .= chr( int(srand(rand(25) + 65) )); }


if ($_ =~ s/^>*.+\n/>$string/){ # change header FASTA header


print $out_fh "$_";



}


}


close $fh; close $out_fh;


I have been tiring this but starts with 10 characters then adds 10 more on as goes down and i lose the sequence. I realize there are similar question already but it is slightly different, I need to random unique shorten names.





Aucun commentaire:

Enregistrer un commentaire