I have a fasta file with about 8,000 sequences in it. I need to change the identifier line name to a random unique shorten name (max length 10). The fasta file contains seqences like this.
AX039539.1.1212 Bacteria;Chloroflexi;Dehalococcoidia;Dehalococcoidales; GAUGAACGCUAGCGGCGUGCCUUAUGCAUGCAAGUCGAACGGUCUUAAGCAAUUAAGAUAGUGGCAAACGGGUGAGUAACGCGUAAGUAACCUACCUCUAAGUGGGGGAUAGCUUCGGGAAACUGAAGGUAAUACCGCAUGUGGUGGGCCGACAUAAGUUGGUUCACUAAAGCCGUAAGGUGCUUGGUGAGGGGCUUGCGUCCGAUUAGCUAGUUGGUGGGGUAACGGCCUACCAAGGCUUCGAUCGGUAGCUGGUCUGAGAGGAUGAUCAGCCACACUGGGACUGAGACACGGCCCAGACUCCUACGGGAG
use strict; use warnings;
change ID line name to random unique shorten (max 10 characters) string
open (my $fh,"$ARGV[0]") or die "Failed to open file: $!\n"; open (my $out_fh, ">$ARGV[0]_shorten_ID.fasta");
my $string;
while() {
for (0..9) { $string .= chr( int(srand(rand(25) + 65) )); }
if ($_ =~ s/^>*.+\n/>$string/){ # change header FASTA header
print $out_fh "$_";
}
}
close $fh; close $out_fh;
I have been tiring this but starts with 10 characters then adds 10 more on as goes down and i lose the sequence. I realize there are similar question already but it is slightly different, I need to random unique shorten names.
Aucun commentaire:
Enregistrer un commentaire