I am new at this and I must confess I am not a programmer so any help would be great. I saw a couple of similar problems here but still dont know how to obtain the output I am looking for. In contrast to the other questions, I dont mind if there is overlap or sequences are selected more than once.Im just looking for n random contniuous chunks of certain length (for example 5,000 chunks of 250 characters length or whatever) in order to obtain and output like this (in this case we have 3 chunks of 100 length):
->1
GPPXTXTPXXPPPPGGTPTGTTGXTTTXPXPPPPPGXTGXPTTXGPTXPTGTGTPTPPTXTPGXPPPPXTPGTPGGPGGPGXPPPPTPXXXXGPPPTTGT
->2
PCCXYCCPCYXYYYYCYPCYXCYCCPCCYCPPCXXCYPPPXYXXCXCCYYXPPCYYYPPCXCCCYXPPXCPCPYXPPYCYXXYYPYPYYYXXCYPXYYYP
->3
PCYCPXPXCYCCPXCYPCYXCPCPPCYPCPPYYCPPPXCYCCCPPYXXPCXPXCPPPYCCYXCPXXXYCPCCPXCYXXPXCPXXYXCPPPYXCPCCCPPY
The input file I have is around 60G:
XXPPGPYXGXYGGYYGGXGPPYXPPYYYXPYPPPXGXXYPXGXYYYXPPGGPPXGYGYYPPGPPYGYYXY
GGXXGPGYYXXYYPYGPGPXGYYYXGXGYXXXYYPPPYXGPPYPPXGPXPXGPPXXYYGYXGXXGYXPYY
PPGPPPPXXXYYYGXXYYXYYGGXXYYPPYXYGPPYPYXPXGGGYGYXXGYYPXPGGYXGXPPXYGGPYY
YXXYYGPXYYXPGPPPXPGPXYYPXGYGPPYYXYYXYYGPYYYXYYYXYGPXGXYYYYXPYYYYXPYPXY
GGPPXYYGXXGYPPYXPPYGPYXYYPXPPPXPGGPPYPYXPXXXYYPYXPGPGPYXPPYPXXPPPYXPPG
YYXGGXPYXPPPPGXGXGPYXPPGYGXGYXYYXPPYGYXGPGGPXXGYYGYYYXYYXPXXGYXPPXXPPP
XGPPYYGYGGPGGPXYYGPYGYXGYXYXGGGYPXYPPYYYYPYYXPXGYPYPYGYYPXYXXYYPYGYYGY
Cheers
Aucun commentaire:
Enregistrer un commentaire