vendredi 22 janvier 2021

how to obtain n random continuos chunks of x size from a big file

I am new at this and I must confess I am not a programmer so any help would be great. I saw a couple of similar problems here but still dont know how to obtain the output I am looking for. In contrast to the other questions, I dont mind if there is overlap or sequences are selected more than once.Im just looking for n random contniuous chunks of certain length (for example 5,000 chunks of 250 characters length or whatever) in order to obtain and output like this (in this case we have 3 chunks of 100 length):

->1     
GPPXTXTPXXPPPPGGTPTGTTGXTTTXPXPPPPPGXTGXPTTXGPTXPTGTGTPTPPTXTPGXPPPPXTPGTPGGPGGPGXPPPPTPXXXXGPPPTTGT
->2     
PCCXYCCPCYXYYYYCYPCYXCYCCPCCYCPPCXXCYPPPXYXXCXCCYYXPPCYYYPPCXCCCYXPPXCPCPYXPPYCYXXYYPYPYYYXXCYPXYYYP
->3     
PCYCPXPXCYCCPXCYPCYXCPCPPCYPCPPYYCPPPXCYCCCPPYXXPCXPXCPPPYCCYXCPXXXYCPCCPXCYXXPXCPXXYXCPPPYXCPCCCPPY

The input file I have is around 60G:

XXPPGPYXGXYGGYYGGXGPPYXPPYYYXPYPPPXGXXYPXGXYYYXPPGGPPXGYGYYPPGPPYGYYXY
GGXXGPGYYXXYYPYGPGPXGYYYXGXGYXXXYYPPPYXGPPYPPXGPXPXGPPXXYYGYXGXXGYXPYY
PPGPPPPXXXYYYGXXYYXYYGGXXYYPPYXYGPPYPYXPXGGGYGYXXGYYPXPGGYXGXPPXYGGPYY
YXXYYGPXYYXPGPPPXPGPXYYPXGYGPPYYXYYXYYGPYYYXYYYXYGPXGXYYYYXPYYYYXPYPXY
GGPPXYYGXXGYPPYXPPYGPYXYYPXPPPXPGGPPYPYXPXXXYYPYXPGPGPYXPPYPXXPPPYXPPG
YYXGGXPYXPPPPGXGXGPYXPPGYGXGYXYYXPPYGYXGPGGPXXGYYGYYYXYYXPXXGYXPPXXPPP
XGPPYYGYGGPGGPXYYGPYGYXGYXYXGGGYPXYPPYYYYPYYXPXGYPYPYGYYPXYXXYYPYGYYGY

Cheers




Aucun commentaire:

Enregistrer un commentaire