From the comment on a question about selecting n random lines from a text file here:
Select random lines from a file
User commented that they used the shuf command to randomly select lines from a text file having 78 billion lines in less than a minute.
I see from various sources on the internet that people have text file sizes varying from 100GB-200GB-300GB for mere 7-15 Billion lines, depending on the metadata.
I am curious as to :
- What will be the estimate size of a text file having raw data spread over 78 billion lines? (and some pointers on calculating that)
- How does bash - running on a system with a limited computing power(let's say 16GB RAM, ~512GB SSD, 2.5 GHz Intel Core i7 processor - a typical macBook Pro) process this data under a minute?
I get that there are several ways to store and retrieve huge amounts of data. Just curious as to how bash will process it in a short time.
Aucun commentaire:
Enregistrer un commentaire