mercredi 27 mai 2020

Estimating size of a large text file

From the comment on a question about selecting n random lines from a text file here:

Select random lines from a file

User commented that they used the shuf command to randomly select lines from a text file having 78 billion lines in less than a minute.

I see from various sources on the internet that people have text file sizes varying from 100GB-200GB-300GB for mere 7-15 Billion lines, depending on the metadata.

I am curious as to :

  1. What will be the estimate size of a text file having raw data spread over 78 billion lines? (and some pointers on calculating that)
  2. How does bash - running on a system with a limited computing power(let's say 16GB RAM, ~512GB SSD, 2.5 GHz Intel Core i7 processor - a typical macBook Pro) process this data under a minute?

I get that there are several ways to store and retrieve huge amounts of data. Just curious as to how bash will process it in a short time.




Aucun commentaire:

Enregistrer un commentaire