jeudi 16 août 2018

awk split into random subsets

I have a CSV file with 500k rows that I need to split into two sets, 400k and 100k each. However, I can't do something like awk 'NR < 100000' file.csv > subset1.csv, because the rows are sorted and I need a random distribution.
How can I randomize the two sets? As a side note, the sizes don't have to be exact,
i.e. 398111 and 101889 would be an acceptable split as well, if a perfect split is not possible in awk.




Aucun commentaire:

Enregistrer un commentaire