lundi 23 avril 2018

How to get n random "paragraphs" (groups of ordered lines) from a file

I have a file with a known structure - each 4%1=1 line starts with the character "@" and defines an ordered group of 4 lines. I want to select randomly n groups (half) of lines in the most efficient way (preferably in bash/another Unix tool).

My suggestion in python is:

path = "origin.txt"
new_path = "subset.txt"
import random
with open(path) as f:
  subset_size = round((len(lines)/4) * 0.5)
  lines = f.readlines()
  l = random.sample(list(range(0, len(lines), 4)),subset_size)
  selected_lines = [line for i in l for line in list(range(i,i+4))]
  with open(new_path,'w+') as f2:
    f2.writelines(new_lines)

Can you help me find another (and faster) way to do it?




Aucun commentaire:

Enregistrer un commentaire