I have a file with a known structure - each 4%1=1 line starts with the character "@" and defines an ordered group of 4 lines. I want to select randomly n groups (half) of lines in the most efficient way (preferably in bash/another Unix tool).
My suggestion in python is:
path = "origin.txt"
new_path = "subset.txt"
import random
with open(path) as f:
subset_size = round((len(lines)/4) * 0.5)
lines = f.readlines()
l = random.sample(list(range(0, len(lines), 4)),subset_size)
selected_lines = [line for i in l for line in list(range(i,i+4))]
with open(new_path,'w+') as f2:
f2.writelines(new_lines)
Can you help me find another (and faster) way to do it?
Aucun commentaire:
Enregistrer un commentaire