jeudi 25 avril 2019

Parallel keyword substitution across two files

Assume two text files (file1 and file2) that are identical in structure (same n of lines), but different in content. Assume further the occasional occurrence of a keyword (e.g., "KEYW") in the same lines of both files.

$ cat file1
foo bar baz
foo KEYW baz
foo bar KEYW

$ cat file2
foo bar qux
foo qux KEYW
qux bar KEYW

I would like to replace each instance of KEYW with a random number such that the replacement is done in parallel across the two files (i.e., the first occurrence in both files is replaced with random number 1, the second occurrence in both files replace with random number 2, etc.).

$ sought_command file1 file2
# RANDOM1=123
# RANDOM2=456

$ cat file1.new
foo bar baz
foo 123 baz
foo bar 456

$ cat file2.new
foo bar qux
foo qux 123
qux bar 456

In your opinion, what is the quickest way to implement this task? Shall I try to implement it via sed, awk or Python?

--

EDIT 1:

The pseudocode I am thinking of goes like this:

1. Identify lines with KEYW and save as list.
2. Iterate through list via for-loop.
3. In each loop, do:
   3.a. Generate a random number 
   3.b. Replace the occurrence of KEYW in the current line with the random number using `sed -i -e "<line_number>s/KEYW/<random_number>/" file*`.




Aucun commentaire:

Enregistrer un commentaire