dimanche 15 mars 2015

Change real data to fake data and store in new file

Below given is the input file from a sample in which col[0] seems to be sensitive data that cannot be used as it is so I need to use a fake data in place of original.



00022d7317d7 1073260810 819251 440006 819251 440006
00022d9064bc 1073260810 819251 440006 819251 440006
00022d9064bc 1073260810 819251 440006 819251 440006
00022d9064bc 1073260810 819251 440006 819251 440006
0030650c9eda 1073260811 820356 439224 820356 439224
0030650c9eda 1073260813 820356 439224 820356 439224
0030650c9eda 1073260811 820356 439224 820356 439224
0030650c9eda 1073260813 820356 439224 820356 439224
00022d0e0cec 1073260813 820187 439271 820187 439271
00022d176cf3 1073260813 817721 439564 817721 439564
00022d9064bc 1073260810 1073276155 819251 440006 819251 440006
0030650497a0 1073260810 1073272525 819251 440006 819251 440006
00904b8150f1 1073260810 1073260999 819251 440006 819251 440006
00904ba69d11 1073260810 1073260857 819251 440006 819251 440006
0030658a61de 1073260811 1073260813 820356 439224 820356 439224
00904b16c23a 1073260811 1073260813 820356 439224 820356 439224
00904bacceaf 1073260811 1073260813 820356 439224 820356 439224
00904bf058d0 1073260811 1073260813 820356 439224 820356 439224
0030650c9eda 1073260813 1073262843 820187 439271 820187 439271
00904ba8b682 1073260813 1073260962 817721 439564 817721 439564


I am lost in how to start about.


What I want to do is?




  1. Read the input file.




  2. Create a reference file with two cols one with real data and one with fake data. The reference file has to be only one which will be referred every time a new input file arrives.

    Like below:


    00022d7317d7 0001 00022d9064bc 0002 00022d9064bc 0002




If real entry exists then the fake entry will be pulled and written in output file.



00022d9064bc 0002


If not then a new fake number will be allocated eventually new entry will be updated in output file.



0030650c9eda 0003
0030650c9eda 0003




  1. Create a new output file with fake data and other data from input file.


    0001 1073260810 819251 440006 819251 440006 0002 1073260810 819251 440006 819251 440006 0002 1073260810 819251 440006 819251 440006 0002 1073260810 819251 440006 819251 440006 0003 1073260811 820356 439224 820356 439224 0003 1073260813 820356 439224 820356 439224




Random number generations might not be suitable here because after every decided interval of time new set of input files arrive with combination of old and new entries.


Any suggestion to move forward is appreciated. This seems a broad question but I have explained it elaborated.





Aucun commentaire:

Enregistrer un commentaire