mercredi 6 juillet 2016

Oracle Data Masking using random names from a temp table

We need to mask some Personally Identifiable Information in our Oracle 10g database. The process I'm using is based on another masking script that we are using for Sybase (which works fine), but since the information in the Oracle and Sybase databases is quite different, I've hit a bit of a roadblock.

The process is to select all data out of the PERSON table, into a PERSON_TRANSFER table. We then use a random number to select a random name from the PERSON_TRANSFER table, and then update the PERSON table with that random name. This works fine in Sybase because there is only one row per person in the PERSON table.

The issue I've encountered is that in the Oracle DB, there are multiple rows per PERSON, and the name may or may not be different for each row, e.g.

|PERSON| :-------------: |PERSON_ID|SURNAME| |1 |Purple| |1 |Purple| |1 |Pink | <-- |2 |Gray | |2 |Blue | <-- |3 |Black | |3 |Black |

The PERSON_TRANSFER is a copy of this table. The table is in the millions of rows, so I'm just giving a very basic example here :)

The logic I'm currently using would just update all rows to be the same for that PERSON_ID, e.g.

|PERSON| :----------------: |PERSON_ID|SURNAME| |1 |Brown | |1 |Brown | |1 |Brown | <-- |2 |White | |2 |White | <-- |3 |Red | |3 |Red |

But this is incorrect as the name that is different for that PERSON_ID needs to be masked differently, e.g.

|PERSON| :-------------: |PERSON_ID|SURNAME| |1 |Brown | |1 |Brown | |1 |Yellow| <-- |2 |White | |2 |Green | <-- |3 |Red | |3 |Red |

How do I get the script to update the distinct names separately, rather than just update them all based on the PERSON_ID? My script currently looks like this

DECLARE v_SURNAME VARCHAR2(30);

BEGIN

select pt.SURNAME
into  v_SURNAME
from   PERSON_TRANSFER pt
where   pt.PERSON_ID = (SELECT PERSON_ID FROM
                        ( SELECT PERSON_ID FROM PERSON_TRANSFER
                        ORDER BY dbms_random.value )
                        WHERE rownum = 1);

END;

Which causes an error because too many rows are returned for that random PERSON_ID.

1) Is there a more efficient way to update the PERSON table so that names are randomly assigned? 2) How do I ensure that the PERSON table is masked correctly, in that the various surnames are kept distinct (or the same, if they are all the same) for any single PERSON_ID?

I'm hoping this is enough information. I've simplified it a fair bit (the table has a lot more columns, such as First Name, DOB, TFN, etc.) in the hope that it makes the explanation easier.

Any input/advice/help would be greatly appreciated :)

Thanks.




Aucun commentaire:

Enregistrer un commentaire