mardi 1 octobre 2019

Updating a lot of rows with random data in Mysql

I have a database with personal data that I want to anonymize with random names etc. Eg., the table person (id int, first_name varchar, last_name varchar, birth_date date). I also have a table with random names, eg. first_name (id int, value varchar). I know that this boils down to optimizing select * from ... order by rand() and I've read several StackOverflow questions on it (and http://jan.kneschke.de/projects/mysql/order-by-rand/), but this doesn't answer the question of updating with random data in bulk.

I've tried:

  1. (Initially) UPDATE _person me SET birth_date = (SELECT value FROM anonymizer.birth_date ORDER BY RAND() LIMIT 1) (...same for other columns). As there are multiple tables, anonymizing the whole DB was going for an hour when I interrupted it.
  2. (Currently) UPDATE _person me join anonymizer.first_name on anonymizer.first_name.id = round(rand() * (select max(id) from anonymizer.first_name)) SET first_name = anonymizer.first_name.value. This single query took 5 minutes (apart from other columns and tables).

I can prepare my fake data tables in any way (init with sequence numbers etc.). Is there a way to anonymize the DB efficiently?




Aucun commentaire:

Enregistrer un commentaire