dimanche 11 juillet 2021

I found an easy way to generate pseudo-random MySQL IDs, what's wrong with this approach?

There's business intelligence risk to exposing auto-incremented IDs to users. The methods that I commonly see people using to solve this are:

  • use GUID/MD5 (larger and slower than 4 byte INT)
  • generate integer, check for collision, repeat (slow, have to lock table)
  • expose "public" ID in the application (easy to leak actual ID, might be slow)

Instead, I created a table:

CREATE TABLE `foo` (
  `id` int unsigned NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;

Then, every time I insert a row, I'd manually insert the ID like this:

insert into `foo` (id) values (
    last_insert_id((0xe8e5* last_insert_id()) % power(2, 32))
);

The formula is a linear congruential generator. My understanding is that it would generate a pseudo-random integer between 1 and 2^32, with no collisions unless all 2^32-1 integers are exhausted. The parameter 0xe8e5 is from https://arxiv.org/pdf/2001.05304.pdf

Surprisingly, this even works with batch inserts. I tried inserting 100,000 rows and there weren't collisions.

I haven't seen anyone else do this before, are there potential issues with this?




Aucun commentaire:

Enregistrer un commentaire