There's business intelligence risk to exposing auto-incremented IDs to users. The methods that I commonly see people using to solve this are:
- use GUID/MD5 (larger and slower than 4 byte INT)
- generate integer, check for collision, repeat (slow, have to lock table)
- expose "public" ID in the application (easy to leak actual ID, might be slow)
Instead, I created a table:
CREATE TABLE `foo` (
`id` int unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;
Then, every time I insert a row, I'd manually insert the ID like this:
insert into `foo` (id) values (
last_insert_id((0xe8e5* last_insert_id()) % power(2, 32))
);
The formula is a linear congruential generator. My understanding is that it would generate a pseudo-random integer between 1 and 2^32, with no collisions unless all 2^32-1 integers are exhausted. The parameter 0xe8e5
is from https://arxiv.org/pdf/2001.05304.pdf
Surprisingly, this even works with batch inserts. I tried inserting 100,000 rows and there weren't collisions.
I haven't seen anyone else do this before, are there potential issues with this?
Aucun commentaire:
Enregistrer un commentaire