I'm generating random IDs in javascript which serve as unique message identifiers for an analytics suite.
When checking the data (more than 10MM records), there are some minor collisions for some IDs for various reasons (network retries, robots faking data etc), but there is one in particular which has an intriguing number of collisions: akizow-dsrmr3-wicjw1-3jseuy.
This is the algorithm used to generate the random IDs:
function generateUUID() {
return [
generateUUID4(), generateUUID4(), generateUUID4(), generateUUID4()
].join("-");
}
function generateUUID4() {
return Math.abs(Math.random() * 0xFFFFFFFF | 0).toString(36);
}
I reversed the algorithm and it seems like for akizow-dsrmr3-wicjw1-3jseuy the browser's Math.random() is returning the following four numbers in this order: 0.1488114111471948, 0.19426893796638328, 0.45768366415465334, 0.0499740378116197, but I don't see anything special about them. Also, from the other data I collected it seems to appear especially after a redirect/preload (e.g. google results, ad clicks etc).
So I have 3 hypotheses:
- There's a statistical problem with the algorithm that causes this specific collision
- Redirects/preloads are somehow messing with the seed of the pseudo-random generator
- A robot is smart enough that it fakes all the other data but for some reason is keeping the random id the same. The data comes from different user agents, IPs, countries etc.
Any idea what could cause this collision?
Aucun commentaire:
Enregistrer un commentaire