I am looking for an algorithm that fairly samples p percent of users from a potentially infinite list of users.
A naive algorithm looks something like this:
if (((user.userId.toString.hashCode % 1000)/1000.0) < p) {
processUser(user)
}
There are issues with this code though (hashCode may favor shorter strings, modulo arithmatic is discritizing value so its not exactly p, etc.).
Was is the "more correct" way of doing this?
Aucun commentaire:
Enregistrer un commentaire