mercredi 16 novembre 2016

How to randomly sample p percent of users in user event stream in scala

I am looking for an algorithm that fairly samples p percent of users from a potentially infinite list of users.

A naive algorithm looks something like this:

if (((user.userId.toString.hashCode % 1000)/1000.0) < p) {
    processUser(user)
}

There are issues with this code though (hashCode may favor shorter strings, modulo arithmatic is discritizing value so its not exactly p, etc.).

Was is the "more correct" way of doing this?




Aucun commentaire:

Enregistrer un commentaire