I have a data set with objects x_1,...,x_N and the object x_i appears c_i times in the data. I would like to sample efficiently from the distribution, so that object x_i has probability c_i/c of getting selected, where c = c_1 + ... + c_N.
This must be a well-known problem, but I wasn't able to find a good algorithm for this. What is the most efficient way of accomplishing this, when N is of the order of a few million?
Aucun commentaire:
Enregistrer un commentaire