I've seen plenty of posts advocating for the use of something along the lines of SELECT * FROM tbl ORDER BY -LOG(RAND())/weight LIMIT 1;
to select a random entry from a table. However, this seems terribly inefficient to me, since we have to run through the entire table and generate random numbers for each before sorting. This answer seems to be more of a step in the right direction assuming that the overall sum is calculated beforehand, and now we've got ourselves a linear search.
Still, it's definitely possible to do better, but the methods that come to mind all seem to require sketchy manoeuvres.
- We could store a cumulative weight distribution, generate a random number between 0 and max of
weights
and use an index onweights
coupled withBETWEEN
to find posts. However, deleting or moving entries in the middle requires a lot of work updating the weights after it. - We could fragment the table into
sqrt(n)
smaller tables and calculate the sum of the weights within it. We first search through these ranges until we arrive at the one containing our selected random number, then perform a linear search on that table. However, having so many tables for largen
seems like bad database design, and ideally I'd want to get it down to logarithmic time instead ofO(sqrt(n))
.
I've been struggling with this problem for a while now, and this is the best I've come up with. Any other ideas?
Aucun commentaire:
Enregistrer un commentaire