jeudi 27 août 2015

How to retrieve a given number of random hits?

I have a database from which I would like to retrieve 100 documents, randomly distributed, which match two should constraints.

If I just query using a "size": 100, I get 100 hits corresponding to only one of the constraints. Fair enough, this is probably how elasticsearch indexed the data and the result is correct (100 documents which matched either of the constraints).

To ensure that I have the right matching body, I set size to 100000 (more than the total amount of documents) and I got about 40k documents, 20k for each constraint. This is consistent with the data.

As for the random part, I know about random scoring, a good example was given as an answer to another question.

Now combining both, i.e. "size": 100 and random_score I get, like in the first case, 100 documents matching only one of the two constraints. So no randomness.

It looks therefore that the randomizing is done internally in ES after size has been applied. Is that a correct conclusion?

If so, the "random" part would only be applied to the order of the returned results, and not the query - which is not what I understood from the documentation


If the above conclusion is correct, the solution would mean first getting all the 20+20=40k hits, randomize and slice them locally. This is feasible for such a small amount of documents, but not efficient (nor scalable), taken into account the availability of random_score.




Aucun commentaire:

Enregistrer un commentaire