vendredi 4 septembre 2015

Select N unique rows randomly while excluding 1 specific row

I need to select x unique rows randomly from a table with n rows, while excluding 1 specific row. (x is small, 3 for example) This can be done in several queries if needed and I can also compute anything in programming language (Java). The one important thing is that it must be done faster than O(n), consuming O(x) memory and indefinite looping (retrying) is also undesirable.

I tried the following approach:

  • get row count (= n);
  • get the position of the excluded row by something like select count(*) where id < excluded_id, assuming id is monotonically increasing;
  • select the numbers from 0..n, obeying all the rules, by using some "clever" algorithms, it's something like O(x), in other words fast enough;
  • select these x rows one by one by using limit(index, 1) SQL clause.

However, it turned out that it's possible for rows to change positions (I'm not sure why), so the auto-generated ids are not monotonically increasing. And in this case the second step (get the position of the excluded row) produces wrong result and the algorithm fails to do its job correctly.

How can this be improved?

Also, if this is vastly easier with a different SQL-like database system, it would be interesting, too. (the DB is on a server and I can install any software there as long as it's compatible with Ubuntu 14.04 LTS)




Aucun commentaire:

Enregistrer un commentaire