jeudi 9 avril 2020

Select random rows according to a given criteria PostgreSQL

I have table user with ten million rows. It has fields: id int4 primary key, rating int4, country varchar(32), last_active timestamp. It has gaps in identifiers. The task is to select five random users for a given country which were active in a period of last two days and has rating in a given range. Is there a tricky way to select them faster than the query below?

SELECT id
FROM user
WHERE last_active > '2020-04-07'
    AND rating between 200 AND 280
    AND country = 'US'
ORDER BY random()
LIMIT 5

It thought about this query:

SELECT id
FROM user
WHERE last_active > '2020-04-07'
    AND rating between 200 AND 280
    AND country = 'US'
    AND id > (SELECT random()*max(id) FROM user)
ORDER BY id ASC
LIMIT 5

but the problem is that there lots of inactive users with small identifier values, the majority of new users are in the end of the id range. So, this query would select some users too often.




Aucun commentaire:

Enregistrer un commentaire