jeudi 5 juillet 2018

Random row selection with weighted filters in SQL/PostgreSQL

I have a questions table. The questions need to be filtered according to multiple criteria (subject, institution, area, etc.), each with different weights.

The filters weight are dynamically setted and normalized outside the query. Ex.:

  1. Subject 1 — 0.4
  2. Subject 2 — 0.1
  3. Subject 3 — 0.5
  4. Institution 1 — 0.2
  5. Institution 2 — 0.04
  6. Institution 3 — 0.76
  7. Area 1 — 1

Some other points:

  • The questions table have ~500k rows;
  • The filters are N — N with the questions;
  • After the filtering, I want to limit the returned rows;
  • I’m very concerned with the performance of this query.

To illustrate, if I didn’t want to weight the filters, I would do something like that:

SELECT
    *
FROM
    public.questions q
INNER JOIN
    public.subjects_questions sq
ON
    q.id = sq.question_id
INNER JOIN
    public.subjects s
ON
    s.id = sq.subject_id
INNER JOIN
    public.institutions_questions iq
ON
    iq.question_id = q.id
INNER JOIN
    public.institutions i
ON
    i.id = iq.institution_id
INNER JOIN
    public.areas_questions aq
ON
    aq.question_id = q.id
INNER JOIN
    public.areas a
ON
    a.id = aq.area_id
WHERE
    s.id IN :subjects
AND a.id IN :areas
AND i.id IN :institutions
ORDER BY
    random() limit 200

Desired output:

Question — Subject — Institution — Area

I thought in something along the lines:

  1. Create a CTE with the questions returned by the filter; must consider that the same question can be returned by more than one filter — do I need to evaluate each filter apart and UNION ALL then to solve this? Must assign, too, from what filter the question came from;
  2. Create another CTE with weights and the respective filter associated;
  3. JOIN the CTE’s, but at this point the questions must be grouped and the weights SUMmed;
  4. Apply a Window Function and return the results, limitted to X rows (LIMIT X).

How would you write such query / solve this problem?




Aucun commentaire:

Enregistrer un commentaire