I have a questions table. The questions need to be filtered according to multiple criteria (subject, institution, area, etc.), each with different weights.
The filters weight are dynamically setted and normalized outside the query. Ex.:
- Subject 1 — 0.4
- Subject 2 — 0.1
- Subject 3 — 0.5
- Institution 1 — 0.2
- Institution 2 — 0.04
- Institution 3 — 0.76
- Area 1 — 1
Some other points:
- The questions table have ~500k rows;
- The filters are N — N with the questions;
- After the filtering, I want to limit the returned rows;
- I’m very concerned with the performance of this query.
To illustrate, if I didn’t want to weight the filters, I would do something like that:
SELECT
*
FROM
public.questions q
INNER JOIN
public.subjects_questions sq
ON
q.id = sq.question_id
INNER JOIN
public.subjects s
ON
s.id = sq.subject_id
INNER JOIN
public.institutions_questions iq
ON
iq.question_id = q.id
INNER JOIN
public.institutions i
ON
i.id = iq.institution_id
INNER JOIN
public.areas_questions aq
ON
aq.question_id = q.id
INNER JOIN
public.areas a
ON
a.id = aq.area_id
WHERE
s.id IN :subjects
AND a.id IN :areas
AND i.id IN :institutions
ORDER BY
random() limit 200
Desired output:
Question — Subject — Institution — Area
I thought in something along the lines:
- Create a CTE with the questions returned by the filter; must consider that the same question can be returned by more than one filter — do I need to evaluate each filter apart and UNION ALL then to solve this? Must assign, too, from what filter the question came from;
- Create another CTE with weights and the respective filter associated;
- JOIN the CTE’s, but at this point the questions must be grouped and the weights SUMmed;
- Apply a Window Function and return the results, limitted to X rows (LIMIT X).
How would you write such query / solve this problem?
Aucun commentaire:
Enregistrer un commentaire