mardi 1 juin 2021

Sample from groups proportional to score

I have data with the following structure:

CREATE TABLE if not EXISTS scores (
  id int,
  class char,
  score float
);

INSERT INTO scores VALUES
(1, 'A', 0.5),
(1, 'B', 0.2),
(1, 'C', 0.1),
(2, 'A', 0.1),
(2, 'B', 0.2),
(3, 'D', 0.01),
(4, 'A', 0.5),
(4, 'B', 0.5);

I want to randomly sample, for each id, a class. A possible sample obtained would be:

1,'A'
2,'B'
3,'D'
4,'A'

The logic I want to use for the sample is the following: each class is sampled proportionally to its score. For instance:

  • In id = 1, sampling class 'B' should be twice as likely to be sampled than class 'C'.
  • In id = 2, sampling class 'B' should be twice as likely to be sampled than class 'A'.
  • In id = 3, we should only sample class 'D'.
  • In id = 4, sampling class 'B' should be as likely as sampling class 'A'.

I'm looking for ways of doing this in BigQuery/PostgreSQL. Also, are there solutions with a fixed random seed that can be reproduced?

Thanks!




Aucun commentaire:

Enregistrer un commentaire