I have data with the following structure:
CREATE TABLE if not EXISTS scores (
id int,
class char,
score float
);
INSERT INTO scores VALUES
(1, 'A', 0.5),
(1, 'B', 0.2),
(1, 'C', 0.1),
(2, 'A', 0.1),
(2, 'B', 0.2),
(3, 'D', 0.01),
(4, 'A', 0.5),
(4, 'B', 0.5);
I want to randomly sample, for each id, a class. A possible sample obtained would be:
1,'A'
2,'B'
3,'D'
4,'A'
The logic I want to use for the sample is the following: each class is sampled proportionally to its score. For instance:
- In
id = 1
, sampling class 'B' should be twice as likely to be sampled than class 'C'. - In
id = 2
, sampling class 'B' should be twice as likely to be sampled than class 'A'. - In
id = 3
, we should only sample class 'D'. - In
id = 4
, sampling class 'B' should be as likely as sampling class 'A'.
I'm looking for ways of doing this in BigQuery/PostgreSQL. Also, are there solutions with a fixed random seed that can be reproduced?
Thanks!
Aucun commentaire:
Enregistrer un commentaire