jeudi 3 novembre 2022

Random Select 2 Records from all categories Groups in MariaDB

I'm trying to return the following result in a MariaDB online compiler (inspired from this old solution Select 2 products from each category in MySQL ):

Return random groups of 2 records from each categories of database test1.csv.

test1 dataset:

id,name,category
150,apple,fruit
153,orange,fruit
187,kiwi,fruit
189,banana,fruit
142,beef,meat
141,pork,meat
143,rabbit,meat
146,chicken,meat
410,pepsi,drinks
413,milk,drinks
418,coffee,drinks
415,mate,drinks
581,carrot,vegetables
586,tomato,vegetables
589,cucumber,vegetables
582,broccoli,vegetables

I reproduce the test on MariaDB Compiler here: https://onecompiler.com/mariadb/3ymv4a6e6

gif

mp4

Code:

-- create
CREATE TABLE test1 (
  id INTEGER PRIMARY KEY,
  name TEXT NOT NULL,
  category TEXT NOT NULL
);

-- insert
INSERT INTO test1 VALUES (150,'apple','fruit');
INSERT INTO test1 VALUES (153,'orange','fruit');
INSERT INTO test1 VALUES (187,'kiwi','fruit');
INSERT INTO test1 VALUES (189,'banana','fruit');
INSERT INTO test1 VALUES (142,'beef','meat');
INSERT INTO test1 VALUES (141,'pork','meat');
INSERT INTO test1 VALUES (143,'rabbit','meat');
INSERT INTO test1 VALUES (146,'chicken','meat');
INSERT INTO test1 VALUES (410,'pepsi','drinks');
INSERT INTO test1 VALUES (413,'milk','drinks');
INSERT INTO test1 VALUES (418,'coffee','drinks');
INSERT INTO test1 VALUES (415,'mate','drinks');
INSERT INTO test1 VALUES (581,'carrot','vegetables');
INSERT INTO test1 VALUES (586,'tomato','vegetables');
INSERT INTO test1 VALUES (589,'cucumber','vegetables');
INSERT INTO test1 VALUES (582,'broccoli','vegetables');

-- fetch 
SET @prev=0,@rownum=0;
SELECT id, name, category 
FROM (
  SELECT *, 
         IF( @prev <> category, 
             @rownum := 1, 
             @rownum := @rownum+1 
         ) AS rank, 
         @prev := category, 
         @rownum  
  FROM (
    SELECT * FROM test1 
    ORDER BY category, rand()
  ) random_test1
) test1_ranked 
WHERE rank <= 2;

As shown above the code only returns 2 random records and from one single and same category.

An example of intended result on given dataset test1 would be:

2 records (or more, randomly selected) x 4 categories (all available categories, will be much more in production dataset)

For example:

150,apple,fruit
153,orange,fruit
142,beef,meat
141,pork,meat
418,coffee,drinks
415,mate,drinks
589,cucumber,vegetables
582,broccoli,vegetables

What is the way to control/expand the random selection to all categories and not just to a single one?

I looked at the order By and the rank and Rand docs but I couldn't find mentions for subcategories/groups application.

I found a similar python and pandas example of the exact expected result: 5. Pandas sample rows by group

df.groupby('color').apply(lambda x: x.sample(n=3)).reset_index(drop = True)

How to achieve the same result in MariaDb?

Your help is appreciated.




Aucun commentaire:

Enregistrer un commentaire