lundi 24 mai 2021

SQL : select n. random groups from a table

I have a table on Bigquery that looks like this:

authorId text     textid 
--------------------------     
12         bla      234
22         cat      134
22         fish    1312
33         dog      432
33         catcall  442
12         ..  
53
..

The table is very big with more than 100 million author Ids. I want to sample n=1000 random authors and each row from those authors. How can I do that?

The expected results would look like this for n=2

authorId text     textid 
------------------------    
12         bla      234
33         dog      432
33         catcall  442
12         ..  
..

I was thinking to create a list of random authors and then select rows where author id appears in the list, but I am not sure it is the best way to proceed.

In case it was, I can create a table with the unique ids as follows but I don't know how to then subset table1 based on this..

WITH table2 AS 
(
    SELECT authorId  
    FROM table1  
    GROUP BY authorId 
    ORDER BY RAND() 
    LIMIT 1000
)



Aucun commentaire:

Enregistrer un commentaire