I have a table on Bigquery that looks like this:
authorId text textid
--------------------------
12 bla 234
22 cat 134
22 fish 1312
33 dog 432
33 catcall 442
12 ..
53
..
The table is very big with more than 100 million author Ids. I want to sample n=1000 random authors and each row from those authors. How can I do that?
The expected results would look like this for n=2
authorId text textid
------------------------
12 bla 234
33 dog 432
33 catcall 442
12 ..
..
I was thinking to create a list of random authors and then select rows where author id appears in the list, but I am not sure it is the best way to proceed.
In case it was, I can create a table with the unique ids as follows but I don't know how to then subset table1 based on this..
WITH table2 AS
(
SELECT authorId
FROM table1
GROUP BY authorId
ORDER BY RAND()
LIMIT 1000
)
Aucun commentaire:
Enregistrer un commentaire