I have a table partitioned daywise with each partition containing almost 80M rows.
I want to take a random sample of 100000 rows from each partition for a particular month.
Currently I'm doing it using rank within each partition, ordering by rand() and then filtering on the rank but it takes almost 45-60 mins.
Is there a faster way to do the same thing without compromising on the quality of the sample?
Aucun commentaire:
Enregistrer un commentaire