random: Amazon Redshift - SQL - behavior of RANDOM() when called in multiple ROW

vendredi 15 janvier 2016

Amazon Redshift - SQL - behavior of RANDOM() when called in multiple ROW_NUMBER() ORDER BY clauses

Wondering whether anyone has some insight to this odd behavior I'm experiencing with an Amazon Redshift query, involving multiple ROW_NUMBER() OVER (ORDER BY RANDOM()) columns in a single table (a simple way to draw several random samples from a table).

Minimal working example: let's say I have a table data with one column id and 10 rows. This query:

SELECT id,
       ROW_NUMBER() OVER (ORDER BY RANDOM()) AS rk1,
       ROW_NUMBER() OVER (ORDER BY RANDOM()) AS rk2
FROM data ORDER BY rk1

always gives me something like:

id  rk1 rk2
5   1   1
3   2   2
7   3   3
10  4   4
6   5   5
8   6   6
1   7   7
9   8   8
2   9   9
4   10  10

when I would've expected something like:

id  rk1 rk2
5   1   6
3   2   3
7   3   9
10  4   1
6   5   8
8   6   5
1   7   7
9   8   10
2   9   4
4   10  2

That is, it seems like Redshift is only calculating RANDOM() once on each row despite the two calls, so the orders are always the same. But this is weird because this simple query:

SELECT RANDOM(), RANDOM()

returns something like:

random              random
0.87943100551116793 0.23245479410162884

as expected. So what's going on here? Does it have to do with the call to RANDOM() being in a window function?

Not really a critical issue for my needs, as I can just do these random ranks in separate queries and join, but this would have been a super simple way to do it.

Couldn't find anything on the site that resembled this, although apologies if it's already been asked and addressed. Thanks!!

random

vendredi 15 janvier 2016

Amazon Redshift - SQL - behavior of RANDOM() when called in multiple ROW_NUMBER() ORDER BY clauses

Aucun commentaire:

Enregistrer un commentaire