jeudi 24 mars 2016

Random function in DB2 is not evenly distributed

I find this completely astounding but the rand() function in DB2 occasionally returns a value of one. Consider this select against a table that has about 150K rows in it:

select integer(rand()*10) as Num, count(*) as N
from TabWithAbout150KRows
group by integer(rand()*10)
order by 1 desc;

In most languages/DB's, etc, I'd expect this to return 10 rows of data, with the distribution being roughly equal. What I actually get is 11 rows, as in the following:

Num       N
---   -----
10       12 
9     14871 
8     14975 
7     15213 
6     15004 
5     15196 
4     14998 
3     14916 
2     14926 
1     15081 
0     15017 

Shocking! In my use case I'm updating rows in a table and want to assign a random value, but it needs to be randomly distributed as opposed to the horrible situation above.

So I'm currently thinking I'll have to do the update multiple times in a loop, continuing in the 2nd...nth iterations to try again for the rows that were unlucky enough to end up with rand()=1.0

Or, I could use rand()/1.00001, but this is just silly (and not evenly distributed, either)!

Any ideas on a better way to approach this (without, for example, writing UDF's, etc, would be appreciated).




Aucun commentaire:

Enregistrer un commentaire