jeudi 28 octobre 2021

SQL Server CHOOSE() function behaving unexpectedly with RAND() function

I've encountered an interesting SQL server behaviour while trying to generate random values in T-sql using RAND and CHOOSE functions.

My goal was to try to return one of two given values using RAND() as rng. Pretty easy right?

For those of you who don't know it, CHOOSE function accepts in an index number(int) along with a collection of values and returns a value at specified index. Pretty straightforward.

At first attempt my SQL looked like this:

    select choose(ceiling((rand()*2)) ,'a','b')

To my surprise, this expression returned one of three values: null, 'a' or 'b'. Since I didn't expect the null value i started digging. RAND() function returns a float in range from 0(included) to 1 (excluded). Since I'm multiplying it by 2, it should return values anywhere in range from 0(included) to 2 (excluded). Therefore after use of CEILING function final value should be one of: 0,1,2. After realising that i extended the value list by 'c' to check whether that'd be perhaps returned. I also checked the docs page of CEILING and learnt that:

Return values have the same type as numeric_expression.

I assumed the CEILINGfunction returned int, but in this case would mean that the value is implicitly cast to int before being used in CHOOSE, which sure enough is stated on the docs page:

If the provided index value has a numeric data type other than int, then the value is implicitly converted to an integer.

Just in case I added an explicit cast. My SQL query looks like this now:

    select choose(cast(ceiling((rand()*2)) as int) ,'a','b','c')

However, the result set didn't change. To check which values cause the problem I tried generating the value beforehand and selecting it alongside the CHOOSE result. It looked like this:

    declare @int int = cast(ceiling((rand()*2)) as int)
    select @int,choose( @int,'a','b','c') 

Interestingly enough, now the result set changed to (1,a), (2,b) which was my original goal. After delving deeper in the CHOOSE docs page and some testing i learned that 'null' is returned in one of two cases:

  1. Given index is a null
  2. Given index is out of range

In this case that would mean that index value when generated inside the SELECT statement is either 0 or above 2/3 (I'm assuming that negative numbers are not possible here and CHOOSE function indexes from 1). As I've stated before 0 should be one of possibilities of:

    ceiling((rand()*2))

,but for some reason it's never 0 (at least when i tried it 1 million+ times like this)

    set nocount on
    
    declare @test table(ceiling_rand int)
    declare @counter int = 0
    
    while @counter<1000000
    begin
    insert into @test 
    select ceiling((rand()*2))
    
    set @counter=@counter+1
    end
    
    select distinct ceiling_rand from @test

Therefore I assume that the value generated in SELECT is greater than 2/3 or NULL. Why would it be like this only when generated in SELECT statement? Perhaps order of resolving CAST, CELING or RAND inside SELECT is different than it would seem? It's true I've only tried it a limited number of times, but at this point the chances of it being a statistical fluctuation are extremely small. Is it somehow a floating-point error? I truly am stumbled and looking forward to any explanation.

TL;DR: When generating a random number inside a SELECT statement result set of possible values is different then when it's generated before the SELECT statement.

Cheers, NFSU

EDIT: Formatting




Aucun commentaire:

Enregistrer un commentaire