Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> They cached my calls to rand() way too often although documentation promised not to do that.

For some DBs (SQL Server definitely), RAND() and similar are handled as if they are deterministic and so are called once per use. For instance:

    SELECT TOP 10 RAND() FROM sys.objects
    SELECT TOP 10 RAND() FROM sys.objects
just returned ten lots of 0.680862566387624 and ten lots of 0.157039657790194.

    SELECT TOP 10 RAND(), RAND(), RAND()-RAND() FROM sys.objects
returns a different value for each column (0.451758385842036 & 0.0652620609942665, -0.536618123021777), so the optimisation is per use not per statement or even per column (if it were per column that last value would be 0, or as close to as floating point arithmetic oddities allow).

This surprises a lot of people when they try “… ORDER BY RAND()” and get the same order on each run.

One workaround for this is to use a non-deterministic function like NEWID(), though you need some extra jiggery-pokery to get a 0≤v<1 value to mimic rand:

    SELECT TOP 10 CAST(CAST(CAST(NEWID() AS VARBINARY(4)) AS BIGINT) AS FLOAT)/(4.0*1024*1024*1024) FROM sys.objects
For the example of sorting, the outer cast is not needed. You might think just using “ORDER BY NEWID()” would be sufficient, but that is an undefined behaviour so you shouldn't rely upon it. It might work now, a quick test has just worked as expected here, but at any point the optimiser could decide it is more efficient to consider all UUIDs as having the same weight for sorting purposes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: