It seems that in Haskell, the behavior of split
can be strongly dependent on the chosen (pseudo)random number generator (PRNG).
I was drawn to split
by looking at various ways within the API to generate random numbers.
Say we take the comparatively recent Threefish generator.
We will need:
λ> import System.Random
λ> import System.Random.TF
λ> import Control.Monad.Random
Next, we generate sequences of 20 two-digits numbers in 3 different ways:
λ> tg0 = mkTFGen 42
λ>
λ> evalRand (sequence (replicate 20 $ getRandomR (10,99))) tg0
[62,99,78,18,38,21,54,25,54,94,62,57,55,69,98,78,99,73,59,24]
λ>
λ> take 20 evalRand (sequence (repeat getRandomR (10,99))) tg0
[62,99,78,18,38,21,54,25,54,94,62,57,55,69,98,78,99,73,59,24]
λ>
λ> take 20 $ evalRand (getRandomRs (10,99)) tg0
[62,99,78,18,38,21,54,25,54,94,62,57,55,69,98,78,99,73,59,24]
λ>
We get 3 times the same sequence, regardless of whether we're take'ing a prefix of an unlimited sequence or not.
But this is not so for the “standard” StdGen generator. Here:
λ>
λ> sg0 = mkStdGen 42
λ>
λ> evalRand (sequence (replicate 20 $ getRandomR (10,99))) sg0
[69,79,35,32,84,89,76,39,16,73,73,37,34,96,12,23,45,59,31,34]
λ>
λ> take 20 evalRand (sequence (repeat getRandomR (10,99))) sg0
[69,79,35,32,84,89,76,39,16,73,73,37,34,96,12,23,45,59,31,34]
λ>
λ> take 20 $ evalRand (getRandomRs (10,99)) sg0
[33,66,18,33,73,38,73,35,59,82,42,44,48,20,58,21,89,14,43,14]
λ>
So here, the third sequence differs from the other ones. It turns out that this is because getRandomRs calls split
on the current generator.
This can be checked directly in this way:
λ>
λ> (sg0a, sg0b) = split sg0
λ> (tg0a, tg0b) = split tg0
λ>
λ>: {
|λ> let { getRandomDouble::RandomGen g => g -> Double ;
|λ> getRandomDouble g = fst $ random g }
|λ>: }
λ>
λ> λ>
λ> getRandomDouble tg0
0.6499718678721916
λ> getRandomDouble tg0a
0.6499718678721916
λ>
λ> getRandomDouble tg0b
0.7724686371301966
λ>
λ> getRandomDouble sg0
1.0663729393723398e-2
λ> getRandomDouble sg0a
0.36531519389010025
λ>
λ> getRandomDouble sg0b
0.7740913257381021
λ>
So, in the case of the Threefish generator, the left component of the split (used by getRandomRs
) generates the same sequence as the original. For the StdGen generator, it generates a different sequence, leading to the surprising result noted in the beginning.
The source code for the stdSplit
function is here. It includes a comment:
-- no statistical foundation for this!
Question: is the behavior of stdSplit
legal (in conformity with the language standard) and/or statistically legitimate? I understand there might be a necessity not to break existing client code, but are there guidelines about this to be applied to newer PRNGs?
__Note: __ It is rather unusual for a PRNG API to provide a split
function. It is more common to provide an advance
function, like here in Python/NumPy for example, and extra generators can be created as required using that advance
function. For some PRNGs, the advance
function has a cost of only O(log(numSteps)).
Aucun commentaire:
Enregistrer un commentaire