mercredi 18 mai 2016

Random generator unexpected behavior

I'm working with the System.Random library and I've come across some behavior that I don't fully understand. The shuffle function below is an implementation of the Fischer-Yates shuffle that also works as a random sample without replacement. E.g. calling shuffle with a list and the length of the list will shuffle the whole list, but calling it with a list and the number 2 should extract a random sample of length 2.

import           Control.Monad               as M
import           Control.Monad.ST
import           Data.Vector.Unboxed         as VU
import           Data.Vector.Unboxed.Mutable as VUM
import           System.Random

go = do
  g <- newStdGen
  let (rand_vec1, g1) = randVector 10 g
  let (rand_vec2, g2) = randVector 10 g
  let (rand_sample1, g3) = shuffle rand_vec1 2 g
  let (rand_sample2, g4) = shuffle rand_vec1 2 g
  print rand_vec1
  print rand_vec2
  print rand_sample1
  print rand_sample2

randVector :: (RandomGen g) => Int -> g -> (VU.Vector Int, g)
randVector n = shuffle vector (VU.length vector) where
  vector = VU.enumFromN 0 n

shuffle :: (RandomGen g, Unbox a) => VU.Vector a -> Int -> g -> (VU.Vector a, g)
shuffle li size g = runST $ do
  vector <- VU.unsafeThaw li
  let n = VUM.length vector - 1
  let step g i = do
              let (j,g') = randomR (0,n) g
              VUM.swap vector i j
              return g'
  g' <- M.foldM step g [0..size-1]
  v' <- VU.unsafeFreeze vector
  let vec = VU.take size v'
  return (vec, g')

I notice that rand_vec1 and rand_vec2 are always identical, which is to be expected, since the same random number generator is used.

However, rand_sample1 and rand_sample2 differs even though they both use the same random generator. Even stranger, more than half the time, but not always, rand_sample2 just contains the two first numbers of the vector being sampled from (like in the example below). How come? Example output:

[3,0,4,9,7,2,1,8,5,6]

[3,0,4,9,7,2,1,8,5,6]

[9,2]

[3,0]

(Also, code review is appreciated)




Aucun commentaire:

Enregistrer un commentaire