dimanche 7 mai 2017

Is ScalaCheck's Gen.pick really random?

I observed the following unexpected behaviour when using ScalaCheck's Gen.pic, which (for me) indicates that its picking is not quite random, even though its documentation says so:

/** A generator that picks a given number of elements from a list, randomly */

I ran the below three little programs in order (in a span of 2 days, at different times, as it might matter) after setting

implicit override val generatorDrivenConfig = PropertyCheckConfig(
  maxSize = 1000, 
  minSize = 1000, 
  minSuccessful = 1000)

to get a decent sample size.

Program #1

val set = Set(1,2,3,4,5,6,7,8,9,10,
      11,12,13,14,15,16,17,18,19,20,
      21,22,23,24,25,26,27,28,29,30,
      31,32,33,34,35,36,37,38,39,40,
      41,42,43,44,45,46,47,48,49,50)

// Thanks to @Jubobs for the solution
// See: http://ift.tt/2qRe4yp
val g = Gen.pick(3, set).map { _.toList }
forAll (g) { s => println(s) }

Out of the 3000 numbers generated at 2 different runs I got a surprisingly similar, and quite non-random distribution (numbers are rounded, only top 5 listed, as for all listing from here on):

  • Number: frequency in run #1, frequency in run #2
  • 15: 33%, 33%
  • 47: 22%, 22%
  • 4: 15%, 16%
  • 19: 10%, 10%
  • 30: 6%, 6%

(Disclaimer: I couldn't find how to create a table here other then this way)

Program 2

val list: List[Int] = List.range(1, 50)
val g = Gen.pick(3, list)
forAll (g) { s => println(s) }

In case of using a List, the numbers seem to get "stuck" at the end of the range (3x1000 numbers in case of both runs):

  • 49: 33%, 33%
  • 48: 22%, 22%
  • 47: 14%, 14%
  • 46: 10%, 10%
  • 45: 6%, 6%

Interestingly, the frequencies are pretty much the same as in the case of Program 1.

Remark: I repeated the runs for lists up to 10 times, and experienced the very same distribution with +/- 1% differences, just didn't want to list all the numbers here in this strange "table" format.

Program 3

Just to spice up things a bit, I ran a third little snippet, creating the Set (Program 1) from a List (Program 2):

val set: Set[Int] = List.range(1, 50).toSet
val g = Gen.pick(3, set).map { _.toList }
forAll (g) { s => println(s) }

Now the numbers are the same as for Program 2 (List wins!), although the frequencies (again, for 3*1000 numbers in 2 runs) got slightly different at the end:

  • 49: 33%, 33%
  • 48: 23%, 22%
  • 47: 16%, 15%
  • 46: 9%, 10%
  • 45: 7%, 6%

Question

Even though the sample size is not enough (as it is never enough) to tell true randomness, I can't help but question Gen.pick's claimed randomness (as far as using it out-of-the-box, I might need to set some seed for it to work "more" randomly), since numbers got "stuck", and frequencies are almost the same.

Upon looking at Gen.pick's source code, at line #672 a certain seed0 is used:

def pick[T](n: Int, l: Iterable[T]): Gen[Seq[T]] = {
    if (n > l.size || n < 0) throw new IllegalArgumentException(s"invalid choice: $n")
    else if (n == 0) Gen.const(Nil)
    else gen { (p, seed0) =>
    // ...

which I can't find defined anywhere else (in Gen.scala source code, or in scala.util.Random documentation), but I have a hunch it might have something to do with the observed behaviour. Is this expected behaviour of Gen.pick? If so, how can I get "more" random picking?




Aucun commentaire:

Enregistrer un commentaire