I observed the following unexpected behaviour when using ScalaCheck's Gen.pic, which (for me) indicates that its picking is not quite random, even though its documentation says so:
/** A generator that picks a given number of elements from a list, randomly */
I ran the below three little programs in order (in a span of 2 days, at different times, as it might matter) after setting
implicit override val generatorDrivenConfig = PropertyCheckConfig(
maxSize = 1000,
minSize = 1000,
minSuccessful = 1000)
to get a decent sample size.
Program #1
val set = Set(1,2,3,4,5,6,7,8,9,10,
11,12,13,14,15,16,17,18,19,20,
21,22,23,24,25,26,27,28,29,30,
31,32,33,34,35,36,37,38,39,40,
41,42,43,44,45,46,47,48,49,50)
// Thanks to @Jubobs for the solution
// See: http://ift.tt/2qRe4yp
val g = Gen.pick(3, set).map { _.toList }
forAll (g) { s => println(s) }
Out of the 3000 numbers generated at 2 different runs I got a surprisingly similar, and quite non-random distribution (numbers are rounded, only top 5 listed, as for all listing from here on):
- Number: frequency in run #1, frequency in run #2
- 15: 33%, 33%
- 47: 22%, 22%
- 4: 15%, 16%
- 19: 10%, 10%
- 30: 6%, 6%
(Disclaimer: I couldn't find how to create a table here other then this way)
Program 2
val list: List[Int] = List.range(1, 50)
val g = Gen.pick(3, list)
forAll (g) { s => println(s) }
In case of using a List, the numbers seem to get "stuck" at the end of the range (3x1000 numbers in case of both runs):
- 49: 33%, 33%
- 48: 22%, 22%
- 47: 14%, 14%
- 46: 10%, 10%
- 45: 6%, 6%
Interestingly, the frequencies are pretty much the same as in the case of Program 1.
Remark: I repeated the runs for lists up to 10 times, and experienced the very same distribution with +/- 1% differences, just didn't want to list all the numbers here in this strange "table" format.
Program 3
Just to spice up things a bit, I ran a third little snippet, creating the Set (Program 1) from a List (Program 2):
val set: Set[Int] = List.range(1, 50).toSet
val g = Gen.pick(3, set).map { _.toList }
forAll (g) { s => println(s) }
Now the numbers are the same as for Program 2 (List wins!), although the frequencies (again, for 3*1000 numbers in 2 runs) got slightly different at the end:
- 49: 33%, 33%
- 48: 23%, 22%
- 47: 16%, 15%
- 46: 9%, 10%
- 45: 7%, 6%
Question
Even though the sample size is not enough (as it is never enough) to tell true randomness, I can't help but question Gen.pick's claimed randomness (as far as using it out-of-the-box, I might need to set some seed for it to work "more" randomly), since numbers got "stuck", and frequencies are almost the same.
Upon looking at Gen.pick's source code, at line #672 a certain seed0 is used:
def pick[T](n: Int, l: Iterable[T]): Gen[Seq[T]] = {
if (n > l.size || n < 0) throw new IllegalArgumentException(s"invalid choice: $n")
else if (n == 0) Gen.const(Nil)
else gen { (p, seed0) =>
// ...
which I can't find defined anywhere else (in Gen.scala source code, or in scala.util.Random documentation), but I have a hunch it might have something to do with the observed behaviour. Is this expected behaviour of Gen.pick? If so, how can I get "more" random picking?
Aucun commentaire:
Enregistrer un commentaire