lundi 12 décembre 2016

How to understand autocorrelations caused by seeding a RNG too much?

In response to this question I ran the following VBA experiment:

Sub Test()
    Dim i As Long, A As Variant
    Dim count1 As Long, count2 As Long
    ReDim A(1 To 10000)

    For i = 1 To 10000
        Randomize
        A(i) = IIf(Rnd() < 0.5, 0, 1)
    Next i

    'count how often A(i) = A(i+1)
    For i = 1 To 9999
        If A(i) = A(i + 1) Then count1 = count1 + 1
    Next i

    For i = 1 To 10000
        A(i) = IIf(Rnd() < 0.5, 0, 1)
    Next i

    'count how often A(i) = A(i+1)
    For i = 1 To 9999
        If A(i) = A(i + 1) Then count2 = count2 + 1
    Next i

   Debug.Print "First Loop: " & count1
   Debug.Print "Second Loop: " & count2 & vbCrLf

End Sub

When I saw output like this:

First Loop: 5550
Second Loop: 4976

I was pretty sure that I knew what was happening: VBA was converting the system clock into something of lower resolution (perhaps microsecond) which as a consequence would lead to Randomize sometimes producing identical seeds in two or more passes through the loop. In my original answer I even confidently asserted this. But then I ran the code some more and noticed that the output was sometimes like this:

First Loop: 4449
Second Loop: 5042

The overseeding is still causing a noticeable autocorrelation -- but in the opposite (and unexpected) direction. Successive passes through the loop with the same seed should produce identical outputs, hence we should see successive values agreeing more often than chance would predict, not disagreeing more often than chance would predict.

Curious now, I modified the code to:

Sub Test2()
    Dim i As Long, A As Variant
    Dim count1 As Long, count2 As Long
    ReDim A(1 To 10000)

    For i = 1 To 10000
        Randomize
        A(i) = Rnd()
    Next i

    'count how often A(i) = A(i+1)
    For i = 1 To 9999
        If A(i) = A(i + 1) Then count1 = count1 + 1
    Next i

    For i = 1 To 10000
        A(i) = Rnd()
    Next i

    'count how often A(i) = A(i+1)
    For i = 1 To 9999
        If A(i) = A(i + 1) Then count2 = count2 + 1
    Next i

   Debug.Print "First Loop: " & count1
   Debug.Print "Second Loop: " & count2 & vbCrLf

End Sub

Which always gives the following output:

First Loop: 0
Second Loop: 0

It seems that it isn't the case that successive calls to Randomize sometimes returns the same seed (at least not often enough to make a difference).

But if that isn't the source of the autocorrelation -- what is? And -- why does it sometimes manifest itself as a negative rather than a positive autocorrelation?




Aucun commentaire:

Enregistrer un commentaire