mercredi 2 mars 2016

18 trillion coin tosses, where did I go wrong?

A little fun to distract you from all the important stuff...

Re-reading Iain M. Banks' seminal science fiction novel Consider Phlebas, I took issue with Banks' assertion that "toss 18 trillion coins in the air and a few of them are going to keep landing the same way up for a long, long time". I think not, after all 2^44 = 17,592,186,044,416, so I'd expect the longest same side sequence to be somewhere in the low to mid 40's, and probably 44.

I always enjoy doing little experiments and so, to see what the longest sequence I would actually get was, I wrote a Python program to count the sequences of the same side in a row for 18 trillion coin tosses and did a test run of 1,000,000 to see how long it would take. Oops, Python was a poor choice, 18 trillion 'coin tosses' would have taken 229.2 days. Umm, I thought, this calls for C.

A few minutes later the C code was written and I'd worked out that it would take only 2.5 days to run on my powerful Linux desktop. I didn't really want to leave my desktop on for all that time so I copied the code to a 24/7 Linux server that I have an account on, and worked out that it would take 3.6 days to run on that, that's doable I thought and started it running.

Some days later, earlier today in fact, I took a look at the progress and saw that after 15,680,103,708,266 (15.7 trillion) coin tosses the longest same side in a row sequence so far was only 29.

Odd, I thought, that's way off what I should have, and I remembered that in some test runs on my desktop I had got sequences of over 30 while still in the low tens of billions of coin tosses. I ran it again on my desktop to see and, sure enough, after only 4,684,457,663 (4.7 billion) coin tosses the longest sequence was already 31, since 2^31 = 2,147,483,648 that sounded about right.

So why have I got a sequence of only 29 on the server after 15.7 trillion coin tosses but a sequence of 31 after only 4.7 billion on my desktop?

Modulo bias was my first thought. I wrote some code to print RAND_MAX - both desktop and server gave me the same value, 2,147,483,647 (a 32 bit signed long). So the rand() function will give me a number 0 <= rand() <= 2,147,483,647. 0 is even and 2,147,483,647 is odd, so unless I'm very much mistaken there's no modulo bias introduced by my int rand_num = (rand() % 2); line of code.

Then I thought about how the C standard library's pseudo-random number generator is not considered adequate for cryptography. Surely that could not be a factor when generating, admittedly really rather long, sequences of zeros and ones. Could it?

I compiled the code with the same gcc command on both desktop and server, the version of gcc that's running on the desktop is 4.8.4, on the server it's 4.9.2. Not knowing how to find which version of the C libraries both machines had I settled on diffing the 2 versions of /usr/include/stdlib.h, they were identical. The desktop's OS is Linux Mint 17, the server's OS is Debian 8. But I can't imagine these things would matter.

Clearly I've gone wrong somewhere, and my C is somewhat rusty to say the least, but for the life of my I can't see what the problem is. What was supposed to be a 15 minute experiment (with the results to look at 3 days later) has left me baffled but determined to resolve.

Can you C gurus enlighten me please? Many thanks.

Here's the source:

Compiled on both machines using: gcc -O3 -o 18TCT 18TrillionCoinTosses.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(int argc, char* argv[])
{
    srand(time(NULL));

    int current_seq = 0;
    int longest_seq = 0;
    int prev_rand_num = -1;

    long long i = 0;
    long long total = 18000000000000;

    // To serve as a rudimentary progress indicator.
    long billion_counter = 0;
    long billion = 1000000000;

    while (i < total)
    {
        int rand_num = (rand() % 2);

        if (rand_num == prev_rand_num)
        {
            current_seq++;

            if (current_seq >= longest_seq)
            {
                longest_seq = current_seq;
                printf("Longest sequence so far: %d (on iteration %lli)\n", longest_seq, i);
            }
        }
        else
            current_seq = 1;

        if (billion_counter == billion)
        {
            billion_counter = 0;
            printf("Progress report, current iteration: %lli\n", i);
        }

        prev_rand_num = rand_num;

        i++;
        billion_counter++;
    }

    printf("\nTotal coins tossed: %lli\n", i);
    printf("Longest sequence: %d\n", longest_seq);
}




Aucun commentaire:

Enregistrer un commentaire