mardi 3 octobre 2017

Can I increase randomness by changing my algorithm?

Here is a generator that I'm currently using:

from random import Random

def shuffle(size):
    """Yield random items from range(size) without replacement."""
    pool = list(range(size))
    rng = Random()
    while pool:
        yield pool.pop(rng.randrange(len(pool)))

As I use this generator, it seems less random than it could be. For example, the first 4 items often all end up in either the first or second half of the result.

An obvious change I could make is this:

def shuffle(size):
    """Yield random items from range(size) without replacement."""
    pool = list(range(size))
    rng = Random()
    while pool:
        i = rng.randrange(len(pool))
        yield pool[i]
        pool[i] = pool[-1]
        del pool[-1]

I prefer the first example for simplicity, but the second one mixes things up a bit more. Is there a way to prove whether or not the second example would be more random, perhaps by citing weaknesses in the Mersenne Twister algorithm (which Python uses)?

If it's not possible to prove anything one way or the other, how would I test both algorithms for randomness? I know I need to write a test with many trials, but I have no idea how to analyze the results.

I don't want to use random.sample, because I want my final list to be partially sorted, and I think a generator is better for that.




Aucun commentaire:

Enregistrer un commentaire