I have written the following code to perform a three-letter monkey test on the notorious Randu number generator.
from itertools import product
from scipy.stats import chisquare
from numpy.random import randint
multiple = 10
number_of_random_numbers = 17576*multiple #There are 17576 possible three letter words. Therefore it's easy use a sequence length of
# 17576 times some natural number.
expectation = multiple #Given that the null hypothesis is true
# (equivalent to saying: given that the random sequence is iid and uniform) we expect
# that each word occurs once every 17576 times. Therefore the expected occurrence of each
# combinations is equal to the multiple. If we have a random sequence of length 2*17576 we
# expect each word to occur twice.
def seedlcg(init_Val):
global rand
rand = init_Val
def lcg_randu():
a = 65539
c = 0
m = 2**31
global rand
rand = (a*rand + c) % m
return rand/m
seedlcg(30000)
random_sequence = []
for i in range(1,number_of_random_numbers):
random_sequence.append(lcg_randu())
# def collect_words():
# This function collects the observed words from the sequence of random numbers. It
# returns a list that contains the observed words.
def collect_words(random_sequence):
list_of_words = []
for i in range(0,len(random_sequence)):
list_of_words.append((random_sequence[i-2],random_sequence[i-1], random_sequence[i]))
return list_of_words
# def initiate_list_of_possible_words():
# This function initiates a list of all possible words and returns that list.
def initiate_list_of_possible_words():
alphabet = [x for x in range(1,27)]
list_of_possible_words = [letter for letter in product(alphabet, repeat=3)]
return list_of_possible_words
# def count_words(observed_words):
# This function checks how many times each possible word combination occurred. It returns
# a list with the number of occurrences for each word.
def count_words(observed_words):
list_of_possible_words = initiate_list_of_possible_words()
word_count = []
for word in list_of_possible_words:
word_count.append(observed_words.count(word))
return word_count
word_combinations = collect_words(random_sequence) #Collect the observed words from the sequence
word_count = count_words(word_combinations) #Collect how many times each possible word has occurred.
print(chisquare(word_count, multiple)) #Compute and print the Chisquare statistic and the corresponding p-value.
When I run this code I get the following result:
Power_divergenceResult(statistic=117915.47220000002, pvalue=0.0)
This code is supposed to count the occurrences of every possible three letter word in the generated random sequence and then perform a chi-square test. I understand that this code can probably be made to run ten times faster. However, I don't really believe that this chi-square test statistic has a p value of zero (this would mean that the monkey test would reject the the idd-ness of the Randu generator).
Question: Have I made a mistake in my code? Is there something I don't comprehend that results in this extreme p-value?
Aucun commentaire:
Enregistrer un commentaire