I guess many as me would always avoid using rand(), or rand()%N because we've been always thought that is pretty bad. I was curious to see how "wrong" random integers generated with c rand()%N effectively are. This is also a follow up of Ryan Reich's answer in How to generate a random integer number from within a range.
The explanation there sounds very convincing to be honest, nevertheless, I thought giving it a try. So, I compare the distributions in a VERY naïve way. I just run both random generators for different numbers of samples and domains. I didn't see the point in computing a density instead of the histograms, so I just computed histograms and just by looking I would say they look both as uniform. Regarding the other point that was raised, about the actual randomness (despite being uniformly distributed). I -again naïvly, compute the permutation entropy for this runs, which is the same for both sample sets, which tell us that there's no difference between both regarding the ordering of the occurrence.
So, for many purposes, it seems to me that rand()%N would be just fine, how can we see their flaws?
Here I show you a very simple, inefficient and not very elegant (but i think correct) way of computing these samples and get the histograms together with the permutation entropies. I show plots for domains (0,i) with i in {5,10,25,50,100} for different number of samples:
There's not much to see on the code I guess, so I will leave here both the C and the matlab code for replication purposes.
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main(int argc, char *argv[]){
unsigned long max = atoi(argv[2]);
int samples=atoi(argv[3]);
srand(time(NULL));
if(atoi(argv[1])==1){
for(int i=0;i<samples;++i)
printf("%ld\n",rand()%(max+1));
}else{
for(int i=0;i<samples;++i){
unsigned long
num_bins = (unsigned long) max + 1,
num_rand = (unsigned long) RAND_MAX + 1,
bin_size = num_rand / num_bins,
defect = num_rand % num_bins;
long x;
do {
x = rand();
}
while (num_rand - defect <= (unsigned long)x);
printf("%ld\n",x/bin_size);
}
}
return 0;
}
and the matlab code to plot this and compute the PEs (the recursion for the permutations I took it from: https://www.mathworks.com/matlabcentral/answers/308255-how-to-generate-all-possible-permutations-without-using-the-function-perms-randperm):
system('gcc randomTest.c -o randomTest.exe;');
max = 100;
samples = max*10000;
trials = 200;
system(['./randomTest.exe 1 ' num2str(max) ' ' num2str(samples) ' > file1'])
system(['./randomTest.exe 2 ' num2str(max) ' ' num2str(samples) ' > file2'])
a1=load('file1');
a2=load('file2');
uni = figure(1);
title(['Samples: ' num2str(samples)])
subplot(1,3,1)
h1 = histogram(a1,max+1);
title('rand%(max+1)')
subplot(1,3,2)
h2 = histogram(a2,max+1);
title('Integer arithmetic')
as=[a1,a2];
ns=3:8;
H = nan(numel(ns),size(as,2));
for op=1:size(as,2)
x = as(:,op);
for n=ns
sequenceOcurrence = zeros(1,factorial(n));
sequences = myperms(1:n);
sequencesArrayIdx = sum(sequences.*10.^(size(sequences,2)-1:-1:0),2);
for i=1:numel(x)-n
[~,sequenceOrder] = sort(x(i:i+n-1));
out = sequenceOrder'*10.^(numel(sequenceOrder)-1:-1:0).';
sequenceOcurrence(sequencesArrayIdx == out) = sequenceOcurrence(sequencesArrayIdx == out) + 1;
end
chunks = length(x) - n + 1;
ps = sequenceOcurrence/chunks;
hh = sum(ps(logical(ps)).*log2(ps(logical(ps))));
H(n,op) = hh/log2(factorial(n));
end
end
subplot(1,3,3)
plot(ns,H(ns,:),'--*','linewidth',2)
ylabel('PE')
xlabel('Sequence length')
filename = ['all_' num2str(max) '_' num2str(samples) ];
export_fig(filename)
Aucun commentaire:
Enregistrer un commentaire