vendredi 21 décembre 2018

How to store random bytes in a file using character encoding?

I'm trying to run someone else's Python 2 program on Python 3 (with Windows 7). Its purpose is to generate large factorials then use them as a stream of random numbers. The program converts a decimal factorial to byte values from 0 to 255 and writes chr(byte value) to a file. It computes each byte by moving through the factorial in sections of 8 decimals. However, encoding changed from Python 2 to 3 (I'm not certain as to exactly what or why it matters), and the chr() command won't work for any values from 128 to 159 (but values 160 to 255 work)- the program raises "UnicodeEncodeError: 'charmap' codec can't encode character '(the character point)' in position 0: character maps to <undefined>"

I have tried changing the file encoding with "open(filename, "w", encoding="utf-8")", and this successfully writes all the bytes. However, when I test the file's randomness properties they are significantly worse than results the author got.

What should I change to store the character bytes without affecting the randomness of the data?

The test program is called "ent." From the command prompt, it takes a file as an argument and then outputs a few randomness statistics. For more information, visit http://www.fourmilab.ch/random/ , its website.

  • My ent results for file from !500,000, using open(filename, "w", encoding="utf-8"):

    Entropy = 6.251272 bits per byte.
    
    Optimum compression would reduce the size of this 471812 byte file by 21 percent.
    
    Chi square distribution for 471812 samples is 6545600.65, and randomly
    would exceed this value less than 0.01 percent of the times.
    
    Arithmetic mean value of data bytes is 138.9331 (127.5 = random).
    Monte Carlo value for Pi is 3.173294335 (error 1.01 percent).
    Serial correlation coefficient is 0.162915 (totally uncorrelated = 0.0).
    
    
  • The authors' ent results for a file from !500,000:

    Entropy = 7.999373 bits per byte.
    
    Optimum compression would reduce the size of this 313417 byte file by 0 percent.
    
    Chi square distribution for 31347 samples is 272.63, and randomly would
    exceed this value 25.00 percent of the times.
    
    Arithmetic mean value of data bytes is 127.6336 (127.5 = random).
    Monte Carlo value for Pi is 3.149475458 (error 0.25 percent).
    Serial correlation coefficient is -0.001209 (totally uncorrelated = 0.0).
    
    



Aucun commentaire:

Enregistrer un commentaire