dimanche 26 avril 2015

Generate 0 and 1, pseudorandom, 16 char long string

I need to generate 16 characters long string (from SHA1 hash), which contains only 0 and 1, with probability 50% (statistically in most cases same amount of 1 in string as amount of 0's).

So i wrote benchmark, and i tried converting each $hash character to binary. Results are bad, i mean, if im adding leading zeros to binary converted hash, correct probability is far from correct. When im not adding leading zeros to binary conversion, probability is close to correct:

Percentage all 0 or all 1: 0.0012%
Percentage all 0 or all 1 except 1 character : 0.0146%
Percentage all 0 or all 1 except 2 characters: 0.0812%

But its still far from true correct probability that code below should produce which is:

Percentage all 0 or all 1: 0.003%
Percentage all 0 or all 1 except 1 character : 0.048%
Percentage all 0 or all 1 except 2 characters: 0.376%

How do i know its correct probability? I changed binary conversion to simple mt_rand(0,1) sixteen times (and other confirmation tests).

It must be generated from sha1 hash, to be deterministic by that hash. Anyone have idea, how to fix my code to produce correct probability results? I tried already for 10 hours straight.

    function binary($text){
            $list = '';
            $temp = '';
            $i = 0;
            while ($i < 16){
                    if (is_numeric($text[$i])){
                            $list .= decbin( $text[$i] );//sprintf( "%08d", decbin( $text[$i] ));
                    } else {
                            $temp = ord($text[$i]);
                            $list .= decbin( $temp );
    //                      $list .= sprintf( "%08d", decbin( $temp ));// substr("00000000",0,8 - strlen($temp)) . $temp;
                    }
            $i++;
            }
            return $list;
    }

    $y = 0;
    $trafien = 0;
    $trafien1= 0;
    $trafien2= 0;
    $max = 500000;
    while ($y < $max){

    $time = uniqid()  . mt_rand(1,999999999999);
    $seed = 'eqm2890rmn9ou8nr9q2';
    $hash = sha1($time . $seed);

    $last4 = substr($hash, 0, 40);
    $binary =  binary($last4);
    $final = substr($binary, 0,16);

    $ile = substr_count($final, '0');
    $ile2= substr_count($final, '1');
    if ($ile == 16 || $ile2 == 16){
        echo "\n".$last4 ." " . 'binary: '. $binary .' final: '. $final;
        $trafien += 1;
    }

    if ($ile == 15 || $ile2 == 15){
        $trafien1 += 1;
    }

    if ($ile == 14 || $ile2 == 14){
        $trafien2 += 1;
    }

$y++;
}

$procent = ($trafien * 100)  / $max;
$procent1= ($trafien1 * 100) / $max;
$procent2= ($trafien2 * 100) / $max;
echo "\nPercentage all 0 or all 1: ". $procent . "%";
echo "\nPercentage all 0 or all 1 except 1 character : ". $procent1 . "%";
echo "\nPercentage all 0 or all 1 except 2 characters: ". $procent2 . "%";




Aucun commentaire:

Enregistrer un commentaire