mercredi 26 août 2020

Frugal conversion of uniformly distributed random numbers from one range to another

Is there a way to convert uniformly distributed random numbers of one range to uniformly distributed random numbers of another range frugally?

Let me explain what I mean by "frugally".

The typical approach to generate random number within given range (e.g. $r \in [0..10)$ ) is to take some fixed random bits, let's say 31, which result non-negative random number less than 2147483648. Then make sure that the value is less than 2147483640 (because 2147483648 is not divisible by 10, and hence may lead to uneven distribution). If the value is greater or equal to 2147483640, throw it away and try again (get next 31 random bits and so on). If value if less than 2147483640, then just return the remainder of division by 10. This approach consumes at least 31 bit per decimal digit. Since theoretical limit is $\log_2{10} = 3.321928...$, it is quite wasteful.

We can improve this, if we use 4 bits instead if 31. In this case we will consume $4 \times 1.6 = 6.4$ bits per decimal digit. This is more frugal, but still far from the ideal.

    public int nextDit() {
        int result;
        do {
            result = next4Bits();
        } while (result >= 10);
        return result;
    }

We can try to generate 3 decimal digits at once. Since 1024 is quite close to 1000, the probability that raw source random number will be rejected is less than in previous case. Once we generated 3 decimal digits, we return 1 digit and reserve the rest 2 digits.

Something like below

    private int _decDigits = 0;
    private int _decCount = 0;

    public int nextDit() {
        if (_decCount > 0) {
            // take numbers from the reserve
            int result = _decDigits % 10;
            _decDigits /= 10;
            _decCount -= 1;
            return result;
        } else {
            int result;
            do {
                result = next10Bits();
            } while (result >= 1000);
            // reserve 2 decimal digits
            _decCount = 2;
            _decDigits = result % 100;
            result /= 100;
            return result;
        }
    }

This approach is much more frugal: it consumes $10 \times 1.024 \div 3 = 3.41(3)$ bits per decimal digit.

We can even go farther if we try to reuse the numbers, which we previously have been throwing away. The random number $r \in [0, 1024)$ falls into one of the 3 ranges: $[0, 1000)$, $[1000, 1020)$, $[1020, 1024)$.

If it falls into $[0, 1000)$, we do as we did before, reserve 2 decimal digits (in decimal digit reserve) and return 1 decimal digit.

If it falls into $[1000, 1020)$, we subtract 1000 converting to the range $[0, 20)$. Then we get 1 bit by dividing it by 10 and 1 decimal digit by getting remainder of division by 10. We put the bit to the binary digit reserve and return the decimal digit.

If it falls into $[1020, 1024)$, we subtract 1020 converting to the range $[0, 4)$. Here we get just 2 bits, which we put to the binary digits reserve.

    // decimal digit reserve
    private int _decDigits = 0;
    private int _decCount = 0;
    // binary digit reserve
    private int _binDigits = 0;
    private int _binCount = 0;

    private int nextBits(int bits, int n) {
        for (int i = 0; i < n; i += 1) {
            bits = (bits << 1) + _bitRandomDevice.nextBit();
        }
        return bits;
    }

    private int next10Bits() {
        // we bits from the binary reserve first, then from _bitRandomDevice
        int result;
        if (_binCount >= 10) {
            result = _binDigits >> (_binCount - 10);
            _binDigits = _binDigits & (1 << (_binCount - 10) - 1);
            _binCount -= 10;
        } else {
            result = nextBits(_binDigits, 10 - _binCount);
            _binCount = 0;
            _binDigits = 0;
        }
        return result;
    }

    public int nextDit() {
        if (_decCount > 0) {
            // take numbers from the decimal reserve
            int result = _decDigits % 10;
            _decDigits /= 10;
            _decCount -= 1;
            return result;
        } else {
            int result;
            while (true) {
                result = next10Bits();
                if (result < 1000) {
                    assert result >= 0 && result < 1000;
                    // reserve 2 decimal digits
                    _decCount = 2;
                    _decDigits = result % 100;
                    result /= 100;
                    // return 1 decimal digit
                    return result;
                } else if (result < 1020) {
                    result -= 1000;
                    assert result >= 0 && result < 20;
                    // reserve 1 binary digit
                    _binCount += 1;
                    _binDigits = (_binDigits << 1) + (result / 10);
                    // return 1 decimal digit
                    return result % 10;
                } else {
                    result -= 1020;
                    assert result >= 0 && result < 4;
                    // reserve 2 binary digits
                    _binCount += 2;
                    _binDigits = (_binDigits << 2) + result;
                }
            }
        }
    }

This approach consumes about 3.38... bits per decimal digit. This is the most frugal approach I can find, but it still wastes/loses some information from the source of randomness.

Thus, my question is: Is there any universal approach/algorithm that converts uniformly distributed random numbers of one arbitrary range $[0, s)$ (later called source numbers) to uniformly distributed random numbers of another arbitrary range $[0, t)$ (later called target numbers), consuming only $log_s{t} + C$ source numbers per target number? where C is some constant. If there is no such approach, why? What prevents from reaching the ideal limit?




Aucun commentaire:

Enregistrer un commentaire