lundi 3 octobre 2022

The fastest way to generate a random permutation

I need to permute N numbers between 0 and N-1 in the fastest way (on a CPU, without multi-threading, but maybe with SIMD). N is not large, I think in most cases, N<=12, so N! fits a signed 32-bit integer.

What I have tried so far is roughly the following (some optimizations are omitted, and my original code is in Java, but we speak performance in C++ if not pseudo-code):

static inline uint64_t rotl(const uint64_t x, int k) {
    return (x << k) | (x >> (64 - k));
}


static uint64_t s[2];

uint64_t Next(void) {
    const uint64_t s0 = s[0];
    uint64_t s1 = s[1];
    const uint64_t result = rotl(s0 + s1, 17) + s0;

    s1 ^= s0;
    s[0] = rotl(s0, 49) ^ s1 ^ (s1 << 21); // a, b
    s[1] = rotl(s1, 28); // c

    return result;
}

// Assume the array |dest| must have enough space for N items
void GenPerm(int* dest, const int N) {
    for(int i=0; i<N; i++) {
        dest[i] = i;
    }
    for(int i=0; i+1<N; i++) {
        const int pos = Next() % (N-i) + i;
        const int t = dest[pos];
        dest[pos] = dest[i];
        dest[i] = t;
    }
}

The above is slow because the CPU's modulo operation (%) is slow. I could think of generating one random number between 0 and N!-1 (inclusive); this will reduce the number of modulo operations and Next() calls, but I don't know how to proceed then. Another approach could be to replace the division operation with multiplication by the inverse integer number at the cost of small bias in the modulos generated, but I don't these inverse integers and multiplication will probably not be much faster (bitwise operations & shifts should be faster).

Any more concrete ideas?




Aucun commentaire:

Enregistrer un commentaire