lundi 15 juin 2020

Walkers Alias Method for weighted random selection isn't faster than a linear search (in my implementation)

I came across Walkers Method for selecting weighted random variables and found it genius. I tried to implement it and it works correctly (gives the valid statistical properties), but when running a test for speed, it performs no better (if not worse) than my original linear algorithm. Looking at the logic, I cant figure out why that would be. Perhaps i'm missing something?

Notes:

//Alias method. Selection Method
//See full code in link above for the setup, but in summary, the setup creates "share" and "pair" data members.
//If you know the walker method, you know it kind of creates "buckets".
//The picker generates a real number, takes the integer part of it and used that to select the "bucket", 
//Then the remaining decimal component is compared what is in "share", and the function either returns the selected bucket or it's "pair" bucket.


size_t wrs_pick(wrs_t* data)
{
    int MAX = 1000000;
    int dice = bounded_rand(MAX);

    double rng = (double)dice / (double)MAX;

    double pct = rng*data->N;;
    size_t idx = (int)pct;
    if (pct - idx > data->share[idx]) { idx = data->pair[idx]; }
    return idx;
}

I've excluded the setup of the Walker's Algorithm. because i've checked that and it correctly is creating those arrays. Also, when I run the test, I actually do the setup regardless of which selection algorithm I use, so I'm sure it's not the setup that is creating the lag.

//Original, linear method
//I ensure that my arrays are of the form {numElements, total, item1, item2...item n}

int weightedDice(int* dist)
{
    int pick;

    int dice = bounded_rand(dist[1]);
    //int dice = rng(dist[1]);

    int checkpoint = 0;

    for (pick = 2; pick <= (dist[0] + 1); pick++)
    {
        checkpoint += dist[pick];
        if (dice < checkpoint) break;
    }
    return pick - 2;
}

So, what's confusing to me is that im using the same RNG function for both, the Alias should be doing just the RNG call, checking a quick lockup and then returning a number. And the second one has to do that check across the arrays each time. I even tried to create arrays which were weighted more heavily towards the "end" to force that algorithm to run longer on average. And still, no luck. The following is my statistics collection code:

double numTrials = 10000000;
setupRNG(); //this sets up the xhoshiro. (code omitted)


int diceWeights_weightedDice[11] = { 9,90, 10,2,3,4,5,6,20,20,20 };
double diceWeights_alias[9] = { 10,2,3,4,5,6,20,20,20 };


clock_t tStart = clock();

//setup the Alias
wrs_t* dist = wrs_create(diceWeights_alias, 9); 

//Run the trials
    for (double t = 0; t < numTrials; t++)
    {

        int dice = weightedDice(diceWeights_weightedDice);

        //int dice = wrs_pick(dist);    

        stats_diceHits[dice]++;

    }

    printf("Time taken: %.2fs\n", (double)(clock() - tStart) / CLOCKS_PER_SEC);
    for (int d = 0; d < 9; d++)
    {
        printf("%i: %f\n", d, stats_diceHits[d]);
    }


    wrs_destroy(dist);

To switch methods, i just comment out one line and use the other (for dice assignment). But, as mentioned, my printout is reporting a similar time to execute both.




Aucun commentaire:

Enregistrer un commentaire