vendredi 27 novembre 2020

Weighted random sample of array items *without replacement*

I want to generate a random sample from an array of objects using an array of weighted values for each object. The weighted list may, or may not, be integer values. It could be floats like [0.2, 0.1, 0.7, 0.3], or it could be integers like [20, 10, 70, 30]. The weights do not have to add up to a value that represents 100%.

Peter below gave me a good reference on the general algorithm that I'll have to take a closer look at. It's not specific to JS, but this was what he linked: https://stackoverflow.com/a/62459274/7915759

I didn't see a ready-made accumulate() that I could use, so put this together:

function accumulate(numbers) {
    let accum = [];
    let total = 0;
    for (let n of numbers) {
        total += n;
        accum.push(total);
    }
    return accum;
}

To choose a single object from the population:

function wchoice(pop, wts) {
    let acm = accumulate(wts);
    let tot = acm[acm.length - 1]; // sum of all weights.
    let rnd = Math.random() * tot;

    // a *native* bisect_left() would be more efficient here:
    let idx = acm.findIndex((elm) => rnd <= elm); 

    return [idx, pop[idx]];
}

(Error handling logic is removed from these examples to keep them simple.)

Array.findIndex() is alright for the purposes I need this for, but it would be more efficient if there were a native bisect_left() function to locate the index of the chosen member.

Generating a sample with replacement is just a matter of calling wchoice() k number of times. But, to choose k members from the population without replacement is more involved:

function wsample(population, weights, k) {
    weights     = [...weights];
    let sample  = [];
    let indices = [];
    let index   = 0;
    let choice  = null;

    for (let i=0; i < k; i++) {
        [index, choice] = wchoice(population, weights);
        weights[index]  = 0; // Eliminates the item from subsequent picks.
        sample.push(choice);
        indices.push(index);
    }
    return [indices, sample];
}

k must be <= the number of non-zero items in weights.

So that's what I have so far. I have somewhat of a solution, but I'm not certain this is the best way. Any tips are appreciated.




Aucun commentaire:

Enregistrer un commentaire