vendredi 21 juillet 2017

Mimicking random.sample() for non-uniform distributions

I want to emulate the functionality of random.sample() in python, but with a non-uniform (triangular, in this case) distribution of choices. Important for this is that a single item is not chosen twice (as described in the random.sample docs). Here's what I have:

...

def tri_sample(population, k, mode=0):
    """
    Mimics the functionality of random.sample() but with a triangular
    distribution over the length of the sequence.

    Mode defaults to 0, which favors lower indices.
    """
    psize = len(population)
    if k > psize:
        raise ValueError("k must be less than the number of items in population.")
    if mode > psize:
        raise ValueError("mode must be less than the number of items in population.")
    indices_chosen = []
    sample = []
    for i in range(k):
        # This ensures unique selections
        while True:
            choice = math.floor(random.triangular(0, psize, mode))
            if choice not in indices_chosen:
                break
        indices_chosen.append(choice)
        sample.append(population[choice])
    return sample

...

My suspicion is that this is not an ideal way of preventing duplicate items being pulled. My first thought when designing this was to make a duplicate of population and .pop() the items as they're sampled to prevent choosing the same item twice, but I saw two problems with that:

  1. If population is a list of objects, there could be some difficulty with duplicating the list while still ensuring that the items in sample point to the same objects in population.
  2. Using .pop() on the population would change the size of the population, altering the distribution each time. Ideally, the distribution (not sure if I'm using the term correctly--the probability of each item being called) would be the same no matter what order the items are chosen in.

Is there a more efficient way of taking a non-uniform random sample from a population?




Aucun commentaire:

Enregistrer un commentaire