I want to emulate the functionality of random.sample() in python, but with a non-uniform (triangular, in this case) distribution of choices. Important for this is that a single item is not chosen twice (as described in the random.sample docs). Here's what I have:
...
def tri_sample(population, k, mode=0):
"""
Mimics the functionality of random.sample() but with a triangular
distribution over the length of the sequence.
Mode defaults to 0, which favors lower indices.
"""
psize = len(population)
if k > psize:
raise ValueError("k must be less than the number of items in population.")
if mode > psize:
raise ValueError("mode must be less than the number of items in population.")
indices_chosen = []
sample = []
for i in range(k):
# This ensures unique selections
while True:
choice = math.floor(random.triangular(0, psize, mode))
if choice not in indices_chosen:
break
indices_chosen.append(choice)
sample.append(population[choice])
return sample
...
My suspicion is that this is not an ideal way of preventing duplicate items being pulled. My first thought when designing this was to make a duplicate of population
and .pop()
the items as they're sampled to prevent choosing the same item twice, but I saw two problems with that:
- If
population
is a list of objects, there could be some difficulty with duplicating the list while still ensuring that the items insample
point to the same objects inpopulation
. - Using
.pop()
on the population would change the size of the population, altering the distribution each time. Ideally, the distribution (not sure if I'm using the term correctly--the probability of each item being called) would be the same no matter what order the items are chosen in.
Is there a more efficient way of taking a non-uniform random sample from a population?
Aucun commentaire:
Enregistrer un commentaire