lundi 19 septembre 2022

Difference between random() and randint() when need to generate weighted sampling

I need to generate n indices, from 0 to n-1 randomly (the goal is to later use the generated index to implement weighted sampling).

First, I used randint(0, n-1). However it did not pass the test for the problem https://leetcode.com/problems/random-pick-with-weight/

The solution passes when the index is generated using random.random() since the latter outputs the float, which you then look for in the array:

def pickIndex(self) -> int:
    
    random_index_not_working = randint(0, self.prefixed_weights[-1])
    random_index = self.prefixed_weights[-1] * random.random()
    
    print(random_index_not_working, random_index)

Output:

23 6.6781083742021154

0 11.119979387153617

14 22.958546966743995

So for the true "weighted" sampling to work, I apparently need to use float.

The rest of the solution:

from random import randint, random 
from itertools import accumulate

def __init__(self, w: List[int]):
    self.prefixed_weights = []
    for ww in accumulate(w):
        self.prefixed_weights.append(ww)
            
def find_index(self, random_index):
    # 0..8 [1, 4, 9] 3
    
    left = 0
    right = len(self.prefixed_weights) - 1
    
    while left <= right:
        mid = (left + right) // 2
        
        if self.prefixed_weights[mid] > random_index:
            # search in the left part
            if mid - 1 >= 0 and self.prefixed_weights[mid-1] > random_index:
                right = mid - 1
            else:
                return mid
        elif self.prefixed_weights[mid] < random_index:
            # search in the right part
            if mid + 1 < len(self.prefixed_weights) and self.prefixed_weights[mid+1] < random_index:
                left = mid + 1
            else:
                return mid+1
        else:
            return mid
    
           

Why generating integer indices using randint() between [0..n-1] does not yield the same probability distribution as compared to generating a float between (0,1) using random()and multiplying it by n?




Aucun commentaire:

Enregistrer un commentaire