jeudi 21 juin 2018

sampling from discrete distribution without replacement where the probabilities change each draw

having a sequence S = (s1,s2,...sk) with probability weights for each sequence site P = (p1,p2,...pk) where the sum of P = 1 maximum length of S may be around 10^9 By Simulation a site k is picked and modified after each draw , as reason the pk also changes each run through.

Question 1: How would you suggest to draw site? Actually I implemented this logic which seems to be ok itself as going along literature see e.g. here:

    counter = 0
    random_number = draw_random()  #<= float in range 0,1
    while P_sum <  random_number
        P_sum += P[counter]
        counter++
    return counter

By testing the simulation I observed a strong bias which seems to rebuilt random generators distribution ( see_here ) Three different generators generate 3 different results... which is fairly ok but none of them is correct at all states Walkers and Knuth's methods with lookup table seem to be too time expensive for me as the lookup tables have to be recalculated each time.

Question 2 How can I reduce bias from randomness? Actual built in 3 different generators (only one used per simulation) which are uniform distributed in kindness to their chances. Knowing this is a heavy question when not knowing a line of simulation code

Question 3 Library for the thing ? As it's not to much code I don't have problem to write on my own, but is there a another library for it which may not BOOST? Asking as this question may be outdated... Not Boost as I don't want to built in a fourth random generator and use that large thing

Question 4 Faster alternative?

I know that this topic was answered may thousands of time before - but none of the answers satisfies me enough nor gave me a wise alternative e.g. here seems to have the same problem but I don't understand which heap is where built and why in addition it seems very complicated for such a "easy" thing

Thank you for your support!




Aucun commentaire:

Enregistrer un commentaire