mercredi 29 août 2018

Improving scaling of sample from discrete distribution

I have recently started playing around with Julia and I am currently working on a Monte Carlo simulation of some stochastic process on a 2-dimensional lattice. Each site has some associated rate of activation (the number of times it "does something" per second on average) which we can assume to be approximately constant. The order in which lattice sites activate is relevant so we need a method for randomly picking one particular site with probability proportional to its rate of activation.

It looks like sample(sites,weights(rates)) from the StatsBase package is exactly what I am looking for BUT from testing out my code structure (no logic, just loops and RNG), it turns out that sample() scales linearly with the number of sites. This means that overall my runtimes scale like N^(2+2), where N in the side length of my 2-dimensional lattice (one factor of 2 from the increase in total rate of activity, the other from the scaling of sample()).

Now, the increase in total rate of activity is unavoidable but I think the scaling of the "random pick with weights" method can be improved. More specifically, one should be able to achieve a logarithmic scaling with the number of sites (rather than linear). Consider for example the following function (and, please, forgive the poor coding)

function randompick(indices,rates)
cumrates = [sum(rates[1:i]) for i in indices]
pick = rand()*cumrates[end]
tick = 0
lowb = 0
highb = indis[end]

while tick == 0
    mid = floor(Int,(highb+lowb)/2)
    midrate = cumrates[mid]

    if pick > midrate
        lowb = mid
    else
        highb = mid
    end

    if highb-lowb == 1
        tick = 1
    end
end
return(highb)
end

Because we half the number of "pickable" sites at each step, it would take n steps to pick one specific site out of 2^n (hence the logarithmic scaling). However, in its current state randompick() is so much slower than sample() that scaling is practically irrelevant. Is there any way of reducing this method to a form that can compete with sample() and hence take advantage of the improved scaling?




Aucun commentaire:

Enregistrer un commentaire