samedi 31 août 2019

Evenly-spaced samples from a stream of unknown length

I want to sample K items from a stream of N items that I see one at a time. I don't know how big N is until the last item turns up, and I want the space consumption to depend on K rather than N.

So far I've described a reservoir sampling problem. The last ask though is that I'd like the samples to be 'evenly spaced', or at least more evenly spaced than reservoir sampling manages. This is vague; one formalization would be that the sample indices are a low-discrepancy sequence, but I'm not particularly tied to that.

My intuition is that this is a feasible problem, and the algorithm I imagine preferentially drops samples from the 'highest density' part of the reservoir in order to make space for samples from the incoming stream. It also seems like a common enough problem that someone should have written a paper on it, but Googling combinations of 'evenly spaced', 'reservoir', 'quasirandom', 'sampling' haven't gotten me anywhere.




Aucun commentaire:

Enregistrer un commentaire