I want to sample K
items from a stream of N
items that I see one at a time. I don't know how big N
is until the last item turns up, and I want the space consumption to depend on K
rather than N
.
So far I've described a reservoir sampling problem. The last ask though is that I'd like the samples to be 'evenly spaced', or at least more evenly spaced than reservoir sampling manages. This is vague; one formalization would be that the sample indices are a low-discrepancy sequence, but I'm not particularly tied to that.
My intuition is that this is a feasible problem, and the algorithm I imagine preferentially drops samples from the 'highest density' part of the reservoir in order to make space for samples from the incoming stream. It also seems like a common enough problem that someone should have written a paper on it, but Googling combinations of 'evenly spaced', 'reservoir', 'quasirandom', 'sampling' haven't gotten me anywhere.
Aucun commentaire:
Enregistrer un commentaire