I just want to know that my code is reservoir sampling. I have a stream of pageviews that I just want to process. I'm processing one pageview at a time. However, since most of the pageviews are the same so I just want to randomly pick any pageview (one at a time to process). For example, I have a pageview of
[www.example.com, www.example.com, www.example1.com, www.example3.com, ...]
I'm processing one element at a time. Here's my code.
import random
def __init__(self):
self.counter = 0
def processable():
self.counter += 1
return random.random() < 1.0 / self.counter
Aucun commentaire:
Enregistrer un commentaire