I have a bunch of Thing
s.
A Thing
is a struct with a field, source
, typed as a string.
Currently I get a deterministic sampled selection of Things
by simply hashing the Thing.
def is_thing_sampled(t: Thing):
hashed_thing = my_deterministic_hash(t);
return hashed_thing % 100 < sample_size_pct;
Now I want to extend this function so that it additionally samples Thing of a specific source. If the source is "foo"
, I want to do another level of sampling on it.
def is_thing_sampled(t: Thing):
hashed_thing = my_deterministic_hash(t)
base = hashed_thing % 100 < sample_size_pct;
if base and t.source == "foo":
# try to sample again. How do I do this??
double_hash = my_deterministic_hash(hashed_thing)
return double_hash % 100 < foo_sample_size_pct
return base
Can someone help me understand what's the right approach? I'd love some pointers - I'm a total noob at statistics.
Aucun commentaire:
Enregistrer un commentaire