Suppose I have an array of 100_000 records ( this is Ruby code, but any language will do)
ary = ['apple','orange','dog','tomato', 12, 17,'cat','tiger' .... ]
results = []
I can only make random calls to the array ( I cannot traverse it in any way)
results << ary.sample
# in ruby this will pull a random record from the array, and
# push into results array
How many random calls like that, do I need to make, to get least 80% of records from ary
. Or expressed another way - what should be the size of results
so that results.uniq
will contain around 80_000 records from ary
.
From my rusty memory of Stats class in college, I think it's needs to be 2*result set size = or around 160_000 requests ( assuming random function is random, and there is no some other underling issue) . My testing seems to confirm this.
ary = [*1..100_000];
result = [];
160_000.times{result << ary.sample};
result.uniq.size # ~ 80k
This is stats, so we are talking about probabilities, not guaranteed results. I just need a reasonable guess.
So the question really, what's the formula to confirm this?
Aucun commentaire:
Enregistrer un commentaire