I'm trying to understand solution for the following task: Randomly generate a set of M elements from an array of size N. Each element must have equal probability of being chosen.
I found the following solution (I've already read this question, and this one, but I still have questions that are too long for comments):
int rand(int min, int max) {
return min + (int)(Math.random() * (max - min + 1));
}
int[] generateSet(int[] arr, int m, int n) {
if (n + 1 == m) { //base case
int[] set = new int[m];
for (int k = 0; k < m; k++) {
set[k] = arr[k];
}
return set;
}
int[] set = generateSet(arr, m, n - 1);
int r = rand(0, n);
if (r < m) set[r] = arr[n];
return set;
}
// rand() function returns inclusive value
// i.e. rand(0, 5) will return from 0 to 5
This code was found in book "Cracking the coding interview" (Section Hard, Task 3). Author explains it as follows:
Suppose we have an algorithm that can pull a random set of
m
elements from an array of sizen - 1
. How can we use this algorithm to pull a random set ofm
elements from an array of sizen
? We can first pull a random set of size m from the firstn - 1
elements. Then, we just need to decide ifarray[n]
should be inserted into our subset (which would require pulling out a random element from it). An easy way to do this is to pick a random number k from 0 through n. Ifk < m
, then insertarray[n]
intosubset[k]
. This will both "fairly" (i.e., with proportional probability) insertarray[n]
into the subset and "fairly" remove a random element from the subset. This is even cleaner to write iteratively. In this approach, we initialize an array subset to be the firstm
elements in original. Then, we iterate through the array, starting at elementm
, insertingarray[i]
into the subset at (random) positionk
wheneverk < m
.
I fully understand the base case.It says that: if we have an array of size N
and M == N
, therefore, we should return first M
elements from the array, because each of them will be selected with equal probability.
Than comes the hard part (recursive case) that I do not understand at all.
- Code generates set of size
M
from array of sizeN - 1
- Now code should decide add or not "new" element
arr[N]
to the set -
With the probability
M / N
code decides add or not "new" element. Random works as follow:- Generates random number
r
between0
andN
- If
(r < m)
it means thatr
was generated withM / N
probability - Also if
(r < m)
it means that with probability1 / M
we will change one of M elements in the set.
- Generates random number
I have no idea where M / N
probability comes from. I think code should be implemented as follow instead:
- We need to decide take
Nth
element or not. We haveN
elements so we should decide take or not with1 / N
probability i.e.r = random(0, n)
- If
r == n
then we should take this element - Next with probability
1/M
we should find what element in the set we will replace, so it will beset[random(0, m)] = arr[n]
Therefore, the whole recursive part will be:
int[] set = generateSet(arr, m, n - 1);
int r = rand(0, n);
if (r == n) {
int indexToReplace = random(0, m);
set[indexToReplace] = arr[n];
}
return set;
Could you help me understand:
- Why we should decide with
M / N
probability replace element in the set or not? - Is my code incorrect?
Aucun commentaire:
Enregistrer un commentaire