I'm trying to understand solution for the following task: Randomly generate a set of M elements from an array of size N. Each element must have equal probability of being chosen.
I found the following solution (I've already read this question, and this one, but I still have questions that are too long for comments):
int rand(int min, int max) {
return min + (int)(Math.random() * (max - min + 1));
}
int[] generateSet(int[] arr, int m, int n) {
if (n + 1 == m) { //base case
int[] set = new int[m];
for (int k = 0; k < m; k++) {
set[k] = arr[k];
}
return set;
}
int[] set = generateSet(arr, m, n - 1);
int r = rand(0, n);
if (r < m) set[r] = arr[n];
return set;
}
// rand() function returns inclusive value
// i.e. rand(0, 5) will return from 0 to 5
This code was found in book "Cracking the coding interview" (Section Hard, Task 3). Author explains it as follows:
Suppose we have an algorithm that can pull a random set of
melements from an array of sizen - 1. How can we use this algorithm to pull a random set ofmelements from an array of sizen? We can first pull a random set of size m from the firstn - 1elements. Then, we just need to decide ifarray[n]should be inserted into our subset (which would require pulling out a random element from it). An easy way to do this is to pick a random number k from 0 through n. Ifk < m, then insertarray[n]intosubset[k]. This will both "fairly" (i.e., with proportional probability) insertarray[n]into the subset and "fairly" remove a random element from the subset. This is even cleaner to write iteratively. In this approach, we initialize an array subset to be the firstmelements in original. Then, we iterate through the array, starting at elementm, insertingarray[i]into the subset at (random) positionkwheneverk < m.
I fully understand the base case.It says that: if we have an array of size N and M == N, therefore, we should return first M elements from the array, because each of them will be selected with equal probability.
Than comes the hard part (recursive case) that I do not understand at all.
- Code generates set of size
Mfrom array of sizeN - 1 - Now code should decide add or not "new" element
arr[N]to the set -
With the probability
M / Ncode decides add or not "new" element. Random works as follow:- Generates random number
rbetween0andN - If
(r < m)it means thatrwas generated withM / Nprobability - Also if
(r < m)it means that with probability1 / Mwe will change one of M elements in the set.
- Generates random number
I have no idea where M / N probability comes from. I think code should be implemented as follow instead:
- We need to decide take
Nthelement or not. We haveNelements so we should decide take or not with1 / Nprobability i.e.r = random(0, n) - If
r == nthen we should take this element - Next with probability
1/Mwe should find what element in the set we will replace, so it will beset[random(0, m)] = arr[n]
Therefore, the whole recursive part will be:
int[] set = generateSet(arr, m, n - 1);
int r = rand(0, n);
if (r == n) {
int indexToReplace = random(0, m);
set[indexToReplace] = arr[n];
}
return set;
Could you help me understand:
- Why we should decide with
M / Nprobability replace element in the set or not? - Is my code incorrect?
Aucun commentaire:
Enregistrer un commentaire