So I have been trying this problem, the randomized binary search (RBS). We are given a sorted array A[] of n elements. We need to find if some random number x is present in A or not. In binary search we always used middle element, here we will randomly pick one element in given range, uniformly.
It is clear to me that the expected number of queries depends on the length of the array, T(n), and I have figured out a recurring relation for it, but this recurring relation does not seem to give a clear logarithmic growth for T(n): $$T(n) = 1 + \frac{2}{n^2} \sum_{j=1}^n j T(j) $$
My main issue is that in the internet there seem to be a lot of resources that present a proof, but I don't seem to understand the logic and when I run the numbers I obtain much worse complexity for the RBS. The proof goes as follows:
After we choose one random pivot, array size reduces to some size say k. In this way, T(n) is one plus sum of time of all possible sizes after choosing pivot multiplied by probability of choosing that pivot. This gives the formula $$T(n) = 1 + \sum_{k=1}^{j-1}T(j) / n $$
This is the formula that I can't seem to understand, as in my mind it's much more probable that you get larger intervals (As the key itself is also uniformly distributed at random). Specifically, if the key is in the middle of the range, we cannot ever reduce the size of the interval to one. Maybe this distinction between the key number x being random or not changes the complexity, but that makes no sense conceptually.
This proof is presented in the following "independent" sources:
- https://www.geeksforgeeks.org/randomized-binary-search-algorithm/
- https://tutorialspoint.dev/algorithm/randomized-algorithms/randomized-binary-search-algorithm
- Running Time of Randomized Binary Search
Aucun commentaire:
Enregistrer un commentaire