lundi 1 janvier 2018

how to bound the probability that quicksort takes greater than n lg n time?

I am working on exercise 12.4-5 of CLRS (Cormen et al, Intro to Algorithms 3rd ed)

Consider RANDOMIZED-QUICKSORT operating on a sequence of n distinct input numbers. Prove that for any constant k > 0, all but O(1/nᵏ) of the n! input permutations yield an O(n lg n) running time.

By RANDOMIZED-QUICKSORT, they simply mean quicksort with the pivot randomly selected from interval being partitioned, as opposed to always being chosen as, say the last element or something.

A previous thread on this same question had a single answer, which consisted entirely of a dead link plus a working link that leads to a set of (in my opinion) poorly-written notes for an undergraduate class, that does not answer the question, only having a very brief discussion of a particular case of quicksort with n ≈ 20.

I'm not even sure how to rigorously pose the question; my best shot is this:

Prove that for any k > 0, there exists positive constants a, b, and n₀ such that for all n > n₀, the probability that RANDOMIZED-QUICKSORT makes more than a n lg n comparisons is less than b / nᵏ:

A similar problem appears in Knuth's The Art of Computer Programming vol 3, Sorting and Searching, as problem 42 on page 137 (with some minor notational edits for clarity):

For any real number c > 0, prove that the probability is less than exp(-c) that quicksort will make more than (c + 1)(n + 1)H comparisons when sorting random data, where H = 1 + 1/2 + 1/3 ... 1/n. (This upper bound is especially interesting when c is, say, , where h is small and positive).

The solution to this problem is given at the end of the book, and consists only of the following:

This is a special case of a general theorem due to R. M. Karp, see JACM 41 (1994), 1136-1150, section 2.8. Significantly sharper asymptotic bounds for tails of the quicksort distributions have been obtained by McDiarmid and Hayward, J. Algorithms 21 (1996), 476-507.

I hunted down the Karp paper, and it indeed proves the result from the Knuth problem.

While CLRS seemingly refers to Knuth for a great deal of the former's material, it seems to be that the result that CLRS is asking us to prove is actually stronger than the result in the Knuth problem. In particular if we take the parameter c in Knuth's result to be any c(n) = ω(1), e.g. even something extremely slowly growing like c(n) = ln ln ln n, then the Karp result says that the probability is less than exp[-c(n)] that quicksort will make more than [c(n) + 1](n + 1)H comparisons. Since H = Θ(lg n), this number is Θ[c(n) n lg n], and since we have defined c(n) = ω(1), this number is ω[ n lg n ], with maximum generality. However, with the choice c(n) = lg lg lg n, the upper bound of the probability is exp[-c(n)] = 1 / ln ln n, which is a much weaker upper bound than what CLRS is asking for: less than O(1/nᵏ) for any constant k > 0.

My questions are the following: is my interpretation of the CLRS question (the quoted block above, immediately after I say "my best shot is this") correct? If not, what, specifically, is CLRS actually asking us to prove? Also, what is the relationship between the CLRS question and the Knuth question? Is the Knuth result weaker than the CLRS result? If so, does the CLRS result draw on more recent published research than the 1994 Karp paper cited in Knuth's "answer"?




Aucun commentaire:

Enregistrer un commentaire