lundi 21 août 2017

How to understand the analysis of expected running time of randomized quick-sort in this paper? [migrated]

I'm learning the book named Data Structures & Algorithms in Python.

On Page 557-558, there is a proof of the expected running time of randomized qucick-sort. I have some problems confusing me some days. I will note them in bold form.

Here's orginal text:

Proposition 12.3: The expected running time of randomized quick-sort on a sequence S of size n is O(n logn).

Justification: We assume two elements of S can be compared in O(1) time. Consider a single recursive call of randomized quick-sort, and let n denote the size of the input for this call. Say that this call is “good” if the pivot chosen is such that subsequences L and G have size at least n/4 and at most 3n/4 each; otherwise, a call is “bad.”

Now, consider the implications of our choosing a pivot uniformly at random. Note that there are n/2 possible good choices for the pivot for any given call of size n of the randomized quick-sort algorithm. Thus, the probability that any call is good is 1/2. Note further that a good call will at least partition a list of size n into two lists of size 3n/4 and n/4, and a bad call could be as bad as producing a single call of size n−1.

Now consider a recursion trace for randomized quick-sort. This trace defines a binary tree, T, such that each node in T corresponds to a different recursive call on a subproblem of sorting a portion of the original list.

Say that a node v in T is in size group i if the size of v’s subproblem is greater than <code>\[(3/4)^(i+1)\]* n</code> and at most <code>\[(3/4)^i\]* n</code>. Let us analyze the expected time spent working on all the subproblems for nodes in size group i. By the linearity of expectation (Proposition A.19), the expected time for working on all these subproblems is the sum of the expected times for each one. Some of these nodes correspond to good calls and some correspond to bad calls. But note that, since a good call occurs with probability 1/2, the expected number of consecutive calls we have to make before getting a good call is 2. <1>Moreover, notice that as soon as we have a good call for a node in size group i, its children will be in size groups higher than i. <2>Thus, for any element x from the input list, the expected number of nodes in size group i containing x in their subproblems is 2. In other words, the expected total size of all the subproblems in size group i is 2n. Since the nonrecursive work we perform for any subproblem is proportional to its size, this implies that the total expected time spent processing subproblems for nodes in size group i is O(n).

The number of size groups is log 4/3 n, since repeatedly multiplying by 3/4 is the same as repeatedly dividing by 4/3. That is, the number of size groups is O(logn). Therefore, the total expected running time of randomized quick-sort is O(nlogn). (See Figure 11.13.)

enter image description here

Here are my problems:

  1. Accroding this sentence:

    <1>Moreover, notice that as soon as we have a good call for a node in size group i, its children will be in size groups higher than i.

Why the node S(s) in Picture of figure 11.13 is in size group 0?

  1. <2>Thus, for any element x from the input list, the expected number of nodes in size group i containing x in their subproblems is 2. In other words, the expected total size of all the subproblems in size group i is 2n. I can't really understand the meaning of this sentence.




Aucun commentaire:

Enregistrer un commentaire