Are you back or are you new here? If you're back, why'd you even come back? If you're new, you are in for hell. People who are back, you know what's coming. People who are new, you don't know what's coming. But you will. You will. Last time, we discussed "Why Comparison-based Sorting Algorithms have $\Omega(n log n)$ Lower Bound" and this is a follow-up to that. Also, if you haven't read that, go read the previous article first, then come back here and read this. Alright, ready? Let's go.

Today, we will discuss average-case lower bounds for comparison-based sorting algorithms. Now, I don't expect your little brain to remember everything you were spoon-fed last time, so I'll give you a quick recap. As expected, our focus in this article, once again, is comparison-based sorting Algorithms. In our last article, we were able to define a comparison-based sorting algorithm.

Definition: A comparison-based sorting algorithm takes as input an array $[a_1, a_2, a_3, ..., a_n]$ of $n$ items, and can only gain information about the items by comparing pairs of them. Each comparison $($"is $a_i > a_j$?"$)$ returns $YES$ or $NO$ and counts a 1 time-step. The algorithm may also for free reorder items based on the results of comparisons made. In the end, the algorithm must output a permutation of the input in which all items are in sorted order.

Also, we were able to prove that any comparison-based sorting algorithm must take $\Omega(n log n)$ time in the worst case. And, we also defined a theorem for this proof. Do you remember what it was? Of course, you don't. Here's the theorem:

Theorem: Any deterministic comparison-based sorting algorithm must perform $\Omega(n log n)$ comparisons to sort an array of $n$ elements in the worst case. Specifically, for any deterministic comparison-based sorting algorithm $\mathcal{A}$, for all $n \geq 2$ there exists an input $I$ of size $n$ such that $\mathcal{A}$ makes at least $log_2(n!) = \Omega(n log n)$ comparisons to sort $I$.

## Average-case lower bounds

You started reading this article, reached this point and wondered, "Wait, we are discussing average-case lower bounds"?... "I know I was told about bounds and worst case, but what the fuck is average-case lower bound?". Well, if you are pondering over that question, once again, well ponder no more! Let's define what an average-case lower bound is. An average-case lower bound is a lower bound on the running time of an algorithm that holds for the average-case input.

Now, what the hell is an average-case input? An average-case input is an input that is chosen at random from the set of all possible inputs of a given size. So, an average-case lower bound is a lower bound on the running time of an algorithm that holds for the average-case input.

Now, we are ready to discuss average-case lower bounds for comparison-based sorting algorithms. In fact, we can generalize the above theorem to show that any comparison-based sorting algorithm must take $\Omega(n log n)$ time on average, not just in the worst case.

Surely, this calls for another theorem, right?

## Another Theorem? What the heck?

Yeah, reality sucks. People are often burdened with things they never asked for. But, you know what? You can't do anything about it. So, let's just get on with it. Here's the theorem:

Theorem: For any deterministic comparison-based sorting algorithm $\mathcal{A}$, the average-case number of comparisons (the number of comparisons on average on a randomly chosen permutation of $n$ distinct elements) is at least $\lfloor log_2(n!) \rfloor$.

Alright, so now that we have the theorem, let's prove it. Let $S$ be the set of all $n!$ possible orderings of $n$ distinct elements. As noted in the previous argument, these each require a different permutation to be produced as output. Let's now build out the entire decision tree for algorithm $\mathcal{A}$ on $S$: the tree we get by looking at all the different question/answer paths we get by running algorithm $\mathcal{A}$ on the inputs in $S$. This tree has $n!$ leaves, where the depth of a leaf is the number of comparisons performed by the sorting algorithm on that input. Our goal is to show that the average depth of the leaves must be at least $\lfloor log_2(n!) \rfloor$ (previously, we only cared about the maximum depth).

If the tree is completely balanced, then each leaf is at depth $\lceil log_2(n!) \rceil$ or $\lfloor log_2(n!) \rfloor$ and we are done (Let us define a tree to be completely balanced if the deepest leaf is at most one level deeper than the shallowest leaf. Everything would be easier if we could somehow assume $n!$ was a power of 2....).

To prove the theorem, we just need to show that out of all binary trees on a given number of leaves, the one that minimizes their average depth is a completely balanced tree. This is not too hard to see: given some unbalanced trees, we take two sibling leaves at the largest depth and move them to be children of the leaf of the smallest depth. Since the difference between the largest depth and the smallest depth is at least 2 (otherwise the tree would be balanced), this operation reduces the average depth of the leaves. Specifically, if the smaller depth is $d$ and the larger depth is $D$, we have removed two leaves of depth $D$ and one of depth $d$, and we have added two leaves of depth $d+ 1$ and one of depth $D - 1$. Since any unbalanced tree can be modified to have a smaller average depth, such a tree cannot be one that minimizes average depth, and therefore the tree of the smallest average depth must in fact be balanced.

In fact, if we are a bit more clever in the proof, we can get rid of the floor in the bound.

## Lower bound for randomized algorithms

Umm, you hate theorems? Do you love them? You don't care? Well, I don't care either. I'm just going to give you another theorem. Here it is:

Theorem: The above bound holds for randomized algorithms too.

Now, let's prove it. The argument here is a bit subtle. The first step is to argue that with respect to counting comparisons, we can think of a randomized algorithm $\mathcal{A}$ as a probability distribution over deterministic algorithms. In particular, we can think of a randomized algorithm $\mathcal{A}$ as a deterministic algorithm with access to a special “random bit tape”: every time $\mathcal{A}$ wants to flip a coin, it just pulls the next bit off that tape. In that case, for any given run of algorithm $\mathcal{A}$, say reading bit-string $s$ from that tape, there is an equivalent deterministic algorithm $\mathcal{A}_s$ with those bits hardwired in. Algorithm $\mathcal{A}$ is then a probability distribution over all those deterministic algorithms $\mathcal{A}_s$.

This means that the expected number of comparisons made by randomized algorithm $\mathcal{A}$ on some input $I$ is just

$$\sum_{s} Pr(s)(\text{Running time of } \mathcal{A}_s \text{ on } I).$$

If you know the definition of expectation, then by the definition of expectation we can say that, the running time of the randomized algorithm is a random variable and the sequences $s$ correspond to the elementary events. And if you don't know the definition of expectation... What!? I'm not gonna spoon-feed you everything! How about you go learn something on your own sometime? Or touch some grass, your choice!

Anyway, the expected running time of the randomized algorithm is just an average over deterministic algorithms. Since each deterministic algorithm has an average-case running time of at least $\lfloor log_2(n!) \rfloor$, any average over them must too. Formally, the average-case running time of the randomized algorithm is

$$\begin{aligned}\mathop{avg}_{inputs I} \sum_{s} Pr(s)(\text{Running time of } \mathcal{A}_s \text{ on } I) &= \sum_{s} \mathop{avg}_{inputs I} [Pr(s)(\text{Running time of } \mathcal{A}_s \text{ on } I)] \\ &= \sum_{s} Pr(s) \mathop{avg}_{inputs I} [(\text{Running time of } \mathcal{A}_s \text{ on } I)] \\ &\geq \sum_{s} Pr(s) \lfloor log_2(n!) \rfloor \\ &= \lfloor log_2(n!) \rfloor \end{aligned}$$

## What does this all mean?

Nothing means anything at all. Death is the only ultimate reality! On the other hand, one way to think of the kinds of bounds we have been proving is to think of a matrix with one row for every possible deterministic comparison-based sorting algorithm (there could be a lot of rows!) and one column for every possible permutation of $n$ given input elements (there are a lot of columns too). Entry $(i,j)$ in this matrix contains the running time of algorithm $i$ on input $j$. The worst-case deterministic lower bound tells us that for each row $i$ there exists a column $j_i$ such that the entry $(i,j_i)$ is large. The average-case deterministic lower bound tells us that for each row $i$, the average of the elements in the row is large. The randomized lower bound says “Well, since the above statement holds for every row, it must also hold for any weighted average of the rows.” In the language of game theory, one could think of this as a two-player game (much like rock-paper-scissors) between an “algorithm player” who gets to pick a row and an adversarial “input player” who gets to pick a column. Each player makes their choice and the entry $(i,j)$ is the cost to the algorithm player which we can think of as how much money the algorithm player has to pay the input player. We have shown that there is a randomized strategy for the input player (namely, pick a column at random) that guarantees it an expected gain of $\Omega(n log n)$ no matter what strategy the algorithm player chooses.

This was a shorter post, wasn't it? But this is all I had to tell you for now. Until next time. Bye!

May 22, 2024