Alright, folks! I know I keep this weblog very personal and there's art flowing all around this website, but let's talk some mathematics today! Specifically, we are here to discuss the notion of lower bounds for sorting algorithms. Now, when I say sorting algorithms, I am talking about comparison-based sorting algorithms. There are other sorting algorithms like counting sort, radix sort, bucket sort, etc. but they are a topic for another day. Now, buckle up for a long text-based post and a vomit load of mathematics, because, by the end of this article, we are going to show that any deterministic comparison-based sorting algorithm must take $\Omega(n \log n)$ time to sort an array of n elements in the worst case.
You started reading this article, and reached this point and wondered, "Wait, we are discussing the notion of lower bounds"? "What is a lower bound"? ... "To think that, what the fuck even is a bound"? Well, if you are pondering over that question, well ponder no more! I promised you a shit-ton of information and a butt-load of theory, so here we go!
BOUNDS? HUH?
So, imagine you want to pack some mangoes and you have 3 boxes. You want to pack the mangoes in such a way that you use the least amount of boxes. Now, you can pack the mangoes in 1 box, 2 boxes or 3 boxes. But, you can't pack the mangoes in 4 boxes, because you only have 3 boxes. So, the number of boxes you can use to pack the mangoes is bounded by 3. This is an example of an upper bound. Similarly, you can't pack the mangoes in -1 boxes, because you can't have negative boxes. So, the number of boxes you can use to pack the mangoes is bounded by 0. This is an example of a lower bound.
To talk definitions, a lower bound is a function that is always less than or equal to the function we are trying to bind. Similarly, an upper bound is a function that is always greater than or equal to the function we are trying to bind. For example, $f(n) = n^2$ is a lower bound for $g(n) = n^2 + n$, because $g(n) \geq f(n)$ for all $n \geq 1$. Similarly, $f(n) = n^2$ is an upper bound for $g(n) = n^2 - n$, because $g(n) \leq f(n)$ for all $n \geq 1$.
WHY LOWER BOUNDS?
Most times in life, we are concerned about the upper bounds of a situation. "Given some problem X, can we solve it while keeping our resources below Y units?" is a question we ask ourselves a lot. And while solving that problem, we often try to keep our resources as low as possible. But, have you ever said to yourself, "I have done the best I could, and no one else would have done better than me"? Well, that's what lower bounds are for. Lower bounds are the best possible solution to a problem. Here, our goal is to keep our resources as high as possible. Lower bounds also help us understand how close we are to the best possible solution. For example, if we have an algorithm that runs in time $O(n \log^2{n})$, and a lower bound of $\Omega(n \log n)$, then we have a $\log(n)$ "gap": the maximum possible savings we could hope to achieve by improving our algorithm.
COMPARISION-BASED SORTING ALGORITHMS
Another, term you encountered during this article was "comparison-based sorting algorithm" and you pondered, "What is a comparison-based sorting algorithm?". Well, once again, ponder no more! I promised you a shit-ton of information and a butt-load of theory, so here we go! Comparison-based sorting algorithms are sorting algorithms that only operate on the input array by comparing pairs of elements and moving elements around based on the results of these comparisons. For example, in the bubble sort algorithm, we compare the first two elements of the array and swap them if they are not in the correct order. Then, we compare the second and third elements of the array and swap them if they are not in the correct order. We continue this process until we reach the end of the array. Then, we repeat the process again, but this time, we stop one element before the end of the array. We continue this process until we reach the beginning of the array. This is a comparison-based sorting algorithm.
Actually, we can go ahead and give a proper definition of these types of algorithms:
Definition: A comparison-based sorting algorithm takes as input an array $[a_1, a_2, a_3, ..., a_n]$ of $n$ items, and can only gain information about the items by comparing pairs of them. Each comparison $($"is $a_i > a_j$?"$)$ returns $YES$ or $NO$ and counts a 1 time-step. The algorithm may also for free reorder items based on the results of comparisons made. In the end, the algorithm must output a permutation of the input in which all items are in sorted order.
Bubble sort, Quicksort, Mergesort, and Insertion-sort are all comparison-based sorting algorithms.
alright, what's the THEOREM?
Now, we are ready to state the theorem:
Theorem: Any deterministic comparison-based sorting algorithm must perform $\Omega(n \log n)$ comparisons to sort an array of $n$ elements in the worst case. Specifically, for any deterministic comparison-based sorting algorithm $\mathcal{A}$, for all $n \geq 2$ there exists an input $I$ of size $n$ such that $\mathcal{A}$ makes at least $\log_2(n!) = \Omega(n \log n)$ comparisons to sort $I$.
Let's talk about the proof now. We know that a sorting algorithm must output a permutation of the input $[a_1, a_2, a_3, ..., a_n]$. The key to the argument is that (a) there are $n!$ different possible permutations the algorithm might output, and (b) for each of these permutations, there exists an input for which that permutation is the only correct answer. For instance, the permutation $[a_3,a_1,a_4,a_2]$ is the only correct answer for sorting the input $[2, 4, 1, 3]$. In fact, if you fix a set of $n$ distinct elements, then there will be a $1-1$ correspondence between the different orderings the elements might be in and the permutations needed to sort them.
Given (a) and (b) above, this means we can fix some set of $n!$ inputs (e.g., all orderings of $\{1, 2,...,n\}$), one for each of the $n!$ output permutations. Let $S$ be the set of these inputs that are consistent with the answers to all comparisons made so far (so, initially, $|S| = n!$). We can think of a new comparison as splitting $S$ into two groups: those inputs for which the answer would be $YES$ and those for which the answer would be $NO$. Now, suppose an adversary always gives the answer to each comparison corresponding to the larger group. Then, each comparison will cut down the size of $S$ by at most a factor of $2$. Since $S$ initially has size $n!$, and by construction, the algorithm at the end must have reduced $|S|$ down to $1$ in order to know which output to produce, the algorithm must make at least $\log_2(n!)$ comparisons before it can halt. We can then solve:
$$\log_2(n!) = \log_2(n) + \log_2(n-1) + ... + \log_2(2) = \Omega(n \log n)$$
Our proof is like a game of 20 Questions in which the responder doesn’t actually decide what he is thinking of until there is only one option left. This is legitimate because we just need to show that there is some input that would cause the algorithm to take a long time. In other words, since the sorting algorithm is deterministic, we can take that final remaining option and then re-run the algorithm on that specific input, and the algorithm will make the same exact sequence of operations.
Let's do an example with $n = 3$, and $S$ as initially consisting of the $6$ possible orderings of $\{1, 2, 3\}$:
$$(123),(132),(213),(231),(312),(321).$$
Suppose the sorting algorithm initially compares the first two elements $a_1$ and $a_2$. Half of the possibilities have $a_1 > a_2$ and half have $a_2 > a_1$. So, the adversary can answer either way and let's say it answers that $a_2 > a_1$. This narrows down the input to three possibilities:
$$(123),(132),(231).$$
Suppose the next comparison is between $a_2$ and $a_3$. In this case, the most popular answer is that $a_2 > a_3$, so the adversary returns that answer which removes just one order, leaving the algorithm with:
$$(132),(231).$$
It now takes one more comparison to finally isolate the input ordering and determine the correct permutation to output.
ALTERNATIVE VIEW OF THE PROOF
Another way of looking at the proof we gave above is as follows. For a deterministic algorithm, the permutation it outputs is solely a function of the series of answers it receives (any two inputs producing the same series of answers will cause the same permutation to be output). So, if an algorithm always made at most $k < lg(n!)$ comparisons, then there are at most $2^k < n!$ different permutations it can possibly output. In other words, there is some permutation it can’t output. So, the algorithm will fail on any input for which that permutation is the only correct answer.
PONDERING TIME...
Suppose we consider the problem: “Order the input array so that the smallest $n/2$ come before the largest $n/2$”? Does our lower bound still hold for that problem, or where does it break down? How fast can you solve that problem?
Answer: No, the proof does not still hold. It breaks down because any given input can have multiple correct answers. E.g., for input $[2, 1, 4, 3]$, we could output any of $[a_1,a_2,a_3,a_4]$, $[a_2,a_1,a_3,a_4]$, $[a_1,a_2,a_4,a_3]$, or $[a_2,a_1,a_4,a_3]$. In fact, not only does the lower bound break down, but we can actually solve this problem in linear time: just run the linear-time median-finding algorithm and then make a second pass putting elements into the first half or second half based on how they compare to the median.
I know the title said, "Why sorting has to be $(n \log n)$ lower bound", but we just talked about comparison-based sorting algorithms. Feeling cheated? Well then, screw you too! Yeah, I cheated you. What are you gonna do about it? Complain about it in comments, Huh?
Alright! See you later, alligator!
EDIT on June 02, 2023: You can read this follow-up next!
See https://en.m.wikipedia.org/wiki/Sorting_algorithm#Non-comparison_sorts
P.S. I love the cat running across the screen that I can click on. Like wtf??