Jekyll2020-04-01T08:12:12+00:00http://mathmondays.com/feed.xmlMath MondaysIt's a work in progressThe Dehn Invariant, or, Tangrams In Space2020-03-30T00:00:00+00:002020-03-30T00:00:00+00:00http://mathmondays.com/dehn-invariant<div class="mathdefs">
$
\newcommand{\ZZ}{\Bbb Z}
\newcommand{\QQ}{\Bbb Q}
\newcommand{\RR}{\Bbb R}
$
</div>
<p>Fans of wooden children’s toys may remember <a href="https://en.wikipedia.org/wiki/Tangram">tangrams</a>, a puzzle composed of 7 flat pieces that can be rearranged into numerous different configurations.</p>
<p><img src="/assets/dehn/tangrams.svg" alt="Tangrams in square and cat configurations" /></p>
<p>As mathematicians, we’re interested in shapes that are slightly simpler than cats or houses. <!--more--> For example, we might try to design a set of tangrams that can be rearranged into an equilateral triangle. One possibility is shown below.</p>
<p><img src="/assets/dehn/square-to-triangle.svg" alt="Equidecomposition of square and triangle" /></p>
<p>How about a pentagon?</p>
<p><img src="/assets/dehn/square-to-pentagon.svg" alt="Equidecomposition of square and pentagon" /></p>
<p>We don’t have to start with a square, how about a set that can become a star or a triangle?</p>
<p><img src="/assets/dehn/star-to-triangle.svg" alt="Equidecomposition of six-pointed star and triangle" /></p>
<p>What pairs of polygons can we design tangram sets for? One way to reframe this problem is in terms of <em>scissors-congruence</em>, which is pretty much what it sounds like. Two polygons are “scissors-congruent” if we can take the first polygon, make a finite number of straight-line cuts to it, and rearrange the pieces into the second polygon. Clearly, two polygons are scissors-congruent if and only if we can design a set of tangrams that connect the two.</p>
<hr />
<p>Given two polygons, how can we tell if they’re scissors-congruent? One thing we can do is check their areas, since, if they have different areas, there’s no way they can be scissors-congruent. It turns out that this is the <em>only</em> obstacle – if two polygons have the same area, they <em>must</em> be scissors-congruent! This surprising result is known as the Wallace–Bolyai–Gerwien theorem, and was proven in the 1830s. We’ll walk through a proof.</p>
<p>It suffices to show that any polygon of area $A$ is scissors-congruent to an $A \times 1$ rectangle. This is because, if $P_1$ and $P_2$ are scissors-congruent to some third shape $Q$, then we can rearrange $P_1$ into $P_2$ by going through $Q$ as an intermediate step. We start by breaking our polygon into triangles:</p>
<p><img src="/assets/dehn/wbg-1.svg" alt="Triangulation of a polygon" /></p>
<p>Next, we’ll transform each triangle into a rectangle, by cutting it halfway up its height, and folding down the apex:</p>
<p><img src="/assets/dehn/wbg-2.svg" alt="Cutting a triangle into a rectangle" /></p>
<p>Now we need to change the dimensions of this rectangle, but this step requires some creativity. We need the height of the rectangle to be between $1$ and $2$. If it isn’t, we can repeatedly cut it in half until it does. (If the height is less than $1$, then we run this process in reverse to double it instead.)</p>
<p><img src="/assets/dehn/wbg-3.svg" alt="Repeatedly halving a rectangle" /></p>
<p>Then, we do a sliding maneuver to convert this rectangle into one with height $1$. Notice that we need $u < 1$, or else $u \ell$ would be greater than $\ell$, and we couldn’t draw this diagram.</p>
<p><img src="/assets/dehn/wbg-4.svg" alt="Minor width adjustment of a rectangle" /></p>
<p>After doing this to all the triangles, the final step is to glue all these rectangles together, end-to-end, to get the desired $A \times 1$ rectangle.</p>
<hr />
<p>The natural question to ask next is: can we generalize this? What about 3D shapes? Are any two polyhedra of equal volume also scissors-congruent?</p>
<p>This is the third of <a href="https://en.wikipedia.org/wiki/Hilbert%27s_problems">Hilbert’s twenty-three problems</a>, and his student, Max Dehn, proved in 1903 that, unlike in two dimensions, the answer is “no”. He did so by constructing a quantity (now known as the “Dehn invariant”) that stays unchanged under scissors-congruence. Two shapes with different Dehn invariants, therefore, cannot be scissors-congruent. For example, a cube and a tetrahedron of equal volume are not scissors-congruent.</p>
<p>Unlike area and volume, the Dehn invariant isn’t as simple as a real number, and we’ll need to do a bit of legwork to define it. The key observation to make is that a cut can only do one of three things to an edge:</p>
<ul>
<li>miss it completely</li>
<li>cut it at a point</li>
<li>split it along its entire length</li>
</ul>
<p>By looking at what these operations do to edges, we can cobble together a quantity that stays invariant. The properties of an edge that we care about are its length and its dihedral angle.<sup id="ref1"><a href="#fn1">[1]</a></sup>.</p>
<p>In the first situation, the edge stays unchanged. That one’s easy.</p>
<p>In the second situation, one edge is turned into two edges. The new edges have the same dihedral angle as the original, and their lengths sum to the original length.</p>
<p><img src="/assets/dehn/edge-cut-transverse.svg" alt="Cutting an edge transversely" /></p>
<p>In the third situation, we again get two edges, but this time, the length stays the same, and the dihedral angle changes.</p>
<p><img src="/assets/dehn/edge-cut-lengthwise.svg" alt="Cutting an edge along its length" /></p>
<p>Lastly, cuts also create new edges, as they slice through a face. We’d like these to count for nothing, count as zero.</p>
<p>Now that we know what cuts do to edges, how do we use this to define an invariant? If an edge is represented by the ordered pair $(\ell_i, \theta_i)$, we want to enforce the following equivalence relations:
\[ (\ell_1 + \ell_2, \theta) \cong (\ell_1, \theta) + (\ell_2, \theta) \qquad (\ell, \theta_1 + \theta_2) \cong (\ell, \theta_1) + (\ell, \theta_2) \]</p>
<p>These two rules imply some further relations. Consider the sum of $n$ copies of $(\ell, \theta)$. Applying the first rule repeatedly gives $(n \ell, \theta)$, and the second rule gives $(\ell, n \theta)$. This can be extended to negative $n$ as well, so for any integer $n$,
\[ n (\ell, \theta) = (n \ell, \theta) = (\ell, n \theta) \]</p>
<p>If you’re familiar with tensors, you might notice that these are exactly the conditions for a tensor product! If not, don’t worry, you can think of these as ordered pairs still, but we’ll use the symbol $\otimes$ instead of a comma. It may make more sense when we go through the examples.</p>
<p>We still have to deal with the new edges created from cuts in the faces, but these almost resolve themselves. The edges we create come in pairs with supplementary angles. So if the edge pair we create has length $\ell$, we get $(\ell, \theta) + (\ell, \pi - \theta) = (\ell, \pi)$. Using the third rule above, we can drag a $2$ from the left to the right, giving us $(\ell/2, 2\pi)$. If we declare that $2\pi$ is equivalent to $0$ (a reasonable demand, given that we’re working with angles), then these edge pairs automatically cancel each other out, as desired.</p>
<p>We can now define the Dehn invariant: it takes values in $\RR \otimes_\ZZ \RR/2 \pi$ (lengths and angles), and it’s equal to the sum of $\ell_i \otimes \theta_i$ over all the edges. Is something that concise truly unchanged by scissors-congruence?</p>
<p>When we make a cut, either it misses an existing edge, and so the corresponding term in the sum does not change, or it intersects it, in which case that term is replaced by two terms that sum to the original. It also creates new edges, by cutting into the faces. But as we saw earlier, these edges come in pairs that sum to zero, and so the total value of the invariant remains unchanged.</p>
<hr />
<p>Armed with this invariant, we can now answer the question: are the cube and the tetrahedron are scissors-congruent? Let’s say both have volume 1. The cube has 12 edges, each with dihedral angle $\pi / 2$. To get the volume to be $1$, we need edges of length $1$, so the Dehn invariant of this cube is:
\[ 12 (1 \otimes \frac{\pi}{2}) = 3 (1 \otimes 2 \pi) = (3 \otimes 2 \pi) = 0 \]</p>
<p>A tetrahedron has 6 edges, each with dihedral angle $\arccos(1/3)$. The volume of a tetrahedron with side length $a$ is $a^3 / 6 \sqrt 2$, so the side length of our tetrahedron needs to be $a = (72)^{1/6}$, making the Dehn invariant equal to:
\[ 6 (a \otimes \arccos(1/3)) = 6 a \otimes \arccos(1/3) \]</p>
<p>With some knowledge of modules, one can show that this is non-zero<sup id="ref2"><a href="#fn2">[2]</a></sup>, but the crux of the idea is that $\arccos(1/3)$ is not a rational multiple of $\pi$, so we can never get the right hand side of this tensor to collapse to zero. This shows that no matter how many pieces you cut it into, a cube can never be reassembled into a tetrahedron.</p>
<p>One interesting consequence of this: in geometry class, you probably saw some cut-and-paste constructions for proving the area of a parallelogram, or a triangle. This result shows there can never be such a proof for pyramids – calculus is unavoidable!</p>
<hr />
<p>A final note: we’ve shown that there are at least two obstructions for two scissors-congruence in 3D: volume and Dehn invariant. Are they the only ones? The answer is yes! In other words, if two polyhedra do have the same volume and Dehn invariant, then they are indeed scissors-congruent. The proof of that is much harder, and a good presentation can be found <a href="http://www.math.brown.edu/~res/MathNotes/jessen.pdf">here</a>.</p>
<ol>
<li>
<p><a id="fn1" href="#ref1">↑</a> The dihedral angle of an edge is the angle between the two faces adjacent to it. You can think of it as a measure of the ‘sharpness’ of an edge; a 90° edge is like the edge of a countertop, but a 15° edge will cut like a knife.</p>
</li>
<li>
<p><a id="fn2" href="#ref2">↑</a> First, note that for any rational $p/q$, we have $\ell \otimes \frac{p}{q} \pi = \frac{\ell}{2q} \otimes 2 p \pi = 0$. This means that $\RR \otimes_\ZZ \RR/2\pi \cong \RR \otimes_\ZZ \RR/(2\pi\QQ)$. Since both of those modules are divisible, this is equal to $\RR \otimes_\QQ \RR/(2 \pi \QQ)$, which, being a tensor product of $\QQ$-vector spaces, is a $\QQ$-vector space itself. In particular, if $\ell \ne 0$ and $\theta \notin 2 \pi \QQ$, then $\ell \otimes \theta$ is a non-zero vector.</p>
</li>
</ol>$ \newcommand{\ZZ}{\Bbb Z} \newcommand{\QQ}{\Bbb Q} \newcommand{\RR}{\Bbb R} $ Fans of wooden children’s toys may remember tangrams, a puzzle composed of 7 flat pieces that can be rearranged into numerous different configurations. As mathematicians, we’re interested in shapes that are slightly simpler than cats or houses.The Mathematical Hydra2019-09-29T00:00:00+00:002019-09-29T00:00:00+00:00http://mathmondays.com/hydra<p>Imagine you’re tasked with killing a hydra. As usual, the hydra is defeated when all of its heads are cut off, and whenever a head is cut off, the hydra grows new ones.</p>
<p>However, this mathematical hydra is much more frightening than a “traditional” one. It’s got a tree-like structure – heads growing out of its heads – and it can regrow entire groups of heads at once! Can you still win?</p>
<p>Also, this post is the first one with interactivity! Feel free to report bugs on the <a href="https://github.com/HenrySwanson/HenrySwanson.github.io/issues">GitHub issues page</a>.</p>
<!--more-->
<hr />
<p>For the purposes of our game, a hydra is a rooted tree. The root, on the left, is the body, and the leaves are the heads. Intermediate nodes are part of the necks of the hydra, and cannot (yet) be cut off.</p>
<p align="center">
<img src="/assets/hydra/anatomy.svg" width="70%" height="auto" alt="Anatomy of a hydra" />
</p>
<p>You can cut off one head at a time, and when you do, the hydra may grow more heads, according to the following rules:</p>
<ul>
<li>If the head is connected directly to the root, then the hydra does nothing.</li>
<li>Otherwise, look at the parent node (the one directly underneath the one you just cut off). The hydra grows two new copies of that node <em>and all its children</em>, attaching them to the grandparent as appropriate.</li>
</ul>
<hr />
<p>This is hard to convey through text, so let’s walk through an example. Let’s start with a pretty simple hydra, and cut off one of the heads. (Purple indicates newly-grown heads.)</p>
<p><img src="/assets/hydra/example-1.svg" alt="First step of killing the hydra" width="100%" height="auto" /></p>
<p>We used to have two heads, and four nodes total, but now we have three, and seven nodes. That’s not good. Let’s try chopping off another one.</p>
<p><img src="/assets/hydra/example-2.svg" alt="Second step of killing the hydra" width="100%" height="auto" /></p>
<p>This increases the total number of heads, but now, we can cut off the three smallest heads, one at a time, without incident.</p>
<p><img src="/assets/hydra/example-3.svg" alt="Third step of killing the hydra" width="100%" height="auto" /></p>
<p>We’ve made some visible progress now. Cutting off one of the remaining heads will reveal three more, but we can extinguish them easily.</p>
<p><img src="/assets/hydra/example-4.svg" alt="Fourth step of killing the hydra" width="100%" height="auto" /></p>
<p>Repeating this process on the last head will kill the hydra.</p>
<p><img src="/assets/hydra/example-5.svg" alt="Fifth step of killing the hydra" width="100%" height="auto" /></p>
<hr />
<p>We managed to defeat this hydra, but it was a pretty small one. What about something a bit larger? Let’s add one more head to that neck.</p>
<p>This time, you can try to kill it yourself: the illustration below is interactive!</p>
<p><button id="reset-button" type="button">Reset</button>
<span id="click-counter" style="float:right;"></span></p>
<div id="hydra-interactive" style="border-style: solid;border-width: 3px;border-radius: 5px;background-color: #fff"></div>
<hr />
<p>Depending on how persistent you are, you might not be surprised to learn that you can indeed kill this hydra, though it’ll take tens of thousands of moves to do so (29528 moves by my count). In fact, you can kill any hydra, though I’ll make no guarantees about how long it will take.</p>
<p>But what may be surprising is that you can’t avoid killing the hydra, even if you try. No matter how large the hydra, or what order you cut off its heads, you will always defeat it in a finite number of moves.</p>
<p>And even better, this holds true even for faster-regenerating hydras. What if, instead of growing back two copies of the subtree, the hydra grows back three copies? Or a hundred? What if, on the $N$th turn of the game, it grows back $N$ copies? $N^2$? $N!$? What if the hydra just gets to pick how many copies to regrow, as many as it wants?</p>
<p>It doesn’t matter.</p>
<p>You always win.</p>
<hr />
<p>The proof here relies on <a href="https://en.wikipedia.org/wiki/Ordinal_number">ordinal numbers</a>. If you’re not familiar, there’s a good <a href="https://www.youtube.com/watch?v=SrU9YDoXE88">video from Vsauce</a> about them. The key property to know is that the ordinals are “well-ordered”; that is, there is no infinitely long descending sequence.<sup id="ref1"><a href="#fn1">[1]</a></sup>.</p>
<p>We assign an ordinal number to each hydra, in such a way that cutting off a head produces a hydra with a strictly smaller ordinal. As we play the hydra game, the sequence of hydras we encounter produces a corresponding sequence of ordinals. Since the ordinal sequence is strictly decreasing, it must eventually terminate, and so the hydra sequence must terminate as well. The only way that the hydra sequence can terminate is if we have no more heads to cut off; i.e., we’ve defeated the hydra.</p>
<p>The assignment is done by assigning values to the nodes, and accumulating down to the root:</p>
<ul>
<li>A head is assigned $0$. Similarly, a trivial (dead) hydra is assigned $0$.</li>
<li>If a node has children with ordinals $\alpha_1, \alpha_2, \ldots, \alpha_n$, then we assign the ordinal $\omega^{\alpha_1} + \omega^{\alpha_2} + \cdots + \omega^{\alpha_n}$<sup id="ref2"><a href="#fn2">[2]</a></sup>.</li>
</ul>
<p>What happens when we cut off a head?</p>
<ul>
<li>If it’s directly attached to the body, then it contributes a term of $\omega^0 = 1$ to the whole ordinal. Killing this head removes this term, decreasing the ordinal.</li>
<li>Otherwise, consider the ordinal of that head’s parent and grandparent. Before we cut off the head, the ordinal of the parent must have been of the form $\alpha + 1$. This means the ordinal of the grandparent has a term $\omega^{\alpha + 1}$. When we cut off the head, the parent ordinal decreases to $\alpha$, but there’s now two more copies of it. This replaces the $\omega^{\alpha + 1}$ term in the grandparent with $3 \omega^\alpha$, which is strictly smaller. And because the rest of the tree remains unchanged, this means the ordinal assigned to the hydra as a whole also decreases.</li>
</ul>
<p>To illustrate this process, let’s look the ordinals that correspond to the hydras we saw earlier. It may help to read them in reverse order.</p>
<p align="center">
<img src="/assets/hydra/ordinals.svg" width="100%" height="auto" alt="Ordinal sequence for previous hydra" />
</p>
<p>We can also see why the hydra’s regeneration speed doesn’t matter. No matter how large $N$ is, as long as it’s finite, $\omega^{\alpha + 1}$ will be strictly larger than $N \omega^{\alpha}$.</p>
<p>One way to think about this is that a neck that forks at height $k+1$ is literally <em>infinitely worse</em> than a neck that forks at height $k$. By cutting off a head, you simplify it at height $k+1$, at the expense of introducing some forking at height $k$, which isn’t as bad.</p>
<hr />
<p>A last interesting fact: this proof relied on ordinal numbers, which have a whole lot of infinities ($\omega$s) tied up in them. But everything in this hydra game is finite; from an initial hydra, there’s only finitely many hydras we can encounter, each of which has only finitely many heads. Is there a proof that avoids any mention of infinity?</p>
<p>In 1982, Laurence Kirby and Jeff Paris proved that there isn’t, in the following sense: any proof technique strong enough to prove the hydra’s eventual demise is strong enough to prove the consistency of Peano arithmetic. In particular, it’s impossible to prove the hydra theorem from within Peano arithmetic.</p>
<hr />
<ol>
<li><a id="fn1" href="#ref1">↑</a> In fact, the ordinals are the prototype of every well-founded set, and this is what makes them important.</li>
<li><a id="fn2" href="#ref2">↑</a> Without loss of generality, we can relabel the subhydras so that the ordinals are non-strictly descending. This avoids problems coming from the non-commutativity of ordinal addition.</li>
</ol>
<script src="https://cdnjs.cloudflare.com/ajax/libs/svg.js/2.7.1/svg.js"></script>
<script src="/js/hydra_lib.js"></script>
<script src="/js/hydra_main.js"></script>Imagine you’re tasked with killing a hydra. As usual, the hydra is defeated when all of its heads are cut off, and whenever a head is cut off, the hydra grows new ones. However, this mathematical hydra is much more frightening than a “traditional” one. It’s got a tree-like structure – heads growing out of its heads – and it can regrow entire groups of heads at once! Can you still win? Also, this post is the first one with interactivity! Feel free to report bugs on the GitHub issues page.Safes and Keys2018-11-26T00:00:00+00:002018-11-26T00:00:00+00:00http://mathmondays.com/safes-and-keys<p>Here’s a few similar puzzles with a common story:</p>
<blockquote>
<p>I have <em>n</em> safes, each one with a unique key that opens it. Unfortunately, some prankster snuck into my office last night and stole my key ring. It seems they’ve randomly put the keys inside the safes (one key per safe), and locked them.</p>
</blockquote>
<p>We’ll play around with a few different conditions and see what chances we have of getting all safes unlocked, and at what cost.</p>
<!--more-->
<hr />
<p><strong>1) The prankster was a bit sloppy, and forgot to lock one of the safes. What is the probability I can unlock all of my safes?</strong></p>
<p>The key observation here, as with the subsequent problems, is to consider the arrangement of keys and safes as a permutation. Label the safes and keys $1$ to $n$, and define $\pi(i)$ to be the number of the key inside the $i$th safe. So, if we have key $1$, we unlock safe $1$ to reveal key $\pi(1)$.</p>
<p>Under this interpretation, key $i$ lets us unlock all safes in the cycle containing $i$; we open a safe, find a new key, track down the new safe, and repeat until we end up where we started. So, we want to know the probability that a randomly chosen permutation has exactly one cycle.</p>
<p>This isn’t too hard; we can count the number of one-cycle permutations in a straightforward way. Given a permutation of one cycle, we start with element $1$, we write out $\pi(1)$, $\pi(\pi(1))$, etc, until we loop back to $1$. This produces an ordered list of $n$ numbers, starting with $1$, and this uniquely determines the cycle. There are $(n-1)!$ such lists, and so the probability of having exactly one cycle is $(n-1)!/n! = 1/n$</p>
<hr />
<p><strong>2) Say the prankster is sloppier, and leaves k safes unlocked. Now what is my probability of success?</strong></p>
<p>This one requires a little more thought. It’s tempting to consider permutations with $k$ cycles, but that’s not quite right. If there’s only one cycle, we’re sure to succeed, and furthermore, even if there are $k$ cycles, our success isn’t guaranteed: we could pick two safes in the same cycle.</p>
<p>By symmetry, label our safes so that we’ve picked safes $1$, $2$, …, $k$. We’d like to know how many permutations have a cycle that completely avoid $1$ through $k$. If, and only if, such a cycle is present, we fail to unlock all the safes.</p>
<p>Let $a_i$ be the number of “good” permutations when there are $i$ safes. We will express $a_n$ in terms of smaller $a_i$s, and solve the resulting recurrence relation.</p>
<p>Given a permutation $\pi$, we can split the set $\{ 1, \ldots n \}$ into two parts: those that have cycles intersecting $\{ 1, \ldots, k \}$, and those that do not. (It may help to think of these sets as “reachable” and “unreachable” safes, respectively). Since $\pi$ never sends a reachable safe to an unreachable one, or vice versa, it induces permutations on both these sets. Also, knowing both these subpermutations, we can reconstruct $\pi$. So, let’s count how many possible permutations there are on the reachable and unreachable sets.</p>
<p>If there are $r$ reachable safes, then there are $a_r$ possible permutations induced on the reachable set, and $(n-r)!$ induced on the unreachable one. (The reason we don’t get the full $r!$ on the reachable set is that some permutations would leave a safe unreachable, when it’s supposed to be reachable.) Furthermore, we have a choice of <em>which</em> safes are reachable. The first $k$ safes must be reachable, so beyond that, we have $\binom{n-k}{r-k}$ more choices to make. Our recurrence relation is then:
\[ n! = \sum_{r = k}^n \binom{n-k}{r-k} a_r (n-r)! = \sum_{r = k}^n a_r \frac{(n-k)!}{(r-k)!} \]</p>
<p>Since $(n-k)!$ doesn’t depend on $r$, we can pull it out to get a neater-looking form:
\[ \frac{n!}{(n-k)!} = \sum_{r=k}^n \frac{a_r}{(r-k)!} \]</p>
<p>Now $n$ only shows up as an index, not anywhere in the summand. This lets us collapse our sum; take this term, and subtract it from the corresponding one for $n-1$:
\begin{align*}
\frac{n!}{(n-k)!} - \frac{(n-1)!}{(n-1-k)!} &= \left( \sum_{r=k}^n \frac{a_r}{(r-k)!} \right) - \left( \sum_{r=k}^{n-1} \frac{a_r}{(r-k)!} \right) \\\<br />
\frac{n!}{(n-k)!} - \frac{(n-1)!}{(n-1-k)!} &= \frac{a_n}{(n-k)!} \\\<br />
n! - (n-1)!(n-k) &= a_n \\\<br />
k \cdot (n-1)! &= a_n
\end{align*}</p>
<p>So there’s $k \cdot (n-1)!$ permutations in which we win. Since there’s $n!$ total, this gives our probability of success at $k/n$.</p>
<hr />
<p><strong>3) If the prankster is careful, and remembers to lock all the safes, then I have no choice but to break some of them open. What’s the expected number of safes I have to crack?</strong></p>
<p>This one’s much easier than 2). The question here is just “how many cycles are there in a random permutation”, and <a href="/linearity-expectation">from a previous post</a>, we know that’s $H_n$, the $n$th harmonic number.</p>
<hr />
<p><strong>4) Putting it all together: if we start with $k$ safes unlocked, what’s the expected number of safes I have to crack open?</strong></p>
<p>I haven’t actually put this one on solid ground yet! It’s not coming out pretty.</p>Here’s a few similar puzzles with a common story: I have n safes, each one with a unique key that opens it. Unfortunately, some prankster snuck into my office last night and stole my key ring. It seems they’ve randomly put the keys inside the safes (one key per safe), and locked them. We’ll play around with a few different conditions and see what chances we have of getting all safes unlocked, and at what cost.Ax-Grothendieck Theorem2018-11-12T00:00:00+00:002018-11-12T00:00:00+00:00http://mathmondays.com/ax-grothendieck<div class="mathdefs">
$
\newcommand{\CC}{\Bbb C}
\newcommand{\FF}{\Bbb F}
\newcommand{\QQ}{\Bbb Q}
\newcommand{\FFx}[1]{\overline{\FF_{#1}}}
\newcommand{\ACF}{\mathbf{ACF}}
\newcommand{\cL}{\mathcal{L}}
\newcommand{\cT}{\mathcal{T}}
$
</div>
<p>The Ax-Grothendieck theorem is the statement:</p>
<div class="theorem-box">
<div class="theorem-title">Ax-Grothendieck Theorem</div>
Let $f: \CC^n \to \CC^n$ be a polynomial map; that is, each coordinate $f_i: \CC^n \to \CC$ is a polynomial in the $n$ input variables.
Then, if $f$ is injective, it is surjective.
</div>
<p>This… doesn’t seem like a particularly exciting theorem. But it has a really exciting proof.</p>
<!--more-->
<hr />
<p>The idea behind the proof isn’t algebraic, it isn’t topological, it’s not even geometric, it’s <s>DiGiorno</s> model-theoretic!</p>
<p>The spirit of the proof is as follows:</p>
<ul>
<li>if the theorem is false, then there is a disproof (a proof of the negation)</li>
<li>this proof can be written in “first-order logic”, a particularly limited set of axioms</li>
<li>because this proof is finitely long, and uses only first-order logic, it “can’t tell the difference” between $\CC$ and $\FFx{p}$ for large enough $p$
<ul>
<li>note: $\FFx{p}$ is the algebraic closure of the finite field $\FF_p$</li>
</ul>
</li>
<li>pick a large enough $p$, and transfer our proof to $\FFx{p}$; this won’t affect its structure or validity</li>
<li>show that there is, in fact, no counterexample in $\FFx{p}$</li>
<li>by contradiction, there is no disproof, and the theorem must be true</li>
</ul>
<p>This is an… unusual proof strategy. I don’t usually think about my proofs as mathematical objects unto themselves. But that’s probably because I’m not a model theorist.</p>
<hr />
<p>First, we’ll get the last step out of the way.</p>
<p><em>Proof</em>: Let $f: \FFx{p}^n \to \FFx{p}^n$ be injective. Pick an arbitrary target $y_i \in \FFx{p}^n$ to hit. Let $K \supseteq \FF_p$ be the field extension generated by the $y_i$ and the coefficients that show up in $f$. Since all of these generators are algebraic over $\FF_p$, and there’s finitely many of them, $K$ is finite. Also, since fields are closed under polynomial operations, $f(K^n) \subseteq K^n$. But because $f$ is injective, and $K^n$ is finite, $f(K^n)$ must be all of $K^n$, i.e., there’s some input $x_i$ such that $f(x_i) = y_i$. Thus $f$ is surjective.</p>
<hr />
<p>Now for the exciting stuff.</p>
<p>We have to figure out a way of taking proofs over $\CC$, and translating them into proofs over $\FFx{p}$. This is daunting, but it’s made easier by the fact that they are both algebraically closed fields, and so they have a shared pool of axioms. Of course, they are very different in other ways: $\CC$ is uncountable while $\FFx{p}$ is countable, they have different characteristic, etc. We have to show that our proof manipulations aren’t affected by these differences.</p>
<p>Since this isn’t an intro to model theory post, I won’t be defining the basic terms. If these look unfamiliar, check out <a href="https://www.lesswrong.com/posts/F6BrJFkqEhh22rFsZ/very-basic-model-theory">this post</a>.</p>
<p>Let $\ACF$ be the theory of algebraically closed fields. We claim that it’s first-order, and it’s <em>almost</em> complete.</p>
<p>This is a theory in the language of rings, which is $\cL_{ring} = \{ +, \times, 0, 1 \}$. Our axioms are:</p>
<ul>
<li>the usual field axioms (these are all first-order)</li>
<li>for each $d \ge 1$, add the sentence $\forall a_0 \forall a_1 \cdots \forall a_d \exists x \ a_0 + a_1 x + \cdots a_d x^d = 0 \land a_d \ne 0$
<ul>
<li>this are first-order sentences, and together, they tell us that every non-constant polynomial has a root</li>
</ul>
</li>
</ul>
<p>So $\ACF$ is a first-order theory. It isn’t complete, of course. For example, the sentence $1 + 1 = 0$ is true in $\FFx{2}$, but not in $\FFx{3}$ or $\CC$. Turns out fields of different characteristic are… different. No surprise there.</p>
<p>So we define extensions of $\ACF$, where we <em>do</em> specify the characteristic. For a prime $p$, define $S_p$ to be the sentence $1 + \cdots + 1 = 0$, where there are $p$ copies of $1$. Then the theory of algebraically closed fields of characteristic $p$ is $\ACF_p = \ACF \cup \{ S_p \}$.</p>
<p>What about characteristic $0$? To force our field to have characteristic zero, we can throw in $\lnot S_p$ for all primes $p$: $\ACF_0 = \ACF \cup \{ \lnot S_2, \lnot S_3, \lnot S_5, \ldots \}$. This nails down exactly the algebraically closed fields of characteristic $0$.</p>
<p>We claim that $\ACF_0$ and $\ACF_p$ are complete theories.</p>
<hr />
<p>If that is indeed the case, then we can prove a stronger form of the Ax-Grothendieck theorem.</p>
<div class="theorem-box">
<div class="theorem-title">Ax-Grothendieck Theorem (Stronger)</div>
Let $k$ be an algebraically closed field. If $f: k^n \to k^n$ is a polynomial map, then if $f$ is injective, it is surjective.
</div>
<p><em>Proof</em>: We start by breaking our claim into a number of first-order sentences. We can’t first-order define an arbitrary polynomial, so we’ll work with all polynomials of bounded degree. For a fixed $d$, the sentence “for all polynomial maps $f$ of degree at most $d$, injectivity of $f$ implies surjectivity of $f$” can be expressed as a first-order sentence.</p>
<p>First, introduce $n \cdot (d+1)$ variables for the coefficients of $f$. The sentence “$f$ is injective” can be made first-order by taking $f(x) = f(y) \implies x = y$ and expanding out the coefficients of $f$. Likewise, “$f$ is surjective” can be written as $\forall z \exists x \ f(x) = z$, and expanding $f$.</p>
<p>As an example, if $n = 1, d = 2$, our sentence is:
\[ \forall a_0 \forall a_1 \forall a_2 \ (\forall x \forall y \ a_2 x^2 + a_1 x + a_0 = a_2 y^2 + a_1 y + a_0 \implies x = y) \]
\[ \implies \forall z \exists x \ a_2 x^2 + a_1 x + a_0 = z \]</p>
<p>Since I literally never want to write out that sentence in the general case, let’s just call it $\phi_d$.</p>
<p>We’ll separately tackle the case of characteristic $p$ and characteristic $0$.</p>
<p>Let $p$ be any prime. Because $\ACF_p$ is complete, either there is a proof of $\phi_d$ or a proof of $\lnot \phi_d$. The latter is impossible; if there were such a proof, then it would show that $\phi_d$ is false in $\FFx{p}$, and we’ve proven before that it is true in this field. Therefore, $\ACF_p$ entails a proof of $\phi_d$.</p>
<p>Similarly, because $\ACF_0$ is complete, either it can prove $\phi_d$, or it can prove $\lnot \phi_d$. Again, for the sake of contradiction, we assume the latter. Let $P$ be a proof of $\phi_d$ from $\ACF_0$. Since $P$ is finite, it can only use finitely many axioms. In particular, it can only use finitely many of the $\lnot S_p$. So there’s some prime $q$ such that $\lnot S_q$ was not used in $P$. Therefore, $P$ is also a valid proof in $\ACF_q$. But we already know there are no proofs of $\lnot \phi_d$ from $\ACF_q$, and so we’ve reached a contradiction. Therefore, there must be a proof of $\phi_d$ from $\ACF_0$.</p>
<p>Since $\ACF_p$ can prove $\phi_d$, and $\ACF_0$ can prove $\phi_d$, we know that $\phi_d$ is true in all algebraically closed fields $k$, no matter what the characteristic of $k$ is. And since $\phi_d$ is true for all $d$, we have proved the claim for polynomials of arbitrary degree.</p>
<hr />
<p>This proof is magical in two ways.</p>
<p>One is that, despite there being no homomorphisms between $\FFx{p}$ and $\CC$, we were able to somehow transport a claim between the two. This was possible not by looking at the structure of $\CC$ and $\FFx{p}$ themselves, but by using the structure of their axiomatizations. The reduction to only finitely many axioms is an example of the <a href="https://en.wikipedia.org/wiki/Compactness_theorem">compactness theorem</a>, a very useful logical principle.</p>
<p>The other is that we never actually made use of $\phi_d$! All we knew is that it was a first-order sentence, and that it was true in some model of $\ACF_p$ for each $p$. Generalizing this argument, we get the following principle:</p>
<div class="theorem-box">
<div class="theorem-title">Robinson's Principle</div>
If $\phi$ is a first-order sentence, then the following are equivalent:
<ol>
<li>$\ACF_p$ proves $\phi$ for all but finitely many $p$</li>
<li>$\ACF_p$ proves $\phi$ for infinitely many $p$</li>
<li>$\ACF_0$ proves $\phi$</li>
</ol>
Furthermore, the following are equivalent for $r$ a prime or $0$:
<ol>
<li>$\ACF_r$ proves $\phi$</li>
<li>$\phi$ is true in some algebraically closed field of characteristic $r$</li>
<li>$\phi$ is true in all algebraically closed fields of characteristic $r$</li>
</ol>
</div>
<p>For the first claim, obviously (1) implies (2). The proof that (2) implies (3) is essentially the proof we gave above: if $\phi$ can’t be proved from $\ACF_0$, then $\lnot \phi$ can. This proof can only use finitely many of the $\lnot S_p$, and there’s infinitely many $\ACF_p$ that prove $\phi$, so there’s some $p$ we can transfer the proof to and get our contradiction. The proof that (3) implies (1) is similar: if there’s a proof of $\phi$ from $\ACF_0$, it can be transferred to all but finitely many $\ACF_p$.</p>
<p>The second claim is a direct consequence of completeness of $\ACF_r$.</p>
<p>Combining these two claims gives some very powerful techniques. The way we used it is: to show something is true for all algebraically closed fields, it suffices to show it only for a single example at each prime $p$.</p>
<p>At this point, there is no more spooky magic, and the rest of the article is about justifying the completeness of $\ACF_p$ and $\ACF_0$. Still cool though, IMO.</p>
<hr />
<p>First, we’ll state a popular theorem in model theory:</p>
<div class="theorem-box">
<div class="theorem-title">Löwenheim–Skolem Theorem</div>
Let $\cT$ be a countable theory. If it has an infinite model, then for any infinite cardinal $\kappa$, it has a model of size $\kappa$.
</div>
<p>Essentially, first-order logic is too limited to distinguish between different sizes of infinity; if there’s a model of one infinite size, there’s a model of all infinite sizes. The proof of this theorem is somewhat involved, and we won’t cover it here, but see <a href="http://modeltheory.wikia.com/wiki/L%C3%B6wenheim-Skolem_Theorem">here</a> for a proof.</p>
<p>Using this, we can prove the Łoś–Vaught test:</p>
<div class="theorem-box">
<div class="theorem-title">Łoś–Vaught Test</div>
Let $\cT$ be a theory and $\kappa$ be some infinite cardinal. We say that $\cT$ is $\kappa$-categorical if there is exactly one model of $\cT$ of size $\kappa$, up to isomorphism.
<br /><br />
If $\cT$ is $\kappa$-categorical for some $\kappa$, and has no finite models, then it is a complete theory.
</div>
<p>This is unexpected, at least in my opinion. But then again, model theory isn’t my forte. Maybe there’s some intution one can use here that I don’t have.</p>
<p><em>Proof</em>: If $\cT$ isn’t complete, then there’s some $\phi$ such that $\cT$ proves neither $\phi$ nor $\lnot \phi$. By the <a href="https://en.wikipedia.org/wiki/G%C3%B6del%27s_completeness_theorem">completeness theorem</a>, this means there’s a model $M$ of $\cT$ in which $\phi$ is true, and a model $M’$ of $\cT$ in which $\lnot \phi$ is true.</p>
<p>Since all models of $\cT$ are infinite, both $M$ and $M’$ are infinite. This means that $M$ is an infinite model of $\cT \cup \{ \phi \}$, thus we can apply Löwenheim–Skolem to get a model $N$ of $\cT \cup \{ \phi \}$ which has size $\kappa$. Likewise, we use $M’$ to get a model $N’$ of $\cT \cup \{ \lnot \phi \}$ which has size $\kappa$. But because $\cT$ is $\kappa$-categorical and both $N$ and $N’$ are models of $\cT$, they must be isomorphic. But because $\phi$ is true in $N$ and false in $N’$, this is a contradiction.</p>
<p>We’d like to apply the Łoś–Vaught test to $\ACF_p$ and $\ACF_0$. Since all algebraically closed fields are infinite, it suffices to show that these theories are $\kappa$-categoral for some $\kappa$.</p>
<p><em>Proof</em>: Let $\kappa$ be an uncountable cardinal and $K$ be an algebraically closed field of size $\kappa$. Let $B$ be a transcendence basis of $K$ over its prime subfield $k$ ($\FF_p$ or $\QQ$). <a href="https://proofwiki.org/wiki/Field_of_Uncountable_Cardinality_K_has_Transcendence_Degree_K">A cardinality argument</a> shows that $\|B\| = \kappa$ (this is where the uncountability of $\kappa$ is used; for example, $\overline{\QQ}(t_1, \ldots, t_n)$ has transcendence degree $n$, but cardinality $\aleph_0$). So, if $K’$ is another algebraically closed field, with the same cardinality and characteristic, and we pick a transcendence basis $B’$, it will also have cardinality $\kappa$. The bijection between $B$ and $B’$ induces an isomorphism between $k(B)$ and $k(B’)$. But since $K$ and $K’$ are algebraically closed, and algebraic over $k(B) \cong k(B’)$, they are algebraic closures of the same field, and are thus isomorphic!</p>
<p>This proves that $\ACF_p$ and $\ACF_0$ are $\kappa$-categorical for uncountable cardinals $\kappa$. In particular, they’re $\kappa$-categorical for at least one infinite cardinal, and so via the Łoś–Vaught test, we conclude they are complete.</p>$ \newcommand{\CC}{\Bbb C} \newcommand{\FF}{\Bbb F} \newcommand{\QQ}{\Bbb Q} \newcommand{\FFx}[1]{\overline{\FF_{#1}}} \newcommand{\ACF}{\mathbf{ACF}} \newcommand{\cL}{\mathcal{L}} \newcommand{\cT}{\mathcal{T}} $ The Ax-Grothendieck theorem is the statement: Ax-Grothendieck Theorem Let $f: \CC^n \to \CC^n$ be a polynomial map; that is, each coordinate $f_i: \CC^n \to \CC$ is a polynomial in the $n$ input variables. Then, if $f$ is injective, it is surjective. This… doesn’t seem like a particularly exciting theorem. But it has a really exciting proof.Wedderburn’s Little Theorem2018-11-05T00:00:00+00:002018-11-05T00:00:00+00:00http://mathmondays.com/wedderburn<div class="mathdefs">
$
\newcommand{\ZZ}{\Bbb Z}
\newcommand{\QQ}{\Bbb Q}
$
</div>
<p>Some rings are closer to being fields than others. A <strong>domain</strong> is a ring where we can do cancellation: if $ab = ac$ and $a \ne 0$, then $b = c$. Even closer is a <strong>division ring</strong>, a ring in which every non-zero element has a multiplicative inverse. The only distinction between fields and division rings is that the latter may be non-commutative. For this reason, division rings are also called <strong>skew-fields</strong>.</p>
<p>These form a chain of containments, each of which is strict:
fields $\subset$ division rings $\subset$ domains $\subset$ rings</p>
<p>Some examples:</p>
<ul>
<li>$\ZZ$ is a domain</li>
<li>$\ZZ/6\ZZ$ is not a domain</li>
<li>the set of $n \times n$ matrices is not a domain; two non-zero matrices can multiply to zero</li>
<li>$\QQ$ is a field (duh)</li>
<li>the quaternions are a division ring</li>
</ul>
<p>Wedderburn’s theorem states that this hierarchy collapses for finite rings: every finite domain is a field.</p>
<!--more-->
<hr />
<p>First, we show that every finite domain is a division ring.</p>
<p>Let $D$ be a finite domain, and $x \in D$ be non-zero. The map $f : D \to D$ given by $f(d) = xd$ is injective, which we get immediately from the definition of a domain. Because $D$ is finite, $f$ injective implies that $f$ is surjective as well. This means there’s some $y$ such that $f(y) = xy = 1$. This makes $y$ a right-inverse of $x$; is it also a left-inverse? Yes! Since $x = 1x = xyx$, cancellation gives us $1 = yx$.</p>
<hr />
<p>The next step, showing that every finite division ring is a field, is significantly trickier. We’ll continue, knowing that $D$ is a division ring.</p>
<p>Our plan is to re-interpret $D$ as a vector space, to get some information about its size. Then, we’ll drop the additive structure, and apply some group theory to the multiplicative structure. Lastly, our result will be vulnerable to some elementary number theory.</p>
<p>Let $Z$ be the center of $D$; the set of elements that commute multiplicatively with everything in $D$. The distributive law tells us that $Z$ is an abelian group under addition, and by definition, $Z^*$ is an abelian group under multiplication. This makes $Z$ a field, which allows us to apply some linear algebra to the problem.</p>
<p>As with field extensions, a division ring containing a field is a vector space over that field; specifically, $D$ is a vector space over $Z$, where vector addition is addition in $D$, and scalar multiplication is multiplication by an element of $Z$. This gives us some information about the size of $D$. If $Z$ has size $q$, and $D$ has dimension $n$ over $Z$, then $D$ has size $q^n$.</p>
<p>Let’s look at some linear subspaces of $D$ (as a vector space). For an element $x \in D$, let $C(x)$ be the set of all elements that commute with $x$ (this is the <strong>centralizer</strong> of $x$). We claim that this is a subspace of $D$. It’s clearly closed under addition, and we claim it is also closed under scalar multiplication. If $y \in C(x)$ and $z \in Z$, then it follows quickly that $(zy)x = x(zy)$, i.e., $zy \in C(x)$.</p>
<p>Because $C(x)$ is a linear subspace, it has dimension $q^k$ for some $1 \le k \le n$. And if $x \notin Z$, we know that both these inequalities are strict. If $k = n$, then $C(x) = D$, and $x$ is in fact in the center. If $k = 1$, then $C(x) = Z$, and since $x \in C(x)$ for sure, $x$ is again in $Z$.</p>
<p>Now we can apply some group theory. The <a href="https://en.wikipedia.org/wiki/Conjugacy_class#Conjugacy_class_equation">class equation</a> is a statement about the conjugacy classes of a group. The details are best saved for another post, but if we have a group $G$ with center $Z(G)$, and $g_1, \ldots, g_r$ are distinct representatives of the non-trivial conjugacy classes, then
\[ |G| = |Z(G)| + \sum_{i=1}^r [G : C(g_i)] \]</p>
<p>Essentially, this comes from the fact that $[G : C(g_i)]$ is the number of conjugates of $g_i$, and that the conjugacy classes partition $G$.</p>
<p>If we apply this to $D^*$, and remember our observation about the size of $C(x)$, then we get:
\[ q^n - 1 = (q - 1) + \sum_{i=1}^r \frac{q^n - 1}{q^{k_i} - 1}, \, 1 < k_i < n \]</p>
<p>We claim that this can only happen when $n = 1$; i.e., when $Z = D$. This would prove that $D$ is a field! From here on out, it’s all number theory.</p>
<hr />
<p>First, we claim that each $k_i$ divides $n$. Let $n = a k_i + b$ be the result of division with remainder. Since $(q^n - 1)/(q^{k_i} - 1)$ is the index of some $C(x)$, it’s an integer, so $q^{k_i} - 1$ divides $q^n - 1$, or equivalently, $q^n \equiv 1 \pmod{q^{k_i} - 1}$. Substituting $n = a k_i + b$, we get that $q^b \equiv 1 \pmod{q^{k_i} - 1}$. But since $b < k_i$, $q^b - 1 < q^{k_i} - 1$, and so we must have that $q^b - 1 = 0$; i.e., that $b = 0$. (Here, we quietly used the fact that $q > 1$.) Therefore, $k_i$ divides $n$.</p>
<p>For the next step, we’ll need to introduce the <a href="https://en.wikipedia.org/wiki/Cyclotomic_polynomial">cyclotomic polynomials</a> $\Phi_k(x)$. They have three properties in particular that are of interest to us:</p>
<ul>
<li>they are monic and have integer coefficients</li>
<li>for any $m$, the polynomial $x^m - 1$ factors as $\prod_{k \mid m} \Phi_k(x)$</li>
<li>the roots of $\Phi_k(x)$ are exactly the primitive $k$th roots of unity</li>
</ul>
<p>The second fact tells us that $\Phi_n(x)$ is a factor of $x^n - 1$, but also, that it is a factor of $(x^n - 1)/(x^{k_i} - 1)$ – the denominator cancels out out some of the $\Phi_k(x)$, but $\Phi_n(x)$ is left intact, since $k_i < n$.</p>
<p>Since the quotients $\frac{x^n - 1}{\Phi_n(x)}$ and $\frac{(x^n - 1)/(x^{k_i} - 1)}{\Phi_n(x)}$ are products of cyclotomic polynomials, each of which is monic with integer coefficients, then they are also monic with integer coefficients. Therefore, if we plug in $x = q$, we will get an integer. This means that the integer $\Phi_n(q)$ divides the integers $q^n - 1$ and $(q^n - 1)/(q^{k_i} - 1)$. Note that we had to work for this; it’s not an immediate consequence of divisibility as polynomials. For example, consider $p(x) = x + 3$ snd $q(x) = x^3 + 3x^2 - x/4 - 3/4$. While $p(x)$ divides $q(x)$ as polynomials, $p(1) = 4$ does not divide $q(1) = 3$.</p>
<p>Now, returning to the class equation, we’ve shown that most of the terms are divisible by the integer $\Phi_n(q)$, so the only leftover term, $q - 1$, is also divisible by $\Phi_n(q)$. We claim this is only possible if $n = 1$, which would then give us our desired result.</p>
<p>Use the third fact about cyclotomic polynomials: $\Phi_n(q) = \prod (q - \zeta)$, where $\zeta$ ranges over all primitive $n$th roots of unity. Taking the modulus, we get that $|\Phi_n(q)| = \prod |q - \zeta|$. From the triangle inequality, $|q - \zeta| + |\zeta| \ge |q|$, or, rearranged, $|q - \zeta| \ge |q| - |\zeta| = q - 1$. If $n > 1$, then this inequality is strict, because equality only happens when $\zeta = 1$. Furthermore, since $q \ge 2$, we have $|q - \zeta| > q - 1 \ge 1$. Therefore, if $n > 1$, $\Phi_n(q)$ is a product of terms, all of which have absolute value strictly greater than $q - 1$ and $1$, thus, $|\Phi_n(q)| > q - 1$. But this means that $\Phi_n(q)$ cannot divide $q - 1$, and so this is a contradiction!</p>
<p>Therefore, $n = 1$, which forces $Z = D$, and thus $D$ to be commutative; hence, a field. Q.E.D!</p>$ \newcommand{\ZZ}{\Bbb Z} \newcommand{\QQ}{\Bbb Q} $ Some rings are closer to being fields than others. A domain is a ring where we can do cancellation: if $ab = ac$ and $a \ne 0$, then $b = c$. Even closer is a division ring, a ring in which every non-zero element has a multiplicative inverse. The only distinction between fields and division rings is that the latter may be non-commutative. For this reason, division rings are also called skew-fields. These form a chain of containments, each of which is strict: fields $\subset$ division rings $\subset$ domains $\subset$ rings Some examples: $\ZZ$ is a domain $\ZZ/6\ZZ$ is not a domain the set of $n \times n$ matrices is not a domain; two non-zero matrices can multiply to zero $\QQ$ is a field (duh) the quaternions are a division ring Wedderburn’s theorem states that this hierarchy collapses for finite rings: every finite domain is a field.Sylow Theorems2018-10-29T00:00:00+00:002018-10-29T00:00:00+00:00http://mathmondays.com/sylow<div class="mathdefs">
$
\newcommand{\ZZ}{\Bbb Z}
\DeclareMathOperator{\Stab}{Stab}
\DeclareMathOperator{\Fix}{Fix}
\DeclareMathOperator{\Aut}{Aut}
\DeclareMathOperator{\sgn}{sgn}
$
</div>
<p>In group theory, the Sylow theorems are a triplet of theorems that pin down a suprising amount of information about certain subgroups.</p>
<p>Lagrange’s theorem tells us that if $H$ is a subgroup of $G$, then the size of $H$ divides the size of $G$. The Sylow theorems give us some answers to the converse question: for what divisors of $|G|$ can we find a subgroup of that size?</p>
<!--more-->
<hr />
<p>For a group $G$, and a prime $p$, and $n$ be the largest integer such that $p^n$ divides $|G|$. A $p$-subgroup of $G$ is a subgroup of order $p^k$, and if it has order $p^n$, then it is called a Sylow $p$-subgroup. Under these definitions, the Sylow theorems are:</p>
<div class="theorem-box">
<div class="theorem-title">Sylow Theorems</div>
<ol>
<li>Every $p$-subgroup is contained in a Sylow $p$-subgroup. As such, Sylow $p$-subgroups exist.</li>
<li>All Sylow $p$-subgroups are conjugate to each other.</li>
<li>Let $n_p$ be the number of Sylow $p$-subgroups, and $m = |G|/p^n$. Then the following hold:
<ul>
<li>$n_p$ divides $m$</li>
<li>$n_p \equiv 1 \bmod p$</li>
<li>$n_p = [G : N(P)]$, where $N(P)$ is the normalizer of any Sylow $p$-subgroup.</li>
</ul>
</li>
</ol>
</div>
<p>These are rather technical and deserve some more thorough digestion. Sylow 1 tells us that maximal $p$-subgroups are as big as possible; there is no obstruction preventing them from being the full $p^n$.</p>
<p>Sylow 2 tells us that all Sylow $p$-subgroups are isomorphic in a very strong way; there is a conjugation of the group sending them to each other. To see how this is a strong criterion, consider a non-example. Let $G = \ZZ_4 \times \ZZ_2$, and pick out the subgroups $H_1 = { (0, 0), (2, 0) }$ and $H_2 = { (0, 0), (0, 1) }$. It’s clear that $H_1$ and $H_2$ are isomorphic, but they are not conjugate. This manifests in $G/H_1 = \ZZ_2 \times \ZZ_2$ and $G/H_2 = \ZZ_4$ not being isomorphic.</p>
<p>Sylow 3 is the easiest to understand; it just puts some arithmetic criteria on $n_p$. For small-ish groups, this is often enough to nail down $n_p$ exactly!</p>
<p>On to the proofs!</p>
<h2 id="lemma">Lemma</h2>
<p>First let’s establish a lemma we’ll use frequently.</p>
<div class="theorem-box">
<div class="theorem-title">Lemma</div>
If $G$ is a $p$-group, and it acts on a set $X$, then $\|X\| \equiv \|\Fix(X)\| \bmod p$, where $\Fix(X)$ is the set of points in $X$ that are fixed by every $g \in G$.
</div>
<p>Proof: Let $x_1, \ldots, x_k$ be representatives for the $G$-orbits of $X$. We know that the sum of the sizes of the orbits is $|X|$. If $x_i$ is a fixed point, then the orbit is of size $1$. If it is not, then by orbit-stabilizer, the size of the orbit is $[G : \Stab(x_i)]$, which is divisible by $p$. Thus, mod $p$, every fixed point contributes $1$, and everything else in $X$ contributes $0$.</p>
<h2 id="sylow-1">Sylow 1</h2>
<p>Given a $p$-subgroup $H$, we show that, if it is not already maximal, we can find a $p$-subgroup $H’ \supset H$ that is $p$ times bigger. Repeating this process gives us a Sylow $p$-subgroup containing our original $H$. Since the trivial subgroup is a $p$-subgroup, this also establishes the existence of Sylow $p$-subgroups!</p>
<p>Let $H$ be a $p$-group that is not maximal, i.e., it has order $p^i$, where $i < n$. There is a natural action of $H$ on the left coset space $G/H$, and since $H$ is a $p$-group, our lemma tells us that $|G/H|$ is equivalent to the number of fixed points mod $p$. But since $i < n$, $G/H$ has order divisible by $p$. So the number of fixed points of this action is also divisible by $p$.</p>
<p>What do fixed points of this action look like? If $gH$ is a coset fixed by $h \in H$, then $hgH = gH$, i.e., $g^{-1} h g \in H$. If this is true for all $h$, then $g$ lies in the normalizer of $H$. The converse is also true, since these implications were all reversible. This means that $N(H)$ is composed of the cosets of $H$ that are fixed points.</p>
<p>Combining the two observations above, we conclude that $[N(H) : H]$ is divisible by $p$. Therefore, by Cauchy’s theorem, there’s some subgroup of order $p$ in $N(H)/H$. Lifting this subgroup to $N(H)$, we get a subgroup of size $p \cdot |H| = p^{i+1}$. This is the $H’$ we were looking for.</p>
<h2 id="sylow-2">Sylow 2</h2>
<p>Let $P$ and $Q$ be two Sylow $p$-subgroups of $G$. We want to show they are conjugate.</p>
<p>There is a natural action of $P$ on $G$ by multiplication, and this descends to an action of $P$ on $G/Q$ (again, left coset space). From our lemma, the number of fixed points of this action is equivalent to $|G/Q|$, mod $p$. But since $Q$ is a Sylow $p$-subgroup, $|G/Q|$ is not divisible by $p$. This means that the number of fixed points cannot be zero; i.e., there is at least one fixed point for this action. This is some $gQ$ such that $pgQ = gQ$ for all $p \in P$. Or, rearranging the terms, a $g$ such that $g^{-1}pg \in Q$ for all $p \in P$. Since $P$ and $Q$ are the same size, being Sylow $p$-subgroups, this means that $g^{-1}Pg = Q$, and so they are indeed conjugate.</p>
<h2 id="sylow-3">Sylow 3</h2>
<p>Let $P$ be a particular Sylow $p$-subgroup, and let it act on the set of <em>all</em> Sylow $p$-subgroups by conjugation. We claim that $P$ is the only fixed point of this action. This would, by our lemma (we’re getting so much mileage out of this baby), instantly tell us that $n_p \equiv 1 \bmod p$.</p>
<p>Consider some fixed point $Q$. Then for any $p \in P$, $p^{-1}Qp = Q$, which means that $P$ lies in the normalizer of $Q$. Since both $P$ and $Q$ are Sylow $p$-subgroups of $G$, they are both Sylow $p$-subgroups of $N(Q)$. By Sylow 2, they must be conjugate, but since $Q$ is normal in $N(Q)$, it’s not going anywhere under conjugation. Thus $Q$ must equal $P$.</p>
<p>Next, we show that $n_p = [G : N(P)]$. Consider the action of $G$ by conjugation on the set of Sylow $p$-subgroups. There’s only one orbit, because of Sylow 2, and by orbit-stabilizer, it has size $[G : \Stab(P)]$. But the stabilizer of $P$ is just the normalizer, so $n_p = [G : N(P)]$, as desired.</p>
<p>Lastly, since $m = [G : P] = [G : N(P)] [N(P) : P]$, we get that $n_p$ divides $m$ for free.</p>
<h2 id="applications">Applications</h2>
<p>Cool! These are nice theorems, how do we put them to use? Let’s look at some example applications.</p>
<hr />
<p><em>Show that $\ZZ_{35}$ is the only group of size $35$.</em></p>
<p>Let $G$ be a group of size $35$. We’ll consider its Sylow $5$ and $7$-subgroups. By Sylow 3, we know that $n_5 \equiv 1 \bmod 5$, and divides $7$. This means it’s gotta be $1$, which means $G$ has a normal subgroup of size $5$. Likewise, $n_7 \equiv 1 \bmod 7$, and divides $5$, so $G$ has a normal subgroup of size $7$ as well. They intersect trivially, since their sizes are relatively prime, so $G$ is a direct product of these groups. Therefore, $G \cong \ZZ_5 \times \ZZ_7$, which is $\ZZ_{35}$.</p>
<hr />
<p><em>Classify all groups of order $105$.</em></p>
<p>Let $G$ be a group of order $105$. First, we show that it has normal Sylow $5$- and $7$-subgroups. Sylow 3 restricts $n_5 = 1,21$ and $n_7 = 1,15$.</p>
<p>If $n_5 = 1$, then there’s a unique Sylow $5$-subgroup $N_5$. Picking out some Sylow $7$-subgroup $P_7$, we get a subgroup $H = N_5 P_7$ of size $35$ (the normality of $N_5$ is necessary for this to be a subgroup). But from our previous exercise, we know that this must be isomorphic to $\ZZ_{35}$. Since it’s abelian, $P_7$ must of course be normal in $H$. This means that the normalizer $N(P_7) \supseteq H$. Since $n_7 = [G : N(P_7)] \le [G : H] = 3$, we are forced to conclude that $n_7 = 1$ as well.</p>
<p>Likewise, if $n_7 = 1$, we can construct a subgroup $H = P_5 N_7$ isomorphic to $\ZZ_{35}$, in which $P_5$ is normal. The index of $H$ here is $7$, and this also pins down $n_5 = 1$.</p>
<p>If neither of these are $1$, then we run out of elements. Each of these subgroups intersects trivially (because they have prime order), and so we would have $20 \cdot 4$ non-identity elements from the Sylow $5$-subgroups, and $15 \cdot 6$ non-identity elements from the Sylow $7$-subgroups. Adding in the identity, this is a total of $171$ elements, way too many.</p>
<p>So $G$ has normal Sylow $5$- and $7$-subgroups, and their product is a subgroup $H$ or size $35$. As the product of normal subgroups, it is itself normal. Cauchy’s theorem gives us an element $x$ of order $3$, and it generates a subgroup $K$. Since $H$ and $K$ intersect trivially, $HK$ is the whole group, and so $G$ is a semidirect product of $H$ and $K$.</p>
<p>What options do we have for our twisting homomorphism $\phi : K \to \Aut(H)$? All we have to do is specify $\phi(x)$, and all we need is that $\phi(x)^3$ is the identity.</p>
<p>The automorphisms of $\ZZ_n$ are those given by multiplying by some $a$ relatively prime to $n$. As such, the automorphisms of $\ZZ_{35}$ with degree dividing $3$ are $(r \mapsto ar)$, where $a^3 \equiv 1 \bmod 35$. The only such solutions are $1, 11, 16$.</p>
<p>If $a = 1$, then this is the trivial automorphism, and so $G \cong \ZZ_3 \times \ZZ_{35} \cong \ZZ_{105}$.</p>
<p>It turns out that the groups for $a = 11$ and $a = 16$ are isomorphic, but I can’t figure out a clean way to show it at the moment. Stay tuned. <!--TODO--></p>
<hr />
<p><em>Show $A_5$ is the smallest non-abelian simple group.</em></p>
<p>To prove this, we need to eliminate the possibility of a simple non-abelian group of any smaller size. First, we can eliminate primes; any group of size $p$ is cyclic, hence abelian.</p>
<p>We can also eliminate prime powers. Any group of prime power order has a non-trivial center, so it cannot be simple.</p>
<p>Next, we eliminate anything that is $2$ mod $4$. Such a number is equal to $2m$ with $m$ odd. If $G$ is a group of size $2m$, let $G$ act on itself by multiplication. This gives us a map $\phi : G \to S_{2m}$ sending $g$ to the permutation it induces. By Cauchy’s theorem, there’s an element of order $2$. This induces a product of $m$ transpositions, and thus an odd permutation. So the map $\sgn \circ \phi : G \to { \pm 1 }$ is surjective, and so its kernel is a non-trivial proper subgroup of $G$. (Unless $G$ has order $2$, but we already handled that case.)</p>
<p>Our last big sweep will be to eliminate groups of size $p^k m$ with $m < p$. Since $n_p$ divides $m$, we have $n_p \le m < p$. But $n_p$ is $1$ mod $p$, and so must be $1$. If there is a single Sylow $p$-subgroup, it must be normal. This eliminates 15, 20, 21, 28, 33, 35, 39, 44, 51, 52, 55, and 57.</p>
<p>This leaves us with 12, 24, 36, 40, 45, 48, and 56.</p>
<p>$|G|=40$: From the congruence conditions, we know that $n_5$ is $1$ mod $5$ and divides $8$. But this forces it to be $1$, so there is a unique Sylow $5$-subgroup.</p>
<p>$|G|=45$: Similar to $|G|=40$, the arithmetic restrictions force $n_5$ to be $1$.</p>
<p>$|G| = 12$: We know that $n_3$ is either $1$ or $4$. If it’s not $1$, there’s $4$ Sylow $3$-subgroups, and because they have prime order, they intersect trivially. This gives $8$ elements of order $3$, leaving $4$ other elements to constitute the Sylow $2$-subgroups. But each Sylow $2$-subgroup has $4$ elements, and so there is a unique (hence normal) one.</p>
<p>$|G| = 56$: Similar to the case for $12$. If $n_7$ is not $1$, it is $8$, yielding $48$ elements of order $7$. The leftover $8$ elements form the unique Sylow $2$-subgroup.</p>
<p>For the other three cases we need some stronger stuff.</p>
<p><em>Claim</em>: if $G$ is simple and non-abelian, then for all $p$ dividing $|G|$, we must have $|G|$ divides $n_p!$.</p>
<p><em>Proof</em>: Let $G$ act on the Sylow $p$-subgroups by conjugation. Because there are $n_p$ of them, this gives us a homomorphism $\phi : G \to S_{n_p}$. Since $G$ is simple, $\ker \phi$ is either trivial or all of $G$. Because all Sylow $p$-subgroups are conjugate, the latter situation only occurs when there is only one of them, something impossible if $G$ is simple and non-abelian.</p>
<p>This leaves us with the former case, where the kernel is trivial, and thus $\phi$ is an injection. Identifying $G$ as a subgroup of $S_{n_p}$, we get that $|G|$ divides $n_p!$ as promised.</p>
<p>We can now eliminate the last cases.</p>
<p>$|G|=24$: We know that $n_2$ is either $1$ or $3$, by the usual congruence conditions. But now we have a new tool. If $G$ were simple, then $24$ would divide $n_2!$, which it can’t in either case. So $G$ can’t be simple.</p>
<p>$|G|=36$: We know $n_3$ is $1$ or $4$. If $G$ is simple, then $36$ would divide $n_3!$, which it can’t.</p>
<p>$|G|=48$: Identical to the case for $24$.</p>
<p>Phew!</p>
<p>This was a lot of work. Back when I was in high school, we had to prove this without the Sylow theorems, and by god we appreciated them. Get off my lawn!</p>
<p>(But actually though, that was an… experience.)</p>$ \newcommand{\ZZ}{\Bbb Z} \DeclareMathOperator{\Stab}{Stab} \DeclareMathOperator{\Fix}{Fix} \DeclareMathOperator{\Aut}{Aut} \DeclareMathOperator{\sgn}{sgn} $ In group theory, the Sylow theorems are a triplet of theorems that pin down a suprising amount of information about certain subgroups. Lagrange’s theorem tells us that if $H$ is a subgroup of $G$, then the size of $H$ divides the size of $G$. The Sylow theorems give us some answers to the converse question: for what divisors of $|G|$ can we find a subgroup of that size?The Heawood Number2018-10-22T00:00:00+00:002018-10-22T00:00:00+00:00http://mathmondays.com/heawood<p>The <a href="https://en.wikipedia.org/wiki/Four_color_theorem">four-color theorem</a> tells us that we can color any map using only four colors, such that no adjacent regions have the same color.</p>
<p>This is true for any map of the world, whether it’s on a globe or laid out flat. But what about maps on other surfaces?</p>
<!--more-->
<hr />
<p>The mathematical formalization of the four-color theorem is: “any planar graph is 4-colorable”. Let’s break down what that means.</p>
<p>Graph here refers to a collection of vertices and edges, not a plot or a chart. For our purposes, we’ll only consider <strong>simple</strong> graphs, that is, graphs where a) there is no edge from a point to itself and b) for any pair of points, there’s at most one edge between them. A graph is <strong>planar</strong> if we can embed it in the plane (i.e., draw it on a sheet of paper) without any of the edges crossing.</p>
<p>A <em>coloring</em> of a graph is a way of coloring the vertices of the graph such that no two vertices of the same color are connected. Note that self-loops make a graph impossible to color, and multiple edges between vertices don’t matter. This is why we concentrate only on simple graphs.</p>
<p>We say a map is $k$-colorable if there exists a coloring with $k$ colors.</p>
<p><img src="/assets/heawood-1.png" alt="TODO tooltip here" width="100%" height="auto" /></p>
<p>So what does this have to do with maps? The problem of coloring a map can be rephrased as a problem about coloring graphs. And since the field is called “graph theory”, and not “map theory”, that’s what we’ll do. Put a vertex for each country, and connect two vertices if the corresponding countries are adjacent. If you can color the map, then the corresponding graph can be colored in the same way. Likewise, if you can color the graph, you can use the same color assignment to color the map.</p>
<p><img src="/assets/heawood-2.png" alt="TODO tooltip here" width="100%" height="auto" /></p>
<p>We’re looking to answer the question: for a surface $S$, how many colors do we need to guarantee we can color any graph embedded in $S$? To do this, we’ll need to make use of an invariant called the “Euler characteristic”.</p>
<h1 id="euler-characteristic">Euler Characteristic</h1>
<p>Euler’s formula for planar graphs says that for any planar graph, $V - E + F = 2$, where $V$ is the number of vertices, $E$ is the number of edges, and $F$ is the number of faces (including the outside face).</p>
<p>This also applies to graphs embedded on the sphere. Imagine taking a pin and poking a hole in the middle of one of the faces. Stretch this hole out until it is wide enough that you can flatten the entire sphere into a disk. Now you have a graph embedded in the plane. (This explains why we like to consider the outside face a legitimate face.)</p>
<p>But this does not apply to graphs embedded on other surfaces! Consider the following graph on the torus:</p>
<p><img src="/assets/heawood-3.png" alt="TODO tooltip here" width="100%" height="auto" /></p>
<p>This has 16 vertices, 32 edges, and 16 faces (count carefully, not all of them are obvious). This has $V - E + F = 0$! Euler’s formula doesn’t work on the torus, but maybe we can salvage it?</p>
<p>Let’s try some examples:</p>
<p><img src="/assets/heawood-4.png" alt="TODO tooltip here" width="100%" height="auto" /></p>
<p>It seems we <em>usually</em> get $0$, but sometimes we do get a $2$, like before. To resolve this, note that in all the examples where we don’t get $0$, some of the faces have “holes”. If you took the face in the $3 - 3 + 1$ example and laid it out flat, it’d look like a ring, not a disk.</p>
<p>So we’ll equip ourselves with another definition: if a graph is embedded in a surface, and none of the resulting faces have holes, we call that embedding <em>honest</em>. (This isn’t standard terminology, but you can’t stop me from naming things whatever I want. Try me.) It turns out that if you honestly embed a graph into the torus, you’ll always get $V - E + F = 0$, no matter which graph you use, or how it’s embedded.</p>
<p>In fact, for any surface $S$, we have a similar result: there’s a fixed integer $\chi(S)$ such that $V - E + F = \chi(S)$, for any honest embedding of any graph. We call this number the <em>Euler characteristic</em> for the surface. For the plane and the sphere, $\chi = 2$. For the torus, $\chi = 0$. Here’s some other examples of surfaces and their Euler characteristics:</p>
<p><img src="/assets/heawood-5.png" alt="TODO tooltip here" width="100%" height="auto" /></p>
<h1 id="the-heawood-number">The Heawood Number</h1>
<p>Now we can approach the generalized four-color theorem. Armed with the Euler characteristic, we define the <strong>Heawood number</strong> of a surface with Euler characteristic $\chi$ as:
\[ H(\chi) = \left\lfloor \frac{7 + \sqrt{49 - 24 \chi}}{2} \right\rfloor \]</p>
<p>Yeah. That’s… unmotivated.</p>
<p>We claim that any graph that can be embedded on a surface with characteristic $\chi$, honestly or otherwise, can be colored with at most $H(\chi)$ colors. For the sphere, $H(2) = 4$, so our claim becomes the famous Four-Color Theorem, which is Very Hard To Prove (TM). We’ll deliberately exclude that case, like the cowards we are.</p>
<hr />
<p>The first step is to prove a lemma about the minimum degree of the graph. That’ll get us most of the way there.</p>
<p>Let $S$ be a surface that isn’t the sphere, and embed a graph $G$ on it, honestly or not. Let $V$, $E$, and $F$ be the usual, and let $\delta$ be the minimum degree of a vertex in $G$. We claim that $\delta \le H(\chi) - 1$.</p>
<p>Proof: First, we can extend this embedding to an honest embedding, by adding extra edges to cut up the faces. This can only make $\delta$ bigger, so if we can prove $\delta \le H(\chi) - 1$ for this new graph, it was also true for the old graph.</p>
<p>Next, consider the following inequalities, the motivations for which are pulled directly from my ass.</p>
<ul>
<li>Since each face has at least three edges, we know that $2E \ge 3F$.</li>
<li>The sum of the degrees for all vertices is $2E$. Thus, $2E \ge \delta V$.</li>
<li>A vertex cannot be connected to more than $V - 1$ other vertices, so $\delta + 1 \le V$.</li>
</ul>
<p>Now, from the definition of Euler characteristic, we have:
\begin{align*}
\chi &= V - E + F \\\<br />
6\chi &= 6V - 6E + 6F \\\<br />
6\chi &\le 6V - 2E \\\<br />
6\chi &\le 6V - \delta V = (6 - \delta) V \\\<br />
\end{align*}</p>
<p>Here we must split into cases, depending on the sign of $\chi$.</p>
<p>If $\chi \le 0$, then we make both sides positive before making use of our last inequality:
\[ -6\chi \ge (\delta - 6)V \ge (\delta - 6)(\delta + 1) = \delta^2 - 5 \delta - 6 \]</p>
<p>Now use the handy-dandy quadratic formula; we get that $\delta$ is at most $\frac{5 + \sqrt{49 - 24 \chi}}{2} = H(\chi) - 1$. Boom.</p>
<p>Otherwise, $\chi > 0$, and by the <a href="https://en.wikipedia.org/wiki/Surface_%28topology%29#Classification_of_closed_surfaces">classification of compact surfaces</a>, we know $S$ must be the sphere or the projective plane. We’re explicitly excluding the sphere, so $S$ must be the projective plane, which has Euler characteristic 1. Plugging that in, we get that $6 \le (6 - \delta) V$. Since the right side is positive, we must have $\delta < 6$. Because $H(1) = 6$, we can still guarantee that $\delta \le H(\chi) - 1$.</p>
<p>So for any graph $G$ embedded in $S$, honestly or otherwise, there is a vertex with degree at most $H(\chi) - 1$.</p>
<hr />
<p>We’re basically done! We’ll describe an explicit procedure to color graphs on $S$ with $H(\chi)$ colors.</p>
<p>Let $G$ be a graph embedded on $S$. Our base case is the graph with one vertex; it can trivially be colored. Otherwise, consider $G$ with $n \ge 2$ vertices. By our lemma, it has some vertex $v$ with degree at most $H(\chi) - 1$. Apply our procedure to the subgraph $G - v$, coloring it with $H(\chi)$ colors. Since $v$ has strictly less than $H(\chi)$ neighbors, there will be at least one color available for us to color $v$ with, and so we can color all of $G$.</p>
<h1 id="conclusions">Conclusions</h1>
<p>We showed that any graph $G$ embedded in $S$, honestly or otherwise, can be colored with $H(\chi) = \left\lfloor \frac{7 + \sqrt{49 - 24 \chi}}{2} \right\rfloor$ colors. The only case we decided not to handle was when $S$ is the sphere. Unfortunately, that case is much harder. The proof above was discovered in 1890 by Percy John Heawood, after whom the number is named. The Four-Color Theorem wasn’t proven until much later, in 1976, by Kenneth Appel and Wolfgang Haken. And what a controversial proof it was! They managed to reduce the problem to checking a particular property of 1,936 graphs. This wasn’t feasible to do by hand, so they used a computer to check those cases. This was the first computer-aided proof, and it ruffled quite a few feathers.</p>
<p>Secondly, we only established an upper bound on the number of colors we need in our palette. Is there a graph that requires all $H(\chi)$ colors? Or can we lower the bound a bit? The Heawood conjecture is the claim that we can’t; i.e., that this bound is sharp. And it’s mostly true. In 1968, Gerhard Ringel and Ted Youngs showed that, on almost any surface, you can embed the complete graph on $H(\chi)$ vertices. Since that graph requires all $H(\chi)$ colors, that shows the bound is sharp. The only exception is the Klein bottle, where the conjecture predicts $H(0)=7$ colors are needed, but in fact, $6$ colors suffice to color any graph.</p>
<p>A maximal coloring of the Klein bottle is shown below:</p>
<div class="image-container">
<p><img src="/assets/heawood-6.png" alt="TODO tooltip here" height="250px" /></p>
</div>The four-color theorem tells us that we can color any map using only four colors, such that no adjacent regions have the same color. This is true for any map of the world, whether it’s on a globe or laid out flat. But what about maps on other surfaces?Linearity of Expectation2018-10-15T00:00:00+00:002018-10-15T00:00:00+00:00http://mathmondays.com/linearity-expectation<p>To introduce this topic, let’s start with an innocuous problem:</p>
<blockquote>
<p>You have $10$ six-sided dice. If you roll all of them, what is the expected sum of the faces?</p>
</blockquote>
<p>Your intuition should tell you that it’s $35$. But what’s really going on here is an example of a slick principle called <strong>linearity of expectation</strong>.</p>
<!--more-->
<hr />
<p>We’re not actually computing the probability of getting $10, 11, \ldots, 60$, and summing it all up. Implicitly, we are making the following line of argument: the expected value of the first die is $3.5$, and so the expected value for $k$ dice is $3.5k$. This relies on the following claim: given two random variables $X$ and $Y$, the expected value of their sum, $E[X + Y]$, is just $E[X] + E[Y]$.</p>
<p>This feels intuitively true, and proving it is straightforward. Let $\Omega$ be the space of possible outcomes. Then
\begin{align*}
E[X + Y] &= \sum_{\omega \in \Omega} p(\omega) (X + Y)(\omega) \\\<br />
&= \sum_{\omega \in \Omega} p(\omega) (X(\omega) + Y(\omega)) \\\<br />
&= \sum_{\omega \in \Omega} p(\omega) X(\omega) + \sum_{\omega \in \Omega} p(\omega) Y(\omega) \\\<br />
&= E[X] + E[Y]
\end{align*}</p>
<p>But interestingly enough, at no point did we require $X$ and $Y$ be independent. This still works even when $X$ and $Y$ are correlated! For some sanity-checking examples, consider $X = Y$ and $X = -Y$.</p>
<p>This principle, which is rather obvious when $X$ and $Y$ are independent (so much so that we often use it unconsciously), is unexpectedly powerful when applied to dependent variables. We’ll explore the concept through several example problems.</p>
<h1 id="gumballs">Gumballs</h1>
<blockquote>
<p>Imagine a very large gumball machine, with $4$ colors of gumballs in it, evenly distributed. We only have enough money for $6$ gumballs; what’s the expected number of colors we will receive? Assume that the machine has so many gumballs that the ones we take out don’t matter; effectively, we are drawing with replacement.</p>
</blockquote>
<p>Let’s compute this the naive way first. Let’s count the number of ways we can get each number of colors, and do the appropriate weighted sum.</p>
<p>There are $4$ ways we can get only one color.</p>
<p>For any two colors, there’s $2^6 = 32$ ways we can get gumballs using just those colors. There’s $6$ pairs of colors, so there’s $32 \cdot 6 = 192$ ways to get at most two colors. Subtracting off the single-color cases, we get $188$ ways to get exactly two colors.</p>
<p>Similarly, for any three colors, there’s $3^6 = 729$ ways to get gumballs with just those colors. There’s $4$ possible triplets, giving $2916$ ways to get at most three colors. Subtracting off the two-color cases, we get $2728$ ways to get exactly three colors.</p>
<p>All other cases have four colors: $4^6 - 2728 - 188 - 4 = 1176$ possible ways.</p>
<p>Now we do the weighted sum. Each possible sequence of gumballs has probability $1/4^6$ of occuring, so the expected value of the number of colors is:
\[ 1 \frac{4}{4^6} + 2 \frac{188}{4^6} + 3 \frac{2728}{4^6} + 4 \frac{1176}{4^6} = \frac{3317}{1024} \approx 3.239 \]</p>
<p>It’s doable, but one can imagine this is much harder for larger numbers.</p>
<hr />
<p>Let’s take another go at it. For the $i$th color, define $X_i$ to be $1$ if we get at least one gumball of that color, and $0$ otherwise. The number of colors we get, $X$, is then the sum of the $X_i$.</p>
<p>The probability of <em>not</em> getting a gumball of a particular color on a particular draw is $3/4$, so the probability of not getting it in $6$ draws is $(3/4)^6$. This means that $E[X_i] = 1 - (3/4)^6 = 3367/4096$.</p>
<p>The $X_i$ are not independent; for example, if we know three of them are $0$, the last one must be $1$ (we must draw a gumball of <strong>some</strong> color). But we can still apply linearity of expectation, even to dependent variables.</p>
<p>Thus, the expected number of colors we get is $E[X] = \sum_{i = 1}^4 E[X_i] = 4 \cdot \frac{3367}{4096} = \frac{3367}{1024}$, just as we got earlier.</p>
<p>Notably, this approach extends gracefully to when we take $k$ gumballs with $n$ available colors. The expected value of each $X_i$ is then $(1 - 1/n)^k$, so the expected value of $X$ is then $n (1 - 1/n)^k$.</p>
<p>(This reveals an interesting approximation: if $n$ and $k$ are equal and large, then $(1 - 1/n)^n \approx 1/e$, so the expected number of colors is $n(1 - 1/e) \approx 0.63n$).</p>
<h1 id="number-of-fixed-points">Number of Fixed Points</h1>
<p>These variables we saw earlier, that are $1$ if a condition is true, and $0$ otherwise, are called <strong>indicator variables</strong>, and they are particularly good candidates for linearity of expectation problems.</p>
<blockquote>
<p>After we shuffle a deck of $n$ cards, what are the expected number of cards that have stayed in the same position? Equivalently, given an arbitrary permutation on $n$ objects, how many fixed points does it have on average.</p>
</blockquote>
<p>We have no interest in examining all $n!$ possible outcomes, and summing over the number of fixed points in each. That would be terrible. Instead, we’re going to split our desired variable into several indicator variables, each of which is easier to analyze.</p>
<p>Let $X_k$ be $1$ if the $k$th card is in the $k$th position, and $0$ otherwise. Then the number of fixed points is $\sum_k X_k$.</p>
<p>After shuffling, the $k$th card is equally likely to be in any position in the deck. So the chance of ending up in the same place is $1/n$, which makes $E[X_k] = 1/n$. So by linearity of expectation, $E[X_1 + \cdots + X_n] = n \cdot \frac{1}{n} = 1$. So on average, one card will stay in the same place.</p>
<h1 id="number-of-cycles">Number of Cycles</h1>
<p>We don’t have to limit ourselves to indicator variables: sometimes we can use a constant factor to help us avoid overcounting.</p>
<blockquote>
<p>Given a random permutation on $n$ objects, how many cycles does it have?</p>
</blockquote>
<p>As a reminder, the cycles of a permutation are the “connected components”. For example, if $\sigma$ sends $1 \to 2$, $2 \to 4$, $3 \to 6$, $4 \to 1$, $5 \to 5$, and $6 \to 3$, then the cycles of $\sigma$ are $(1, 2, 4)$, $(3, 6)$, and $(5)$.</p>
<p>For each $k$, let $X_k = \frac{1}{L}$, where $L$ is the length of the cycle of $\sigma$ containing the number $k$. So for the permutation we described, $X_1 = X_2 = X_4 = 1/3$, $X_3 = X_6 = 1/2$, and $X_5 = 1$. Then the number of cycles is $X_1 + \cdots + X_n$, since each cycle contributes $L$ copies of $1/L$. As usual, these variables are highly dependent (if $X_i = 1/5$, there’d better be four other $X_j$ that equal $1/5$ as well), but we can still apply linearity of expectation.</p>
<p>The probability that $k$ is in a cycle of length $1$ is $1/n$, since $\sigma$ would have to send $k$ to itself.</p>
<p>The probability it is in a cycle of length $2$ is the probability $k$ is sent to some other number, times the probability that the other number is sent back to $k$, i.e. $\frac{n-1}{n} \cdot \frac{1}{n - 1}$, which is $\frac{1}{n}$.</p>
<p>In general, the probability of being in a cycle of length $L$ is $\frac{n-1}{n} \frac{n-2}{n-1} \cdots \frac{n-(L-1)}{n-(L-2)} \cdot \frac{1}{n-(L-1)} = \frac{1}{n}$. Curiously, this is independent of $L$.</p>
<p>So the expected value of $X_k$ is $\frac{1}{n} \sum_{L=1}^n \frac{1}{L} = \frac{H_n}{n}$, where $H_n$ is the $n$th <a href="https://en.wikipedia.org/wiki/Harmonic_number">harmonic number</a>. Then the expected number of cycles is $E[X_1] + \cdots + E[X_n] = H_n$.</p>
<h1 id="buffons-needle">Buffon’s Needle</h1>
<p>We’ll finish up with a rather surprising application to the Buffon’s needle problem:</p>
<blockquote>
<p>Consider a gigantic piece of lined paper, with the lines spaced one unit apart. If we throw a needle of length $1$ onto the paper, what is the probability it crosses a line?</p>
</blockquote>
<p>Technically, we’re only interested in the probability that the needle crosses the line. But because it can cross at most once, this is equal to the expected number of crossings. So if we let $X_a$ be the expected number of crossings for a needle of length $a$, we’re interested in $E[X_1]$.</p>
<p>Take a needle of length $a + b$, and paint it, covering the first $a$ units of it red, and the other $b$ units blue. Then throw it on the paper. The expected number of crossings is the expected number of red crossings, plus the expected number of blue crossings. But each segment of the needle is just a smaller needle, so the expected number of red crossings is $E[X_a]$, and the expected number of blue crossings is $E[X_b]$. This lets us conclude, unsurprisingly, that $E[X_{a+b}] = E[X_a] + E[X_b]$. This tells us that $E[X_a]$ is linear in $a$, and so $E[X_a] = Ca$ for some unknown constant $C$. (Well, we’ve gotta assume $X_a$ is continuous in $a$, which it is, but shh…)</p>
<p>Furthermore, put a sharp bend in the needle right at the color boundary. Each segment is still a linear needle, so the number of red crossings is still $E[X_a]$, and likewise with blue crossings. So the expected number of crossings for this bent needle is <em>still</em> $E[X_{a+b}]$, despite the kink!</p>
<p>By induction, if you put a finite number of sharp bends in a needle, it doesn’t change the expected number of crossings. All that matters is the total length. And by <s>handwaving</s> a continuity argument, this is true for continuous bends as well. So $X_a$ doesn’t just measure the expected number of crossings for a needle of length $a$, but any reasonable curve of length $a$. (Much to my delight, this phenomenon is called “Buffon’s noodle”.) This means that if we throw a rigid noodle of length $a$ on the paper, the expected number of crossings is $E[X_a] = Ca$.</p>
<p>So let’s consider a particular kind of noodle: a circle with diameter $1$. No matter how it’s thrown onto the paper, it will cross the lines exactly twice. It has circumference $\pi$, and so we can determine that $C = \frac{2}{\pi}$. Thus, for the original needle problem, $p = X_1 = \frac{2}{\pi}$.</p>To introduce this topic, let’s start with an innocuous problem: You have $10$ six-sided dice. If you roll all of them, what is the expected sum of the faces? Your intuition should tell you that it’s $35$. But what’s really going on here is an example of a slick principle called linearity of expectation.Expected Density of Pigeons2018-10-08T00:00:00+00:002018-10-08T00:00:00+00:00http://mathmondays.com/pigeons<div class="mathdefs">
$
\DeclareMathOperator{\res}{Res}
$
</div>
<p>This one’s another puzzle from work:</p>
<blockquote>
<p>Consider a pigeon coop with $n$ pigeonholes, arranged in a straight line. When a pigeon arrives at the coop, it will roost in a pigeonhole only if it is empty, and both neighboring pigeonholes are also empty. It selects such a pigeonhole uniformly at random, enters the pigeonhole, and does not leave. At some point, the coop will fill up, but not every pigeonhole will be occupied. What is the expected density of pigeons in the coop, as $n$ grows large?</p>
</blockquote>
<p>If you run a few simulations, you get that it’s about $0.432332\ldots$. But this isn’t any easily recognizable number. What is it in closed form?</p>
<!--more-->
<hr />
<p>This problem illustrates one of the things I find really cool about math: the boundaries between different disciplines are essentially fictitious. This is a combinatorics problem, and so we might expect to be using arguments involving counting, bijections, and other finite tools. But instead we’ll sprint as fast as we can into the realm of analysis and solve the problem there.</p>
<p>Let $a_n$ be the expected number of pigeons for a coop with $n$ holes. Then we can come up with a recurrence relation for $a_n$.</p>
<p>Consider what happens when the first pigeon arrives in an unoccupied coop. If it arrives in the first hole, then we can imagine deleting the first hole and its neighbor from the coop, leaving us with an unoccupied coop of size $n - 2$. If it lands in the last hole, we have the same situation. Otherwise, it lands somewhere in the middle; when a pigeon comes to rest in the $k$th hole (I’m going to $1$-index, by the way), it splits the coop into two smaller coops, one with $k - 2$ holes, and the other with $n - k - 1$ holes. Since each hole is equally likely, we can average over all values of $k$ to get a first draft of our recurrence relation:
\[ a_n = 1 + \frac{1}{n} \left( a_{n-2} + a_{n-2} + \sum_{k=2}^{n-1} (a_{k-2} + a_{n-k-1}) \right) \]</p>
<p>This can be prettied up with some mild re-indexing:
\[ a_n = 1 + \frac{2}{n} \sum_{k=0}^{n-2} a_k \]</p>
<p>We can do even better though! If we consider $n a_n - (n-1) a_{n-1}$, we can collapse most of our terms:
\begin{align*}
n a_n - (n-1) a_{n-1} &= \left( n + 2 \sum_{k=0}^{n-2} a_k \right) - \left( n-1 + 2 \sum_{k=0}^{n-1} a_k \right) \\\<br />
n a_n - (n-1) a_{n-1} &= 1 + 2 a_{n-2} \\\<br />
a_n &= \frac{1}{n} ( 1 + (n-1) a_{n-1} + 2 a_{n-2} )
\end{align*}</p>
<hr />
<p>This isn’t a linear recurrence relation, so we can’t apply linear algebra tricks to it. So we fall back on the Swiss Army knife of recurrence relations: the generating function.</p>
<p>Let $G(z) = a_0 + a_1 z + a_2 z^2 + a_3 z^3 + \cdots$. We don’t know what this function is yet, but we can use the recurrence relation to pin down what it is.
\begin{align*}
G(z) &= \sum_{n=0}^\infty a_n z^n \\\<br />
G’(z) &= \sum_{n=1}^\infty n a_n z^{n-1} \\\<br />
&= a_1 + \sum_{n=2}^\infty n a_n z^{n-1} \\\<br />
&= a_1 + \sum_{n=2}^\infty \left( 1 + (n-1) a_{n-1} + 2 a_{n-2} \right) z^{n-1}
\end{align*}</p>
<p>Dealing with the three pieces separately makes this much easier to read (and also to write *wink*):
\[ \sum_{n=2}^\infty z^{n-1} = \frac{z}{1 - z} \]
\[ \sum_{n=2}^\infty (n-1) a_{n-1} z^{n-1} = \sum_{n=1}^\infty n a_n z^n = z G’(z) \]
\[ \sum_{n=2}^\infty 2 a_{n-2} z^{n-1} = 2 \sum_{n=0}^\infty a_n z^{n+1} = 2z G(z) \]</p>
<p>Putting it all together, we get a differential equation for $G(z)$:
\[ G’(z) = 1 + \frac{z}{1 - z} + z G’(z) + 2z G(z) \]</p>
<p>Cleaning it up a little, we see that it’s first order and linear, so we can put those diff eq skills to use:
\[ G’(z) = \frac{2z}{1 - z} G(z) + \frac{1}{(1 - z)^2} \]</p>
<p>The details aren’t super important, but basically you use an <a href="https://en.wikipedia.org/wiki/Integrating_factor">integrating factor</a> and get:
\[ G(z) = \frac{1 + C e^{-2z}}{2(z-1)^2} \]</p>
<p>What should $C$ be? We’ll have to use our initial conditions, and one of them is particularly straightforward: $G(0) = a_0$, which we know is $0$, and so $C = -1$.</p>
<hr />
<p>At this point, let’s stop and recollect our thoughts. We’ve defined a function $G(z)$ whose power series coefficients are $a_n$, the average number of pigeons in a coop of size $n$. Our solution is now encoded in quite a peculiar way: how fast do the coefficients of $G(z)$ grow?</p>
<p>To figure this out, let’s put the “analytic” in “analytic combinatorics”, and consider some contour integrals. Fix some $R > 1$, and define $I_n$ to be the integral of $G(z)/z^{n+1}$ around the circle of radius $R$ at the origin (taken counter-clockwise).</p>
<p>What is $I_n$? We can evaluate it using the <a href="/residues">residue theorem</a>. There are two poles, one at $z = 0$, and the other at $z = 1$. The former is easy to compute; the residue is the coefficient on the $z^{-1}$ term, which is exactly $a_n$. The second does not admit such a nice description, and so we compute it the usual way:
\begin{align*}
\res\left( \frac{G(z)}{z^{n+1}}, 1\right) &= \lim_{z \to 1} \frac{d}{dz} (z-1)^2 \frac{G(z)}{z^{n+1}} \\\<br />
&= \lim_{z \to 1} \frac{d}{dz} \frac{1 - e^{-2z}}{2 z^{n+1}} \\\<br />
&= \lim_{z \to 1} \frac{2 z e^{-2z} - (n+1)(1 - e^{-2z})}{2 z^{n+2}} \\\<br />
&= \frac{(n+3)e^{-2} - (n+1)}{2}
\end{align*}</p>
<p>So $\frac{1}{2 \pi i} I_n = a_n + \frac{(n+3)e^{-2} - (n+1)}{2}$. What good does this do us?</p>
<p>If you’ve seen this trick before, you know that $I_n$ drops exponentially to $0$ as $n$ increases, but if not, here’s the justification. Let $M$ be the largest value (in terms of absolute value) that $G$ attains on the circle $|z| = R$. Then the triangle inequality tells us:
\[ | I_n | = \left| \int_{C_R} \frac{G(z)}{z^{n+1}}~dz \right| \le \int_{C_R} \left| \frac{G(z)}{z^{n+1}} \right|~dz \le \int_{C_R} \frac{M}{R^{n+1}}~dz = \frac{2 \pi M}{R^n} \]</p>
<p>So as $n \to \infty$, $I_n$ drops to $0$, and so $a_n$ approaches $\frac{(n+1)-(n+3)e^{-2}}{2}$. Therefore, the expected density of pigeons, $a_n/n$, approaches $(1 - e^{-2})/2$, or about $0.432332$.</p>
<hr />
<p>There were other solutions that people came up with for this problem, but what I really like about this one is that it demonstrates a way to approach these problems in general, and (at least IMO) it’s a pretty unexpected one. If someone asked me to figure out how fast the coefficients of a power series grow, the residue theorem would not be the first thing on my mind. And yet, not only does it get the job done, it works for many other similar problems, in essentially the same way. I’m not much of an analysis person, but my understanding is that this kind of trick is common in analytic combinatorics, and I think that’s pretty cool!</p>$ \DeclareMathOperator{\res}{Res} $ This one’s another puzzle from work: Consider a pigeon coop with $n$ pigeonholes, arranged in a straight line. When a pigeon arrives at the coop, it will roost in a pigeonhole only if it is empty, and both neighboring pigeonholes are also empty. It selects such a pigeonhole uniformly at random, enters the pigeonhole, and does not leave. At some point, the coop will fill up, but not every pigeonhole will be occupied. What is the expected density of pigeons in the coop, as $n$ grows large? If you run a few simulations, you get that it’s about $0.432332\ldots$. But this isn’t any easily recognizable number. What is it in closed form?Cauchy Residue Theorem2018-10-01T00:00:00+00:002018-10-01T00:00:00+00:00http://mathmondays.com/residues<div class="mathdefs">
$
\DeclareMathOperator{\res}{Res}
$
</div>
<p>The Cauchy Residue Theorem is a remarkable tool for evaluating contour integrals. Essentially, it says that, instead of computing an integral along a curve $\gamma$, you can replace it with a sum of “residues” at some special points $a_k$:
\[ \oint_\gamma f(z)~dz = 2 \pi i \sum_k \res(f, a_k) \]</p>
<p>But what is a residue? What are the $a_k$? What’s really going on here?</p>
<!--more-->
<h1 id="residues">Residues</h1>
<p>Since this isn’t a rigorous complex analysis text, it’s a post on some blog, we’ll gloss over some of the technicalities, such as verifying convergence, or checking that holomorphic functions are analytic. All we need is some imagination, and the following fact:</p>
<div class="theorem-box">
<div class="theorem-title">Path Independence</div>
Let $D$ be a region of the complex plane and $f$ be a function holomorphic (complex-differentiable) on $D$. If you take a curve $\gamma$, and continuously deform it into a curve $\gamma'$, staying inside $D$, then
\[ \int_\gamma f(z)~dz = \int_{\gamma'} f(z)~dz \]
Also, we say two such curves are "homotopic".
</div>
<p>For example, if the blue dashed area is $D$, the curves in the first picture are homotopic, but not the curves in the second picture. There is no way to deform one of the curves into the other, without leaving the domain.</p>
<div class="image-container">
<p><img src="/assets/residues-contours-1.svg" alt="Homotopic curves" height="250px" /></p>
<p><img src="/assets/residues-contours-2.svg" alt="Non-homotopic curves" height="250px" /></p>
</div>
<p>If you’re comfortable with multivariable calculus, compare this to the Fundamental Theorem of Calculus for line integrals. How does complex-differentiability encode the “curl-free” condition?</p>
<p>This means that if $\gamma$ is a closed loop and $f$ is holomorphic on the region enclosed by $\gamma$, then $\gamma$ is homotopic to a point, which tells us that $\int_\gamma f~dz$ must be zero. Where things get interesting is when there are points in $D$ at which $f$ is not holomorphic.</p>
<hr />
<p>So let’s approach the theorem.</p>
<p>Let $f$ be a function holomorphic on $D$, except at a set of points $a_k$, and $\gamma$ a closed curve in $D$, avoiding the points $a_k$. Without loss of generality, we can assume all of the $a_k$ lie within the region enclosed by $\gamma$ (if not, we just make $D$ smaller). We can use the path-independence of contour integrals to deform $\gamma$, without changing the value of the integral:</p>
<div class="image-container">
<p><img src="/assets/residues-deform-1.svg" alt="A contour around several a_k" height="250px" /></p>
<p><img src="/assets/residues-deform-2.svg" alt="Deformed into several circles with sections between them" height="250px" /></p>
</div>
<p>These corridors between the circles can be moved so they lie on top of each other, and cancel out. This leaves us with circles $C_k$, one for each point $a_k$.
\[ \oint_\gamma f(z)~dz = \sum_k \oint_{C_k} f(z)~dz \]</p>
<div class="image-container">
<p><img src="/assets/residues-deform-3.svg" alt="A few circular contours" height="250px" /></p>
</div>
<p>So all we need to do to now is determine what the integral of $f$ on each circle is.</p>
<div class="theorem-box">
<div class="theorem-title">Residue Definition #1</div>
The residue of $f$ at $a$ is $\displaystyle \frac{1}{2 \pi i} \oint_{C} f(z)~dz$, where $C$ is a small circle around $a$.
<br /><br />
From path-independence, we know we can shrink the circles as much as we like without changing the value of the integral, which tells us this definition is well-defined (just make sure $f$ is holomorphic everywhere else in your circle!).
</div>
<p>“But wait,” you complain, “This definition is ridiculous; you set it up in such a way that the residue theorem is trivial! What gives?”</p>
<p>Well, there are other, equivalent definitions of residue that are much easier to compute, and those are what give the residue theorem its power. Sometimes people will use these computational definitions of residue as the primary definition, but this obscures what’s going on. When you think of what the residue <em>means</em>, in a spiritual sense, you should think of it as “the integral of a small loop around a point”.</p>
<hr />
<p>A point at which $f$ is not holomorphic is called a “singularity”, and there are a few types. The most manageable of these is the pole, where $f(z)$ “behaves like” $\frac{1}{(z-a)^n}$. To be more concrete, $f$ has a pole (of order $n$) at $a$ if $(z - a)^n f(z)$ is holomorphic and non-zero at $a$. In other words, a zero of order $n$ cancels out a pole of order $n$.</p>
<p>For example, $\frac{1}{\sin z}$ has a pole of order $1$ at $z = 0$, as evidenced by the fact that $\frac{z}{\sin z}$ approaches $1$ as $z \to 0$. The rational function $\frac{x-2}{x^2 + 1}$ has poles at $\pm i$, also of order $1$. And the function $\frac{1}{\cos z - 1}$ has a pole of order $2$ at zero.</p>
<p>There are other kinds of singularities, but nothing good comes from them, so we will henceforth only consider singularities that are poles.</p>
<p>If $f$ has a pole of order $n$ at $a$, then $(z-a)^n f(z)$ has a Taylor series centered at $z = a$, with non-zero constant term:
\[ (z-a)^n f(z) = b_0 + b_1 (z - a) + b_2 (z - a)^2 + b_3 (z - a)^3 + \cdots \]</p>
<p>Letting $c_k = b_{k+n}$, we can define a series for $f(z)$ itself, called the <strong>Laurent series</strong>:
\[ f(z) = \frac{c_{-n}}{(z-a)^n} + \frac{c_{-n+1}}{(z - a)^{n-1}} + \cdots + \frac{c_{-1}}{z - a} + c_0 + c_1 (z - a) + \cdots \]</p>
<p>It’s almost a Taylor series, but we allow (finitely many) negative terms as well. This expansion will allow us to compute the residue at $a$.</p>
<p>Let’s just take a single term, $(z - a)^n$, and we’ll recombine our results at the end, because integrals are linear. What happens when we integrate around a circle centered at $a$ with radius $R$? Subsitute $z = a + R e^{it}$ for the contour:
\[ \oint (z - a)^n~dz = \int_0^{2\pi} (R e^{it})^n~d(R e^{it}) = i R^{n+1} \int_0^{2\pi} e^{(n+1) it}~dt = i R^{n+1} \left[ \frac{e^{(n+1)it}}{(n+1)i} \right]^{2\pi}_0 \]</p>
<p>Since $n$ is an integer, $e^{(n+1)2 \pi i} = 1$, and $e^{0} = 1$, so this integral should be zero. But that doesn’t make any sense; that would suggest that the integral of <em>any</em> function around a circle is zero. But that’s not true.</p>
<p>We actually made a mistake in the last step; the antiderivative of $e^{kt}$ is $e^{kt} / k$ <em>unless</em> $k = 0$. For that to happen, we need $n = -1$, and in that case:
\[ \oint \frac{1}{z - a}~dz = \int_0^{2\pi} \frac{d(R e^{it})}{R e^{it}} = \int_0^{2\pi} i~dt = 2 \pi i \]</p>
<p>Therefore, when we integrate $f(z) = \sum_{k = -n}^\infty c_k (z - a)^k$, all the terms vanish, except for the $k = -1$ term, which pops out a $2 \pi i \cdot c_{-1}$. This gives us another definition for the residue!</p>
<div class="theorem-box">
<div class="theorem-title">Residue Definition #2</div>
If $f$ has a pole at $a$, and a Laurent series $f(z) = \sum c_k (z - a)^k$, then the residue of $f$ at $a$ is $c_{-1}$.
</div>
<hr />
<p>If this were all we knew, it would still be a pretty good theorem. Finding power series instead of taking integrals? Not too shabby. But we can take it one step more.</p>
<p>Finding power series can be frustrating; how many people know the power series for $\tan z$ off the top of their head? Besides, we don’t need the whole thing, just a specific coefficient.</p>
<p>Instead, we’ll assume the existence of a power series, and use some tricks to extract $c_{-1}$.</p>
<p>Say we’ve got a simple pole (a pole of order $1$). By multiplying by $(z - a)$, we can get a Taylor series:
\[ (z - a) f(z) = c_{-1} + c_0 (z - a) + c_1 (z - a)^2 + \cdots \]</p>
<p>If we plug in $z = a$, then we’ll get $c_{-1}$. Well, technically, we can’t plug in $z = a$ directly, because $f(z)$ isn’t defined at $a$. But if we take a limit, that’s okay.</p>
<p>How about a pole of order $2$? Our trick won’t work the same way; if we apply it naively, we’ll just get $c_{-2}$, which we don’t care about at all.
\[ (z - a)^2 f(z) = c_{-2} + c_{-1} (z - a) + c_0 (z - a)^2 + c_1 (z - a)^3 \cdots \]</p>
<p>But if we take the derivative, we can knock off a term from the end, and <em>then</em> we can take the limit as $z \to a$.
\[ \frac{d}{dz} (z - a)^2 f(z) = c_{-1} + 2 c_0 (z - a) + 3 c_1 (z - a)^2 \cdots \]</p>
<p>For $n = 3$, there’s a slight wrinkle; we end up with an extra factor of $2$ that we have to divide out:
\[ \frac{d^2}{dz^2} (z - a)^3 f(z) = 2 c_{-1} + 6 c_0 (z - a) + 12 c_1 (z - a)^2 \cdots \]</p>
<p>The pattern for higher-order poles is similar:</p>
<ul>
<li>multiply by $(z - a)^n$; this changes our term of interest to $c_{-1} (z - a)^{n-1}$</li>
<li>take $n-1$ derivatives; the important term is now $(n-1)! c_{-1}$</li>
<li>divide by $(n-1)!$; the important term is now $c_{-1}$</li>
<li>take the limit as $z \to a$; all higher order terms vanish, and we are left with $c_{-1}$</li>
</ul>
<p>We now have our last, and most computationally accessible, definition of residue:</p>
<div class="theorem-box">
<div class="theorem-title">Residue Definition #3</div>
If $f$ has a pole at $a$ of order $n$, then the residue of $f$ at $a$ is:
\[ \res(f, a) = \lim_{z \to a} \frac{1}{(n-1)!} \frac{d^{n-1}}{dz^{n-1}} (z - a)^n f(z) \]
</div>
<p>This is the definition often presented as “the” definition of residue, but this hides where the residue theorem comes from, and why residues are defined the way they are.</p>
<h1 id="winding-number">Winding Number</h1>
<p>As a final note, we can add a tiny bit more generality to the theorem.</p>
<p>Technically, we’ve been a little sloppy with our curve $\gamma$. What if it goes the other way? Or loops around some points multiple times?</p>
<p>To fix this, we introduce $W(\gamma, a)$, the <strong>winding number</strong> of $\gamma$ around $a$. It means exactly what the name suggests, it indicates how many times (and in what direction) $\gamma$ loops around $a$. Counter-clockwise is positive, and clockwise is negative. Two examples are pictured below:</p>
<div class="image-container">
<p><img src="/assets/residues-winding-1.svg" alt="A limacon" height="250px" /></p>
<p><img src="/assets/residues-winding-3.svg" alt="A lemniscate" height="250px" /></p>
</div>
<p>In the first picture, the specified points have winding number +1 and +2, and in the second, they have -1 and +1. The only thing this changes about our proof is that when we deform our $\gamma$ into circles, we may get multiple loops around the same point:</p>
<div class="image-container">
<p><img src="/assets/residues-winding-2.svg" alt="A limacon" height="250px" /></p>
<p><img src="/assets/residues-winding-4.svg" alt="A lemniscate" height="250px" /></p>
</div>
<p>But by definition, the number of loops is exactly the winding number, and if the loop runs clockwise, we pick up a negative sign. So after accounting for multiplicity and direction, we get:
\[ \oint_\gamma f(z)~dz = \sum_k W(\gamma, a_k) \res(f, a_k) \]</p>$ \DeclareMathOperator{\res}{Res} $ The Cauchy Residue Theorem is a remarkable tool for evaluating contour integrals. Essentially, it says that, instead of computing an integral along a curve $\gamma$, you can replace it with a sum of “residues” at some special points $a_k$: \[ \oint_\gamma f(z)~dz = 2 \pi i \sum_k \res(f, a_k) \] But what is a residue? What are the $a_k$? What’s really going on here?