Math Mondayshttps://mathmondays.com/2021-02-15T00:00:00-08:00A Cooperative Hat Game2021-02-15T00:00:00-08:002021-02-15T00:00:00-08:00Henry Swansontag:mathmondays.com,2021-02-15:/hat-game<p><span class="mathdefs">
<span class="math">\(\newcommand{W}{\square}
\newcommand{B}{\blacksquare}\)</span>
</span></p>
<p>Hat puzzles are super popular among mathematicians. Most of them have cute and clever solutions. Here’s one that, at the time of writing, is still an open problem.
<!-- TODO consider 0-indexing the darn thing --></p>
<p>Alice and Bob sit facing each other, each with an infinite tower of hats on their heads. Each hat is either black or white, with equal probability. Alice can see all of Bob’s hats, but not her own, and vice versa. On the count of three, both players must name a natural number, which is used to index into their own hat tower. If the two hats match, then the players win, otherwise they lose. (Also, they’re not allowed to talk, cough, wink, or otherwise communicate.)</p>
<p>As an example, say Alice’s hats are <span class="math">\(\W\W\B\W\W\B\cdots\)</span> and Bob’s hats are <span class="math">\(\B\W\B\W\W\B\cdots\)</span>. If Alice says 3 and Bob says 1, then since Alice’s third hat and Bob’s first hat are both black, then they win. If they both say 1, their first hats do not match, so they lose.
<!-- TODO diagram! --></p>
<p>What’s the best possible strategy, and how often does it win? No one knows! I have some conjectures here, and some (probably unoriginal) strategies that do pretty well.
<p><span class="mathdefs">
<span class="math">\(\newcommand{W}{\square}
\newcommand{B}{\blacksquare}\)</span>
</span></p>
<p>Hat puzzles are super popular among mathematicians. Most of them have cute and clever solutions. Here’s one that, at the time of writing, is still an open problem.
<!-- TODO consider 0-indexing the darn thing --></p>
<p>Alice and Bob sit facing each other, each with an infinite tower of hats on their heads. Each hat is either black or white, with equal probability. Alice can see all of Bob’s hats, but not her own, and vice versa. On the count of three, both players must name a natural number, which is used to index into their own hat tower. If the two hats match, then the players win, otherwise they lose. (Also, they’re not allowed to talk, cough, wink, or otherwise communicate.)</p>
<p>As an example, say Alice’s hats are <span class="math">\(\W\W\B\W\W\B\cdots\)</span> and Bob’s hats are <span class="math">\(\B\W\B\W\W\B\cdots\)</span>. If Alice says 3 and Bob says 1, then since Alice’s third hat and Bob’s first hat are both black, then they win. If they both say 1, their first hats do not match, so they lose.
<!-- TODO diagram! --></p>
<p>What’s the best possible strategy, and how often does it win? No one knows! I have some conjectures here, and some (probably unoriginal) strategies that do pretty well.
<!-- more --></p>
<h1>Simplest Strategy</h1>
<p>The simplest strategy is for both players to ignore any information they have and just pick the first hat. Unsurprisingly, this doesn’t go very well. The outcomes <span class="math">\(\W/\W\)</span>, <span class="math">\(\W/\B\)</span>, <span class="math">\(\B/\W\)</span>, and <span class="math">\(\B/\B\)</span> are all equally likely, so the chance of winning is <span class="math">\(1/2\)</span>.</p>
<p>It’s not at all obvious that you can do any better than this. Since there’s no communication, neither player can learn anything about their own hats, and so both players are equally likely to pick a white hat or a black hat. How can you squeeze out any additional advantage?</p>
<h1>First-White Strategy</h1>
<p>Here’s a strategy that does better. Both players look for the first white hat on their partner’s head, and guess the corresponding number. For example, if Bob is wearing <span class="math">\(\B\B\W\B\W\W\cdots\)</span>, Alice would say “3”. If he’s wearing <span class="math">\(\W\W\W\B\W\B\cdots\)</span>, Alice would say “1”. Call Alice’s guess <span class="math">\(a\)</span> and Bob’s guess <span class="math">\(b\)</span>. What’s the probability of success?</p>
<ul>
<li>Case <span class="math">\(a = b\)</span>: they’re both pointing at white hats, so they win.</li>
<li>Case <span class="math">\(a < b\)</span>: Bob’s guess means that every one of Alice’s hats before <span class="math">\(b\)</span> was black, including the one at <span class="math">\(a\)</span>. Alice stopped looking at Bob’s hats at <span class="math">\(a\)</span>, so Bob’s <span class="math">\(b\)</span>th hat could be either color. They win with probability <span class="math">\(1/2\)</span>.</li>
<li>Case <span class="math">\(a > b\)</span>: Symmetric to the previous case.</li>
</ul>
<p>So if <span class="math">\(p\)</span> is the probability that <span class="math">\(a = b\)</span>, then the chance of success is <span class="math">\(p + (1 - p) / 2 = 1/2 + p/2\)</span>. Even before we know <span class="math">\(p\)</span>, we can already tell that we’re going to do better than <span class="math">\(1/2\)</span>!</p>
<p>To find <span class="math">\(p\)</span>, we sum up the probabilities both players say “1”, that they both say “2”, that they both say “3”, etc. Note that the chance that Alice says “<span class="math">\(k\)</span>” is the chance that Bob’s <span class="math">\(k\)</span>th hat is white, <em>and</em> that none of the previous ones were. Likewise for Bob. Summing up the resulting geometric series, we get</p>
<div class="math">$$
p = \sum_{k = 1}^\infty \left[ \left(\frac{1}{2} \right) \left(\frac{1}{2} \right)^{k-1} \right]^2 = \sum_{k = 1}^\infty \frac{1}{4^k} = \frac{1/4}{1 - 1/4} = \frac{1}{3}
$$</div>
<p>So by following this strategy, Alice and Bob can win with probability <span class="math">\(2/3\)</span>. Much better!</p>
<h1>Finite Strategies</h1>
<p>Here’s another approach: what if we focus only on the first <span class="math">\(N\)</span> hats, reducing it to a finite problem?</p>
<p>If <span class="math">\(N = 1\)</span> obviously there’s nothing interesting we can do, so let’s look at <span class="math">\(N = 2\)</span>. If Alice sees only black hats on Bob’s head, then she knows that strategizing is hopeless – Bob will pick a black hat for sure, and she’ll pick a black hat with probability only 50%. Same thing goes if she sees only white, and same thing from Bob’s point of view. So the only interesting cases are when both players have non-monochromatic hat stacks.</p>
<p>There’s four possible situations: <span class="math">\(\W\B / \W\B\)</span>, <span class="math">\(\W\B / \B\W\)</span>, <span class="math">\(\B\W / \W\B\)</span>, and <span class="math">\(\B\W / \B\W\)</span>. We could brute-force all possible strategies (there’s only four possible for each player, and half of those are constant strategies). But let’s think this one through. Let’s say, arbitrarily, that Alice guesses “1” if she sees <span class="math">\(\W\B\)</span>, and “2” if she sees <span class="math">\(\B\W\)</span>. If Bob sees <span class="math">\(\W\B\)</span> on Alice’s head, what should he do?</p>
<ul>
<li>If he has <span class="math">\(\W\B\)</span>, then Alice will pick “1”, selecting her white hat. Bob should select his white hat by saying “1”.</li>
<li>If he has <span class="math">\(\B\W\)</span>, then Alice will pick “2”, selecting her black hat. Bob should select his black hat by saying “1”.</li>
</ul>
<p>In both situations, saying “1” guarantees a win. Similarly, if he sees <span class="math">\(\B\W\)</span> on Alice’s head, he wins by saying “2”. So in the “neither player is monochrome” situation, they can win 100% of the time! For the monochrome cases, no strategy is possible, and so that’s just 50%. There’s 4 non-monochrome cases, and 12 monochrome ones, so that gives a win rate of <span class="math">\(10/16 = 62.5\%\)</span>.</p>
<!-- TODO diagram -->
<p>How about <span class="math">\(N = 3\)</span>? We could just ignore the third hat, giving us a win rate of at least <span class="math">\(10/16\)</span>, but we can do better. Consider the following (asymmetric) strategy:</p>
<ul>
<li>If a player sees a monochromatic stack, they pick an arbitrary hat. Doesn’t matter.</li>
<li>If a player sees only one white hat, they pick the index corresponding to that hat.</li>
<li>If Alice sees one black hat, she picks the hat <em>after</em> that one (with wraparound, so <span class="math">\(\B\W\W \to 2\)</span>, <span class="math">\(\W\B\W \to 3\)</span>, <span class="math">\(\W\W\B \to 1\)</span>).</li>
<li>If Bob sees one black hat, he picks the hat <em>before</em> that one (again, with wraparound).</li>
</ul>
<p>How does this strategy do? Note that the strategy is unchanged by cyclic shifting of the hats, which reduces the amount of casework we have to do.</p>
<p>If either player has a monochromatic stack, then they win only 50% of the time, as usual.</p>
<p>If they both have a one-white-hat stack, then they have a guaranteed win.
</p>
<div class="math">$$
\begin{matrix}
\\
\textrm{Alice's hats} \\
\\
\textrm{Bob's hats}
\end{matrix}
\qquad
\begin{matrix}
\downarrow\hphantom{\B\B} \\
\W\B\B \\
\downarrow\hphantom{\B\B} \\
\W\B\B
\end{matrix}
\qquad
\begin{matrix}
\downarrow \\
\W\B\B \\
\downarrow\hphantom{\B\B} \\
\B\W\B
\end{matrix}
\qquad
\begin{matrix}
\hphantom{\B\B}\downarrow \\
\W\B\B \\
\downarrow\hphantom{\B\B} \\
\B\B\W
\end{matrix}
$$</div>
<p>If they both have one-black-hat stacks, then they also have a guaranteed win, though it’s less obvious why.
</p>
<div class="math">$$
\begin{matrix}
\\
\textrm{Alice's hats} \\
\\
\textrm{Bob's hats}
\end{matrix}
\qquad
\begin{matrix}
\downarrow \\
\B\W\W \\
\hphantom{\B\B}\downarrow \\
\B\W\W
\end{matrix}
\qquad
\begin{matrix}
\hphantom{\B\B}\downarrow \\
\B\W\W \\
\hphantom{\B\B}\downarrow \\
\W\B\W
\end{matrix}
\qquad
\begin{matrix}
\downarrow\hphantom{\B\B} \\
\B\W\W \\
\hphantom{\B\B}\downarrow \\
\W\W\B
\end{matrix}
$$</div>
<p>The only remaining case is when one player has a one-white stack, and the other has a one-black stack. We can’t win every matchup here, but we can get a solid <span class="math">\(4/6\)</span>. (Note: what happens if you change the one-black strategy to “pick the black hat”?).
</p>
<div class="math">$$
\begin{matrix}
\\
\textrm{Alice's hats} \\
\\
\textrm{Bob's hats}
\end{matrix}
\qquad
\begin{matrix}
\downarrow \\
\W\B\B \\
\downarrow \hphantom{\B\B} \\
\B\W\W
\end{matrix}
\qquad
\begin{matrix}
\hphantom{\B\B} \downarrow \\
\W\B\B \\
\downarrow \hphantom{\B\B} \\
\W\B\W
\end{matrix}
\qquad
\begin{matrix}
\downarrow \hphantom{\B\B} \\
\W\B\B \\
\downarrow \hphantom{\B\B} \\
\W\W\B
\end{matrix}
$$</div>
<div class="math">$$
\begin{matrix}
\\
\textrm{Alice's hats} \\
\\
\textrm{Bob's hats}
\end{matrix}
\qquad
\begin{matrix}
\downarrow \hphantom{\B\B} \\
\B\W\W \\
\hphantom{\B\B} \downarrow \\
\W\B\B \\
\end{matrix}
\qquad
\begin{matrix}
\downarrow \\
\B\W\W \\
\hphantom{\B\B} \downarrow \\
\B\W\B \\
\end{matrix}
\qquad
\begin{matrix}
\hphantom{\B\B} \downarrow \\
\B\W\W \\
\hphantom{\B\B} \downarrow \\
\B\B\W \\
\end{matrix}
$$</div>
<p>This totals up to a winning probability of <span class="math">\(44/64 = 68.75\%\)</span>. Better than <span class="math">\(N = 2\)</span>, but also better than our “first-white” strategy!</p>
<p>The casework becomes worse and worse for <span class="math">\(N \ge 4\)</span>, so we’ll stop here for now.</p>
<h1>Stronger Together</h1>
<p>We’ve seen two kinds of strategies so far: first-white, and finite strategies. These can be combined, in a pretty simple way, into a strategy better than either of them alone!</p>
<p>With an <span class="math">\(N\)</span>-hat strategy, the augumented strategy goes as follows:</p>
<ul>
<li>Each player looks at the first <span class="math">\(N\)</span> hats on their partner’s head.</li>
<li>If they’re not monochromatic, then apply the finite strategy as usual.</li>
<li>Otherwise, skip those <span class="math">\(N\)</span> hats, and look at hats <span class="math">\(N+1\)</span> to <span class="math">\(2N\)</span>.</li>
<li>If those are non-monochromatic, apply the finite strategy, but increase all your answers by <span class="math">\(N\)</span>.</li>
<li>Otherwise, look at the next block of <span class="math">\(N\)</span> hats, and repeat.</li>
</ul>
<p>The finite strategies perform worst when facing a monochromatic block of hats. By using the “scan upwards and focus on the first non-monochromatic block” trick, we can sometimes salvage situations where the finite strategy would have to accept the 50-50 guess.</p>
<p>Say that the <span class="math">\(N\)</span>-hat strategy has win rate <span class="math">\(q\)</span>. We’d first like to find <span class="math">\(q^\ast\)</span>, the conditional win rate for scenarios where neither player has a monochromatic stack. Let <span class="math">\(W\)</span> be the event “we win”, and <span class="math">\(E\)</span> be the event “neither player has a monochromatic stack”. The number of situations where Alice has a non-monochromatic stack is <span class="math">\(2^N - 2\)</span>, and same for Bob. So the probability of <span class="math">\(E\)</span> is <span class="math">\((2^N - 2)^2/4^N\)</span>. Thus,</p>
<div class="math">$$
\begin{align*}
Pr(W) &= Pr(W | E) Pr(E) + Pr(W | \lnot E) Pr(\lnot E) \\
q &= q^\ast \frac{(2^N - 2)^2}{4^N} + \frac{1}{2} \left( 1 - \frac{(2^N - 2)^2}{4^N} \right) \\
q &= q^\ast \frac{(2^N - 2)^2}{4^N} + \frac{2^{N+1} - 2}{4^N} \\
q \frac{4^N}{(2^N - 2)^2} &= q^\ast + \frac{2^{N+1} - 2}{(2^N - 2)^2} \\
\frac{4^N q - 2^{N+1} + 2}{(2^N - 2)^2} &= q^\ast \\
\end{align*}
$$</div>
<p>Next, we want to find <span class="math">\(r\)</span>, the probability that both players will select the same block of <span class="math">\(N\)</span> hats. The chance an individual block is monochromatic is <span class="math">\(2/2^N\)</span>, and so the chance that Alice (or Bob) picks the <span class="math">\(k\)</span>th block is “probability the <span class="math">\(k\)</span>th block is non-monochromatic” times “probability the first <span class="math">\(k-1\)</span> were monochromatic”. This is quite similar to the setup we had for the original first-white strategy.</p>
<div class="math">$$
\begin{align*}
r &= \sum_{k=1}^\infty \left( \frac{2^N - 2}{2^N} \cdot \left( \frac{2}{2^N} \right)^{k-1} \right)^2 \\
&= \frac{(2^N - 2)^2}{4^N} \sum_{k=1}^\infty \left( \frac{4}{4^N} \right)^{k-1} \\
&= \frac{(2^N - 2)^2}{4^N} \frac{1}{1 - 4/4^N} \\
&= \frac{(2^N - 2)^2}{4^N - 4} \\
&= \frac{2^N - 2}{2^N + 2}
\end{align*}
$$</div>
<p>So now we can find <span class="math">\(q'\)</span>, the win rate of the augmented strategy. If they pick the same block, then they win with probability <span class="math">\(q^\ast\)</span> (remember that these blocks are necessarily non-monochromatic). If they don’t, then someone is picking into a monochromatic block, and so we’re fated to get only <span class="math">\(1/2\)</span> success.</p>
<div class="math">$$
\begin{align*}
q' &= r q^\ast + (1 - r) \frac{1}{2} \\
&= \frac{2^N - 2}{2^N + 2} \frac{4^N q - 2^{N+1} + 2}{(2^N - 2)^2} + \frac{4}{2^N + 2} \frac{1}{2} \\
&= \frac{4^N q - 2^{N+1} + 2}{4^N - 4} + \frac{2(2^N - 2)}{4^N - 4} \\
&= \frac{4^N q - 2}{4^N - 4}
\end{align*}
$$</div>
<p>Since <span class="math">\(q \ge 1/2\)</span>, we have <span class="math">\(q' \ge q\)</span>, and when the first inequality is strict, so is the second. So, perhaps unsurprisingly, augmenting a finite strategy makes it work better. How much better? Let’s take our <span class="math">\(N = 3\)</span> strategy:</p>
<div class="math">$$ \frac{4^3 (44/64) - 2}{4^3 - 4} = \frac{42}{60} = \frac{7}{10} $$</div>
<p>We’ve nudged our 68.75% chance of winning to a 70% chance. That’s small, but it’s not nothing. Unfortunately, it’s as far as we can go – this is conjectured to be an optimal strategy. No one’s found or ruled out anything better yet.</p>
<h1>Observations</h1>
<p>Now that we’ve seen some strategies, we can look for some patterns.</p>
<p>In the simplest strategy, we’re equally likely to get any pair of hats. With the “first-white” strategy, what are the odds of each outcome? The only way to get <span class="math">\(\W/\W\)</span> is for both players to guess the same index, which happens with probability <span class="math">\(1/3\)</span>. In the other <span class="math">\(2/3\)</span> of the time, half the time Alice guesses the higher number, and half the time it’s Bob. In the former case, Bob’s hat is guaranteed black, and Alice’s hat is random. In the latter case, it’s the other way around. So that adds up to <span class="math">\(\B/\B\)</span> with probability <span class="math">\(1/3\)</span>, <span class="math">\(\W/\B\)</span> with probablity <span class="math">\(1/6\)</span>, and <span class="math">\(\B/\W\)</span> with <span class="math">\(1/6\)</span>.</p>
<p>Similarly, if you work through the strategies given in the “Finite Strategies” section, the probability of <span class="math">\(\W/\W\)</span> and <span class="math">\(\B/\B\)</span> outcomes are equal, as are <span class="math">\(\B/\W\)</span> and <span class="math">\(\W/\B\)</span> outcomes. This is no coincidence.</p>
<p>Since Alice is equally likely to pick a white or black hat (remember, she never learns anything about her own hat stack), <span class="math">\(Pr(\W/\W) + Pr(\W/\B)\)</span> has to equal <span class="math">\(Pr(\B/\B) + Pr(\B/\W)\)</span>. Similarly, Bob has to be equally likely to pick white or black, meaning <span class="math">\(Pr(\W/\W) + Pr(\B/\W)\)</span> equals <span class="math">\(Pr(\W/\B) + Pr(\B/\B)\)</span>. Subtracting one equation from the other gives <span class="math">\(Pr(\B/\W) = Pr(\W/\B)\)</span>, and some quick algebra gives <span class="math">\(Pr(\W/\W) + Pr(\B/\B)\)</span> as well.</p>
<p>This tells us something interesting – changing the win condition to “both players pick white hats” doesn’t change the nature of the game at all. Maximizing the probability of matching pairs is the same as maximizing the number of white pairs (and in fact, this is how the problem is usually presented.)</p>
<hr>
<p>Another thing we can look at is the relationship between the finite and infinite game. Let <span class="math">\(p_\infty\)</span> denote the best possible winning probability for the infinite game, and <span class="math">\(p_N\)</span> for the game with just <span class="math">\(N\)</span> hats. How are these related to each other?</p>
<p>Since an <span class="math">\(N\)</span>-hat strategy works just as well for a <span class="math">\((N+1)\)</span>-hat game (by just ignoring the last hat), we know that <span class="math">\(p_{N+1}\)</span> is at least <span class="math">\(p_N\)</span>. Similarly, <span class="math">\(p_\infty \ge p_N\)</span> for all <span class="math">\(N\)</span>. This gives us a chain of inequalities:</p>
<div class="math">$$ p_1 \le p_2 \le p_3 \le \cdots \le p_\infty $$</div>
<p>Also, from augmenting a strategy, we know that <span class="math">\(p_\infty \ge \frac{4^N p_N - 2}{4^N - 4}\)</span> for all <span class="math">\(N\)</span>.</p>
<h1>Upper Bounds, Infinite</h1>
<p>Well, we know that <span class="math">\(p_\infty\)</span> is at least <span class="math">\(0.7\)</span>; can we put an upper bound on it too?</p>
<p>Let’s say Alice and Bob have already decided on a strategy, one that has win rate <span class="math">\(p\)</span>. Now, imagine that, right before the game starts, we split the game into two identical games: in one game, things proceed as normal, and in the other game, all of Alice’s hats are swapped with their opposites. Every black hat becomes a white hat, and vice versa. We’ll refer to these players as “Alice” and “nega-Alice”. Let <span class="math">\(X\)</span> be the random variable “how many games are won” (so it is either <span class="math">\(0\)</span>, <span class="math">\(1\)</span>, or <span class="math">\(2\)</span>).</p>
<p>Then clearly the expected value of <span class="math">\(X\)</span> is just <span class="math">\(p + p\)</span> – each game has probability <span class="math">\(p\)</span> of being won, and expected value is linear. But we can also bound it in an interesting way. Let <span class="math">\(S\)</span> be the event “Bob picks the same color hat in both games”. Then in such a situation, only one of the two games is winnable. Both Alices will see the same hats on Bob, and will say the same number. But this will always result in different hats between them, and so Bob will win in exactly one game. If we let <span class="math">\(q\)</span> denote the probability of <span class="math">\(S\)</span> under the chosen strategy:</p>
<div class="math">$$ E[X] = Pr(S) E[X|S] + Pr(\lnot S) E[X | \lnot S] \le q + (1 - q) 2 = 2 - q $$</div>
<p>Rearranging, we get that <span class="math">\(p \le 1 - q/2\)</span>. What do we know about <span class="math">\(q\)</span>? If Bob picks the same index in both games, then he’s guaranteed to pick the same color hat too, and if he doesn’t, then the hats are uncorrelated, and there’s a 50-50 chance he picks the same hat. So this means <span class="math">\(q \ge 1/2\)</span>, and thus <span class="math">\(p \le 3/4\)</span>.</p>
<p>So we know that the optimal <span class="math">\(p_\infty\)</span> is between <span class="math">\(0.7\)</span> and <span class="math">\(0.75\)</span>. This is the best I’ve been able to prove, but apparently, there is a proof that <span class="math">\(p_\infty < \frac{81}{112} \approx 0.723\)</span>, as mentioned in <a href="https://arxiv.org/abs/1407.4711">this paper</a>. Doesn’t seem to be published though, unfortunately.</p>
<h1>Upper Bounds, Finite</h1>
<p>Let’s, for the moment, assume that <span class="math">\(p_\infty\)</span> is indeed <span class="math">\(7/10\)</span>, and try to put some upper bounds on <span class="math">\(p_N\)</span>.</p>
<p>In this section, it’ll be easier to work with “number of winning outcomes” than “probability of winning”, so for a strategy on <span class="math">\(N\)</span> hats, we’ll call the number of winning outcomes the “score” of a strategy, which is equal to <span class="math">\(4^N\)</span> times the win rate. The optimal score for an <span class="math">\(N\)</span>-hat strategy we’ll denote <span class="math">\(s_N\)</span>, which is of course equal to <span class="math">\(4^N p_N\)</span>.</p>
<p>We’ll start with the inequality we learned about from augumenting finite strategies: <span class="math">\(p_\infty \ge \frac{4^N p_N - 2}{4^N - 4}\)</span>. Rearranging it, we get that <span class="math">\(s_N = 4^N p_N \le \frac{7}{10} (4^N - 4) + 2\)</span>. Let <span class="math">\(B_N\)</span> be the floor of the RHS, so that <span class="math">\(s_N \le B_N\)</span>. Later, we’ll show that these bounds are sharp, and so <span class="math">\(s_N\)</span> actually equals <span class="math">\(B_N\)</span>, but for now it’s easier to call them different names.</p>
<p>Computing some values of <span class="math">\(B_N\)</span>, we can see a pattern forming:</p>
<!-- TODO fix table margins and top border -->
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="math">\(N\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(2\)</span></td>
<td><span class="math">\(3\)</span></td>
<td><span class="math">\(4\)</span></td>
<td><span class="math">\(5\)</span></td>
<td><span class="math">\(6\)</span></td>
<td><span class="math">\(7\)</span></td>
<td><span class="math">\(8\)</span></td>
<td><span class="math">\(9\)</span></td>
<td><span class="math">\(10\)</span></td>
</tr>
<tr>
<td><span class="math">\(B_N\)</span></td>
<td><span class="math">\(2\)</span></td>
<td><span class="math">\(10\)</span></td>
<td><span class="math">\(44\)</span></td>
<td><span class="math">\(178\)</span></td>
<td><span class="math">\(716\)</span></td>
<td><span class="math">\(2866\)</span></td>
<td><span class="math">\(11468\)</span></td>
<td><span class="math">\(45874\)</span></td>
<td><span class="math">\(183500\)</span></td>
<td><span class="math">\(734002\)</span></td>
</tr>
</tbody>
</table>
<p>They seem to follow an almost-geometric recurrence relation:</p>
<ul>
<li><span class="math">\(B_1 = 2\)</span></li>
<li>for even <span class="math">\(N\)</span>, <span class="math">\(B_N = 4 B_{N-1} + 2\)</span></li>
<li>for odd <span class="math">\(N\)</span>, <span class="math">\(B_N = 4 B_{N-1} + 4\)</span></li>
</ul>
<p>Proof: Let <span class="math">\(e_N\)</span> be the amount removed by flooring, i.e., <span class="math">\(\left( \frac{7}{10} (4^N - 4) + 2 \right) - B_N\)</span>. We’d like to find <span class="math">\(e_N\)</span>, since it will make our lives easier.</p>
<p>For odd <span class="math">\(N\)</span>, this is easy: <span class="math">\(4^N - 4\)</span> is divisible by <span class="math">\(10\)</span>, so the flooring is unnecessary, which makes <span class="math">\(e_N = 0\)</span>.</p>
<p>For even <span class="math">\(N\)</span>, <span class="math">\(4^N - 4\)</span> is <span class="math">\(2\)</span> mod <span class="math">\(10\)</span>, and so <span class="math">\(\frac{7}{10} (4^N - 4)\)</span> is of the form “integer <span class="math">\(+ \frac{7 \cdot 2}{10}\)</span>”. This makes <span class="math">\(e_N = 2/5\)</span>.</p>
<p>Now, we can find the difference between <span class="math">\(B_N\)</span> and <span class="math">\(4 B_{N-1}\)</span>:
</p>
<div class="math">$$
\begin{align*}
B_N - 4 B_{N-1} &= \left( \frac{7}{10} (4^N - 4) + 2 - e_N \right) - 4 \left( \frac{7}{10} (4^{N-1} - 4) + 2 - e_{N-1} \right) \\
&= \left( \frac{7}{10} (4^N - 4) + 2 - e_N \right) + \left( \frac{7}{10} (16 - 4^N) - 8 + 4 e_{N-1} \right) \\
&= \frac{7}{10} (16 - 4) - 6 + 4 e_{N-1} - e_N \\
&= \frac{12}{5} + 4 e_{N-1} - e_N
\end{align*}
$$</div>
<p>For odd <span class="math">\(N\)</span>, this is <span class="math">\(\frac{12}{5} + \frac{8}{5} - 0 = 4\)</span>. For even <span class="math">\(N\)</span>, this is <span class="math">\(\frac{12}{5} + 0 - \frac{2}{5} = 2\)</span>. Check.</p>
<p>Now, we don’t know for sure that these <span class="math">\(B_N\)</span> are upper bounds on our score. That proof relied on <span class="math">\(p_\infty\)</span> actually being <span class="math">\(7/10\)</span>. But when I take a computer and search for good strategies, I found lots of strategies that acheive <span class="math">\(B_N\)</span>, and none that surpass it. That’s pretty suggestive that this conjecture is right.</p>
<p>But computer-generated strategies don’t give good intution, and my program starts to struggle at about <span class="math">\(N = 11\)</span>. Can we come up with a way to construct strategies that hit <span class="math">\(B_N\)</span>?</p>
<h1>Finite Strategies, Part II</h1>
<p>We’ll start with the following <span class="math">\(3\)</span>-hat strategy, and build it up into <span class="math">\(4\)</span>-hat and <span class="math">\(5\)</span>-hat strategies. (I’ve picked a symmetric one, for ease of presentation). It has score <span class="math">\(44\)</span>:
<!-- TODO can i show a grid giving the score? --></p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Hats</td>
<td><span class="math">\(\B\B\B\)</span></td>
<td><span class="math">\(\W\B\B\)</span></td>
<td><span class="math">\(\B\W\B\)</span></td>
<td><span class="math">\(\W\W\B\)</span></td>
<td><span class="math">\(\B\B\W\)</span></td>
<td><span class="math">\(\W\B\W\)</span></td>
<td><span class="math">\(\B\W\W\)</span></td>
<td><span class="math">\(\W\W\W\)</span></td>
</tr>
<tr>
<td>Choice</td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(2\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(3\)</span></td>
<td><span class="math">\(2\)</span></td>
<td><span class="math">\(3\)</span></td>
<td><span class="math">\(1\)</span></td>
</tr>
</tbody>
</table>
<p>It’s easy to extend to a <span class="math">\(4\)</span>-hat strategy, by just ignoring the last hat and applying the original strategy. But obviously this doesn’t improve the probability of winning, and it just increases the score to <span class="math">\(4 \cdot 44 = 176\)</span>, which is a little less than <span class="math">\(B_4 = 178\)</span>. Somehow we need to squeeze out an additional two points.</p>
<p>The key observation is that when we designed the <span class="math">\(3\)</span>-hat strategy, it didn’t matter what our decision was when seeing <span class="math">\(\B\B\B\)</span> or <span class="math">\(\W\W\W\)</span>. When you see your partner with a monochromatic stack of hats, you know that your choice doesn’t matter. But when we extended this to a <span class="math">\(4\)</span>-hat strategy, those decisons were copied over to <span class="math">\(\B\B\B\W\)</span> and <span class="math">\(\W\W\W\B\)</span>, where now they might matter! (They still won’t matter for <span class="math">\(\B\B\B\B\)</span> and <span class="math">\(\W\W\W\W\)</span> of course.)</p>
<p>Let’s just focus on the case where both Alice and Bob have one of these “almost monochromatic” stacks. Right now, they’ll both say “1”, and will only win when their stacks are identical. If they change their strategy so that <span class="math">\(\B\B\B\W \to 4\)</span>, then they’ll win all four possible matchups.
</p>
<div class="math">$$
\begin{matrix}
\\
\textrm{Alice's hats} \\
\\
\textrm{Bob's hats}
\end{matrix}
\qquad
\begin{matrix}
\hphantom{\B\B\B} \downarrow \\
\B\B\B\W \\
\hphantom{\B\B\B} \downarrow \\
\B\B\B\W \\
\end{matrix}
\qquad
\begin{matrix}
\downarrow \hphantom{\B\B\B} \\
\B\B\B\W \\
\hphantom{\B\B\B} \downarrow \\
\W\W\W\B \\
\end{matrix}
\qquad
\begin{matrix}
\hphantom{\B\B\B} \downarrow \\
\W\W\W\B \\
\downarrow \hphantom{\B\B\B} \\
\B\B\B\W \\
\end{matrix}
\qquad
\begin{matrix}
\downarrow \hphantom{\B\B\B} \\
\W\W\W\B \\
\downarrow \hphantom{\B\B\B} \\
\W\W\W\B \\
\end{matrix}
$$</div>
<p>That could be our extra two points we need. We just need to confirm that this tweak didn’t have a negative effect elsewhere.</p>
<p>If the matchup doesn’t involve <span class="math">\(\B\B\B\W\)</span>, then obviously the result is unaffected. So all we have to look at are matchups of the form “<span class="math">\(\B\B\B\W\)</span> vs ‘anything other than <span class="math">\(\B\B\B\W\)</span> and <span class="math">\(\W\W\W\B\)</span>’“. Before our tweak, we won exactly half of these matchups. Afterwards, the first player will answer “1”, “2”, or “3”, and the second player will answer “4”. The first player is guaranteed to pick a black hat, and since the second player is equally likely to pick a white or black hat, we still win exactly half of our matchups. So we have a score of <span class="math">\(178\)</span>, as desired!</p>
<hr>
<p>How about <span class="math">\(N = 5\)</span>? We could try the same approach – extend and tweak the <span class="math">\(\B\B\B\B\W\)</span> state – but that would only get us to <span class="math">\(4 \cdot 178 + 2 = 714\)</span>, which is still two points away from our target of <span class="math">\(B_5 = 716\)</span>.</p>
<p>The key is to think about why tweaking <span class="math">\(\B\B\B\W\)</span> was a strict improvement on the old strategy. It didn’t affect the outcome against most other hat configurations. You can reframe our <span class="math">\(4\)</span>-hat strategy as similar to our augumented “first-white” strategy:</p>
<ul>
<li>Split the <span class="math">\(4\)</span> hats you see into a block of <span class="math">\(3\)</span> and a block of <span class="math">\(1\)</span>.</li>
<li>If the first block is not monochromatic, apply the <span class="math">\(3\)</span>-hat strategy.</li>
<li>If it is, apply the following strategy:</li>
</ul>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Hats</td>
<td><span class="math">\((\B\B\B)\B\)</span></td>
<td><span class="math">\((\W\W\W)\B\)</span></td>
<td><span class="math">\((\B\B\B)\W\)</span></td>
<td><span class="math">\((\W\W\W)\W\)</span></td>
</tr>
<tr>
<td>Choice</td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(4\)</span></td>
<td><span class="math">\(1\)</span></td>
</tr>
</tbody>
</table>
<p>That table should look familar; it’s essentially our <span class="math">\(2\)</span>-hat strategy from earlier on, but using the monochromatic block as a single hat!</p>
<p>This provides an interesting way to build strategies. If we have an <span class="math">\(N\)</span>-hat strategy <span class="math">\(S\)</span>, and an <span class="math">\(M\)</span>-hat strategy <span class="math">\(T\)</span>, then we can combine them into an <span class="math">\((N+M-1)\)</span>-hat strategy that has a potentially better score.</p>
<p>Let <span class="math">\(p\)</span> be the win rate of <span class="math">\(S\)</span>, and <span class="math">\(q\)</span> the win rate of <span class="math">\(T\)</span>. Then we can find the win rate of this new strategy.</p>
<ul>
<li>If both players have a non-monochromatic first block, then the conditional win rate here is <span class="math">\(p^\ast\)</span>, which we know how to compute.</li>
<li>If both players have a monochromatic first block, then the conditional win rate is just <span class="math">\(q\)</span>.</li>
<li>If only one player has a monochromatic first block, then I claim they can only win half the time.<ul>
<li>Say Alice has the monochromatic first block, and Bob doesn’t. Then Alice will only ever answer a number between <span class="math">\(1\)</span> and <span class="math">\(N\)</span>.</li>
<li>Imagine flipping all of Bob’s hats; since Alice will still pick into her first block, it won’t change the color of the hat she picks. But it does flip the color of Bob’s choice.</li>
<li>This pairs every win with a loss, and vice versa, so they must be equal in number.</li>
</ul>
</li>
</ul>
<p>This means the total win rate of this strategy is:
</p>
<div class="math">$$
\begin{align*}
p_{new} &= p^\ast \frac{(2^N - 2)^2}{4^N} + \frac{1}{2} \frac{2 \cdot 2 \cdot (2^N - 2)}{4^N} + q \frac{2^2}{4^N} \\
&= \frac{4^N p - 2^{N+1} + 2}{4^N} + \frac{2^{N+1} - 4}{4^N} + \frac{4q}{4^N} \\
&= \frac{4^N p + 4q - 2}{4^N} \\
&= p + \frac{4q - 2}{4^N}
\end{align*}
$$</div>
<p>Interestingly enough, this doesn’t depend on the particular strategy chosen, only its win rate. Converting this into a score-based equation, where <span class="math">\(s\)</span> is the score of <span class="math">\(S\)</span>, and <span class="math">\(t\)</span> the score of <span class="math">\(T\)</span>, we get:
</p>
<div class="math">$$ s_{new} = 4^{M-1} s + (t - 4^M / 2) $$</div>
<p>That last term can be interpreted as “score above halfway”. I don’t know if that’s meaningful, but it’s crisp.</p>
<p>Let’s try to make a good <span class="math">\(5\)</span>-hat strategy with this. We know that combining a <span class="math">\(4\)</span> and <span class="math">\(2\)</span> hat strategy doesn’t work (we get a score of <span class="math">\(4 \cdot 178 + (10 - 8) = 714\)</span>). How about <span class="math">\(3\)</span> and <span class="math">\(3\)</span>? We’d get <span class="math">\(16 \cdot 44 + (44 - 32) = 716\)</span>. That works!</p>
<p>For completeness’s sake, let’s check out <span class="math">\((2, 4)\)</span>. The score would be <span class="math">\(64 \cdot 10 + (178 - 128) = 710\)</span>. Not great, which kind of makes sense. Front-loading the <span class="math">\(2\)</span>-hat strategy, which is worse than the <span class="math">\(4\)</span>-hat strategy, is a bad idea.</p>
<hr>
<p>Using this idea, we can construct strategies with scores of <span class="math">\(B_N\)</span> for all <span class="math">\(N\)</span>.</p>
<ul>
<li>For <span class="math">\(N = 1, 2, 3\)</span> we have explicit examples.</li>
<li>For even <span class="math">\(N\)</span>, extend an <span class="math">\((N-1)\)</span>-hat strategy by the optimal <span class="math">\(2\)</span>-hat strategy. This has a score of <span class="math">\(4 B_{N-1} + (10 - 8) = B_N\)</span>.</li>
<li>For odd <span class="math">\(N\)</span>, extend an <span class="math">\((N-2)\)</span>-hat strategy by the optimal <span class="math">\(3\)</span>-hat strategy. This has a score of <span class="math">\(16 B_{N-2} + (44 - 32) = 16 B_{N-2} + 12\)</span>. This is <span class="math">\(4 B_{N-1} + 4 = B_N\)</span>.</li>
</ul>
<h1>Final Thoughts</h1>
<p>Okay, we’ve defined a series <span class="math">\(B_N\)</span>, and shown we can construct strategies for <span class="math">\(N\)</span>-hat games with a score of <span class="math">\(B_N\)</span>.</p>
<p>If <span class="math">\(p_\infty = 7/10\)</span>, then we know that <span class="math">\(B_N\)</span> is an upper bound on our possible scores, which makes the strategies described above optimal. And conversely, if these finite strategies are optimal, then we can prove <span class="math">\(p_\infty = 7/10\)</span>.</p>
<!-- TODO figure this out lol -->
<p>I don’t quite have a proof figured out, because there’s some measurability criterion I’m missing, but the gist of it is: it should be the case that an infinite strategy can be approximated arbitrarily well by an <span class="math">\(N\)</span>-hat strategy, as long as we allow <span class="math">\(N\)</span> to be large. If <span class="math">\(p_\infty\)</span> were larger than <span class="math">\(7/10\)</span>, we’d be able to find a finite strategy with success rate higher than <span class="math">\(7/10\)</span>. But <span class="math">\(B_N / 4^N\)</span> is always less than <span class="math">\(7/10\)</span>:</p>
<div class="math">$$ B_N = \left \lfloor \frac{7}{10} (4^N - 4) + 2 \right \rfloor = \left \lfloor \frac{7}{10} 4^N - \frac{8}{10} \right \rfloor $$</div>
<p>So proving one of these two claims is sufficient for proving the other. Unfortunately, I can’t prove either one of them.</p>
<p>I’ve proven via computer search that the finite strategies described up to <span class="math">\(N = 8\)</span> are optimal, which is reassuring, but certainly not a proof.</p>
<hr>
<p>Interestingly enough, it doesn’t seem to matter if we restrict ourselves to symmetric strategies. We seem to get just as successful strategies even when we’re limited like that.</p>
<hr>
<p>One possible way to prove an upper bound is to show some kind of relation between a given <span class="math">\(N\)</span>-hat strategy, and an <span class="math">\((N-1)\)</span>-hat strategy derived from it. The difficult part here is that unless you remove a hat from both players at once, you end up in a situation where players have different numbers of hats, which I really don’t want to think about.</p>
<p>But I do want to throw a computer at it!</p>
<p>I wrote up a “relaxation” algorithm, that starts with a random strategy for Alice, computes Bob’s best response to it, then Alice’s best response to that, and so on, until we hit a fixed point. Repeating this over and over again gave the following table of scores:</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(2\)</span></td>
<td><span class="math">\(3\)</span></td>
<td><span class="math">\(4\)</span></td>
<td><span class="math">\(5\)</span></td>
<td><span class="math">\(6\)</span></td>
<td><span class="math">\(7\)</span></td>
<td><span class="math">\(8\)</span></td>
<td><span class="math">\(9\)</span></td>
<td><span class="math">\(10\)</span></td>
</tr>
<tr>
<td><span class="math">\(1\)</span></td>
<td></td>
<td><span class="math">\(2\)</span></td>
<td><span class="math">\(4\)</span></td>
<td><span class="math">\(8\)</span></td>
<td><span class="math">\(16\)</span></td>
<td><span class="math">\(32\)</span></td>
<td><span class="math">\(64\)</span></td>
<td><span class="math">\(128\)</span></td>
<td><span class="math">\(256\)</span></td>
<td><span class="math">\(512\)</span></td>
<td><span class="math">\(1024\)</span></td>
</tr>
<tr>
<td><span class="math">\(2\)</span></td>
<td></td>
<td><span class="math">\(4\)</span></td>
<td><span class="math">\(10\)</span></td>
<td><span class="math">\(20\)</span></td>
<td><span class="math">\(40\)</span></td>
<td><span class="math">\(80\)</span></td>
<td><span class="math">\(160\)</span></td>
<td><span class="math">\(320\)</span></td>
<td><span class="math">\(640\)</span></td>
<td><span class="math">\(1280\)</span></td>
<td><span class="math">\(2560\)</span></td>
</tr>
<tr>
<td><span class="math">\(3\)</span></td>
<td></td>
<td><span class="math">\(8\)</span></td>
<td><span class="math">\(20\)</span></td>
<td><span class="math">\(44\)</span></td>
<td><span class="math">\(88\)</span></td>
<td><span class="math">\(176\)</span></td>
<td><span class="math">\(352\)</span></td>
<td><span class="math">\(704\)</span></td>
<td><span class="math">\(1408\)</span></td>
<td><span class="math">\(2816\)</span></td>
<td><span class="math">\(5632\)</span></td>
</tr>
<tr>
<td><span class="math">\(4\)</span></td>
<td></td>
<td><span class="math">\(16\)</span></td>
<td><span class="math">\(40\)</span></td>
<td><span class="math">\(88\)</span></td>
<td><span class="math">\(178\)</span></td>
<td><span class="math">\(356\)</span></td>
<td><span class="math">\(712\)</span></td>
<td><span class="math">\(1424\)</span></td>
<td><span class="math">\(2848\)</span></td>
<td><span class="math">\(5696\)</span></td>
<td><span class="math">\(11392\)</span></td>
</tr>
<tr>
<td><span class="math">\(5\)</span></td>
<td></td>
<td><span class="math">\(32\)</span></td>
<td><span class="math">\(80\)</span></td>
<td><span class="math">\(176\)</span></td>
<td><span class="math">\(356\)</span></td>
<td><span class="math">\(716\)</span></td>
<td><span class="math">\(1432\)</span></td>
<td><span class="math">\(2864\)</span></td>
<td><span class="math">\(5728\)</span></td>
<td><span class="math">\(11456\)</span></td>
<td><span class="math">\(22912\)</span></td>
</tr>
<tr>
<td><span class="math">\(6\)</span></td>
<td></td>
<td><span class="math">\(64\)</span></td>
<td><span class="math">\(160\)</span></td>
<td><span class="math">\(352\)</span></td>
<td><span class="math">\(712\)</span></td>
<td><span class="math">\(1432\)</span></td>
<td><span class="math">\(2866\)</span></td>
<td><span class="math">\(5732\)</span></td>
<td><span class="math">\(11464\)</span></td>
<td><span class="math">\(22928\)</span></td>
<td><span class="math">\(45856\)</span></td>
</tr>
<tr>
<td><span class="math">\(7\)</span></td>
<td></td>
<td><span class="math">\(128\)</span></td>
<td><span class="math">\(320\)</span></td>
<td><span class="math">\(704\)</span></td>
<td><span class="math">\(1424\)</span></td>
<td><span class="math">\(2864\)</span></td>
<td><span class="math">\(5732\)</span></td>
<td><span class="math">\(11468\)</span></td>
<td><span class="math">\(22936\)</span></td>
<td><span class="math">\(45872\)</span></td>
<td><span class="math">\(91744\)</span></td>
</tr>
<tr>
<td><span class="math">\(8\)</span></td>
<td></td>
<td><span class="math">\(256\)</span></td>
<td><span class="math">\(640\)</span></td>
<td><span class="math">\(1408\)</span></td>
<td><span class="math">\(2848\)</span></td>
<td><span class="math">\(5728\)</span></td>
<td><span class="math">\(11464\)</span></td>
<td><span class="math">\(22936\)</span></td>
<td><span class="math">\(45874\)</span></td>
<td><span class="math">\(91748\)</span></td>
<td><span class="math">\(183496\)</span></td>
</tr>
<tr>
<td><span class="math">\(9\)</span></td>
<td></td>
<td><span class="math">\(512\)</span></td>
<td><span class="math">\(1280\)</span></td>
<td><span class="math">\(2816\)</span></td>
<td><span class="math">\(5696\)</span></td>
<td><span class="math">\(11456\)</span></td>
<td><span class="math">\(22928\)</span></td>
<td><span class="math">\(45872\)</span></td>
<td><span class="math">\(91748\)</span></td>
<td><span class="math">\(183500\)</span></td>
<td><span class="math">\(367000\)</span></td>
</tr>
<tr>
<td><span class="math">\(10\)</span></td>
<td></td>
<td><span class="math">\(1024\)</span></td>
<td><span class="math">\(2560\)</span></td>
<td><span class="math">\(5632\)</span></td>
<td><span class="math">\(11392\)</span></td>
<td><span class="math">\(22912\)</span></td>
<td><span class="math">\(45856\)</span></td>
<td><span class="math">\(91744\)</span></td>
<td><span class="math">\(183496\)</span></td>
<td><span class="math">\(367000\)</span></td>
<td><span class="math">\(734002\)</span></td>
</tr>
</tbody>
</table>
<p>It seems to follow a… pattern? Not a nice pattern, but a pattern. Say you have <span class="math">\(a_{m,n}\)</span>, where <span class="math">\(m > n\)</span>. Then:</p>
<ul>
<li>To step “away from the diagonal”, i.e., to <span class="math">\(a_{m+1,n}\)</span>, then you just double the score.</li>
<li>To step “toward the diagonal”, i.e., to <span class="math">\(a_{m, n+1}\)</span>, then you double and add <span class="math">\(2^k\)</span>, where <span class="math">\(k\)</span> is <span class="math">\(m - 1 - 2 \lfloor n / 2 \rfloor\)</span>.<ul>
<li>In other words, <span class="math">\(k\)</span> goes <span class="math">\(m-1, m-1, m-3, m-3, m-5, m-5, \ldots\)</span>, until it ends in either <span class="math">\(2, 2\)</span> or <span class="math">\(3, 3, 1\)</span>, at which point we arrive at the diagonal itself.</li>
</ul>
</li>
</ul>
<p>No idea if that’s helpful, or can be cleaned up into anything nice.</p>
<!-- Clean up python code and link to it? -->
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>The Dehn Invariant, or, Tangrams In Space2020-03-30T00:00:00-07:002020-03-30T00:00:00-07:00Henry Swansontag:mathmondays.com,2020-03-30:/dehn-invariant<p><span class="mathdefs">
<span class="math">\(\newcommand{\ZZ}{\Bbb Z}
\newcommand{\QQ}{\Bbb Q}
\newcommand{\RR}{\Bbb R}\)</span>
</span></p>
<p>Fans of wooden children’s toys may remember <a href="https://en.wikipedia.org/wiki/Tangram">tangrams</a>, a puzzle composed of 7 flat pieces that can be rearranged into numerous different configurations.</p>
<p><img alt="Tangrams in square and cat configurations" src="/images/dehn/tangrams.svg"></p>
<p>As mathematicians, we’re interested in shapes that are slightly simpler than cats or houses.
<p><span class="mathdefs">
<span class="math">\(\newcommand{\ZZ}{\Bbb Z}
\newcommand{\QQ}{\Bbb Q}
\newcommand{\RR}{\Bbb R}\)</span>
</span></p>
<p>Fans of wooden children’s toys may remember <a href="https://en.wikipedia.org/wiki/Tangram">tangrams</a>, a puzzle composed of 7 flat pieces that can be rearranged into numerous different configurations.</p>
<p><img alt="Tangrams in square and cat configurations" src="/images/dehn/tangrams.svg"></p>
<p>As mathematicians, we’re interested in shapes that are slightly simpler than cats or houses.
<!-- more -->
For example, we might try to design a set of tangrams that can be rearranged into an equilateral triangle. One possibility is shown below.</p>
<p><img alt="Equidecomposition of square and triangle" src="/images/dehn/square-to-triangle.svg"></p>
<p>How about a pentagon?</p>
<p><img alt="Equidecomposition of square and pentagon" src="/images/dehn/square-to-pentagon.svg"></p>
<p>We don’t have to start with a square, how about a set that can become a star or a triangle?</p>
<p><img alt="Equidecomposition of six-pointed star and triangle" src="/images/dehn/star-to-triangle.svg"></p>
<p>What pairs of polygons can we design tangram sets for? One way to reframe this problem is in terms of <em>scissors-congruence</em>, which is pretty much what it sounds like. Two polygons are “scissors-congruent” if we can take the first polygon, make a finite number of straight-line cuts to it, and rearrange the pieces into the second polygon. Clearly, two polygons are scissors-congruent if and only if we can design a set of tangrams that connect the two.</p>
<hr>
<p>Given two polygons, how can we tell if they’re scissors-congruent? One thing we can do is check their areas, since, if they have different areas, there’s no way they can be scissors-congruent. It turns out that this is the <em>only</em> obstacle – if two polygons have the same area, they <em>must</em> be scissors-congruent! This surprising result is known as the Wallace–Bolyai–Gerwien theorem, and was proven in the 1830s. We’ll walk through a proof.</p>
<p>It suffices to show that any polygon of area <span class="math">\(A\)</span> is scissors-congruent to an <span class="math">\(A \times 1\)</span> rectangle. This is because, if <span class="math">\(P_1\)</span> and <span class="math">\(P_2\)</span> are scissors-congruent to some third shape <span class="math">\(Q\)</span>, then we can rearrange <span class="math">\(P_1\)</span> into <span class="math">\(P_2\)</span> by going through <span class="math">\(Q\)</span> as an intermediate step. We start by breaking our polygon into triangles:</p>
<p><img alt="Triangulation of a polygon" src="/images/dehn/wbg-1.svg"></p>
<p>Next, we’ll transform each triangle into a rectangle, by cutting it halfway up its height, and folding down the apex:</p>
<p><img alt="Cutting a triangle into a rectangle" src="/images/dehn/wbg-2.svg"></p>
<p>Now we need to change the dimensions of this rectangle, but this step requires some creativity. We need the height of the rectangle to be between <span class="math">\(1\)</span> and <span class="math">\(2\)</span>. If it isn’t, we can repeatedly cut it in half until it does. (If the height is less than <span class="math">\(1\)</span>, then we run this process in reverse to double it instead.)</p>
<p><img alt="Repeatedly halving a rectangle" src="/images/dehn/wbg-3.svg"></p>
<p>Then, we do a sliding maneuver to convert this rectangle into one with height <span class="math">\(1\)</span>. Notice that we need <span class="math">\(u < 1\)</span>, or else <span class="math">\(u \ell\)</span> would be greater than <span class="math">\(\ell\)</span>, and we couldn’t draw this diagram.</p>
<p><img alt="Minor width adjustment of a rectangle" src="/images/dehn/wbg-4.svg"></p>
<p>After doing this to all the triangles, the final step is to glue all these rectangles together, end-to-end, to get the desired <span class="math">\(A \times 1\)</span> rectangle.</p>
<hr>
<p>The natural question to ask next is: can we generalize this? What about 3D shapes? Are any two polyhedra of equal volume also scissors-congruent?</p>
<p>This is the third of <a href="https://en.wikipedia.org/wiki/Hilbert%27s_problems">Hilbert’s twenty-three problems</a>, and his student, Max Dehn, proved in 1903 that, unlike in two dimensions, the answer is “no”. He did so by constructing a quantity (now known as the “Dehn invariant”) that stays unchanged under scissors-congruence. Two shapes with different Dehn invariants, therefore, cannot be scissors-congruent. For example, a cube and a tetrahedron of equal volume are not scissors-congruent.</p>
<p>Unlike area and volume, the Dehn invariant isn’t as simple as a real number, and we’ll need to do a bit of legwork to define it. The key observation to make is that a cut can only do one of three things to an edge:</p>
<ul>
<li>miss it completely</li>
<li>cut it at a point</li>
<li>split it along its entire length</li>
</ul>
<p>By looking at what these operations do to edges, we can cobble together a quantity that stays invariant. The properties of an edge that we care about are its length and its dihedral angle.<sup id="ref1"><a href="#fn1">[1]</a></sup>.</p>
<p>In the first situation, the edge stays unchanged. That one’s easy.</p>
<p>In the second situation, one edge is turned into two edges. The new edges have the same dihedral angle as the original, and their lengths sum to the original length.</p>
<p><img alt="Cutting an edge transversely" src="/images/dehn/edge-cut-transverse.svg"></p>
<p>In the third situation, we again get two edges, but this time, the length stays the same, and the dihedral angle changes.</p>
<p><img alt="Cutting an edge along its length" src="/images/dehn/edge-cut-lengthwise.svg"></p>
<p>Lastly, cuts also create new edges, as they slice through a face. We’d like these to count for nothing, count as zero.</p>
<p>Now that we know what cuts do to edges, how do we use this to define an invariant? If an edge is represented by the ordered pair <span class="math">\((\ell_i, \theta_i)\)</span>, we want to enforce the following equivalence relations:
</p>
<div class="math">$$ (\ell_1 + \ell_2, \theta) \cong (\ell_1, \theta) + (\ell_2, \theta) \qquad (\ell, \theta_1 + \theta_2) \cong (\ell, \theta_1) + (\ell, \theta_2) $$</div>
<p>These two rules imply some further relations. Consider the sum of <span class="math">\(n\)</span> copies of <span class="math">\((\ell, \theta)\)</span>. Applying the first rule repeatedly gives <span class="math">\((n \ell, \theta)\)</span>, and the second rule gives <span class="math">\((\ell, n \theta)\)</span>. This can be extended to negative <span class="math">\(n\)</span> as well, so for any integer <span class="math">\(n\)</span>,
</p>
<div class="math">$$ n (\ell, \theta) = (n \ell, \theta) = (\ell, n \theta) $$</div>
<p>If you’re familiar with tensors, you might notice that these are exactly the conditions for a tensor product! If not, don’t worry, you can think of these as ordered pairs still, but we’ll use the symbol <span class="math">\(\otimes\)</span> instead of a comma. It may make more sense when we go through the examples.</p>
<p>We still have to deal with the new edges created from cuts in the faces, but these almost resolve themselves. The edges we create come in pairs with supplementary angles. So if the edge pair we create has length <span class="math">\(\ell\)</span>, we get <span class="math">\((\ell, \theta) + (\ell, \pi - \theta) = (\ell, \pi)\)</span>. Using the third rule above, we can drag a <span class="math">\(2\)</span> from the left to the right, giving us <span class="math">\((\ell/2, 2\pi)\)</span>. If we declare that <span class="math">\(2\pi\)</span> is equivalent to <span class="math">\(0\)</span> (a reasonable demand, given that we’re working with angles), then these edge pairs automatically cancel each other out, as desired.</p>
<p>We can now define the Dehn invariant: it takes values in <span class="math">\(\RR \otimes_\ZZ \RR/2 \pi\)</span> (lengths and angles), and it’s equal to the sum of <span class="math">\(\ell_i \otimes \theta_i\)</span> over all the edges. Is something that concise truly unchanged by scissors-congruence?</p>
<p>When we make a cut, either it misses an existing edge, and so the corresponding term in the sum does not change, or it intersects it, in which case that term is replaced by two terms that sum to the original. It also creates new edges, by cutting into the faces. But as we saw earlier, these edges come in pairs that sum to zero, and so the total value of the invariant remains unchanged.</p>
<hr>
<p>Armed with this invariant, we can now answer the question: are the cube and the tetrahedron are scissors-congruent? Let’s say both have volume 1. The cube has 12 edges, each with dihedral angle <span class="math">\(\pi / 2\)</span>. To get the volume to be <span class="math">\(1\)</span>, we need edges of length <span class="math">\(1\)</span>, so the Dehn invariant of this cube is:
</p>
<div class="math">$$ 12 (1 \otimes \frac{\pi}{2}) = 3 (1 \otimes 2 \pi) = (3 \otimes 2 \pi) = 0 $$</div>
<p>A tetrahedron has 6 edges, each with dihedral angle <span class="math">\(\arccos(1/3)\)</span>. The volume of a tetrahedron with side length <span class="math">\(a\)</span> is <span class="math">\(a^3 / 6 \sqrt 2\)</span>, so the side length of our tetrahedron needs to be <span class="math">\(a = (72)^{1/6}\)</span>, making the Dehn invariant equal to:
</p>
<div class="math">$$ 6 (a \otimes \arccos(1/3)) = 6 a \otimes \arccos(1/3) $$</div>
<p>With some knowledge of modules, one can show that this is non-zero<sup id="ref2"><a href="#fn2">[2]</a></sup>, but the crux of the idea is that <span class="math">\(\arccos(1/3)\)</span> is not a rational multiple of <span class="math">\(\pi\)</span>, so we can never get the right hand side of this tensor to collapse to zero. This shows that no matter how many pieces you cut it into, a cube can never be reassembled into a tetrahedron.</p>
<p>One interesting consequence of this: in geometry class, you probably saw some cut-and-paste constructions for proving the area of a parallelogram, or a triangle. This result shows there can never be such a proof for pyramids – calculus is unavoidable!</p>
<hr>
<p>A final note: we’ve shown that there are at least two obstructions for two scissors-congruence in 3D: volume and Dehn invariant. Are they the only ones? The answer is yes! In other words, if two polyhedra do have the same volume and Dehn invariant, then they are indeed scissors-congruent. The proof of that is much harder, and a good presentation can be found <a href="http://www.math.brown.edu/~res/MathNotes/jessen.pdf">here</a>.</p>
<ol>
<li>
<p><a id="fn1" href="#ref1">↑</a> The dihedral angle of an edge is the angle between the two faces adjacent to it. You can think of it as a measure of the ‘sharpness’ of an edge; a 90° edge is like the edge of a countertop, but a 15° edge will cut like a knife.</p>
</li>
<li>
<p><a id="fn2" href="#ref2">↑</a> First, note that for any rational <span class="math">\(p/q\)</span>, we have <span class="math">\(\ell \otimes \frac{p}{q} \pi = \frac{\ell}{2q} \otimes 2 p \pi = 0\)</span>. This means that <span class="math">\(\RR \otimes_\ZZ \RR/2\pi \cong \RR \otimes_\ZZ \RR/(2\pi\QQ)\)</span>. Since both of those modules are divisible, this is equal to <span class="math">\(\RR \otimes_\QQ \RR/(2 \pi \QQ)\)</span>, which, being a tensor product of <span class="math">\(\QQ\)</span>-vector spaces, is a <span class="math">\(\QQ\)</span>-vector space itself. In particular, if <span class="math">\(\ell \ne 0\)</span> and <span class="math">\(\theta \notin 2 \pi \QQ\)</span>, then <span class="math">\(\ell \otimes \theta\)</span> is a non-zero vector.</p>
</li>
</ol>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>The Mathematical Hydra2019-09-29T00:00:00-07:002019-09-29T00:00:00-07:00Henry Swansontag:mathmondays.com,2019-09-29:/hydra<p>Imagine you’re tasked with killing a hydra. As usual, the hydra is defeated when all of its heads are cut off, and whenever a head is cut off, the hydra grows new ones.</p>
<p>However, this mathematical hydra is much more frightening than a “traditional” one. It’s got a tree-like structure – heads growing out of its heads – and it can regrow entire groups of heads at once! Can you still win?</p>
<p>Also, this post is the first one with interactivity! Feel free to report bugs on the <a href="https://github.com/HenrySwanson/HenrySwanson.github.io/issues">GitHub issues page</a>.</p>
<p>Imagine you’re tasked with killing a hydra. As usual, the hydra is defeated when all of its heads are cut off, and whenever a head is cut off, the hydra grows new ones.</p>
<p>However, this mathematical hydra is much more frightening than a “traditional” one. It’s got a tree-like structure – heads growing out of its heads – and it can regrow entire groups of heads at once! Can you still win?</p>
<p>Also, this post is the first one with interactivity! Feel free to report bugs on the <a href="https://github.com/HenrySwanson/HenrySwanson.github.io/issues">GitHub issues page</a>.</p>
<!-- more -->
<hr>
<p>For the purposes of our game, a hydra is a rooted tree. The root, on the left, is the body, and the leaves are the heads. Intermediate nodes are part of the necks of the hydra, and cannot (yet) be cut off.</p>
<p align="center">
<img src="/images/hydra/anatomy.svg" width="70%" height="auto" alt="Anatomy of a hydra">
</p>
<p>You can cut off one head at a time, and when you do, the hydra may grow more heads, according to the following rules:</p>
<ul>
<li>If the head is connected directly to the root, then the hydra does nothing.</li>
<li>Otherwise, look at the parent node (the one directly underneath the one you just cut off). The hydra grows two new copies of that node <em>and all its children</em>, attaching them to the grandparent as appropriate.</li>
</ul>
<hr>
<p>This is hard to convey through text, so let’s walk through an example. Let’s start with a pretty simple hydra, and cut off one of the heads. (Purple indicates newly-grown heads.)</p>
<p><img alt="First step of killing the hydra" height="auto" src="/images/hydra/example-1.svg" width="100%"></p>
<p>We used to have two heads, and four nodes total, but now we have three, and seven nodes. That’s not good. Let’s try chopping off another one.</p>
<p><img alt="Second step of killing the hydra" height="auto" src="/images/hydra/example-2.svg" width="100%"></p>
<p>This increases the total number of heads, but now, we can cut off the three smallest heads, one at a time, without incident.</p>
<p><img alt="Third step of killing the hydra" height="auto" src="/images/hydra/example-3.svg" width="100%"></p>
<p>We’ve made some visible progress now. Cutting off one of the remaining heads will reveal three more, but we can extinguish them easily.</p>
<p><img alt="Fourth step of killing the hydra" height="auto" src="/images/hydra/example-4.svg" width="100%"></p>
<p>Repeating this process on the last head will kill the hydra.</p>
<p><img alt="Fifth step of killing the hydra" height="auto" src="/images/hydra/example-5.svg" width="100%"></p>
<hr>
<p>We managed to defeat this hydra, but it was a pretty small one. What about something a bit larger? Let’s add one more head to that neck.</p>
<p>This time, you can try to kill it yourself: the illustration below is interactive!</p>
<p><button id="reset-button" type="button">Reset</button>
<span id="click-counter" style="float:right;"></span>
<div id="hydra-interactive" style="border-style: solid;border-width: 3px;border-radius: 5px;background-color: #fff"></div></p>
<hr>
<p>Depending on how persistent you are, you might not be surprised to learn that you can indeed kill this hydra, though it’ll take tens of thousands of moves to do so (29528 moves by my count). In fact, you can kill any hydra, though I’ll make no guarantees about how long it will take.</p>
<p>But what may be surprising is that you can’t avoid killing the hydra, even if you try. No matter how large the hydra, or what order you cut off its heads, you will always defeat it in a finite number of moves.</p>
<p>And even better, this holds true even for faster-regenerating hydras. What if, instead of growing back two copies of the subtree, the hydra grows back three copies? Or a hundred? What if, on the <span class="math">\(N\)</span>th turn of the game, it grows back <span class="math">\(N\)</span> copies? <span class="math">\(N^2\)</span>? <span class="math">\(N!\)</span>? What if the hydra just gets to pick how many copies to regrow, as many as it wants?</p>
<p>It doesn’t matter.</p>
<p>You always win.</p>
<hr>
<p>The proof here relies on <a href="https://en.wikipedia.org/wiki/Ordinal_number">ordinal numbers</a>. If you’re not familiar, there’s a good <a href="https://www.youtube.com/watch?v=SrU9YDoXE88">video from Vsauce</a> about them. The key property to know is that the ordinals are “well-ordered”; that is, there is no infinitely long descending sequence.<sup id="ref1"><a href="#fn1">[1]</a></sup>.</p>
<p>We assign an ordinal number to each hydra, in such a way that cutting off a head produces a hydra with a strictly smaller ordinal. As we play the hydra game, the sequence of hydras we encounter produces a corresponding sequence of ordinals. Since the ordinal sequence is strictly decreasing, it must eventually terminate, and so the hydra sequence must terminate as well. The only way that the hydra sequence can terminate is if we have no more heads to cut off; i.e., we’ve defeated the hydra.</p>
<p>The assignment is done by assigning values to the nodes, and accumulating down to the root:</p>
<ul>
<li>A head is assigned <span class="math">\(0\)</span>. Similarly, a trivial (dead) hydra is assigned <span class="math">\(0\)</span>.</li>
<li>If a node has children with ordinals <span class="math">\(\alpha_1, \alpha_2, \ldots, \alpha_n\)</span>, then we assign the ordinal <span class="math">\(\omega^{\alpha_1} + \omega^{\alpha_2} + \cdots + \omega^{\alpha_n}\)</span><sup id="ref2"><a href="#fn2">[2]</a></sup>.</li>
</ul>
<p>What happens when we cut off a head?</p>
<ul>
<li>If it’s directly attached to the body, then it contributes a term of <span class="math">\(\omega^0 = 1\)</span> to the whole ordinal. Killing this head removes this term, decreasing the ordinal.</li>
<li>Otherwise, consider the ordinal of that head’s parent and grandparent. Before we cut off the head, the ordinal of the parent must have been of the form <span class="math">\(\alpha + 1\)</span>. This means the ordinal of the grandparent has a term <span class="math">\(\omega^{\alpha + 1}\)</span>. When we cut off the head, the parent ordinal decreases to <span class="math">\(\alpha\)</span>, but there’s now two more copies of it. This replaces the <span class="math">\(\omega^{\alpha + 1}\)</span> term in the grandparent with <span class="math">\(3 \omega^\alpha\)</span>, which is strictly smaller. And because the rest of the tree remains unchanged, this means the ordinal assigned to the hydra as a whole also decreases.</li>
</ul>
<p>To illustrate this process, let’s look the ordinals that correspond to the hydras we saw earlier. It may help to read them in reverse order.</p>
<p align="center">
<img src="/images/hydra/ordinals.svg" width="100%" height="auto" alt="Ordinal sequence for previous hydra">
</p>
<p>We can also see why the hydra’s regeneration speed doesn’t matter. No matter how large <span class="math">\(N\)</span> is, as long as it’s finite, <span class="math">\(\omega^{\alpha + 1}\)</span> will be strictly larger than <span class="math">\(N \omega^{\alpha}\)</span>.</p>
<p>One way to think about this is that a neck that forks at height <span class="math">\(k+1\)</span> is literally <em>infinitely worse</em> than a neck that forks at height <span class="math">\(k\)</span>. By cutting off a head, you simplify it at height <span class="math">\(k+1\)</span>, at the expense of introducing some forking at height <span class="math">\(k\)</span>, which isn’t as bad.</p>
<hr>
<p>A last interesting fact: this proof relied on ordinal numbers, which have a whole lot of infinities (<span class="math">\(\omega\)</span>s) tied up in them. But everything in this hydra game is finite; from an initial hydra, there’s only finitely many hydras we can encounter, each of which has only finitely many heads. Is there a proof that avoids any mention of infinity?</p>
<p>In 1982, Laurence Kirby and Jeff Paris proved that there isn’t, in the following sense: any proof technique strong enough to prove the hydra’s eventual demise is strong enough to prove the consistency of Peano arithmetic. In particular, it’s impossible to prove the hydra theorem from within Peano arithmetic.</p>
<hr>
<ol>
<li><a id="fn1" href="#ref1">↑</a> In fact, the ordinals are the prototype of every well-founded set, and this is what makes them important.</li>
<li><a id="fn2" href="#ref2">↑</a> Without loss of generality, we can relabel the subhydras so that the ordinals are non-strictly descending. This avoids problems coming from the non-commutativity of ordinal addition.</li>
</ol>
<script src="https://cdnjs.cloudflare.com/ajax/libs/svg.js/2.7.1/svg.js"></script>
<script src="/js/hydra_lib.js"></script>
<script src="/js/hydra_main.js"></script>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Safes and Keys2018-11-16T00:00:00-08:002018-11-16T00:00:00-08:00Henry Swansontag:mathmondays.com,2018-11-16:/safes-and-keys<p>Here’s a few similar puzzles with a common story:</p>
<blockquote>
<p>I have <em>n</em> safes, each one with a unique key that opens it. Unfortunately, some prankster snuck into my office last night and stole my key ring. It seems they’ve randomly put the keys inside the safes (one key per safe), and locked them.</p>
</blockquote>
<p>We’ll play around with a few different conditions and see what chances we have of getting all safes unlocked, and at what cost.</p>
<p>Here’s a few similar puzzles with a common story:</p>
<blockquote>
<p>I have <em>n</em> safes, each one with a unique key that opens it. Unfortunately, some prankster snuck into my office last night and stole my key ring. It seems they’ve randomly put the keys inside the safes (one key per safe), and locked them.</p>
</blockquote>
<p>We’ll play around with a few different conditions and see what chances we have of getting all safes unlocked, and at what cost.</p>
<!-- more -->
<hr>
<p><strong>1) The prankster was a bit sloppy, and forgot to lock one of the safes. What is the probability I can unlock all of my safes?</strong></p>
<p>The key observation here, as with the subsequent problems, is to consider the arrangement of keys and safes as a permutation. Label the safes and keys <span class="math">\(1\)</span> to <span class="math">\(n\)</span>, and define <span class="math">\(\pi(i)\)</span> to be the number of the key inside the <span class="math">\(i\)</span>th safe. So, if we have key <span class="math">\(1\)</span>, we unlock safe <span class="math">\(1\)</span> to reveal key <span class="math">\(\pi(1)\)</span>.</p>
<p>Under this interpretation, key <span class="math">\(i\)</span> lets us unlock all safes in the cycle containing <span class="math">\(i\)</span>; we open a safe, find a new key, track down the new safe, and repeat until we end up where we started. So, we want to know the probability that a randomly chosen permutation has exactly one cycle.</p>
<p>This isn’t too hard; we can count the number of one-cycle permutations in a straightforward way. Given a permutation of one cycle, we start with element <span class="math">\(1\)</span>, we write out <span class="math">\(\pi(1)\)</span>, <span class="math">\(\pi(\pi(1))\)</span>, etc, until we loop back to <span class="math">\(1\)</span>. This produces an ordered list of <span class="math">\(n\)</span> numbers, starting with <span class="math">\(1\)</span>, and this uniquely determines the cycle. There are <span class="math">\((n-1)!\)</span> such lists, and so the probability of having exactly one cycle is <span class="math">\((n-1)!/n! = 1/n\)</span></p>
<hr>
<p><strong>2) Say the prankster is sloppier, and leaves k safes unlocked. Now what is my probability of success?</strong></p>
<p>This one requires a little more thought. It’s tempting to consider permutations with <span class="math">\(k\)</span> cycles, but that’s not quite right. If there’s only one cycle, we’re sure to succeed, and furthermore, even if there are <span class="math">\(k\)</span> cycles, our success isn’t guaranteed: we could pick two safes in the same cycle.</p>
<p>By symmetry, label our safes so that we’ve picked safes <span class="math">\(1\)</span>, <span class="math">\(2\)</span>, …, <span class="math">\(k\)</span>. We’d like to know how many permutations have a cycle that completely avoid <span class="math">\(1\)</span> through <span class="math">\(k\)</span>. If, and only if, such a cycle is present, we fail to unlock all the safes.</p>
<p>Let <span class="math">\(a_i\)</span> be the number of “good” permutations when there are <span class="math">\(i\)</span> safes. We will express <span class="math">\(a_n\)</span> in terms of smaller <span class="math">\(a_i\)</span>s, and solve the resulting recurrence relation.</p>
<p>Given a permutation <span class="math">\(\pi\)</span>, we can split the set <span class="math">\(\{ 1, \ldots n \}\)</span> into two parts: those that have cycles intersecting <span class="math">\(\{ 1, \ldots, k \}\)</span>, and those that do not. (It may help to think of these sets as “reachable” and “unreachable” safes, respectively). Since <span class="math">\(\pi\)</span> never sends a reachable safe to an unreachable one, or vice versa, it induces permutations on both these sets. Also, knowing both these subpermutations, we can reconstruct <span class="math">\(\pi\)</span>. So, let’s count how many possible permutations there are on the reachable and unreachable sets.</p>
<p>If there are <span class="math">\(r\)</span> reachable safes, then there are <span class="math">\(a_r\)</span> possible permutations induced on the reachable set, and <span class="math">\((n-r)!\)</span> induced on the unreachable one. (The reason we don’t get the full <span class="math">\(r!\)</span> on the reachable set is that some permutations would leave a safe unreachable, when it’s supposed to be reachable.) Furthermore, we have a choice of <em>which</em> safes are reachable. The first <span class="math">\(k\)</span> safes must be reachable, so beyond that, we have <span class="math">\(\binom{n-k}{r-k}\)</span> more choices to make. Our recurrence relation is then:
</p>
<div class="math">$$ n! = \sum_{r = k}^n \binom{n-k}{r-k} a_r (n-r)! = \sum_{r = k}^n a_r \frac{(n-k)!}{(r-k)!} $$</div>
<p>Since <span class="math">\((n-k)!\)</span> doesn’t depend on <span class="math">\(r\)</span>, we can pull it out to get a neater-looking form:
</p>
<div class="math">$$ \frac{n!}{(n-k)!} = \sum_{r=k}^n \frac{a_r}{(r-k)!} $$</div>
<p>Now <span class="math">\(n\)</span> only shows up as an index, not anywhere in the summand. This lets us collapse our sum; take this term, and subtract it from the corresponding one for <span class="math">\(n-1\)</span>:
</p>
<div class="math">$$
\begin{align*}
\frac{n!}{(n-k)!} - \frac{(n-1)!}{(n-1-k)!} &= \left( \sum_{r=k}^n \frac{a_r}{(r-k)!} \right) - \left( \sum_{r=k}^{n-1} \frac{a_r}{(r-k)!} \right) \\
\frac{n!}{(n-k)!} - \frac{(n-1)!}{(n-1-k)!} &= \frac{a_n}{(n-k)!} \\
n! - (n-1)!(n-k) &= a_n \\
k \cdot (n-1)! &= a_n
\end{align*}
$$</div>
<p>So there’s <span class="math">\(k \cdot (n-1)!\)</span> permutations in which we win. Since there’s <span class="math">\(n!\)</span> total, this gives our probability of success at <span class="math">\(k/n\)</span>.</p>
<hr>
<p><strong>3) If the prankster is careful, and remembers to lock all the safes, then I have no choice but to break some of them open. What’s the expected number of safes I have to crack?</strong></p>
<p>This one’s much easier than 2). The question here is just “how many cycles are there in a random permutation”, and <a href="/linearity-expectation">from a previous post</a>, we know that’s <span class="math">\(H_n\)</span>, the <span class="math">\(n\)</span>th harmonic number.</p>
<hr>
<p><strong>4) Putting it all together: if we start with <span class="math">\(k\)</span> safes unlocked, what’s the expected number of safes I have to crack open?</strong></p>
<p>I haven’t actually put this one on solid ground yet! It’s not coming out pretty.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Ax-Grothendieck Theorem2018-11-12T00:00:00-08:002018-11-12T00:00:00-08:00Henry Swansontag:mathmondays.com,2018-11-12:/ax-grothendieck<p><span class="mathdefs">
<span class="math">\(\newcommand{\CC}{\Bbb C}
\newcommand{\FF}{\Bbb F}
\newcommand{\QQ}{\Bbb Q}
\newcommand{\FFx}[1]{\overline{\FF_{#1}}}
\newcommand{\ACF}{\mathbf{ACF}}
\newcommand{\cL}{\mathcal{L}}
\newcommand{\cT}{\mathcal{T}}\)</span>
</span></p>
<p>The Ax-Grothendieck theorem is the statement:
Let <span class="math">\(f: \CC^n \to \CC^n\)</span> be a polynomial map; that is, each coordinate <span class="math">\(f_i: \CC^n \to \CC\)</span> is a polynomial in the <span class="math">\(n\)</span> input variables.
Then, if <span class="math">\(f\)</span> is injective, it is surjective.
</p>
<p>This… doesn’t seem like a particularly exciting theorem. But it has a really exciting proof.</p>
<p><span class="mathdefs">
<span class="math">\(\newcommand{\CC}{\Bbb C}
\newcommand{\FF}{\Bbb F}
\newcommand{\QQ}{\Bbb Q}
\newcommand{\FFx}[1]{\overline{\FF_{#1}}}
\newcommand{\ACF}{\mathbf{ACF}}
\newcommand{\cL}{\mathcal{L}}
\newcommand{\cT}{\mathcal{T}}\)</span>
</span></p>
<p>The Ax-Grothendieck theorem is the statement:
<div class="theorem-box">
<div class="theorem-title">Ax-Grothendieck Theorem</div>
Let <span class="math">\(f: \CC^n \to \CC^n\)</span> be a polynomial map; that is, each coordinate <span class="math">\(f_i: \CC^n \to \CC\)</span> is a polynomial in the <span class="math">\(n\)</span> input variables.
Then, if <span class="math">\(f\)</span> is injective, it is surjective.
</div></p>
<p>This… doesn’t seem like a particularly exciting theorem. But it has a really exciting proof.</p>
<!-- more -->
<hr>
<p>The idea behind the proof isn’t algebraic, it isn’t topological, it’s not even geometric, it’s <s>DiGiorno</s> model-theoretic!</p>
<p>The spirit of the proof is as follows:</p>
<ul>
<li>if the theorem is false, then there is a disproof (a proof of the negation)</li>
<li>this proof can be written in “first-order logic”, a particularly limited set of axioms</li>
<li>because this proof is finitely long, and uses only first-order logic, it “can’t tell the difference” between <span class="math">\(\CC\)</span> and <span class="math">\(\FFx{p}\)</span> for large enough <span class="math">\(p\)</span><ul>
<li>note: <span class="math">\(\FFx{p}\)</span> is the algebraic closure of the finite field <span class="math">\(\FF_p\)</span></li>
</ul>
</li>
<li>pick a large enough <span class="math">\(p\)</span>, and transfer our proof to <span class="math">\(\FFx{p}\)</span>; this won’t affect its structure or validity</li>
<li>show that there is, in fact, no counterexample in <span class="math">\(\FFx{p}\)</span></li>
<li>by contradiction, there is no disproof, and the theorem must be true</li>
</ul>
<p>This is an… unusual proof strategy. I don’t usually think about my proofs as mathematical objects unto themselves. But that’s probably because I’m not a model theorist.</p>
<hr>
<p>First, we’ll get the last step out of the way.</p>
<p><em>Proof</em>: Let <span class="math">\(f: \FFx{p}^n \to \FFx{p}^n\)</span> be injective. Pick an arbitrary target <span class="math">\(y_i \in \FFx{p}^n\)</span> to hit. Let <span class="math">\(K \supseteq \FF_p\)</span> be the field extension generated by the <span class="math">\(y_i\)</span> and the coefficients that show up in <span class="math">\(f\)</span>. Since all of these generators are algebraic over <span class="math">\(\FF_p\)</span>, and there’s finitely many of them, <span class="math">\(K\)</span> is finite. Also, since fields are closed under polynomial operations, <span class="math">\(f(K^n) \subseteq K^n\)</span>. But because <span class="math">\(f\)</span> is injective, and <span class="math">\(K^n\)</span> is finite, <span class="math">\(f(K^n)\)</span> must be all of <span class="math">\(K^n\)</span>, i.e., there’s some input <span class="math">\(x_i\)</span> such that <span class="math">\(f(x_i) = y_i\)</span>. Thus <span class="math">\(f\)</span> is surjective.</p>
<hr>
<p>Now for the exciting stuff.</p>
<p>We have to figure out a way of taking proofs over <span class="math">\(\CC\)</span>, and translating them into proofs over <span class="math">\(\FFx{p}\)</span>. This is daunting, but it’s made easier by the fact that they are both algebraically closed fields, and so they have a shared pool of axioms. Of course, they are very different in other ways: <span class="math">\(\CC\)</span> is uncountable while <span class="math">\(\FFx{p}\)</span> is countable, they have different characteristic, etc. We have to show that our proof manipulations aren’t affected by these differences.</p>
<p>Since this isn’t an intro to model theory post, I won’t be defining the basic terms. If these look unfamiliar, check out <a href="https://www.lesswrong.com/posts/F6BrJFkqEhh22rFsZ/very-basic-model-theory">this post</a>.</p>
<p>Let <span class="math">\(\ACF\)</span> be the theory of algebraically closed fields. We claim that it’s first-order, and it’s <em>almost</em> complete.</p>
<p>This is a theory in the language of rings, which is <span class="math">\(\cL_{ring} = \{ +, \times, 0, 1 \}\)</span>. Our axioms are:</p>
<ul>
<li>the usual field axioms (these are all first-order)</li>
<li>for each <span class="math">\(d \ge 1\)</span>, add the sentence <span class="math">\(\forall a_0 \forall a_1 \cdots \forall a_d \exists x \ a_0 + a_1 x + \cdots a_d x^d = 0 \land a_d \ne 0\)</span><ul>
<li>this are first-order sentences, and together, they tell us that every non-constant polynomial has a root</li>
</ul>
</li>
</ul>
<p>So <span class="math">\(\ACF\)</span> is a first-order theory. It isn’t complete, of course. For example, the sentence <span class="math">\(1 + 1 = 0\)</span> is true in <span class="math">\(\FFx{2}\)</span>, but not in <span class="math">\(\FFx{3}\)</span> or <span class="math">\(\CC\)</span>. Turns out fields of different characteristic are… different. No surprise there.</p>
<p>So we define extensions of <span class="math">\(\ACF\)</span>, where we <em>do</em> specify the characteristic. For a prime <span class="math">\(p\)</span>, define <span class="math">\(S_p\)</span> to be the sentence <span class="math">\(1 + \cdots + 1 = 0\)</span>, where there are <span class="math">\(p\)</span> copies of <span class="math">\(1\)</span>. Then the theory of algebraically closed fields of characteristic <span class="math">\(p\)</span> is <span class="math">\(\ACF_p = \ACF \cup \{ S_p \}\)</span>.</p>
<p>What about characteristic <span class="math">\(0\)</span>? To force our field to have characteristic zero, we can throw in <span class="math">\(\lnot S_p\)</span> for all primes <span class="math">\(p\)</span>: <span class="math">\(\ACF_0 = \ACF \cup \{ \lnot S_2, \lnot S_3, \lnot S_5, \ldots \}\)</span>. This nails down exactly the algebraically closed fields of characteristic <span class="math">\(0\)</span>.</p>
<p>We claim that <span class="math">\(\ACF_0\)</span> and <span class="math">\(\ACF_p\)</span> are complete theories.</p>
<hr>
<p>If that is indeed the case, then we can prove a stronger form of the Ax-Grothendieck theorem.</p>
<div class="theorem-box">
<div class="theorem-title">Ax-Grothendieck Theorem (Stronger)</div>
<p>Let <span class="math">\(k\)</span> be an algebraically closed field. If <span class="math">\(f: k^n \to k^n\)</span> is a polynomial map, then if <span class="math">\(f\)</span> is injective, it is surjective.</p>
</div>
<p><em>Proof</em>: We start by breaking our claim into a number of first-order sentences. We can’t first-order define an arbitrary polynomial, so we’ll work with all polynomials of bounded degree. For a fixed <span class="math">\(d\)</span>, the sentence “for all polynomial maps <span class="math">\(f\)</span> of degree at most <span class="math">\(d\)</span>, injectivity of <span class="math">\(f\)</span> implies surjectivity of <span class="math">\(f\)</span>” can be expressed as a first-order sentence.</p>
<p>First, introduce <span class="math">\(n \cdot (d+1)\)</span> variables for the coefficients of <span class="math">\(f\)</span>. The sentence “<span class="math">\(f\)</span> is injective” can be made first-order by taking <span class="math">\(f(x) = f(y) \implies x = y\)</span> and expanding out the coefficients of <span class="math">\(f\)</span>. Likewise, “<span class="math">\(f\)</span> is surjective” can be written as <span class="math">\(\forall z \exists x \ f(x) = z\)</span>, and expanding <span class="math">\(f\)</span>.</p>
<p>As an example, if <span class="math">\(n = 1, d = 2\)</span>, our sentence is:
</p>
<div class="math">$$ \forall a_0 \forall a_1 \forall a_2 \ (\forall x \forall y \ a_2 x^2 + a_1 x + a_0 = a_2 y^2 + a_1 y + a_0 \implies x = y) $$</div>
<div class="math">$$ \implies \forall z \exists x \ a_2 x^2 + a_1 x + a_0 = z $$</div>
<p>Since I literally never want to write out that sentence in the general case, let’s just call it <span class="math">\(\phi_d\)</span>.</p>
<p>We’ll separately tackle the case of characteristic <span class="math">\(p\)</span> and characteristic <span class="math">\(0\)</span>.</p>
<p>Let <span class="math">\(p\)</span> be any prime. Because <span class="math">\(\ACF_p\)</span> is complete, either there is a proof of <span class="math">\(\phi_d\)</span> or a proof of <span class="math">\(\lnot \phi_d\)</span>. The latter is impossible; if there were such a proof, then it would show that <span class="math">\(\phi_d\)</span> is false in <span class="math">\(\FFx{p}\)</span>, and we’ve proven before that it is true in this field. Therefore, <span class="math">\(\ACF_p\)</span> entails a proof of <span class="math">\(\phi_d\)</span>.</p>
<p>Similarly, because <span class="math">\(\ACF_0\)</span> is complete, either it can prove <span class="math">\(\phi_d\)</span>, or it can prove <span class="math">\(\lnot \phi_d\)</span>. Again, for the sake of contradiction, we assume the latter. Let <span class="math">\(P\)</span> be a proof of <span class="math">\(\phi_d\)</span> from <span class="math">\(\ACF_0\)</span>. Since <span class="math">\(P\)</span> is finite, it can only use finitely many axioms. In particular, it can only use finitely many of the <span class="math">\(\lnot S_p\)</span>. So there’s some prime <span class="math">\(q\)</span> such that <span class="math">\(\lnot S_q\)</span> was not used in <span class="math">\(P\)</span>. Therefore, <span class="math">\(P\)</span> is also a valid proof in <span class="math">\(\ACF_q\)</span>. But we already know there are no proofs of <span class="math">\(\lnot \phi_d\)</span> from <span class="math">\(\ACF_q\)</span>, and so we’ve reached a contradiction. Therefore, there must be a proof of <span class="math">\(\phi_d\)</span> from <span class="math">\(\ACF_0\)</span>.</p>
<p>Since <span class="math">\(\ACF_p\)</span> can prove <span class="math">\(\phi_d\)</span>, and <span class="math">\(\ACF_0\)</span> can prove <span class="math">\(\phi_d\)</span>, we know that <span class="math">\(\phi_d\)</span> is true in all algebraically closed fields <span class="math">\(k\)</span>, no matter what the characteristic of <span class="math">\(k\)</span> is. And since <span class="math">\(\phi_d\)</span> is true for all <span class="math">\(d\)</span>, we have proved the claim for polynomials of arbitrary degree.</p>
<hr>
<p>This proof is magical in two ways.</p>
<p>One is that, despite there being no homomorphisms between <span class="math">\(\FFx{p}\)</span> and <span class="math">\(\CC\)</span>, we were able to somehow transport a claim between the two. This was possible not by looking at the structure of <span class="math">\(\CC\)</span> and <span class="math">\(\FFx{p}\)</span> themselves, but by using the structure of their axiomatizations. The reduction to only finitely many axioms is an example of the <a href="https://en.wikipedia.org/wiki/Compactness_theorem">compactness theorem</a>, a very useful logical principle.</p>
<p>The other is that we never actually made use of <span class="math">\(\phi_d\)</span>! All we knew is that it was a first-order sentence, and that it was true in some model of <span class="math">\(\ACF_p\)</span> for each <span class="math">\(p\)</span>. Generalizing this argument, we get the following principle:</p>
<div class="theorem-box">
<div class="theorem-title">Robinson's Principle</div>
<p>If <span class="math">\(\phi\)</span> is a first-order sentence, then the following are equivalent:</p>
<ol>
<li><span class="math">\(\ACF_p\)</span> proves <span class="math">\(\phi\)</span> for all but finitely many <span class="math">\(p\)</span></li></li>
<li><span class="math">\(\ACF_p\)</span> proves <span class="math">\(\phi\)</span> for infinitely many <span class="math">\(p\)</span></li></li>
<li><span class="math">\(\ACF_0\)</span> proves <span class="math">\(\phi\)</span></li></li>
</ol>
<p>Furthermore, the following are equivalent for <span class="math">\(r\)</span> a prime or <span class="math">\(0\)</span>:</p>
<ol>
<li><span class="math">\(\ACF_r\)</span> proves <span class="math">\(\phi\)</span></li></li>
<li><span class="math">\(\phi\)</span> is true in some algebraically closed field of characteristic <span class="math">\(r\)</span></li></li>
<li><span class="math">\(\phi\)</span> is true in all algebraically closed fields of characteristic <span class="math">\(r\)</span></li></li>
</ol>
</div>
<p>For the first claim, obviously (1) implies (2). The proof that (2) implies (3) is essentially the proof we gave above: if <span class="math">\(\phi\)</span> can’t be proved from <span class="math">\(\ACF_0\)</span>, then <span class="math">\(\lnot \phi\)</span> can. This proof can only use finitely many of the <span class="math">\(\lnot S_p\)</span>, and there’s infinitely many <span class="math">\(\ACF_p\)</span> that prove <span class="math">\(\phi\)</span>, so there’s some <span class="math">\(p\)</span> we can transfer the proof to and get our contradiction. The proof that (3) implies (1) is similar: if there’s a proof of <span class="math">\(\phi\)</span> from <span class="math">\(\ACF_0\)</span>, it can be transferred to all but finitely many <span class="math">\(\ACF_p\)</span>.</p>
<p>The second claim is a direct consequence of completeness of <span class="math">\(\ACF_r\)</span>.</p>
<p>Combining these two claims gives some very powerful techniques. The way we used it is: to show something is true for all algebraically closed fields, it suffices to show it only for a single example at each prime <span class="math">\(p\)</span>.</p>
<p>At this point, there is no more spooky magic, and the rest of the article is about justifying the completeness of <span class="math">\(\ACF_p\)</span> and <span class="math">\(\ACF_0\)</span>. Still cool though, IMO.</p>
<hr>
<p>First, we’ll state a popular theorem in model theory:
<div class="theorem-box">
<div class="theorem-title">Löwenheim–Skolem Theorem</div>
Let <span class="math">\(\cT\)</span> be a countable theory. If it has an infinite model, then for any infinite cardinal <span class="math">\(\kappa\)</span>, it has a model of size <span class="math">\(\kappa\)</span>.
</div></p>
<p>Essentially, first-order logic is too limited to distinguish between different sizes of infinity; if there’s a model of one infinite size, there’s a model of all infinite sizes. The proof of this theorem is somewhat involved, and we won’t cover it here, but see <a href="http://modeltheory.wikia.com/wiki/L%C3%B6wenheim-Skolem_Theorem">here</a> for a proof.</p>
<p>Using this, we can prove the Łoś–Vaught test:
<div class="theorem-box">
<div class="theorem-title">Łoś–Vaught Test</div>
Let <span class="math">\(\cT\)</span> be a theory and <span class="math">\(\kappa\)</span> be some infinite cardinal. We say that <span class="math">\(\cT\)</span> is <span class="math">\(\kappa\)</span>-categorical if there is exactly one model of <span class="math">\(\cT\)</span> of size <span class="math">\(\kappa\)</span>, up to isomorphism.
<br><br>
If <span class="math">\(\cT\)</span> is <span class="math">\(\kappa\)</span>-categorical for some <span class="math">\(\kappa\)</span>, and has no finite models, then it is a complete theory.
</div></p>
<p>This is unexpected, at least in my opinion. But then again, model theory isn’t my forte. Maybe there’s some intution one can use here that I don’t have.</p>
<p><em>Proof</em>: If <span class="math">\(\cT\)</span> isn’t complete, then there’s some <span class="math">\(\phi\)</span> such that <span class="math">\(\cT\)</span> proves neither <span class="math">\(\phi\)</span> nor <span class="math">\(\lnot \phi\)</span>. By the <a href="https://en.wikipedia.org/wiki/G%C3%B6del%27s_completeness_theorem">completeness theorem</a>, this means there’s a model <span class="math">\(M\)</span> of <span class="math">\(\cT\)</span> in which <span class="math">\(\phi\)</span> is true, and a model <span class="math">\(M'\)</span> of <span class="math">\(\cT\)</span> in which <span class="math">\(\lnot \phi\)</span> is true.</p>
<p>Since all models of <span class="math">\(\cT\)</span> are infinite, both <span class="math">\(M\)</span> and <span class="math">\(M'\)</span> are infinite. This means that <span class="math">\(M\)</span> is an infinite model of <span class="math">\(\cT \cup \{ \phi \}\)</span>, thus we can apply Löwenheim–Skolem to get a model <span class="math">\(N\)</span> of <span class="math">\(\cT \cup \{ \phi \}\)</span> which has size <span class="math">\(\kappa\)</span>. Likewise, we use <span class="math">\(M'\)</span> to get a model <span class="math">\(N'\)</span> of <span class="math">\(\cT \cup \{ \lnot \phi \}\)</span> which has size <span class="math">\(\kappa\)</span>. But because <span class="math">\(\cT\)</span> is <span class="math">\(\kappa\)</span>-categorical and both <span class="math">\(N\)</span> and <span class="math">\(N'\)</span> are models of <span class="math">\(\cT\)</span>, they must be isomorphic. But because <span class="math">\(\phi\)</span> is true in <span class="math">\(N\)</span> and false in <span class="math">\(N'\)</span>, this is a contradiction.</p>
<p>We’d like to apply the Łoś–Vaught test to <span class="math">\(\ACF_p\)</span> and <span class="math">\(\ACF_0\)</span>. Since all algebraically closed fields are infinite, it suffices to show that these theories are <span class="math">\(\kappa\)</span>-categoral for some <span class="math">\(\kappa\)</span>.</p>
<p><em>Proof</em>: Let <span class="math">\(\kappa\)</span> be an uncountable cardinal and <span class="math">\(K\)</span> be an algebraically closed field of size <span class="math">\(\kappa\)</span>. Let <span class="math">\(B\)</span> be a transcendence basis of <span class="math">\(K\)</span> over its prime subfield <span class="math">\(k\)</span> (<span class="math">\(\FF_p\)</span> or <span class="math">\(\QQ\)</span>). <a href="https://proofwiki.org/wiki/Field_of_Uncountable_Cardinality_K_has_Transcendence_Degree_K">A cardinality argument</a> shows that <span class="math">\(\|B\| = \kappa\)</span> (this is where the uncountability of <span class="math">\(\kappa\)</span> is used; for example, <span class="math">\(\overline{\QQ}(t_1, \ldots, t_n)\)</span> has transcendence degree <span class="math">\(n\)</span>, but cardinality <span class="math">\(\aleph_0\)</span>). So, if <span class="math">\(K'\)</span> is another algebraically closed field, with the same cardinality and characteristic, and we pick a transcendence basis <span class="math">\(B'\)</span>, it will also have cardinality <span class="math">\(\kappa\)</span>. The bijection between <span class="math">\(B\)</span> and <span class="math">\(B'\)</span> induces an isomorphism between <span class="math">\(k(B)\)</span> and <span class="math">\(k(B')\)</span>. But since <span class="math">\(K\)</span> and <span class="math">\(K'\)</span> are algebraically closed, and algebraic over <span class="math">\(k(B) \cong k(B')\)</span>, they are algebraic closures of the same field, and are thus isomorphic!</p>
<p>This proves that <span class="math">\(\ACF_p\)</span> and <span class="math">\(\ACF_0\)</span> are <span class="math">\(\kappa\)</span>-categorical for uncountable cardinals <span class="math">\(\kappa\)</span>. In particular, they’re <span class="math">\(\kappa\)</span>-categorical for at least one infinite cardinal, and so via the Łoś–Vaught test, we conclude they are complete.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Wedderburn's Little Theorem2018-11-05T00:00:00-08:002018-11-05T00:00:00-08:00Henry Swansontag:mathmondays.com,2018-11-05:/wedderburn<p><span class="mathdefs">
<span class="math">\(\newcommand{\ZZ}{\Bbb Z}
\newcommand{\QQ}{\Bbb Q}\)</span>
</span></p>
<p>Some rings are closer to being fields than others. A <strong>domain</strong> is a ring where we can do cancellation: if <span class="math">\(ab = ac\)</span> and <span class="math">\(a \ne 0\)</span>, then <span class="math">\(b = c\)</span>. Even closer is a <strong>division ring</strong>, a ring in which every non-zero element has a multiplicative inverse. The only distinction between fields and division rings is that the latter may be non-commutative. For this reason, division rings are also called <strong>skew-fields</strong>.</p>
<p>These form a chain of containments, each of which is strict:
fields <span class="math">\(\subset\)</span> division rings <span class="math">\(\subset\)</span> domains <span class="math">\(\subset\)</span> rings</p>
<p>Some examples:</p>
<ul>
<li><span class="math">\(\ZZ\)</span> is a domain</li>
<li><span class="math">\(\ZZ/6\ZZ\)</span> is not a domain</li>
<li>the set of <span class="math">\(n \times n\)</span> matrices is not a domain; two non-zero matrices can multiply to zero</li>
<li><span class="math">\(\QQ\)</span> is a field (duh)</li>
<li>the quaternions are a division ring</li>
</ul>
<p>Wedderburn’s theorem states that this hierarchy collapses for finite rings: every finite domain is a field.</p>
<p><span class="mathdefs">
<span class="math">\(\newcommand{\ZZ}{\Bbb Z}
\newcommand{\QQ}{\Bbb Q}\)</span>
</span></p>
<p>Some rings are closer to being fields than others. A <strong>domain</strong> is a ring where we can do cancellation: if <span class="math">\(ab = ac\)</span> and <span class="math">\(a \ne 0\)</span>, then <span class="math">\(b = c\)</span>. Even closer is a <strong>division ring</strong>, a ring in which every non-zero element has a multiplicative inverse. The only distinction between fields and division rings is that the latter may be non-commutative. For this reason, division rings are also called <strong>skew-fields</strong>.</p>
<p>These form a chain of containments, each of which is strict:
fields <span class="math">\(\subset\)</span> division rings <span class="math">\(\subset\)</span> domains <span class="math">\(\subset\)</span> rings</p>
<p>Some examples:</p>
<ul>
<li><span class="math">\(\ZZ\)</span> is a domain</li>
<li><span class="math">\(\ZZ/6\ZZ\)</span> is not a domain</li>
<li>the set of <span class="math">\(n \times n\)</span> matrices is not a domain; two non-zero matrices can multiply to zero</li>
<li><span class="math">\(\QQ\)</span> is a field (duh)</li>
<li>the quaternions are a division ring</li>
</ul>
<p>Wedderburn’s theorem states that this hierarchy collapses for finite rings: every finite domain is a field.</p>
<!-- more -->
<hr>
<p>First, we show that every finite domain is a division ring.</p>
<p>Let <span class="math">\(D\)</span> be a finite domain, and <span class="math">\(x \in D\)</span> be non-zero. The map <span class="math">\(f : D \to D\)</span> given by <span class="math">\(f(d) = xd\)</span> is injective, which we get immediately from the definition of a domain. Because <span class="math">\(D\)</span> is finite, <span class="math">\(f\)</span> injective implies that <span class="math">\(f\)</span> is surjective as well. This means there’s some <span class="math">\(y\)</span> such that <span class="math">\(f(y) = xy = 1\)</span>. This makes <span class="math">\(y\)</span> a right-inverse of <span class="math">\(x\)</span>; is it also a left-inverse? Yes! Since <span class="math">\(x = 1x = xyx\)</span>, cancellation gives us <span class="math">\(1 = yx\)</span>.</p>
<hr>
<p>The next step, showing that every finite division ring is a field, is significantly trickier. We’ll continue, knowing that <span class="math">\(D\)</span> is a division ring.</p>
<p>Our plan is to re-interpret <span class="math">\(D\)</span> as a vector space, to get some information about its size. Then, we’ll drop the additive structure, and apply some group theory to the multiplicative structure. Lastly, our result will be vulnerable to some elementary number theory.</p>
<p>Let <span class="math">\(Z\)</span> be the center of <span class="math">\(D\)</span>; the set of elements that commute multiplicatively with everything in <span class="math">\(D\)</span>. The distributive law tells us that <span class="math">\(Z\)</span> is an abelian group under addition, and by definition, <span class="math">\(Z^*\)</span> is an abelian group under multiplication. This makes <span class="math">\(Z\)</span> a field, which allows us to apply some linear algebra to the problem.</p>
<p>As with field extensions, a division ring containing a field is a vector space over that field; specifically, <span class="math">\(D\)</span> is a vector space over <span class="math">\(Z\)</span>, where vector addition is addition in <span class="math">\(D\)</span>, and scalar multiplication is multiplication by an element of <span class="math">\(Z\)</span>. This gives us some information about the size of <span class="math">\(D\)</span>. If <span class="math">\(Z\)</span> has size <span class="math">\(q\)</span>, and <span class="math">\(D\)</span> has dimension <span class="math">\(n\)</span> over <span class="math">\(Z\)</span>, then <span class="math">\(D\)</span> has size <span class="math">\(q^n\)</span>.</p>
<p>Let’s look at some linear subspaces of <span class="math">\(D\)</span> (as a vector space). For an element <span class="math">\(x \in D\)</span>, let <span class="math">\(C(x)\)</span> be the set of all elements that commute with <span class="math">\(x\)</span> (this is the <strong>centralizer</strong> of <span class="math">\(x\)</span>). We claim that this is a subspace of <span class="math">\(D\)</span>. It’s clearly closed under addition, and we claim it is also closed under scalar multiplication. If <span class="math">\(y \in C(x)\)</span> and <span class="math">\(z \in Z\)</span>, then it follows quickly that <span class="math">\((zy)x = x(zy)\)</span>, i.e., <span class="math">\(zy \in C(x)\)</span>.</p>
<p>Because <span class="math">\(C(x)\)</span> is a linear subspace, it has dimension <span class="math">\(q^k\)</span> for some <span class="math">\(1 \le k \le n\)</span>. And if <span class="math">\(x \notin Z\)</span>, we know that both these inequalities are strict. If <span class="math">\(k = n\)</span>, then <span class="math">\(C(x) = D\)</span>, and <span class="math">\(x\)</span> is in fact in the center. If <span class="math">\(k = 1\)</span>, then <span class="math">\(C(x) = Z\)</span>, and since <span class="math">\(x \in C(x)\)</span> for sure, <span class="math">\(x\)</span> is again in <span class="math">\(Z\)</span>.</p>
<p>Now we can apply some group theory. The <a href="https://en.wikipedia.org/wiki/Conjugacy_class#Conjugacy_class_equation">class equation</a> is a statement about the conjugacy classes of a group. The details are best saved for another post, but if we have a group <span class="math">\(G\)</span> with center <span class="math">\(Z(G)\)</span>, and <span class="math">\(g_1, \ldots, g_r\)</span> are distinct representatives of the non-trivial conjugacy classes, then
</p>
<div class="math">$$ |G| = |Z(G)| + \sum_{i=1}^r [G : C(g_i)] $$</div>
<p>Essentially, this comes from the fact that <span class="math">\([G : C(g_i)]\)</span> is the number of conjugates of <span class="math">\(g_i\)</span>, and that the conjugacy classes partition <span class="math">\(G\)</span>.</p>
<p>If we apply this to <span class="math">\(D^*\)</span>, and remember our observation about the size of <span class="math">\(C(x)\)</span>, then we get:
</p>
<div class="math">$$ q^n - 1 = (q - 1) + \sum_{i=1}^r \frac{q^n - 1}{q^{k_i} - 1}, \, 1 < k_i < n $$</div>
<p>We claim that this can only happen when <span class="math">\(n = 1\)</span>; i.e., when <span class="math">\(Z = D\)</span>. This would prove that <span class="math">\(D\)</span> is a field! From here on out, it’s all number theory.</p>
<hr>
<p>First, we claim that each <span class="math">\(k_i\)</span> divides <span class="math">\(n\)</span>. Let <span class="math">\(n = a k_i + b\)</span> be the result of division with remainder. Since <span class="math">\((q^n - 1)/(q^{k_i} - 1)\)</span> is the index of some <span class="math">\(C(x)\)</span>, it’s an integer, so <span class="math">\(q^{k_i} - 1\)</span> divides <span class="math">\(q^n - 1\)</span>, or equivalently, <span class="math">\(q^n \equiv 1 \pmod{q^{k_i} - 1}\)</span>. Substituting <span class="math">\(n = a k_i + b\)</span>, we get that <span class="math">\(q^b \equiv 1 \pmod{q^{k_i} - 1}\)</span>. But since <span class="math">\(b < k_i\)</span>, <span class="math">\(q^b - 1 < q^{k_i} - 1\)</span>, and so we must have that <span class="math">\(q^b - 1 = 0\)</span>; i.e., that <span class="math">\(b = 0\)</span>. (Here, we quietly used the fact that <span class="math">\(q > 1\)</span>.) Therefore, <span class="math">\(k_i\)</span> divides <span class="math">\(n\)</span>.</p>
<p>For the next step, we’ll need to introduce the <a href="https://en.wikipedia.org/wiki/Cyclotomic_polynomial">cyclotomic polynomials</a> <span class="math">\(\Phi_k(x)\)</span>. They have three properties in particular that are of interest to us:</p>
<ul>
<li>they are monic and have integer coefficients</li>
<li>for any <span class="math">\(m\)</span>, the polynomial <span class="math">\(x^m - 1\)</span> factors as <span class="math">\(\prod_{k \mid m} \Phi_k(x)\)</span></li>
<li>the roots of <span class="math">\(\Phi_k(x)\)</span> are exactly the primitive <span class="math">\(k\)</span>th roots of unity</li>
</ul>
<p>The second fact tells us that <span class="math">\(\Phi_n(x)\)</span> is a factor of <span class="math">\(x^n - 1\)</span>, but also, that it is a factor of <span class="math">\((x^n - 1)/(x^{k_i} - 1)\)</span> – the denominator cancels out out some of the <span class="math">\(\Phi_k(x)\)</span>, but <span class="math">\(\Phi_n(x)\)</span> is left intact, since <span class="math">\(k_i < n\)</span>.</p>
<p>Since the quotients <span class="math">\(\frac{x^n - 1}{\Phi_n(x)}\)</span> and <span class="math">\(\frac{(x^n - 1)/(x^{k_i} - 1)}{\Phi_n(x)}\)</span> are products of cyclotomic polynomials, each of which is monic with integer coefficients, then they are also monic with integer coefficients. Therefore, if we plug in <span class="math">\(x = q\)</span>, we will get an integer. This means that the integer <span class="math">\(\Phi_n(q)\)</span> divides the integers <span class="math">\(q^n - 1\)</span> and <span class="math">\((q^n - 1)/(q^{k_i} - 1)\)</span>. Note that we had to work for this; it’s not an immediate consequence of divisibility as polynomials. For example, consider <span class="math">\(p(x) = x + 3\)</span> snd <span class="math">\(q(x) = x^3 + 3x^2 - x/4 - 3/4\)</span>. While <span class="math">\(p(x)\)</span> divides <span class="math">\(q(x)\)</span> as polynomials, <span class="math">\(p(1) = 4\)</span> does not divide <span class="math">\(q(1) = 3\)</span>.</p>
<p>Now, returning to the class equation, we’ve shown that most of the terms are divisible by the integer <span class="math">\(\Phi_n(q)\)</span>, so the only leftover term, <span class="math">\(q - 1\)</span>, is also divisible by <span class="math">\(\Phi_n(q)\)</span>. We claim this is only possible if <span class="math">\(n = 1\)</span>, which would then give us our desired result.</p>
<p>Use the third fact about cyclotomic polynomials: <span class="math">\(\Phi_n(q) = \prod (q - \zeta)\)</span>, where <span class="math">\(\zeta\)</span> ranges over all primitive <span class="math">\(n\)</span>th roots of unity. Taking the modulus, we get that <span class="math">\(|\Phi_n(q)| = \prod |q - \zeta|\)</span>. From the triangle inequality, <span class="math">\(|q - \zeta| + |\zeta| \ge |q|\)</span>, or, rearranged, <span class="math">\(|q - \zeta| \ge |q| - |\zeta| = q - 1\)</span>. If <span class="math">\(n > 1\)</span>, then this inequality is strict, because equality only happens when <span class="math">\(\zeta = 1\)</span>. Furthermore, since <span class="math">\(q \ge 2\)</span>, we have <span class="math">\(|q - \zeta| > q - 1 \ge 1\)</span>. Therefore, if <span class="math">\(n > 1\)</span>, <span class="math">\(\Phi_n(q)\)</span> is a product of terms, all of which have absolute value strictly greater than <span class="math">\(q - 1\)</span> and <span class="math">\(1\)</span>, thus, <span class="math">\(|\Phi_n(q)| > q - 1\)</span>. But this means that <span class="math">\(\Phi_n(q)\)</span> cannot divide <span class="math">\(q - 1\)</span>, and so this is a contradiction!</p>
<p>Therefore, <span class="math">\(n = 1\)</span>, which forces <span class="math">\(Z = D\)</span>, and thus <span class="math">\(D\)</span> to be commutative; hence, a field. Q.E.D!</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Sylow Theorems2018-10-29T00:00:00-07:002018-10-29T00:00:00-07:00Henry Swansontag:mathmondays.com,2018-10-29:/sylow<p><span class="mathdefs">
<span class="math">\(\newcommand{\ZZ}{\Bbb Z}
\DeclareMathOperator{\Stab}{Stab}
\DeclareMathOperator{\Fix}{Fix}
\DeclareMathOperator{\Aut}{Aut}
\DeclareMathOperator{\sgn}{sgn}\)</span>
</span></p>
<p>In group theory, the Sylow theorems are a triplet of theorems that pin down a suprising amount of information about certain subgroups.</p>
<p>Lagrange’s theorem tells us that if <span class="math">\(H\)</span> is a subgroup of <span class="math">\(G\)</span>, then the size of <span class="math">\(H\)</span> divides the size of <span class="math">\(G\)</span>. The Sylow theorems give us some answers to the converse question: for what divisors of <span class="math">\(|G|\)</span> can we find a subgroup of that size?</p>
<p><span class="mathdefs">
<span class="math">\(\newcommand{\ZZ}{\Bbb Z}
\DeclareMathOperator{\Stab}{Stab}
\DeclareMathOperator{\Fix}{Fix}
\DeclareMathOperator{\Aut}{Aut}
\DeclareMathOperator{\sgn}{sgn}\)</span>
</span></p>
<p>In group theory, the Sylow theorems are a triplet of theorems that pin down a suprising amount of information about certain subgroups.</p>
<p>Lagrange’s theorem tells us that if <span class="math">\(H\)</span> is a subgroup of <span class="math">\(G\)</span>, then the size of <span class="math">\(H\)</span> divides the size of <span class="math">\(G\)</span>. The Sylow theorems give us some answers to the converse question: for what divisors of <span class="math">\(|G|\)</span> can we find a subgroup of that size?</p>
<!-- more -->
<hr>
<p>For a group <span class="math">\(G\)</span>, and a prime <span class="math">\(p\)</span>, and <span class="math">\(n\)</span> be the largest integer such that <span class="math">\(p^n\)</span> divides <span class="math">\(|G|\)</span>. A <span class="math">\(p\)</span>-subgroup of <span class="math">\(G\)</span> is a subgroup of order <span class="math">\(p^k\)</span>, and if it has order <span class="math">\(p^n\)</span>, then it is called a Sylow <span class="math">\(p\)</span>-subgroup. Under these definitions, the Sylow theorems are:</p>
<div class="theorem-box">
<div class="theorem-title">Sylow Theorems</div>
<ol>
<li>Every <span class="math">\(p\)</span>-subgroup is contained in a Sylow <span class="math">\(p\)</span>-subgroup. As such, Sylow <span class="math">\(p\)</span>-subgroups exist.</li>
<li>All Sylow <span class="math">\(p\)</span>-subgroups are conjugate to each other.</li>
<li>Let <span class="math">\(n_p\)</span> be the number of Sylow <span class="math">\(p\)</span>-subgroups, and <span class="math">\(m = |G|/p^n\)</span>. Then the following hold:<ul>
<li><span class="math">\(n_p\)</span> divides <span class="math">\(m\)</span></li>
<li><span class="math">\(n_p \equiv 1 \bmod p\)</span></li>
<li><span class="math">\(n_p = [G : N(P)]\)</span>, where <span class="math">\(N(P)\)</span> is the normalizer of any Sylow <span class="math">\(p\)</span>-subgroup.</li>
</ul>
</li>
</ol>
</div>
<p>These are rather technical and deserve some more thorough digestion. Sylow 1 tells us that maximal <span class="math">\(p\)</span>-subgroups are as big as possible; there is no obstruction preventing them from being the full <span class="math">\(p^n\)</span>.</p>
<p>Sylow 2 tells us that all Sylow <span class="math">\(p\)</span>-subgroups are isomorphic in a very strong way; there is a conjugation of the group sending them to each other. To see how this is a strong criterion, consider a non-example. Let <span class="math">\(G = \ZZ_4 \times \ZZ_2\)</span>, and pick out the subgroups <span class="math">\(H_1 = \{ (0, 0), (2, 0) \}\)</span> and <span class="math">\(H_2 = \{ (0, 0), (0, 1) \}\)</span>. It’s clear that <span class="math">\(H_1\)</span> and <span class="math">\(H_2\)</span> are isomorphic, but they are not conjugate. This manifests in <span class="math">\(G/H_1 = \ZZ_2 \times \ZZ_2\)</span> and <span class="math">\(G/H_2 = \ZZ_4\)</span> not being isomorphic.</p>
<p>Sylow 3 is the easiest to understand; it just puts some arithmetic criteria on <span class="math">\(n_p\)</span>. For small-ish groups, this is often enough to nail down <span class="math">\(n_p\)</span> exactly!</p>
<p>On to the proofs!</p>
<h2>Lemma</h2>
<p>First let’s establish a lemma we’ll use frequently.
<div class="theorem-box">
<div class="theorem-title">Lemma</div>
If <span class="math">\(G\)</span> is a <span class="math">\(p\)</span>-group, and it acts on a set <span class="math">\(X\)</span>, then <span class="math">\(|X| \equiv |\Fix(X)| \bmod p\)</span>, where <span class="math">\(\Fix(X)\)</span> is the set of points in <span class="math">\(X\)</span> that are fixed by every <span class="math">\(g \in G\)</span>.
</div></p>
<p>Proof: Let <span class="math">\(x_1, \ldots, x_k\)</span> be representatives for the <span class="math">\(G\)</span>-orbits of <span class="math">\(X\)</span>. We know that the sum of the sizes of the orbits is <span class="math">\(|X|\)</span>. If <span class="math">\(x_i\)</span> is a fixed point, then the orbit is of size <span class="math">\(1\)</span>. If it is not, then by orbit-stabilizer, the size of the orbit is <span class="math">\([G : \Stab(x_i)]\)</span>, which is divisible by <span class="math">\(p\)</span>. Thus, mod <span class="math">\(p\)</span>, every fixed point contributes <span class="math">\(1\)</span>, and everything else in <span class="math">\(X\)</span> contributes <span class="math">\(0\)</span>.</p>
<h2>Sylow 1</h2>
<p>Given a <span class="math">\(p\)</span>-subgroup <span class="math">\(H\)</span>, we show that, if it is not already maximal, we can find a <span class="math">\(p\)</span>-subgroup <span class="math">\(H' \supset H\)</span> that is <span class="math">\(p\)</span> times bigger. Repeating this process gives us a Sylow <span class="math">\(p\)</span>-subgroup containing our original <span class="math">\(H\)</span>. Since the trivial subgroup is a <span class="math">\(p\)</span>-subgroup, this also establishes the existence of Sylow <span class="math">\(p\)</span>-subgroups!</p>
<p>Let <span class="math">\(H\)</span> be a <span class="math">\(p\)</span>-group that is not maximal, i.e., it has order <span class="math">\(p^i\)</span>, where <span class="math">\(i < n\)</span>. There is a natural action of <span class="math">\(H\)</span> on the left coset space <span class="math">\(G/H\)</span>, and since <span class="math">\(H\)</span> is a <span class="math">\(p\)</span>-group, our lemma tells us that <span class="math">\(|G/H|\)</span> is equivalent to the number of fixed points mod <span class="math">\(p\)</span>. But since <span class="math">\(i < n\)</span>, <span class="math">\(G/H\)</span> has order divisible by <span class="math">\(p\)</span>. So the number of fixed points of this action is also divisible by <span class="math">\(p\)</span>.</p>
<p>What do fixed points of this action look like? If <span class="math">\(gH\)</span> is a coset fixed by <span class="math">\(h \in H\)</span>, then <span class="math">\(hgH = gH\)</span>, i.e., <span class="math">\(g^{-1} h g \in H\)</span>. If this is true for all <span class="math">\(h\)</span>, then <span class="math">\(g\)</span> lies in the normalizer of <span class="math">\(H\)</span>. The converse is also true, since these implications were all reversible. This means that <span class="math">\(N(H)\)</span> is composed of the cosets of <span class="math">\(H\)</span> that are fixed points.</p>
<p>Combining the two observations above, we conclude that <span class="math">\([N(H) : H]\)</span> is divisible by <span class="math">\(p\)</span>. Therefore, by Cauchy’s theorem, there’s some subgroup of order <span class="math">\(p\)</span> in <span class="math">\(N(H)/H\)</span>. Lifting this subgroup to <span class="math">\(N(H)\)</span>, we get a subgroup of size <span class="math">\(p \cdot |H| = p^{i+1}\)</span>. This is the <span class="math">\(H'\)</span> we were looking for.</p>
<h2>Sylow 2</h2>
<p>Let <span class="math">\(P\)</span> and <span class="math">\(Q\)</span> be two Sylow <span class="math">\(p\)</span>-subgroups of <span class="math">\(G\)</span>. We want to show they are conjugate.</p>
<p>There is a natural action of <span class="math">\(P\)</span> on <span class="math">\(G\)</span> by multiplication, and this descends to an action of <span class="math">\(P\)</span> on <span class="math">\(G/Q\)</span> (again, left coset space). From our lemma, the number of fixed points of this action is equivalent to <span class="math">\(|G/Q|\)</span>, mod <span class="math">\(p\)</span>. But since <span class="math">\(Q\)</span> is a Sylow <span class="math">\(p\)</span>-subgroup, <span class="math">\(|G/Q|\)</span> is not divisible by <span class="math">\(p\)</span>. This means that the number of fixed points cannot be zero; i.e., there is at least one fixed point for this action. This is some <span class="math">\(gQ\)</span> such that <span class="math">\(pgQ = gQ\)</span> for all <span class="math">\(p \in P\)</span>. Or, rearranging the terms, a <span class="math">\(g\)</span> such that <span class="math">\(g^{-1}pg \in Q\)</span> for all <span class="math">\(p \in P\)</span>. Since <span class="math">\(P\)</span> and <span class="math">\(Q\)</span> are the same size, being Sylow <span class="math">\(p\)</span>-subgroups, this means that <span class="math">\(g^{-1}Pg = Q\)</span>, and so they are indeed conjugate.</p>
<h2>Sylow 3</h2>
<p>Let <span class="math">\(P\)</span> be a particular Sylow <span class="math">\(p\)</span>-subgroup, and let it act on the set of <em>all</em> Sylow <span class="math">\(p\)</span>-subgroups by conjugation. We claim that <span class="math">\(P\)</span> is the only fixed point of this action. This would, by our lemma (we’re getting so much mileage out of this baby), instantly tell us that <span class="math">\(n_p \equiv 1 \bmod p\)</span>.</p>
<p>Consider some fixed point <span class="math">\(Q\)</span>. Then for any <span class="math">\(p \in P\)</span>, <span class="math">\(p^{-1}Qp = Q\)</span>, which means that <span class="math">\(P\)</span> lies in the normalizer of <span class="math">\(Q\)</span>. Since both <span class="math">\(P\)</span> and <span class="math">\(Q\)</span> are Sylow <span class="math">\(p\)</span>-subgroups of <span class="math">\(G\)</span>, they are both Sylow <span class="math">\(p\)</span>-subgroups of <span class="math">\(N(Q)\)</span>. By Sylow 2, they must be conjugate, but since <span class="math">\(Q\)</span> is normal in <span class="math">\(N(Q)\)</span>, it’s not going anywhere under conjugation. Thus <span class="math">\(Q\)</span> must equal <span class="math">\(P\)</span>.</p>
<p>Next, we show that <span class="math">\(n_p = [G : N(P)]\)</span>. Consider the action of <span class="math">\(G\)</span> by conjugation on the set of Sylow <span class="math">\(p\)</span>-subgroups. There’s only one orbit, because of Sylow 2, and by orbit-stabilizer, it has size <span class="math">\([G : \Stab(P)]\)</span>. But the stabilizer of <span class="math">\(P\)</span> is just the normalizer, so <span class="math">\(n_p = [G : N(P)]\)</span>, as desired.</p>
<p>Lastly, since <span class="math">\(m = [G : P] = [G : N(P)] [N(P) : P]\)</span>, we get that <span class="math">\(n_p\)</span> divides <span class="math">\(m\)</span> for free.</p>
<h2>Applications</h2>
<p>Cool! These are nice theorems, how do we put them to use? Let’s look at some example applications.</p>
<hr>
<p><em>Show that <span class="math">\(\ZZ_{35}\)</span> is the only group of size <span class="math">\(35\)</span>.</em></p>
<p>Let <span class="math">\(G\)</span> be a group of size <span class="math">\(35\)</span>. We’ll consider its Sylow <span class="math">\(5\)</span> and <span class="math">\(7\)</span>-subgroups. By Sylow 3, we know that <span class="math">\(n_5 \equiv 1 \bmod 5\)</span>, and divides <span class="math">\(7\)</span>. This means it’s gotta be <span class="math">\(1\)</span>, which means <span class="math">\(G\)</span> has a normal subgroup of size <span class="math">\(5\)</span>. Likewise, <span class="math">\(n_7 \equiv 1 \bmod 7\)</span>, and divides <span class="math">\(5\)</span>, so <span class="math">\(G\)</span> has a normal subgroup of size <span class="math">\(7\)</span> as well. They intersect trivially, since their sizes are relatively prime, so <span class="math">\(G\)</span> is a direct product of these groups. Therefore, <span class="math">\(G \cong \ZZ_5 \times \ZZ_7\)</span>, which is <span class="math">\(\ZZ_{35}\)</span>.</p>
<hr>
<p><em>Classify all groups of order <span class="math">\(105\)</span>.</em></p>
<p>Let <span class="math">\(G\)</span> be a group of order <span class="math">\(105\)</span>. First, we show that it has normal Sylow <span class="math">\(5\)</span>- and <span class="math">\(7\)</span>-subgroups. Sylow 3 restricts <span class="math">\(n_5 = 1,21\)</span> and <span class="math">\(n_7 = 1,15\)</span>.</p>
<p>If <span class="math">\(n_5 = 1\)</span>, then there’s a unique Sylow <span class="math">\(5\)</span>-subgroup <span class="math">\(N_5\)</span>. Picking out some Sylow <span class="math">\(7\)</span>-subgroup <span class="math">\(P_7\)</span>, we get a subgroup <span class="math">\(H = N_5 P_7\)</span> of size <span class="math">\(35\)</span> (the normality of <span class="math">\(N_5\)</span> is necessary for this to be a subgroup). But from our previous exercise, we know that this must be isomorphic to <span class="math">\(\ZZ_{35}\)</span>. Since it’s abelian, <span class="math">\(P_7\)</span> must of course be normal in <span class="math">\(H\)</span>. This means that the normalizer <span class="math">\(N(P_7) \supseteq H\)</span>. Since <span class="math">\(n_7 = [G : N(P_7)] \le [G : H] = 3\)</span>, we are forced to conclude that <span class="math">\(n_7 = 1\)</span> as well.</p>
<p>Likewise, if <span class="math">\(n_7 = 1\)</span>, we can construct a subgroup <span class="math">\(H = P_5 N_7\)</span> isomorphic to <span class="math">\(\ZZ_{35}\)</span>, in which <span class="math">\(P_5\)</span> is normal. The index of <span class="math">\(H\)</span> here is <span class="math">\(7\)</span>, and this also pins down <span class="math">\(n_5 = 1\)</span>.</p>
<p>If neither of these are <span class="math">\(1\)</span>, then we run out of elements. Each of these subgroups intersects trivially (because they have prime order), and so we would have <span class="math">\(20 \cdot 4\)</span> non-identity elements from the Sylow <span class="math">\(5\)</span>-subgroups, and <span class="math">\(15 \cdot 6\)</span> non-identity elements from the Sylow <span class="math">\(7\)</span>-subgroups. Adding in the identity, this is a total of <span class="math">\(171\)</span> elements, way too many.</p>
<p>So <span class="math">\(G\)</span> has normal Sylow <span class="math">\(5\)</span>- and <span class="math">\(7\)</span>-subgroups, and their product is a subgroup <span class="math">\(H\)</span> or size <span class="math">\(35\)</span>. As the product of normal subgroups, it is itself normal. Cauchy’s theorem gives us an element <span class="math">\(x\)</span> of order <span class="math">\(3\)</span>, and it generates a subgroup <span class="math">\(K\)</span>. Since <span class="math">\(H\)</span> and <span class="math">\(K\)</span> intersect trivially, <span class="math">\(HK\)</span> is the whole group, and so <span class="math">\(G\)</span> is a semidirect product of <span class="math">\(H\)</span> and <span class="math">\(K\)</span>.</p>
<p>What options do we have for our twisting homomorphism <span class="math">\(\phi : K \to \Aut(H)\)</span>? All we have to do is specify <span class="math">\(\phi(x)\)</span>, and all we need is that <span class="math">\(\phi(x)^3\)</span> is the identity.</p>
<p>The automorphisms of <span class="math">\(\ZZ_n\)</span> are those given by multiplying by some <span class="math">\(a\)</span> relatively prime to <span class="math">\(n\)</span>. As such, the automorphisms of <span class="math">\(\ZZ_{35}\)</span> with degree dividing <span class="math">\(3\)</span> are <span class="math">\((r \mapsto ar)\)</span>, where <span class="math">\(a^3 \equiv 1 \bmod 35\)</span>. The only such solutions are <span class="math">\(1, 11, 16\)</span>.</p>
<p>If <span class="math">\(a = 1\)</span>, then this is the trivial automorphism, and so <span class="math">\(G \cong \ZZ_3 \times \ZZ_{35} \cong \ZZ_{105}\)</span>.</p>
<p>It turns out that the groups for <span class="math">\(a = 11\)</span> and <span class="math">\(a = 16\)</span> are isomorphic, but I can’t figure out a clean way to show it at the moment. Stay tuned. <!--TODO--></p>
<hr>
<p><em>Show <span class="math">\(A_5\)</span> is the smallest non-abelian simple group.</em></p>
<p>To prove this, we need to eliminate the possibility of a simple non-abelian group of any smaller size. First, we can eliminate primes; any group of size <span class="math">\(p\)</span> is cyclic, hence abelian.</p>
<p>We can also eliminate prime powers. Any group of prime power order has a non-trivial center, so it cannot be simple.</p>
<p>Next, we eliminate anything that is <span class="math">\(2\)</span> mod <span class="math">\(4\)</span>. Such a number is equal to <span class="math">\(2m\)</span> with <span class="math">\(m\)</span> odd. If <span class="math">\(G\)</span> is a group of size <span class="math">\(2m\)</span>, let <span class="math">\(G\)</span> act on itself by multiplication. This gives us a map <span class="math">\(\phi : G \to S_{2m}\)</span> sending <span class="math">\(g\)</span> to the permutation it induces. By Cauchy’s theorem, there’s an element of order <span class="math">\(2\)</span>. This induces a product of <span class="math">\(m\)</span> transpositions, and thus an odd permutation. So the map <span class="math">\(\sgn \circ \phi : G \to \{ \pm 1 \}\)</span> is surjective, and so its kernel is a non-trivial proper subgroup of <span class="math">\(G\)</span>. (Unless <span class="math">\(G\)</span> has order <span class="math">\(2\)</span>, but we already handled that case.)</p>
<p>Our last big sweep will be to eliminate groups of size <span class="math">\(p^k m\)</span> with <span class="math">\(m < p\)</span>. Since <span class="math">\(n_p\)</span> divides <span class="math">\(m\)</span>, we have <span class="math">\(n_p \le m < p\)</span>. But <span class="math">\(n_p\)</span> is <span class="math">\(1\)</span> mod <span class="math">\(p\)</span>, and so must be <span class="math">\(1\)</span>. If there is a single Sylow <span class="math">\(p\)</span>-subgroup, it must be normal. This eliminates 15, 20, 21, 28, 33, 35, 39, 44, 51, 52, 55, and 57.</p>
<p>This leaves us with 12, 24, 36, 40, 45, 48, and 56.</p>
<p><span class="math">\(|G|=40\)</span>: From the congruence conditions, we know that <span class="math">\(n_5\)</span> is <span class="math">\(1\)</span> mod <span class="math">\(5\)</span> and divides <span class="math">\(8\)</span>. But this forces it to be <span class="math">\(1\)</span>, so there is a unique Sylow <span class="math">\(5\)</span>-subgroup. </p>
<p><span class="math">\(|G|=45\)</span>: Similar to <span class="math">\(|G|=40\)</span>, the arithmetic restrictions force <span class="math">\(n_5\)</span> to be <span class="math">\(1\)</span>.</p>
<p><span class="math">\(|G| = 12\)</span>: We know that <span class="math">\(n_3\)</span> is either <span class="math">\(1\)</span> or <span class="math">\(4\)</span>. If it’s not <span class="math">\(1\)</span>, there’s <span class="math">\(4\)</span> Sylow <span class="math">\(3\)</span>-subgroups, and because they have prime order, they intersect trivially. This gives <span class="math">\(8\)</span> elements of order <span class="math">\(3\)</span>, leaving <span class="math">\(4\)</span> other elements to constitute the Sylow <span class="math">\(2\)</span>-subgroups. But each Sylow <span class="math">\(2\)</span>-subgroup has <span class="math">\(4\)</span> elements, and so there is a unique (hence normal) one.</p>
<p><span class="math">\(|G| = 56\)</span>: Similar to the case for <span class="math">\(12\)</span>. If <span class="math">\(n_7\)</span> is not <span class="math">\(1\)</span>, it is <span class="math">\(8\)</span>, yielding <span class="math">\(48\)</span> elements of order <span class="math">\(7\)</span>. The leftover <span class="math">\(8\)</span> elements form the unique Sylow <span class="math">\(2\)</span>-subgroup.</p>
<p>For the other three cases we need some stronger stuff.</p>
<p><em>Claim</em>: if <span class="math">\(G\)</span> is simple and non-abelian, then for all <span class="math">\(p\)</span> dividing <span class="math">\(|G|\)</span>, we must have <span class="math">\(|G|\)</span> divides <span class="math">\(n_p!\)</span>.</p>
<p><em>Proof</em>: Let <span class="math">\(G\)</span> act on the Sylow <span class="math">\(p\)</span>-subgroups by conjugation. Because there are <span class="math">\(n_p\)</span> of them, this gives us a homomorphism <span class="math">\(\phi : G \to S_{n_p}\)</span>. Since <span class="math">\(G\)</span> is simple, <span class="math">\(\ker \phi\)</span> is either trivial or all of <span class="math">\(G\)</span>. Because all Sylow <span class="math">\(p\)</span>-subgroups are conjugate, the latter situation only occurs when there is only one of them, something impossible if <span class="math">\(G\)</span> is simple and non-abelian.
<!-- TODO you don't need abelian-ness, having no other conjugates means you're normal! --></p>
<p>This leaves us with the former case, where the kernel is trivial, and thus <span class="math">\(\phi\)</span> is an injection. Identifying <span class="math">\(G\)</span> as a subgroup of <span class="math">\(S_{n_p}\)</span>, we get that <span class="math">\(|G|\)</span> divides <span class="math">\(n_p!\)</span> as promised.</p>
<p>We can now eliminate the last cases.</p>
<p><span class="math">\(|G|=24\)</span>: We know that <span class="math">\(n_2\)</span> is either <span class="math">\(1\)</span> or <span class="math">\(3\)</span>, by the usual congruence conditions. But now we have a new tool. If <span class="math">\(G\)</span> were simple, then <span class="math">\(24\)</span> would divide <span class="math">\(n_2!\)</span>, which it can’t in either case. So <span class="math">\(G\)</span> can’t be simple.</p>
<p><span class="math">\(|G|=36\)</span>: We know <span class="math">\(n_3\)</span> is <span class="math">\(1\)</span> or <span class="math">\(4\)</span>. If <span class="math">\(G\)</span> is simple, then <span class="math">\(36\)</span> would divide <span class="math">\(n_3!\)</span>, which it can’t.</p>
<p><span class="math">\(|G|=48\)</span>: Identical to the case for <span class="math">\(24\)</span>.</p>
<p>Phew!</p>
<p>This was a lot of work. Back when I was in high school, we had to prove this without the Sylow theorems, and by god we appreciated them. Get off my lawn!</p>
<p>(But actually though, that was an… experience.)</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>The Heawood Number2018-10-22T00:00:00-07:002018-10-22T00:00:00-07:00Henry Swansontag:mathmondays.com,2018-10-22:/heawood<p>The <a href="https://en.wikipedia.org/wiki/Four_color_theorem">four-color theorem</a> tells us that we can color any map using only four colors, such that no adjacent regions have the same color.</p>
<p>This is true for any map of the world, whether it’s on a globe or laid out flat. But what about maps on other surfaces?</p>
<p>The <a href="https://en.wikipedia.org/wiki/Four_color_theorem">four-color theorem</a> tells us that we can color any map using only four colors, such that no adjacent regions have the same color.</p>
<p>This is true for any map of the world, whether it’s on a globe or laid out flat. But what about maps on other surfaces?</p>
<!-- more -->
<hr>
<p>The mathematical formalization of the four-color theorem is: “any planar graph is 4-colorable”. Let’s break down what that means.</p>
<p>Graph here refers to a collection of vertices and edges, not a plot or a chart. For our purposes, we’ll only consider <strong>simple</strong> graphs, that is, graphs where a) there is no edge from a point to itself and b) for any pair of points, there’s at most one edge between them. A graph is <strong>planar</strong> if we can embed it in the plane (i.e., draw it on a sheet of paper) without any of the edges crossing.</p>
<p>A <em>coloring</em> of a graph is a way of coloring the vertices of the graph such that no two vertices of the same color are connected. Note that self-loops make a graph impossible to color, and multiple edges between vertices don’t matter. This is why we concentrate only on simple graphs.</p>
<p>We say a map is <span class="math">\(k\)</span>-colorable if there exists a coloring with <span class="math">\(k\)</span> colors. </p>
<p><img alt="TODO tooltip here" height="auto" src="/images/heawood/1.png" width="100%"></p>
<p>So what does this have to do with maps? The problem of coloring a map can be rephrased as a problem about coloring graphs. And since the field is called “graph theory”, and not “map theory”, that’s what we’ll do. Put a vertex for each country, and connect two vertices if the corresponding countries are adjacent. If you can color the map, then the corresponding graph can be colored in the same way. Likewise, if you can color the graph, you can use the same color assignment to color the map.</p>
<p><img alt="TODO tooltip here" height="auto" src="/images/heawood/2.png" width="100%"></p>
<p>We’re looking to answer the question: for a surface <span class="math">\(S\)</span>, how many colors do we need to guarantee we can color any graph embedded in <span class="math">\(S\)</span>? To do this, we’ll need to make use of an invariant called the “Euler characteristic”.</p>
<h1>Euler Characteristic</h1>
<p>Euler’s formula for planar graphs says that for any planar graph, <span class="math">\(V - E + F = 2\)</span>, where <span class="math">\(V\)</span> is the number of vertices, <span class="math">\(E\)</span> is the number of edges, and <span class="math">\(F\)</span> is the number of faces (including the outside face).</p>
<p>This also applies to graphs embedded on the sphere. Imagine taking a pin and poking a hole in the middle of one of the faces. Stretch this hole out until it is wide enough that you can flatten the entire sphere into a disk. Now you have a graph embedded in the plane. (This explains why we like to consider the outside face a legitimate face.)</p>
<p>But this does not apply to graphs embedded on other surfaces! Consider the following graph on the torus:</p>
<p><img alt="TODO tooltip here" height="auto" src="/images/heawood/3.png" width="100%"></p>
<p>This has 16 vertices, 32 edges, and 16 faces (count carefully, not all of them are obvious). This has <span class="math">\(V - E + F = 0\)</span>! Euler’s formula doesn’t work on the torus, but maybe we can salvage it?</p>
<p>Let’s try some examples:</p>
<p><img alt="TODO tooltip here" height="auto" src="/images/heawood/4.png" width="100%"></p>
<p>It seems we <em>usually</em> get <span class="math">\(0\)</span>, but sometimes we do get a <span class="math">\(2\)</span>, like before. To resolve this, note that in all the examples where we don’t get <span class="math">\(0\)</span>, some of the faces have “holes”. If you took the face in the <span class="math">\(3 - 3 + 1\)</span> example and laid it out flat, it’d look like a ring, not a disk.</p>
<p>So we’ll equip ourselves with another definition: if a graph is embedded in a surface, and none of the resulting faces have holes, we call that embedding <em>honest</em>. (This isn’t standard terminology, but you can’t stop me from naming things whatever I want. Try me.) It turns out that if you honestly embed a graph into the torus, you’ll always get <span class="math">\(V - E + F = 0\)</span>, no matter which graph you use, or how it’s embedded.</p>
<p>In fact, for any surface <span class="math">\(S\)</span>, we have a similar result: there’s a fixed integer <span class="math">\(\chi(S)\)</span> such that <span class="math">\(V - E + F = \chi(S)\)</span>, for any honest embedding of any graph. We call this number the <em>Euler characteristic</em> for the surface. For the plane and the sphere, <span class="math">\(\chi = 2\)</span>. For the torus, <span class="math">\(\chi = 0\)</span>. Here’s some other examples of surfaces and their Euler characteristics:</p>
<p><img alt="TODO tooltip here" height="auto" src="/images/heawood/5.png" width="100%"></p>
<h1>The Heawood Number</h1>
<p>Now we can approach the generalized four-color theorem. Armed with the Euler characteristic, we define the <strong>Heawood number</strong> of a surface with Euler characteristic <span class="math">\(\chi\)</span> as:
</p>
<div class="math">$$ H(\chi) = \left\lfloor \frac{7 + \sqrt{49 - 24 \chi}}{2} \right\rfloor $$</div>
<p>Yeah. That’s… unmotivated.</p>
<p>We claim that any graph that can be embedded on a surface with characteristic <span class="math">\(\chi\)</span>, honestly or otherwise, can be colored with at most <span class="math">\(H(\chi)\)</span> colors. For the sphere, <span class="math">\(H(2) = 4\)</span>, so our claim becomes the famous Four-Color Theorem, which is Very Hard To Prove (TM). We’ll deliberately exclude that case, like the cowards we are.</p>
<hr>
<p>The first step is to prove a lemma about the minimum degree of the graph. That’ll get us most of the way there.</p>
<p>Let <span class="math">\(S\)</span> be a surface that isn’t the sphere, and embed a graph <span class="math">\(G\)</span> on it, honestly or not. Let <span class="math">\(V\)</span>, <span class="math">\(E\)</span>, and <span class="math">\(F\)</span> be the usual, and let <span class="math">\(\delta\)</span> be the minimum degree of a vertex in <span class="math">\(G\)</span>. We claim that <span class="math">\(\delta \le H(\chi) - 1\)</span>.</p>
<p>Proof: First, we can extend this embedding to an honest embedding, by adding extra edges to cut up the faces. This can only make <span class="math">\(\delta\)</span> bigger, so if we can prove <span class="math">\(\delta \le H(\chi) - 1\)</span> for this new graph, it was also true for the old graph.</p>
<p>Next, consider the following inequalities, the motivations for which are pulled directly from my ass.</p>
<ul>
<li>Since each face has at least three edges, we know that <span class="math">\(2E \ge 3F\)</span>.</li>
<li>The sum of the degrees for all vertices is <span class="math">\(2E\)</span>. Thus, <span class="math">\(2E \ge \delta V\)</span>.</li>
<li>A vertex cannot be connected to more than <span class="math">\(V - 1\)</span> other vertices, so <span class="math">\(\delta + 1 \le V\)</span>.</li>
</ul>
<p>Now, from the definition of Euler characteristic, we have:
</p>
<div class="math">$$
\begin{align*}
\chi &= V - E + F \\
6\chi &= 6V - 6E + 6F \\
6\chi &\le 6V - 2E \\
6\chi &\le 6V - \delta V = (6 - \delta) V \\
\end{align*}
$$</div>
<p>Here we must split into cases, depending on the sign of <span class="math">\(\chi\)</span>.</p>
<p>If <span class="math">\(\chi \le 0\)</span>, then we make both sides positive before making use of our last inequality:
</p>
<div class="math">$$ -6\chi \ge (\delta - 6)V \ge (\delta - 6)(\delta + 1) = \delta^2 - 5 \delta - 6 $$</div>
<p>Now use the handy-dandy quadratic formula; we get that <span class="math">\(\delta\)</span> is at most <span class="math">\(\frac{5 + \sqrt{49 - 24 \chi}}{2} = H(\chi) - 1\)</span>. Boom.</p>
<p>Otherwise, <span class="math">\(\chi > 0\)</span>, and by the <a href="https://en.wikipedia.org/wiki/Surface_%28topology%29#Classification_of_closed_surfaces">classification of compact surfaces</a>, we know <span class="math">\(S\)</span> must be the sphere or the projective plane. We’re explicitly excluding the sphere, so <span class="math">\(S\)</span> must be the projective plane, which has Euler characteristic 1. Plugging that in, we get that <span class="math">\(6 \le (6 - \delta) V\)</span>. Since the right side is positive, we must have <span class="math">\(\delta < 6\)</span>. Because <span class="math">\(H(1) = 6\)</span>, we can still guarantee that <span class="math">\(\delta \le H(\chi) - 1\)</span>.</p>
<p>So for any graph <span class="math">\(G\)</span> embedded in <span class="math">\(S\)</span>, honestly or otherwise, there is a vertex with degree at most <span class="math">\(H(\chi) - 1\)</span>.</p>
<hr>
<p>We’re basically done! We’ll describe an explicit procedure to color graphs on <span class="math">\(S\)</span> with <span class="math">\(H(\chi)\)</span> colors.</p>
<p>Let <span class="math">\(G\)</span> be a graph embedded on <span class="math">\(S\)</span>. Our base case is the graph with one vertex; it can trivially be colored. Otherwise, consider <span class="math">\(G\)</span> with <span class="math">\(n \ge 2\)</span> vertices. By our lemma, it has some vertex <span class="math">\(v\)</span> with degree at most <span class="math">\(H(\chi) - 1\)</span>. Apply our procedure to the subgraph <span class="math">\(G - v\)</span>, coloring it with <span class="math">\(H(\chi)\)</span> colors. Since <span class="math">\(v\)</span> has strictly less than <span class="math">\(H(\chi)\)</span> neighbors, there will be at least one color available for us to color <span class="math">\(v\)</span> with, and so we can color all of <span class="math">\(G\)</span>.</p>
<h1>Conclusions</h1>
<p>We showed that any graph <span class="math">\(G\)</span> embedded in <span class="math">\(S\)</span>, honestly or otherwise, can be colored with <span class="math">\(H(\chi) = \left\lfloor \frac{7 + \sqrt{49 - 24 \chi}}{2} \right\rfloor\)</span> colors. The only case we decided not to handle was when <span class="math">\(S\)</span> is the sphere. Unfortunately, that case is much harder. The proof above was discovered in 1890 by Percy John Heawood, after whom the number is named. The Four-Color Theorem wasn’t proven until much later, in 1976, by Kenneth Appel and Wolfgang Haken. And what a controversial proof it was! They managed to reduce the problem to checking a particular property of 1,936 graphs. This wasn’t feasible to do by hand, so they used a computer to check those cases. This was the first computer-aided proof, and it ruffled quite a few feathers.</p>
<p>Secondly, we only established an upper bound on the number of colors we need in our palette. Is there a graph that requires all <span class="math">\(H(\chi)\)</span> colors? Or can we lower the bound a bit? The Heawood conjecture is the claim that we can’t; i.e., that this bound is sharp. And it’s mostly true. In 1968, Gerhard Ringel and Ted Youngs showed that, on almost any surface, you can embed the complete graph on <span class="math">\(H(\chi)\)</span> vertices. Since that graph requires all <span class="math">\(H(\chi)\)</span> colors, that shows the bound is sharp. The only exception is the Klein bottle, where the conjecture predicts <span class="math">\(H(0)=7\)</span> colors are needed, but in fact, <span class="math">\(6\)</span> colors suffice to color any graph.</p>
<p>A maximal coloring of the Klein bottle is shown below:</p>
<div class="image-container">
<p><img alt="A maximal coloring of the Klein bottle" height="250px" src="/images/heawood/6.png"></p>
</div>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Linearity of Expectation2018-10-15T00:00:00-07:002018-10-15T00:00:00-07:00Henry Swansontag:mathmondays.com,2018-10-15:/linearity-expectation<p>To introduce this topic, let’s start with an innocuous problem:</p>
<blockquote>
<p>You have <span class="math">\(10\)</span> six-sided dice. If you roll all of them, what is the expected sum of the faces?</p>
</blockquote>
<p>Your intuition should tell you that it’s <span class="math">\(35\)</span>. But what’s really going on here is an example of a slick principle called <strong>linearity of expectation</strong>.</p>
<p>To introduce this topic, let’s start with an innocuous problem:</p>
<blockquote>
<p>You have <span class="math">\(10\)</span> six-sided dice. If you roll all of them, what is the expected sum of the faces?</p>
</blockquote>
<p>Your intuition should tell you that it’s <span class="math">\(35\)</span>. But what’s really going on here is an example of a slick principle called <strong>linearity of expectation</strong>.</p>
<!-- more -->
<hr>
<p>We’re not actually computing the probability of getting <span class="math">\(10, 11, \ldots, 60\)</span>, and summing it all up. Implicitly, we are making the following line of argument: the expected value of the first die is <span class="math">\(3.5\)</span>, and so the expected value for <span class="math">\(k\)</span> dice is <span class="math">\(3.5k\)</span>. This relies on the following claim: given two random variables <span class="math">\(X\)</span> and <span class="math">\(Y\)</span>, the expected value of their sum, <span class="math">\(E[X + Y]\)</span>, is just <span class="math">\(E[X] + E[Y]\)</span>.</p>
<p>This feels intuitively true, and proving it is straightforward. Let <span class="math">\(\Omega\)</span> be the space of possible outcomes. Then
</p>
<div class="math">$$
\begin{align*}
E[X + Y] &= \sum_{\omega \in \Omega} p(\omega) (X + Y)(\omega) \\
&= \sum_{\omega \in \Omega} p(\omega) (X(\omega) + Y(\omega)) \\
&= \sum_{\omega \in \Omega} p(\omega) X(\omega) + \sum_{\omega \in \Omega} p(\omega) Y(\omega) \\
&= E[X] + E[Y]
\end{align*}
$$</div>
<p>But interestingly enough, at no point did we require <span class="math">\(X\)</span> and <span class="math">\(Y\)</span> be independent. This still works even when <span class="math">\(X\)</span> and <span class="math">\(Y\)</span> are correlated! For some sanity-checking examples, consider <span class="math">\(X = Y\)</span> and <span class="math">\(X = -Y\)</span>.</p>
<p>This principle, which is rather obvious when <span class="math">\(X\)</span> and <span class="math">\(Y\)</span> are independent (so much so that we often use it unconsciously), is unexpectedly powerful when applied to dependent variables. We’ll explore the concept through several example problems.</p>
<h1>Gumballs</h1>
<blockquote>
<p>Imagine a very large gumball machine, with <span class="math">\(4\)</span> colors of gumballs in it, evenly distributed. We only have enough money for <span class="math">\(6\)</span> gumballs; what’s the expected number of colors we will receive? Assume that the machine has so many gumballs that the ones we take out don’t matter; effectively, we are drawing with replacement.</p>
</blockquote>
<p>Let’s compute this the naive way first. Let’s count the number of ways we can get each number of colors, and do the appropriate weighted sum.</p>
<p>There are <span class="math">\(4\)</span> ways we can get only one color.</p>
<p>For any two colors, there’s <span class="math">\(2^6 = 32\)</span> ways we can get gumballs using just those colors. There’s <span class="math">\(6\)</span> pairs of colors, so there’s <span class="math">\(32 \cdot 6 = 192\)</span> ways to get at most two colors. Subtracting off the single-color cases, we get <span class="math">\(188\)</span> ways to get exactly two colors.</p>
<p>Similarly, for any three colors, there’s <span class="math">\(3^6 = 729\)</span> ways to get gumballs with just those colors. There’s <span class="math">\(4\)</span> possible triplets, giving <span class="math">\(2916\)</span> ways to get at most three colors. Subtracting off the two-color cases, we get <span class="math">\(2728\)</span> ways to get exactly three colors.</p>
<p>All other cases have four colors: <span class="math">\(4^6 - 2728 - 188 - 4 = 1176\)</span> possible ways.</p>
<p>Now we do the weighted sum. Each possible sequence of gumballs has probability <span class="math">\(1/4^6\)</span> of occuring, so the expected value of the number of colors is:
</p>
<div class="math">$$ 1 \frac{4}{4^6} + 2 \frac{188}{4^6} + 3 \frac{2728}{4^6} + 4 \frac{1176}{4^6} = \frac{3317}{1024} \approx 3.239 $$</div>
<p>It’s doable, but one can imagine this is much harder for larger numbers.</p>
<hr>
<p>Let’s take another go at it. For the <span class="math">\(i\)</span>th color, define <span class="math">\(X_i\)</span> to be <span class="math">\(1\)</span> if we get at least one gumball of that color, and <span class="math">\(0\)</span> otherwise. The number of colors we get, <span class="math">\(X\)</span>, is then the sum of the <span class="math">\(X_i\)</span>.</p>
<p>The probability of <em>not</em> getting a gumball of a particular color on a particular draw is <span class="math">\(3/4\)</span>, so the probability of not getting it in <span class="math">\(6\)</span> draws is <span class="math">\((3/4)^6\)</span>. This means that <span class="math">\(E[X_i] = 1 - (3/4)^6 = 3367/4096\)</span>.</p>
<p>The <span class="math">\(X_i\)</span> are not independent; for example, if we know three of them are <span class="math">\(0\)</span>, the last one must be <span class="math">\(1\)</span> (we must draw a gumball of <strong>some</strong> color). But we can still apply linearity of expectation, even to dependent variables.</p>
<p>Thus, the expected number of colors we get is <span class="math">\(E[X] = \sum_{i = 1}^4 E[X_i] = 4 \cdot \frac{3367}{4096} = \frac{3367}{1024}\)</span>, just as we got earlier.</p>
<p>Notably, this approach extends gracefully to when we take <span class="math">\(k\)</span> gumballs with <span class="math">\(n\)</span> available colors. The expected value of each <span class="math">\(X_i\)</span> is then <span class="math">\((1 - 1/n)^k\)</span>, so the expected value of <span class="math">\(X\)</span> is then <span class="math">\(n (1 - 1/n)^k\)</span>.</p>
<p>(This reveals an interesting approximation: if <span class="math">\(n\)</span> and <span class="math">\(k\)</span> are equal and large, then <span class="math">\((1 - 1/n)^n \approx 1/e\)</span>, so the expected number of colors is <span class="math">\(n(1 - 1/e) \approx 0.63n\)</span>).</p>
<h1>Number of Fixed Points</h1>
<p>These variables we saw earlier, that are <span class="math">\(1\)</span> if a condition is true, and <span class="math">\(0\)</span> otherwise, are called <strong>indicator variables</strong>, and they are particularly good candidates for linearity of expectation problems.</p>
<blockquote>
<p>After we shuffle a deck of <span class="math">\(n\)</span> cards, what are the expected number of cards that have stayed in the same position? Equivalently, given an arbitrary permutation on <span class="math">\(n\)</span> objects, how many fixed points does it have on average.</p>
</blockquote>
<p>We have no interest in examining all <span class="math">\(n!\)</span> possible outcomes, and summing over the number of fixed points in each. That would be terrible. Instead, we’re going to split our desired variable into several indicator variables, each of which is easier to analyze.</p>
<p>Let <span class="math">\(X_k\)</span> be <span class="math">\(1\)</span> if the <span class="math">\(k\)</span>th card is in the <span class="math">\(k\)</span>th position, and <span class="math">\(0\)</span> otherwise. Then the number of fixed points is <span class="math">\(\sum_k X_k\)</span>.</p>
<p>After shuffling, the <span class="math">\(k\)</span>th card is equally likely to be in any position in the deck. So the chance of ending up in the same place is <span class="math">\(1/n\)</span>, which makes <span class="math">\(E[X_k] = 1/n\)</span>. So by linearity of expectation, <span class="math">\(E[X_1 + \cdots + X_n] = n \cdot \frac{1}{n} = 1\)</span>. So on average, one card will stay in the same place.</p>
<h1>Number of Cycles</h1>
<p>We don’t have to limit ourselves to indicator variables: sometimes we can use a constant factor to help us avoid overcounting.</p>
<blockquote>
<p>Given a random permutation on <span class="math">\(n\)</span> objects, how many cycles does it have?</p>
</blockquote>
<p>As a reminder, the cycles of a permutation are the “connected components”. For example, if <span class="math">\(\sigma\)</span> sends <span class="math">\(1 \to 2\)</span>, <span class="math">\(2 \to 4\)</span>, <span class="math">\(3 \to 6\)</span>, <span class="math">\(4 \to 1\)</span>, <span class="math">\(5 \to 5\)</span>, and <span class="math">\(6 \to 3\)</span>, then the cycles of <span class="math">\(\sigma\)</span> are <span class="math">\((1, 2, 4)\)</span>, <span class="math">\((3, 6)\)</span>, and <span class="math">\((5)\)</span>.</p>
<p>For each <span class="math">\(k\)</span>, let <span class="math">\(X_k = \frac{1}{L}\)</span>, where <span class="math">\(L\)</span> is the length of the cycle of <span class="math">\(\sigma\)</span> containing the number <span class="math">\(k\)</span>. So for the permutation we described, <span class="math">\(X_1 = X_2 = X_4 = 1/3\)</span>, <span class="math">\(X_3 = X_6 = 1/2\)</span>, and <span class="math">\(X_5 = 1\)</span>. Then the number of cycles is <span class="math">\(X_1 + \cdots + X_n\)</span>, since each cycle contributes <span class="math">\(L\)</span> copies of <span class="math">\(1/L\)</span>. As usual, these variables are highly dependent (if <span class="math">\(X_i = 1/5\)</span>, there’d better be four other <span class="math">\(X_j\)</span> that equal <span class="math">\(1/5\)</span> as well), but we can still apply linearity of expectation.</p>
<p>The probability that <span class="math">\(k\)</span> is in a cycle of length <span class="math">\(1\)</span> is <span class="math">\(1/n\)</span>, since <span class="math">\(\sigma\)</span> would have to send <span class="math">\(k\)</span> to itself.</p>
<p>The probability it is in a cycle of length <span class="math">\(2\)</span> is the probability <span class="math">\(k\)</span> is sent to some other number, times the probability that the other number is sent back to <span class="math">\(k\)</span>, i.e. <span class="math">\(\frac{n-1}{n} \cdot \frac{1}{n - 1}\)</span>, which is <span class="math">\(\frac{1}{n}\)</span>.</p>
<p>In general, the probability of being in a cycle of length <span class="math">\(L\)</span> is <span class="math">\(\frac{n-1}{n} \frac{n-2}{n-1} \cdots \frac{n-(L-1)}{n-(L-2)} \cdot \frac{1}{n-(L-1)} = \frac{1}{n}\)</span>. Curiously, this is independent of <span class="math">\(L\)</span>.</p>
<p>So the expected value of <span class="math">\(X_k\)</span> is <span class="math">\(\frac{1}{n} \sum_{L=1}^n \frac{1}{L} = \frac{H_n}{n}\)</span>, where <span class="math">\(H_n\)</span> is the <span class="math">\(n\)</span>th <a href="https://en.wikipedia.org/wiki/Harmonic_number">harmonic number</a>. Then the expected number of cycles is <span class="math">\(E[X_1] + \cdots + E[X_n] = H_n\)</span>.</p>
<h1>Buffon’s Needle</h1>
<p>We’ll finish up with a rather surprising application to the Buffon’s needle problem:</p>
<blockquote>
<p>Consider a gigantic piece of lined paper, with the lines spaced one unit apart. If we throw a needle of length <span class="math">\(1\)</span> onto the paper, what is the probability it crosses a line?</p>
</blockquote>
<p>Technically, we’re only interested in the probability that the needle crosses the line. But because it can cross at most once, this is equal to the expected number of crossings. So if we let <span class="math">\(X_a\)</span> be the expected number of crossings for a needle of length <span class="math">\(a\)</span>, we’re interested in <span class="math">\(E[X_1]\)</span>.</p>
<p>Take a needle of length <span class="math">\(a + b\)</span>, and paint it, covering the first <span class="math">\(a\)</span> units of it red, and the other <span class="math">\(b\)</span> units blue. Then throw it on the paper. The expected number of crossings is the expected number of red crossings, plus the expected number of blue crossings. But each segment of the needle is just a smaller needle, so the expected number of red crossings is <span class="math">\(E[X_a]\)</span>, and the expected number of blue crossings is <span class="math">\(E[X_b]\)</span>. This lets us conclude, unsurprisingly, that <span class="math">\(E[X_{a+b}] = E[X_a] + E[X_b]\)</span>. This tells us that <span class="math">\(E[X_a]\)</span> is linear in <span class="math">\(a\)</span>, and so <span class="math">\(E[X_a] = Ca\)</span> for some unknown constant <span class="math">\(C\)</span>. (Well, we’ve gotta assume <span class="math">\(X_a\)</span> is continuous in <span class="math">\(a\)</span>, which it is, but shh…)</p>
<p>Furthermore, put a sharp bend in the needle right at the color boundary. Each segment is still a linear needle, so the number of red crossings is still <span class="math">\(E[X_a]\)</span>, and likewise with blue crossings. So the expected number of crossings for this bent needle is <em>still</em> <span class="math">\(E[X_{a+b}]\)</span>, despite the kink!</p>
<p>By induction, if you put a finite number of sharp bends in a needle, it doesn’t change the expected number of crossings. All that matters is the total length. And by <s>handwaving</s> a continuity argument, this is true for continuous bends as well. So <span class="math">\(X_a\)</span> doesn’t just measure the expected number of crossings for a needle of length <span class="math">\(a\)</span>, but any reasonable curve of length <span class="math">\(a\)</span>. (Much to my delight, this phenomenon is called “Buffon’s noodle”.) This means that if we throw a rigid noodle of length <span class="math">\(a\)</span> on the paper, the expected number of crossings is <span class="math">\(E[X_a] = Ca\)</span>.</p>
<p>So let’s consider a particular kind of noodle: a circle with diameter <span class="math">\(1\)</span>. No matter how it’s thrown onto the paper, it will cross the lines exactly twice. It has circumference <span class="math">\(\pi\)</span>, and so we can determine that <span class="math">\(C = \frac{2}{\pi}\)</span>. Thus, for the original needle problem, <span class="math">\(p = X_1 = \frac{2}{\pi}\)</span>.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Expected Density of Pigeons2018-10-08T00:00:00-07:002018-10-08T00:00:00-07:00Henry Swansontag:mathmondays.com,2018-10-08:/pigeons<p><span class="mathdefs">
<span class="math">\(\DeclareMathOperator{\res}{Res}\)</span>
</span></p>
<p>This one’s another puzzle from work:</p>
<blockquote>
<p>Consider a pigeon coop with <span class="math">\(n\)</span> pigeonholes, arranged in a straight line. When a pigeon arrives at the coop, it will roost in a pigeonhole only if it is empty, and both neighboring pigeonholes are also empty. It selects such a pigeonhole uniformly at random, enters the pigeonhole, and does not leave. At some point, the coop will fill up, but not every pigeonhole will be occupied. What is the expected density of pigeons in the coop, as <span class="math">\(n\)</span> grows large?</p>
</blockquote>
<p>If you run a few simulations, you get that it’s about <span class="math">\(0.432332\ldots\)</span>. But this isn’t any easily recognizable number. What is it in closed form?</p>
<p><span class="mathdefs">
<span class="math">\(\DeclareMathOperator{\res}{Res}\)</span>
</span></p>
<p>This one’s another puzzle from work:</p>
<blockquote>
<p>Consider a pigeon coop with <span class="math">\(n\)</span> pigeonholes, arranged in a straight line. When a pigeon arrives at the coop, it will roost in a pigeonhole only if it is empty, and both neighboring pigeonholes are also empty. It selects such a pigeonhole uniformly at random, enters the pigeonhole, and does not leave. At some point, the coop will fill up, but not every pigeonhole will be occupied. What is the expected density of pigeons in the coop, as <span class="math">\(n\)</span> grows large?</p>
</blockquote>
<p>If you run a few simulations, you get that it’s about <span class="math">\(0.432332\ldots\)</span>. But this isn’t any easily recognizable number. What is it in closed form?</p>
<!-- more -->
<hr>
<p>This problem illustrates one of the things I find really cool about math: the boundaries between different disciplines are essentially fictitious. This is a combinatorics problem, and so we might expect to be using arguments involving counting, bijections, and other finite tools. But instead we’ll sprint as fast as we can into the realm of analysis and solve the problem there.</p>
<p>Let <span class="math">\(a_n\)</span> be the expected number of pigeons for a coop with <span class="math">\(n\)</span> holes. Then we can come up with a recurrence relation for <span class="math">\(a_n\)</span>.</p>
<p>Consider what happens when the first pigeon arrives in an unoccupied coop. If it arrives in the first hole, then we can imagine deleting the first hole and its neighbor from the coop, leaving us with an unoccupied coop of size <span class="math">\(n - 2\)</span>. If it lands in the last hole, we have the same situation. Otherwise, it lands somewhere in the middle; when a pigeon comes to rest in the <span class="math">\(k\)</span>th hole (I’m going to <span class="math">\(1\)</span>-index, by the way), it splits the coop into two smaller coops, one with <span class="math">\(k - 2\)</span> holes, and the other with <span class="math">\(n - k - 1\)</span> holes. Since each hole is equally likely, we can average over all values of <span class="math">\(k\)</span> to get a first draft of our recurrence relation:
</p>
<div class="math">$$ a_n = 1 + \frac{1}{n} \left( a_{n-2} + a_{n-2} + \sum_{k=2}^{n-1} (a_{k-2} + a_{n-k-1}) \right) $$</div>
<p>This can be prettied up with some mild re-indexing:
</p>
<div class="math">$$ a_n = 1 + \frac{2}{n} \sum_{k=0}^{n-2} a_k $$</div>
<p>We can do even better though! If we consider <span class="math">\(n a_n - (n-1) a_{n-1}\)</span>, we can collapse most of our terms:
</p>
<div class="math">$$
\begin{align*}
n a_n - (n-1) a_{n-1} &= \left( n + 2 \sum_{k=0}^{n-2} a_k \right) - \left( n-1 + 2 \sum_{k=0}^{n-1} a_k \right) \\
n a_n - (n-1) a_{n-1} &= 1 + 2 a_{n-2} \\
a_n &= \frac{1}{n} ( 1 + (n-1) a_{n-1} + 2 a_{n-2} )
\end{align*}
$$</div>
<hr>
<p>This isn’t a linear recurrence relation, so we can’t apply linear algebra tricks to it. So we fall back on the Swiss Army knife of recurrence relations: the generating function.</p>
<p>Let <span class="math">\(G(z) = a_0 + a_1 z + a_2 z^2 + a_3 z^3 + \cdots\)</span>. We don’t know what this function is yet, but we can use the recurrence relation to pin down what it is.
</p>
<div class="math">\begin{align*}
G(z) &= \sum_{n=0}^\infty a_n z^n \\
G'(z) &= \sum_{n=1}^\infty n a_n z^{n-1} \\
&= a_1 + \sum_{n=2}^\infty n a_n z^{n-1} \\
&= a_1 + \sum_{n=2}^\infty \left( 1 + (n-1) a_{n-1} + 2 a_{n-2} \right) z^{n-1}
\end{align*}</div>
<p>Dealing with the three pieces separately makes this much easier to read (and also to write *wink*):
</p>
<div class="math">$$ \sum_{n=2}^\infty z^{n-1} = \frac{z}{1 - z} $$</div>
<div class="math">$$ \sum_{n=2}^\infty (n-1) a_{n-1} z^{n-1} = \sum_{n=1}^\infty n a_n z^n = z G'(z) $$</div>
<div class="math">$$ \sum_{n=2}^\infty 2 a_{n-2} z^{n-1} = 2 \sum_{n=0}^\infty a_n z^{n+1} = 2z G(z) $$</div>
<p>Putting it all together, we get a differential equation for <span class="math">\(G(z)\)</span>:
</p>
<div class="math">$$ G'(z) = 1 + \frac{z}{1 - z} + z G'(z) + 2z G(z) $$</div>
<p>Cleaning it up a little, we see that it’s first order and linear, so we can put those diff eq skills to use:
</p>
<div class="math">$$ G'(z) = \frac{2z}{1 - z} G(z) + \frac{1}{(1 - z)^2} $$</div>
<p>The details aren’t super important, but basically you use an <a href="https://en.wikipedia.org/wiki/Integrating_factor">integrating factor</a> and get:
</p>
<div class="math">$$ G(z) = \frac{1 + C e^{-2z}}{2(z-1)^2} $$</div>
<p>What should <span class="math">\(C\)</span> be? We’ll have to use our initial conditions, and one of them is particularly straightforward: <span class="math">\(G(0) = a_0\)</span>, which we know is <span class="math">\(0\)</span>, and so <span class="math">\(C = -1\)</span>.</p>
<hr>
<p>At this point, let’s stop and recollect our thoughts. We’ve defined a function <span class="math">\(G(z)\)</span> whose power series coefficients are <span class="math">\(a_n\)</span>, the average number of pigeons in a coop of size <span class="math">\(n\)</span>. Our solution is now encoded in quite a peculiar way: how fast do the coefficients of <span class="math">\(G(z)\)</span> grow?</p>
<!-- TODO smart quotes -->
<p>To figure this out, let’s put the “analytic” in “analytic combinatorics”, and consider some contour integrals. Fix some <span class="math">\(R > 1\)</span>, and define <span class="math">\(I_n\)</span> to be the integral of <span class="math">\(G(z)/z^{n+1}\)</span> around the circle of radius <span class="math">\(R\)</span> at the origin (taken counter-clockwise).</p>
<p>What is <span class="math">\(I_n\)</span>? We can evaluate it using the <a href="https://mathmondays.com/residues">residue theorem</a>. There are two poles, one at <span class="math">\(z = 0\)</span>, and the other at <span class="math">\(z = 1\)</span>. The former is easy to compute; the residue is the coefficient on the <span class="math">\(z^{-1}\)</span> term, which is exactly <span class="math">\(a_n\)</span>. The second does not admit such a nice description, and so we compute it the usual way:
</p>
<div class="math">\begin{align*}
\res\left( \frac{G(z)}{z^{n+1}}, 1\right) &= \lim_{z \to 1} \frac{d}{dz} (z-1)^2 \frac{G(z)}{z^{n+1}} \\
&= \lim_{z \to 1} \frac{d}{dz} \frac{1 - e^{-2z}}{2 z^{n+1}} \\
&= \lim_{z \to 1} \frac{2 z e^{-2z} - (n+1)(1 - e^{-2z})}{2 z^{n+2}} \\
&= \frac{(n+3)e^{-2} - (n+1)}{2}
\end{align*}</div>
<p>So <span class="math">\(\frac{1}{2 \pi i} I_n = a_n + \frac{(n+3)e^{-2} - (n+1)}{2}\)</span>. What good does this do us?</p>
<p>If you’ve seen this trick before, you know that <span class="math">\(I_n\)</span> drops exponentially to <span class="math">\(0\)</span> as <span class="math">\(n\)</span> increases, but if not, here’s the justification. Let <span class="math">\(M\)</span> be the largest value (in terms of absolute value) that <span class="math">\(G\)</span> attains on the circle <span class="math">\(|z| = R\)</span>. Then the triangle inequality tells us:
</p>
<div class="math">$$ | I_n | = \left| \int_{C_R} \frac{G(z)}{z^{n+1}}~dz \right| \le \int_{C_R} \left| \frac{G(z)}{z^{n+1}} \right|~dz \le \int_{C_R} \frac{M}{R^{n+1}}~dz = \frac{2 \pi M}{R^n} $$</div>
<p>So as <span class="math">\(n \to \infty\)</span>, <span class="math">\(I_n\)</span> drops to <span class="math">\(0\)</span>, and so <span class="math">\(a_n\)</span> approaches <span class="math">\(\frac{(n+1)-(n+3)e^{-2}}{2}\)</span>. Therefore, the expected density of pigeons, <span class="math">\(a_n/n\)</span>, approaches <span class="math">\((1 - e^{-2})/2\)</span>, or about <span class="math">\(0.432332\)</span>.</p>
<hr>
<p>There were other solutions that people came up with for this problem, but what I really like about this one is that it demonstrates a way to approach these problems in general, and (at least IMO) it’s a pretty unexpected one. If someone asked me to figure out how fast the coefficients of a power series grow, the residue theorem would not be the first thing on my mind. And yet, not only does it get the job done, it works for many other similar problems, in essentially the same way. I’m not much of an analysis person, but my understanding is that this kind of trick is common in analytic combinatorics, and I think that’s pretty cool!</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Cauchy Residue Theorem2018-10-01T00:00:00-07:002018-10-01T00:00:00-07:00Henry Swansontag:mathmondays.com,2018-10-01:/residues<p><span class="mathdefs">
<span class="math">\(\DeclareMathOperator{\res}{Res}\)</span>
</span></p>
<p>The Cauchy Residue Theorem is a remarkable tool for evaluating contour integrals. Essentially, it says that, instead of computing an integral along a curve <span class="math">\(\gamma\)</span>, you can replace it with a sum of “residues” at some special points <span class="math">\(a_k\)</span>:
</p>
<p>But what is a residue? What are the <span class="math">\(a_k\)</span>? What’s really going on here?</p>
<p><span class="mathdefs">
<span class="math">\(\DeclareMathOperator{\res}{Res}\)</span>
</span></p>
<p>The Cauchy Residue Theorem is a remarkable tool for evaluating contour integrals. Essentially, it says that, instead of computing an integral along a curve <span class="math">\(\gamma\)</span>, you can replace it with a sum of “residues” at some special points <span class="math">\(a_k\)</span>:
</p>
<div class="math">$$ \oint_\gamma f(z)~dz = 2 \pi i \sum_k \res(f, a_k) $$</div>
<p>But what is a residue? What are the <span class="math">\(a_k\)</span>? What’s really going on here?</p>
<!-- more -->
<h1>Residues</h1>
<p>Since this isn’t a rigorous complex analysis text, it’s a post on some blog, we’ll gloss over some of the technicalities, such as verifying convergence, or checking that holomorphic functions are analytic. All we need is some imagination, and the following fact:</p>
<div class="theorem-box">
<div class="theorem-title">Path Independence</div>
<p>Let <span class="math">\(D\)</span> be a region of the complex plane and <span class="math">\(f\)</span> be a function holomorphic (complex-differentiable) on <span class="math">\(D\)</span>. If you take a curve <span class="math">\(\gamma\)</span>, and continuously deform it into a curve <span class="math">\(\gamma'\)</span>, staying inside <span class="math">\(D\)</span>, then
<div class="math">$$ \int_\gamma f(z)~dz = \int_{\gamma'} f(z)~dz $$</div>
</p>
<p>Also, we say two such curves are “homotopic”.</p>
</div>
<p>For example, if the blue dashed area is <span class="math">\(D\)</span>, the curves in the first picture are homotopic, but not the curves in the second picture. There is no way to deform one of the curves into the other, without leaving the domain.</p>
<div class="image-container">
<p><img alt="Homotopic curves" height="250px" src="/images/residues/contours-1.svg"></p>
<p><img alt="Non-homotopic curves" height="250px" src="/images/residues/contours-2.svg"></p>
</div>
<p>If you’re comfortable with multivariable calculus, compare this to the Fundamental Theorem of Calculus for line integrals. How does complex-differentiability encode the “curl-free” condition?</p>
<p>This means that if <span class="math">\(\gamma\)</span> is a closed loop and <span class="math">\(f\)</span> is holomorphic on the region enclosed by <span class="math">\(\gamma\)</span>, then <span class="math">\(\gamma\)</span> is homotopic to a point, which tells us that <span class="math">\(\int_\gamma f~dz\)</span> must be zero. Where things get interesting is when there are points in <span class="math">\(D\)</span> at which <span class="math">\(f\)</span> is not holomorphic.</p>
<hr>
<p>So let’s approach the theorem.</p>
<p>Let <span class="math">\(f\)</span> be a function holomorphic on <span class="math">\(D\)</span>, except at a set of points <span class="math">\(a_k\)</span>, and <span class="math">\(\gamma\)</span> a closed curve in <span class="math">\(D\)</span>, avoiding the points <span class="math">\(a_k\)</span>. Without loss of generality, we can assume all of the <span class="math">\(a_k\)</span> lie within the region enclosed by <span class="math">\(\gamma\)</span> (if not, we just make <span class="math">\(D\)</span> smaller). We can use the path-independence of contour integrals to deform <span class="math">\(\gamma\)</span>, without changing the value of the integral:</p>
<div class="image-container">
<p><img alt="A contour around several a_k" height="250px" src="/images/residues/deform-1.svg"></p>
<p><img alt="Deformed into several circles with sections between them" height="250px" src="/images/residues/deform-2.svg"></p>
</div>
<p>These corridors between the circles can be moved so they lie on top of each other, and cancel out. This leaves us with circles <span class="math">\(C_k\)</span>, one for each point <span class="math">\(a_k\)</span>.
</p>
<div class="math">$$ \oint_\gamma f(z)~dz = \sum_k \oint_{C_k} f(z)~dz $$</div>
<div class="image-container">
<p><img alt="A few circular contours" height="250px" src="/images/residues/deform-3.svg"></p>
</div>
<p>So all we need to do to now is determine what the integral of <span class="math">\(f\)</span> on each circle is.</p>
<div class="theorem-box">
<div class="theorem-title">Residue Definition #1</div>
<p>The residue of <span class="math">\(f\)</span> at <span class="math">\(a\)</span> is <span class="math">\(\displaystyle \frac{1}{2 \pi i} \oint_{C} f(z)~dz\)</span>, where <span class="math">\(C\)</span> is a small circle around <span class="math">\(a\)</span>.
<br><br>
From path-independence, we know we can shrink the circles as much as we like without changing the value of the integral, which tells us this definition is well-defined (just make sure <span class="math">\(f\)</span> is holomorphic everywhere else in your circle!).</p>
</div>
<p>“But wait,” you complain, “This definition is ridiculous; you set it up in such a way that the residue theorem is trivial! What gives?”</p>
<p>Well, there are other, equivalent definitions of residue that are much easier to compute, and those are what give the residue theorem its power. Sometimes people will use these computational definitions of residue as the primary definition, but this obscures what’s going on. When you think of what the residue <em>means</em>, in a spiritual sense, you should think of it as “the integral of a small loop around a point”.</p>
<hr>
<p>A point at which <span class="math">\(f\)</span> is not holomorphic is called a “singularity”, and there are a few types. The most manageable of these is the pole, where <span class="math">\(f(z)\)</span> “behaves like” <span class="math">\(\frac{1}{(z-a)^n}\)</span>. To be more concrete, <span class="math">\(f\)</span> has a pole (of order <span class="math">\(n\)</span>) at <span class="math">\(a\)</span> if <span class="math">\((z - a)^n f(z)\)</span> is holomorphic and non-zero at <span class="math">\(a\)</span>. In other words, a zero of order <span class="math">\(n\)</span> cancels out a pole of order <span class="math">\(n\)</span>.</p>
<p>For example, <span class="math">\(\frac{1}{\sin z}\)</span> has a pole of order <span class="math">\(1\)</span> at <span class="math">\(z = 0\)</span>, as evidenced by the fact that <span class="math">\(\frac{z}{\sin z}\)</span> approaches <span class="math">\(1\)</span> as <span class="math">\(z \to 0\)</span>. The rational function <span class="math">\(\frac{x-2}{x^2 + 1}\)</span> has poles at <span class="math">\(\pm i\)</span>, also of order <span class="math">\(1\)</span>. And the function <span class="math">\(\frac{1}{\cos z - 1}\)</span> has a pole of order <span class="math">\(2\)</span> at zero.</p>
<p>There are other kinds of singularities, but nothing good comes from them, so we will henceforth only consider singularities that are poles.</p>
<p>If <span class="math">\(f\)</span> has a pole of order <span class="math">\(n\)</span> at <span class="math">\(a\)</span>, then <span class="math">\((z-a)^n f(z)\)</span> has a Taylor series centered at <span class="math">\(z = a\)</span>, with non-zero constant term:
</p>
<div class="math">$$ (z-a)^n f(z) = b_0 + b_1 (z - a) + b_2 (z - a)^2 + b_3 (z - a)^3 + \cdots $$</div>
<p>Letting <span class="math">\(c_k = b_{k+n}\)</span>, we can define a series for <span class="math">\(f(z)\)</span> itself, called the <strong>Laurent series</strong>:
</p>
<div class="math">$$ f(z) = \frac{c_{-n}}{(z-a)^n} + \frac{c_{-n+1}}{(z - a)^{n-1}} + \cdots + \frac{c_{-1}}{z - a} + c_0 + c_1 (z - a) + \cdots $$</div>
<p>It’s almost a Taylor series, but we allow (finitely many) negative terms as well. This expansion will allow us to compute the residue at <span class="math">\(a\)</span>.</p>
<p>Let’s just take a single term, <span class="math">\((z - a)^n\)</span>, and we’ll recombine our results at the end, because integrals are linear. What happens when we integrate around a circle centered at <span class="math">\(a\)</span> with radius <span class="math">\(R\)</span>? Subsitute <span class="math">\(z = a + R e^{it}\)</span> for the contour:
</p>
<div class="math">$$ \oint (z - a)^n~dz = \int_0^{2\pi} (R e^{it})^n~d(R e^{it}) = i R^{n+1} \int_0^{2\pi} e^{(n+1) it}~dt = i R^{n+1} \left[ \frac{e^{(n+1)it}}{(n+1)i} \right]^{2\pi}_0 $$</div>
<p>Since <span class="math">\(n\)</span> is an integer, <span class="math">\(e^{(n+1)2 \pi i} = 1\)</span>, and <span class="math">\(e^{0} = 1\)</span>, so this integral should be zero. But that doesn’t make any sense; that would suggest that the integral of <em>any</em> function around a circle is zero. But that’s not true.</p>
<p>We actually made a mistake in the last step; the antiderivative of <span class="math">\(e^{kt}\)</span> is <span class="math">\(e^{kt} / k\)</span> <em>unless</em> <span class="math">\(k = 0\)</span>. For that to happen, we need <span class="math">\(n = -1\)</span>, and in that case:
</p>
<div class="math">$$ \oint \frac{1}{z - a}~dz = \int_0^{2\pi} \frac{d(R e^{it})}{R e^{it}} = \int_0^{2\pi} i~dt = 2 \pi i $$</div>
<p>Therefore, when we integrate <span class="math">\(f(z) = \sum_{k = -n}^\infty c_k (z - a)^k\)</span>, all the terms vanish, except for the <span class="math">\(k = -1\)</span> term, which pops out a <span class="math">\(2 \pi i \cdot c_{-1}\)</span>. This gives us another definition for the residue!</p>
<div class="theorem-box">
<div class="theorem-title">Residue Definition #2</div>
<p>If <span class="math">\(f\)</span> has a pole at <span class="math">\(a\)</span>, and a Laurent series <span class="math">\(f(z) = \sum c_k (z - a)^k\)</span>, then the residue of <span class="math">\(f\)</span> at <span class="math">\(a\)</span> is <span class="math">\(c_{-1}\)</span>.</p>
</div>
<hr>
<p>If this were all we knew, it would still be a pretty good theorem. Finding power series instead of taking integrals? Not too shabby. But we can take it one step more.</p>
<p>Finding power series can be frustrating; how many people know the power series for <span class="math">\(\tan z\)</span> off the top of their head? Besides, we don’t need the whole thing, just a specific coefficient.</p>
<p>Instead, we’ll assume the existence of a power series, and use some tricks to extract <span class="math">\(c_{-1}\)</span>.</p>
<p>Say we’ve got a simple pole (a pole of order <span class="math">\(1\)</span>). By multiplying by <span class="math">\((z - a)\)</span>, we can get a Taylor series:
</p>
<div class="math">$$ (z - a) f(z) = c_{-1} + c_0 (z - a) + c_1 (z - a)^2 + \cdots $$</div>
<p>If we plug in <span class="math">\(z = a\)</span>, then we’ll get <span class="math">\(c_{-1}\)</span>. Well, technically, we can’t plug in <span class="math">\(z = a\)</span> directly, because <span class="math">\(f(z)\)</span> isn’t defined at <span class="math">\(a\)</span>. But if we take a limit, that’s okay.</p>
<p>How about a pole of order <span class="math">\(2\)</span>? Our trick won’t work the same way; if we apply it naively, we’ll just get <span class="math">\(c_{-2}\)</span>, which we don’t care about at all.
</p>
<div class="math">$$ (z - a)^2 f(z) = c_{-2} + c_{-1} (z - a) + c_0 (z - a)^2 + c_1 (z - a)^3 \cdots $$</div>
<p>But if we take the derivative, we can knock off a term from the end, and <em>then</em> we can take the limit as <span class="math">\(z \to a\)</span>.
</p>
<div class="math">$$ \frac{d}{dz} (z - a)^2 f(z) = c_{-1} + 2 c_0 (z - a) + 3 c_1 (z - a)^2 \cdots $$</div>
<p>For <span class="math">\(n = 3\)</span>, there’s a slight wrinkle; we end up with an extra factor of <span class="math">\(2\)</span> that we have to divide out:
</p>
<div class="math">$$ \frac{d^2}{dz^2} (z - a)^3 f(z) = 2 c_{-1} + 6 c_0 (z - a) + 12 c_1 (z - a)^2 \cdots $$</div>
<p>The pattern for higher-order poles is similar:</p>
<ul>
<li>multiply by <span class="math">\((z - a)^n\)</span>; this changes our term of interest to <span class="math">\(c_{-1} (z - a)^{n-1}\)</span></li>
<li>take <span class="math">\(n-1\)</span> derivatives; the important term is now <span class="math">\((n-1)! c_{-1}\)</span></li>
<li>divide by <span class="math">\((n-1)!\)</span>; the important term is now <span class="math">\(c_{-1}\)</span></li>
<li>take the limit as <span class="math">\(z \to a\)</span>; all higher order terms vanish, and we are left with <span class="math">\(c_{-1}\)</span></li>
</ul>
<p>We now have our last, and most computationally accessible, definition of residue:</p>
<div class="theorem-box">
<div class="theorem-title">Residue Definition #3</div>
<p>If <span class="math">\(f\)</span> has a pole at <span class="math">\(a\)</span> of order <span class="math">\(n\)</span>, then the residue of <span class="math">\(f\)</span> at <span class="math">\(a\)</span> is:
<div class="math">$$ \res(f, a) = \lim_{z \to a} \frac{1}{(n-1)!} \frac{d^{n-1}}{dz^{n-1}} (z - a)^n f(z) $$</div>
</p>
</div>
<p>This is the definition often presented as “the” definition of residue, but this hides where the residue theorem comes from, and why residues are defined the way they are.</p>
<h1>Winding Number</h1>
<p>As a final note, we can add a tiny bit more generality to the theorem.</p>
<p>Technically, we’ve been a little sloppy with our curve <span class="math">\(\gamma\)</span>. What if it goes the other way? Or loops around some points multiple times?</p>
<p>To fix this, we introduce <span class="math">\(W(\gamma, a)\)</span>, the <strong>winding number</strong> of <span class="math">\(\gamma\)</span> around <span class="math">\(a\)</span>. It means exactly what the name suggests, it indicates how many times (and in what direction) <span class="math">\(\gamma\)</span> loops around <span class="math">\(a\)</span>. Counter-clockwise is positive, and clockwise is negative. Two examples are pictured below:</p>
<div class="image-container">
<p><img alt="A limacon" height="250px" src="/images/residues/winding-1.svg"></p>
<p><img alt="A lemniscate" height="250px" src="/images/residues/winding-3.svg"></p>
</div>
<p>In the first picture, the specified points have winding number +1 and +2, and in the second, they have -1 and +1. The only thing this changes about our proof is that when we deform our <span class="math">\(\gamma\)</span> into circles, we may get multiple loops around the same point:</p>
<div class="image-container">
<p><img alt="A limacon" height="250px" src="/images/residues/winding-2.svg"></p>
<p><img alt="A lemniscate" height="250px" src="/images/residues/winding-4.svg"></p>
</div>
<p>But by definition, the number of loops is exactly the winding number, and if the loop runs clockwise, we pick up a negative sign. So after accounting for multiplicity and direction, we get:
</p>
<div class="math">$$ \oint_\gamma f(z)~dz = \sum_k W(\gamma, a_k) \res(f, a_k) $$</div>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Monsky's Theorem2018-09-24T00:00:00-07:002018-09-24T00:00:00-07:00Henry Swansontag:mathmondays.com,2018-09-24:/monskys-theorem<p><span class="mathdefs">
<span class="math">\(\newcommand{\RR}{\Bbb R}
\newcommand{\QQ}{\Bbb Q}
\newcommand{\ZZ}{\Bbb Z}\)</span>
</span></p>
<p>For which <span class="math">\(n\)</span> can you cut a square into <span class="math">\(n\)</span> triangles of equal area?</p>
<p>This question appears quite simple; it could have been posed to the Ancient Greeks. But like many good puzzles, it is a remarkably stubborn one.</p>
<p>It was first solved in 1970, by Paul Monsky. Despite the completely geometric nature of the question, his proof relies primarily on number theory and combinatorics! Despite the level of machinery involved, his proof is quite accessible, and we will describe it below.</p>
<p><span class="mathdefs">
<span class="math">\(\newcommand{\RR}{\Bbb R}
\newcommand{\QQ}{\Bbb Q}
\newcommand{\ZZ}{\Bbb Z}\)</span>
</span></p>
<p>For which <span class="math">\(n\)</span> can you cut a square into <span class="math">\(n\)</span> triangles of equal area?</p>
<p>This question appears quite simple; it could have been posed to the Ancient Greeks. But like many good puzzles, it is a remarkably stubborn one.</p>
<p>It was first solved in 1970, by Paul Monsky. Despite the completely geometric nature of the question, his proof relies primarily on number theory and combinatorics! Despite the level of machinery involved, his proof is quite accessible, and we will describe it below.</p>
<!-- more -->
<hr>
<p>If you have a napkin on hand, it should be straightforward to come up with a solution for <span class="math">\(n = 2\)</span> and <span class="math">\(4\)</span>. A little more thought should yield solutions for any even <span class="math">\(n\)</span>. One such scheme is depicted below:</p>
<p><img alt="Equidissection when n is even" height="auto" src="/images/monsky/even-equidissections.svg" width="100%"></p>
<p>But when <span class="math">\(n\)</span> is odd, you will have considerably more trouble. Monsky’s theorem states that such a task is, in fact, impossible.</p>
<!-- TODO make a markdown plugin to make this easier -->
<div class="theorem-box">
<div class="theorem-title">Monsky's Theorem</div>
<p>The unit square cannot be dissected into an odd number of triangles of equal area.</p>
</div>
<p>The result clearly extends to squares of any size, and in fact, arbitrary parallelograms.</p>
<p>There are two key ingredients here:</p>
<ol>
<li>Sperner’s Lemma</li>
<li>2-adic valuations</li>
</ol>
<p>Proof sketch:</p>
<ul>
<li>Color the vertices of the dissection using three colors</li>
<li>Find a triangle with exactly one vertex of each color</li>
<li>Show that such a triangle cannot have area <span class="math">\(1/n\)</span></li>
</ul>
<p>If the last step seems ridiculous to you, don’t worry. It’s completely non-obvious that the coloring of a triangle’s vertices could at all be related to its area. But once you see the trick, it will (hopefully) seem less mysterious. Just hang in there.</p>
<h1>Sperner’s Lemma</h1>
<p>Consider a polygon <span class="math">\(P\)</span> in the plane, and some dissection of it into triangles <span class="math">\(T_i\)</span>. As promised in the previous section, color the vertices with three colors; we’ll use red, green, and blue. We will call a segment <strong>purple</strong> if it has one red and one blue endpoint. A triangle with exactly one corner of each color will be called <strong>trichromatic</strong>. (Great terminology, eh?)</p>
<p>A <strong>Sperner coloring</strong> is a coloring of the vertices of <span class="math">\(T_i\)</span>, using three colors, with the following properties:</p>
<ul>
<li>no face of <span class="math">\(P\)</span>, nor any face of one of the <span class="math">\(T_i\)</span>, contains vertices of all three colors</li>
<li>there are an odd number of purple segments on the boundary of <span class="math">\(P\)</span></li>
</ul>
<p>For example, the following are Sperner colorings:</p>
<div class="image-container">
<p><img alt="Sperner Coloring 1" height="200px" src="/images/monsky/sperner-1.svg"></p>
<p><img alt="Sperner Coloring 2" height="200px" src="/images/monsky/sperner-2.svg"></p>
</div>
<p>But these are not – the first has lines of more than two colors, and the second has an even number of purple boundary segments:</p>
<div class="image-container">
<p><img alt="Non-Sperner Coloring 1" height="200px" src="/images/monsky/sperner-3.svg"></p>
<p><img alt="Non-Sperner Coloring 2" height="200px" src="/images/monsky/sperner-4.svg"></p>
</div>
<p>In this format, Sperner’s lemma can be stated as:</p>
<div class="theorem-box">
<div class="theorem-title">Sperner's Lemma</div>
<p>Given a Sperner coloring of <span class="math">\((P, T_i)\)</span>, there is at least one trichromatic triangle.
<br>
<br>
Check the examples above, both Sperner colorings have trichromatic triangles. The first non-Sperner coloring has one, but the other does not.</p>
</div>
<p><em>Proof</em>: First, we establish a lemma: a triangle <span class="math">\(T\)</span> is trichromatic iff its faces have an odd number of purple segments.</p>
<p>This is easy to see if there are no vertices lying on the faces of <span class="math">\(T\)</span>: a trichromatic triangle has exactly one purple segment, and otherwise, it has zero or two.</p>
<p>We can reduce to this case by deleting vertices that lie on the faces of <span class="math">\(T\)</span>. We claim that this won’t change whether the number of purple segments is even or odd. And of course, since we aren’t touching the corners, it can’t change whether or not the triangle is trichromatic. Consider some vertex on a face of <span class="math">\(T\)</span>. If that face contains green at all, then by the first property of Sperner colorings, it can’t ever have purple segments, as it must omit either red or blue vertices. Monochromatic faces also present no concern, because they also cannot have purple segments. The remaining cases are shown below:</p>
<p><img alt="Illustration of the cases" height="auto" src="/images/monsky/delete-purple.svg" width="100%"></p>
<hr>
<p>Cool. How does this help us?</p>
<p>Let’s do some counting mod <span class="math">\(2\)</span>. Let <span class="math">\(f(T)\)</span> be the number of purple segments in a triangle <span class="math">\(T\)</span>. What is the sum of all <span class="math">\(f(T)\)</span>, mod <span class="math">\(2\)</span>?</p>
<p>On one hand, it’s simply the number of trichromatic triangles; <span class="math">\(f(T) \not\equiv 0 \pmod 2\)</span> exactly when <span class="math">\(T\)</span> is trichromatic. But also, it’s the number of purple segments on the boundary. Each purple segment in the interior of <span class="math">\(P\)</span> gets counted twice, and so contributes nothing, but boundary segments contribute exactly once.</p>
<p>Since there are an odd number of purple segments on the boundary of <span class="math">\(P\)</span>, there are an odd number of trichromatic triangles. In particular, there’s at least one of them.</p>
<p>(This illustrates a common trick among combinatorialists: if you want to show that an object <span class="math">\(X\)</span> exists, show that the number of <span class="math">\(X\)</span>s is odd. Cheeky!)</p>
<h1><span class="math">\(2\)</span>-adic valuations</h1>
<p>Before we describe our coloring, we’ll take an unexpected detour into the land of valuations.</p>
<p>A <strong>valuation</strong> is a function that assigns a notion of “value” or “size” to numbers. There’s multiple conventions, but we one we’ll use is that a valuation on a ring <span class="math">\(R\)</span> is a function <span class="math">\(\nu\)</span> from <span class="math">\(R\)</span> to <span class="math">\(\RR^+ \cup \{ \infty \}\)</span> such that:</p>
<ul>
<li><span class="math">\(\nu(x) = \infty\)</span> if and only if <span class="math">\(x = 0\)</span></li>
<li><span class="math">\(\nu(xy) = \nu(x) + \nu(y)\)</span></li>
<li><span class="math">\(\nu(x + y) \ge \min(\nu(x), \nu(y))\)</span></li>
</ul>
<p>We’ll assign the obvious rules to <span class="math">\(\infty\)</span>, such as, <span class="math">\(a + \infty = \infty\)</span>, and <span class="math">\(\min(a, \infty) = a\)</span>.</p>
<p>One example of a valuation, that might help guide your intuition, is the “multiplicity of a root”. For some polynomial <span class="math">\(p(x) = a_0 + a_1 x + \cdots + a_n x^n\)</span>, let <span class="math">\(\nu(p)\)</span> be the index of the first non-zero coefficient. For example, <span class="math">\(\nu(3x^4 - x^5 + 7x^8) = 4\)</span>, and <span class="math">\(\nu(1 + x - x^2) = 0\)</span>. If all coefficients are zero, define <span class="math">\(\nu(p) = \infty\)</span>. In essence, <span class="math">\(\nu(p)\)</span> is “how many” roots <span class="math">\(p\)</span> has at <span class="math">\(0\)</span>; e.g., is <span class="math">\(0\)</span> a single root? A double root? Not a root at all?</p>
<p>Is this a valuation?</p>
<p>Well we satisfied the first property by fiat. The second one is pretty easy to see; when you multiply two polynomials, the lowest term has the sum of the degrees. And the third one ain’t too bad either. If both <span class="math">\(p\)</span> and <span class="math">\(q\)</span> have zero coefficients on <span class="math">\(x^k\)</span>, <span class="math">\(p+q\)</span> certainly will too. The converse isn’t true though, it’s possible that the low-degree terms in <span class="math">\(p\)</span> and <span class="math">\(q\)</span> could cancel, and so <span class="math">\(\nu(p+q)\)</span> could be larger than either <span class="math">\(\nu(p)\)</span> or <span class="math">\(\nu(q)\)</span>. This is why we have an inequality, instead of an equality.</p>
<hr>
<p>The particular valuation we’re interested in the <span class="math">\(2\)</span>-adic valuation, which measures how divisible by two a number is. The more factors of <span class="math">\(2\)</span> a number has, the bigger its valuation is.</p>
<p>For example, <span class="math">\(\nu_2(2) = \nu_2(6) = \nu_2(-22) = 1\)</span>, since they all have a single factor of <span class="math">\(2\)</span>. Odd integers have <span class="math">\(\nu_2\)</span> of <span class="math">\(0\)</span>, since they have no factors of <span class="math">\(2\)</span> at all. And because <span class="math">\(0\)</span> can be factored as <span class="math">\(2^k \cdot 0\)</span> for any <span class="math">\(k\)</span>, no matter how big, it makes sense to say <span class="math">\(\nu_2(0) = \infty\)</span>.</p>
<p>To extend this to rational numbers, we consider <span class="math">\(2\)</span>s in the denominator to count as negative. Consider the following examples until they make sense:
</p>
<div class="math">$$ \nu_2(1/4) = -2 \qquad \nu_2(1/3) = 0 \qquad \nu_2(2/3) = 1 \qquad \nu(3/8) = -3 \qquad \nu_2(12/5) = 2 $$</div>
<p>We claim this is also a valuation.</p>
<p>Again, we get the first property simply because we defined it to be so. The second one is also easy to verify, but the third one needs some work.</p>
<p>Let <span class="math">\(x\)</span> and <span class="math">\(y\)</span> be rational numbers. By pulling out all the factors of <span class="math">\(2\)</span> from numerator and denominator, they can be written as <span class="math">\(x = 2^n \frac{a}{b}\)</span> and <span class="math">\(y = 2^m \frac{c}{d}\)</span>, where <span class="math">\(a\)</span>, <span class="math">\(b\)</span>, <span class="math">\(c\)</span>, and <span class="math">\(d\)</span> are odd. (Note that any of these, including <span class="math">\(n\)</span> and <span class="math">\(m\)</span>, may be negative.) Without loss of generality, let <span class="math">\(n \ge m\)</span>. We’d like to show that <span class="math">\(\nu_2(x + y)\)</span> is at least <span class="math">\(\min(\nu_2(x), \nu_2(y)) = m\)</span>.
</p>
<div class="math">$$ x + y = 2^n \frac{a}{b} + 2^m \frac{c}{d} = 2^m \left( \frac{2^{n-m} a}{b} + \frac{c}{d} \right) = 2^m \frac{2^{n-m} ad + bc}{bd} $$</div>
<p>Since <span class="math">\(2^{n-m} ad + bc\)</span> is an integer, and <span class="math">\(bd\)</span> is odd, <span class="math">\(x + y\)</span> has at least <span class="math">\(m\)</span> factors of <span class="math">\(2\)</span>, and so <span class="math">\(\nu_2(x + y) \ge m\)</span>, as desired. Notably, if <span class="math">\(n\)</span> is strictly larger than <span class="math">\(m\)</span>, i.e., <span class="math">\(\nu(x) > \nu(y)\)</span>, then <span class="math">\(2^{n-m} ad + bc\)</span> is odd, and we can guarantee that <span class="math">\(\nu_2(x+y)\)</span> is exactly <span class="math">\(\nu(y)\)</span>. This is actually a property true of all valuations, so we’ll state it again:</p>
<ul>
<li><span class="math">\(\nu(x + y) \ge \min(\nu(x), \nu(y))\)</span>, <strong>and if <span class="math">\(\nu(x) \ne \nu(y)\)</span> this is an equality</strong></li>
</ul>
<p>So <span class="math">\(\nu_2\)</span> is an honest-to-god valuation on <span class="math">\(\QQ\)</span>. By a theorem of Chevalley, we can extend this to a valuation on <span class="math">\(\RR\)</span>. The details are not particularly important, and the curious reader can find them at the end of this post.</p>
<h1>Coloring The Plane</h1>
<p>Our coloring of the dissection will use the (extended) <span class="math">\(2\)</span>-adic valuation. Our choice of coloring is peculiar enough that it deserves its own section though.</p>
<p>Given a point <span class="math">\((x,y)\)</span> in the plane, we’ll color it:</p>
<ul>
<li>red if <span class="math">\(\nu_2(x) > 0\)</span> and <span class="math">\(\nu_2(y) > 0\)</span></li>
<li>green if <span class="math">\(\nu_2(x) \le 0\)</span> and <span class="math">\(\nu_2(x) \le \nu_2(y)\)</span></li>
<li>blue if <span class="math">\(\nu_2(y) \le 0\)</span> and <span class="math">\(\nu_2(y) < \nu_2(x)\)</span></li>
</ul>
<p>This coloring has some interesting properties, which we’ll establish quickly.</p>
<div class="theorem-box">
<div class="theorem-title">Claim</div>
<p>If <span class="math">\(P\)</span> is a red point, then <span class="math">\(Q\)</span> and <span class="math">\(Q-P\)</span> have the same color.</p>
</div>
<p><em>Proof</em>: This is a good exercise for the reader. Make use of the fact that, if <span class="math">\(\nu_2(a) > 0\)</span> and <span class="math">\(\nu_2(x) \le 0\)</span>, then <span class="math">\(\nu_2(x - a) \ge \min(\nu_2(x), \nu_2(a)) = \nu_2(x)\)</span>. On the other hand, if <span class="math">\(\nu_2(x) > 0\)</span>, then <span class="math">\(\nu_2(x - a) > 0\)</span> as well.</p>
<div class="theorem-box">
<div class="theorem-title">Claim</div>
<p>If we forget the dissection for a second, and pick <em>any</em> three collinear points in the plane, they cannot all be different colors.</p>
</div>
<p><em>Proof</em>: Let <span class="math">\(P_r\)</span>, <span class="math">\(P_g\)</span>, and <span class="math">\(P_b\)</span> be three points, colored red, green, and blue, respectively. We must show they can’t be collinear; equivalently, the vectors <span class="math">\(P_g - P_r\)</span> and <span class="math">\(P_b - P_r\)</span> are not parallel. This is a question about linear independence, so we’d better take a determinant. Let <span class="math">\(P_g - P_r = (x_g, y_g)\)</span>, and <span class="math">\(P_b - P_r = (x_b, y_b)\)</span>.</p>
<div class="math">$$
\det M =
\det \begin{pmatrix}
x_g & x_b \\
y_g & y_b
\end{pmatrix}
=
x_g y_b - x_b y_g
$$</div>
<p>To show that <span class="math">\(\det M\)</span> is non-zero, we can show that its <span class="math">\(2\)</span>-adic valuation is nonzero. This might seem harder, but since the only thing we know about these points is their valuations, it’s the only shot we have!</p>
<p>By the previous claim, <span class="math">\(P_g - P_r\)</span> is green, and <span class="math">\(P_b - P_r\)</span> is blue. From the coloring rules, we then know that <span class="math">\(\nu_2(y_b) < \nu_2(x_b)\)</span> and <span class="math">\(\nu_2(x_g) \le \nu_2(y_g)\)</span>. So <span class="math">\(\nu_2(x_g y_b)\)</span> is strictly less than <span class="math">\(\nu_2(x_b y_g)\)</span>. The third property then tells us that <span class="math">\(\nu_2(\det M) = \nu_2(x_g y_b) \le 0\)</span>. Therefore, <span class="math">\(\det M \ne 0\)</span>, and so <span class="math">\(P_r\)</span>, <span class="math">\(P_g\)</span>, and <span class="math">\(P_b\)</span> cannot be collinear.</p>
<h1>Putting it Together</h1>
<p>Now we’re ready. Let <span class="math">\(n\)</span> be odd, and consider a dissection of the unit square into <span class="math">\(n\)</span> triangles of equal area.</p>
<p>Using the coloring rule above, we claim we get a Sperner coloring. The time we invested in the previous section pays off handsomely, as both required properties become almost trivial.</p>
<ul>
<li>No face of the square, nor of a triangle in the dissection, can contain vertices of all three colors, because no line <em>anywhere</em> in the plane can have vertices of all three colors!</li>
<li>Again, we use the fact that there are no trichromatic lines. Consider the corners of the square and their colors:</li>
</ul>
<div class="math">$$ (0, 0) \textrm{ is red} \qquad (1, 0) \textrm{ is green} \qquad (0, 1) \textrm{ is blue} \qquad (1, 1) \textrm{ is green} $$</div>
<p>
<!--TODO diagram instead?--></p>
<p>The only segments that could be purple lie between <span class="math">\((0, 0)\)</span> and <span class="math">\((0, 1)\)</span>. And because one endpoint is red, and the other blue, there must be an odd number of purple segments. (Remember our exercise about deleting vertices on faces…?)</p>
<p>Therefore, this coloring is a Sperner coloring, and so somewhere, there is a trichromatic triangle. To finish the proof, we must show that this triangle can’t have area <span class="math">\(1/n\)</span>.</p>
<p>Let’s revisit our second claim. Strong as it is, we can squeeze just a tiny bit more out of it. Using the same notation as before, basic coordinate geometry tells us that the area of the triangle formed by <span class="math">\(P_r\)</span>, <span class="math">\(P_g\)</span>, and <span class="math">\(P_b\)</span> is <span class="math">\(K = \frac{1}{2} \det M\)</span>. By showing that <span class="math">\(\det M \ne 0\)</span>, we showed that this triangle was not degenerate, i.e., the three points were not collinear. But we actually showed a little more than that; we showed that <span class="math">\(\nu_2(\det M) \le 0\)</span>. Therefore, if a trichromatic triangle has area <span class="math">\(K\)</span>, then <span class="math">\(\nu_2(K) = \nu_2(\frac{1}{2} \det M) \le -1\)</span>.</p>
<p>But because <span class="math">\(n\)</span> is odd, <span class="math">\(\nu_2(1/n) = 0\)</span>. Contradiction.</p>
<h1>Appendix</h1>
<p>We promised a proof that a valuation on <span class="math">\(\QQ\)</span> can be extended to a valuation on <span class="math">\(\RR\)</span>. More generally, for a field extension <span class="math">\(L/K\)</span>, a valuation <span class="math">\(\nu\)</span> on <span class="math">\(K\)</span> can be extended to a valuation on <span class="math">\(L\)</span>.</p>
<p>Unfortunately, I’ve got diagrams to finish making before Monday ends, so I’ll amend this later ;)</p>
<!--
*Proof*: We'll extend one element at a time. If we have a valuation on $K$, we'll extend it to a valuation on $K(\alpha)$, where $\alpha \in L$.
If $\alpha$ is transcendental over $K$, then we will write it as $t$ instead. First we extend the valuation to the polynomial ring $K[t]$, by defining $\nu(\sum_i a_i t^i)$ to be $\max(\nu(a_i))$. After that, we'll extend it to the fraction field $K(t)$ by defining $\nu(p/q) = \nu(p) / \nu(q)$, which will be a valuation for the same reason we could extend from $\ZZ$ to $\QQ$ earlier.
To show that what we defined on $K[t]$ is a valuation, let $p = \sum_i a_i t^i$ and $q = \sum_i b_i t^i$. If $\nu(p) = \max(\nu(a_i))$ is zero, then all the $\nu(a_i)$ are zero. But then all the $a_i$ must have been zero, giving $p = 0$.
Showing multiplicativity is an exercise to the reader cause I'm actually stuck on it lol TODO
If $\alpha$ is algebraic, then let $f(x) = x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0$ be the minimal polynomial of $\alpha$. We define
-->
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>Doubling Loaves, in Two Ways2018-09-17T00:00:00-07:002018-09-17T00:00:00-07:00Henry Swansontag:mathmondays.com,2018-09-17:/doubling-loaves<p>This one comes from a puzzle that a coworker gave me.</p>
<p>There’s a miracle in the Gospels in which Jesus feeds a crowd of 5000, using only a few loaves of bread and some fish. As he breaks the food apart and hands it out, it does not diminish, and eventually the entire crowd is fed.</p>
<p>In our puzzle, we have a prophet who is not quite so saintly. He starts with a single loaf of bread, and has to feed a crowd of <span class="math">\(N\)</span> people. But he also wants to be able to feed himself. Furthermore, our guy’s got a bit of a gambling problem: at each step, he flips a fair, unbiased coin.</p>
<ul>
<li>If it comes up heads, he duplicates one of his loaves.</li>
<li>Otherwise, he hands out a loaf of bread to someone in the crowd.</li>
</ul>
<p>He only stops when he runs out of bread, or he creates <span class="math">\(N\)</span> new loaves (at which point, the entire crowd can be fed, and he can eat the original loaf).</p>
<p>The question is: what is the probability that he can successfully feed everyone?</p>
<p>This one comes from a puzzle that a coworker gave me.</p>
<p>There’s a miracle in the Gospels in which Jesus feeds a crowd of 5000, using only a few loaves of bread and some fish. As he breaks the food apart and hands it out, it does not diminish, and eventually the entire crowd is fed.</p>
<p>In our puzzle, we have a prophet who is not quite so saintly. He starts with a single loaf of bread, and has to feed a crowd of <span class="math">\(N\)</span> people. But he also wants to be able to feed himself. Furthermore, our guy’s got a bit of a gambling problem: at each step, he flips a fair, unbiased coin.</p>
<ul>
<li>If it comes up heads, he duplicates one of his loaves.</li>
<li>Otherwise, he hands out a loaf of bread to someone in the crowd.</li>
</ul>
<p>He only stops when he runs out of bread, or he creates <span class="math">\(N\)</span> new loaves (at which point, the entire crowd can be fed, and he can eat the original loaf).</p>
<p>The question is: what is the probability that he can successfully feed everyone?</p>
<!-- more -->
<hr>
<p>For small values of <span class="math">\(N\)</span>, we can manage this by hand:</p>
<ul>
<li><span class="math">\(N = 0\)</span>: He can always feed himself, so the probability of success, <span class="math">\(p\)</span>, is <span class="math">\(1\)</span>. </li>
<li><span class="math">\(N = 1\)</span>: Everything depends on the first coin toss. If it is heads, then he has two loaves, and can feed himself and someone else. Otherwise, he’s just handed away his only loaf, and the game ends. So <span class="math">\(p = 1/2\)</span>.</li>
<li><span class="math">\(N = 2\)</span>: As before, he must flip heads on the first toss. Consider the second toss. If it is heads, then he has created two loaves, plus the original, and so everyone can be fed. Otherwise, he hands out a loaf, leaving him with one loaf, and two people to feed. This reduces to the previous case, in which there is a <span class="math">\(1/2\)</span> chance of success. So if he makes the first toss, he has a <span class="math">\(3/4\)</span> chance of success, giving us <span class="math">\(p = 3/8\)</span> for the whole process.</li>
</ul>
<p>Clearly, this gets tedious quickly. We need a more systematic approach.</p>
<h2>First Approach</h2>
<p>One approach is to rephrase this as a problem about lattice walks.</p>
<p>Let the point <span class="math">\((x, y)\)</span> represent the state where we have created <span class="math">\(x\)</span> new loaves (not counting the original loaf), and fed <span class="math">\(y\)</span> people (not counting himself). Then duplicating a loaf is a step to the right, and handing out a loaf is a step upward. On this grid, the prophet starts at <span class="math">\((0, 0)\)</span>, and randomly chooses to walk right or up. He wins if he touches the line <span class="math">\(x = N\)</span>, and loses if he crosses the diagonal <span class="math">\(x = y\)</span>. (Touching the diagonal is okay, at that point, he still has one loaf left.)</p>
<p>Let <span class="math">\(p(a, b)\)</span> be the probability that the prophet reaches the point <span class="math">\((a, b)\)</span> on his random walk. It’s only possible to reach the region <span class="math">\(0 \le b \le a\)</span>, so we will set <span class="math">\(p(a, b) = 0\)</span> outside this range. Since it’s our starting point, <span class="math">\(p(0, 0)\)</span> is clearly <span class="math">\(1\)</span>. For all other points, we can state our probability recursively; if the prophet gets to the point <span class="math">\((a, b)\)</span>, then he must have come from <span class="math">\((a-1, b)\)</span> or <span class="math">\((a, b-1)\)</span>. From either of those points, he has a <span class="math">\(1/2\)</span> chance of getting to <span class="math">\((a, b)\)</span>, so <span class="math">\(p(a, b) = \frac{1}{2}(p(a-1, b) + p(a, b-1))\)</span>.</p>
<p>If you write these numbers out in a grid, you’ll quickly get tired of seeing powers of <span class="math">\(2\)</span> in the denominator:</p>
<!-- TODO remove header row -->
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(7/128\)</span></td>
<td><span class="math">\(21/256\)</span></td>
<td><span class="math">\(15/256\)</span></td>
</tr>
<tr>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(5/64\)</span></td>
<td><span class="math">\(7/64\)</span></td>
<td><span class="math">\(7/64\)</span></td>
<td><span class="math">\(3/32\)</span></td>
</tr>
<tr>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(1/8\)</span></td>
<td><span class="math">\(5/32\)</span></td>
<td><span class="math">\(9/64\)</span></td>
<td><span class="math">\(7/64\)</span></td>
<td><span class="math">\(5/64\)</span></td>
</tr>
<tr>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(1/4\)</span></td>
<td><span class="math">\(1/4\)</span></td>
<td><span class="math">\(3/16\)</span></td>
<td><span class="math">\(1/8\)</span></td>
<td><span class="math">\(5/64\)</span></td>
<td><span class="math">\(3/64\)</span></td>
</tr>
<tr>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1/2\)</span></td>
<td><span class="math">\(1/4\)</span></td>
<td><span class="math">\(1/8\)</span></td>
<td><span class="math">\(1/16\)</span></td>
<td><span class="math">\(1/32\)</span></td>
<td><span class="math">\(1/64\)</span></td>
</tr>
</tbody>
</table>
<p>So we’ll define an auxilary function <span class="math">\(q(a, b) = 2^{a+b} p(a, b)\)</span>, leaving us with nice clean integers. The recurrence relation for <span class="math">\(q\)</span> is:
</p>
<div class="math">$$ q(0, 0) = 1 \qquad q(a, b) = 0 \textrm{ if } b > a \qquad q(a, b) = q(a-1, b) + q(a, b-1) \textrm{ otherwise} $$</div>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(14\)</span></td>
<td><span class="math">\(42\)</span></td>
<td><span class="math">\(90\)</span></td>
</tr>
<tr>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(5\)</span></td>
<td><span class="math">\(14\)</span></td>
<td><span class="math">\(28\)</span></td>
<td><span class="math">\(48\)</span></td>
</tr>
<tr>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(2\)</span></td>
<td><span class="math">\(5\)</span></td>
<td><span class="math">\(9\)</span></td>
<td><span class="math">\(14\)</span></td>
<td><span class="math">\(20\)</span></td>
</tr>
<tr>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(2\)</span></td>
<td><span class="math">\(3\)</span></td>
<td><span class="math">\(4\)</span></td>
<td><span class="math">\(5\)</span></td>
<td><span class="math">\(6\)</span></td>
</tr>
<tr>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
</tr>
</tbody>
</table>
<p>This table, and the recurrence relation, feel somewhat like Pascal’s triangle, with the apex in the lower left, and each counter-diagonal forming a row.</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(5\)</span></td>
<td><span class="math">\(15\)</span></td>
<td><span class="math">\(35\)</span></td>
<td><span class="math">\(70\)</span></td>
<td><span class="math">\(126\)</span></td>
<td><span class="math">\(210\)</span></td>
</tr>
<tr>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(4\)</span></td>
<td><span class="math">\(10\)</span></td>
<td><span class="math">\(20\)</span></td>
<td><span class="math">\(35\)</span></td>
<td><span class="math">\(56\)</span></td>
<td><span class="math">\(84\)</span></td>
</tr>
<tr>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(3\)</span></td>
<td><span class="math">\(6\)</span></td>
<td><span class="math">\(10\)</span></td>
<td><span class="math">\(15\)</span></td>
<td><span class="math">\(21\)</span></td>
<td><span class="math">\(28\)</span></td>
</tr>
<tr>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(2\)</span></td>
<td><span class="math">\(3\)</span></td>
<td><span class="math">\(4\)</span></td>
<td><span class="math">\(5\)</span></td>
<td><span class="math">\(6\)</span></td>
<td><span class="math">\(7\)</span></td>
</tr>
<tr>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
</tr>
</tbody>
</table>
<p>But since we’re forcing the region above the diagonal to be <span class="math">\(0\)</span>, this causes a defect. Subtracting the relevant parts of our grid from Pascal’s triangle, we get:</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="math">\(-\)</span></td>
<td><span class="math">\(-\)</span></td>
<td><span class="math">\(-\)</span></td>
<td><span class="math">\(-\)</span></td>
<td><span class="math">\(56\)</span></td>
<td><span class="math">\(84\)</span></td>
<td><span class="math">\(120\)</span></td>
</tr>
<tr>
<td><span class="math">\(-\)</span></td>
<td><span class="math">\(-\)</span></td>
<td><span class="math">\(-\)</span></td>
<td><span class="math">\(15\)</span></td>
<td><span class="math">\(21\)</span></td>
<td><span class="math">\(28\)</span></td>
<td><span class="math">\(36\)</span></td>
</tr>
<tr>
<td><span class="math">\(-\)</span></td>
<td><span class="math">\(-\)</span></td>
<td><span class="math">\(4\)</span></td>
<td><span class="math">\(5\)</span></td>
<td><span class="math">\(6\)</span></td>
<td><span class="math">\(7\)</span></td>
<td><span class="math">\(8\)</span></td>
</tr>
<tr>
<td><span class="math">\(-\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
<td><span class="math">\(1\)</span></td>
</tr>
<tr>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
<td><span class="math">\(0\)</span></td>
</tr>
</tbody>
</table>
<p>This is just a piece of Pascal’s triangle, shifted by one! It appears that <span class="math">\(q(a, b) = \binom{a+b}{b} - \binom{a+b}{b-1}\)</span>, a suspicion that is easy to confirm via the recurrence relations.</p>
<hr>
<p>Now we know <span class="math">\(q(a, b)\)</span> in closed form, and thus <span class="math">\(p(a, b)\)</span> as well. But we can’t just sum over <span class="math">\(p(N, 0)\)</span>, <span class="math">\(p(N, 1)\)</span>, …, <span class="math">\(p(N, N)\)</span>, because these don’t correspond to disjoint events. In fact, to reach <span class="math">\((N, N)\)</span> safely, the prophet must have passed through the point <span class="math">\((N, N-1)\)</span> first!</p>
<p>We’ll have to do something a little silly. Consider the counter-diagonal from <span class="math">\((N, N)\)</span> to <span class="math">\((2N, 0)\)</span>, and note that a path can touch at most one of those points. Furthermore, if there is a path of length <span class="math">\(2N\)</span>, and it touched the line <span class="math">\(x = N\)</span>, then it must end on one of these points.</p>
<!-- TODO add diagram here -->
<p>So we’ll change the rules of the game a little bit. The prophet still loses if he runs out of bread, but otherwise, he must keep flipping until the coin is flipped <span class="math">\(2N\)</span> times. This doesn’t affect the end conditions: after the coin has been flipped <span class="math">\(2N\)</span> times, he’s either run out of loaves, or he’s flipped at least <span class="math">\(N\)</span> heads. And clearly, this doesn’t affect his chances of success (once <span class="math">\(N\)</span> new loaves have been created, it is impossible to fail). But it does change where the “finish line” for our walk is. The prophet succeeds exactly when his walk ends on the counter-diagonal from <span class="math">\((N, N)\)</span> to <span class="math">\((2N, 0)\)</span>!</p>
<p>This telescopes easily into a clean expression:
</p>
<div class="math">$$ \sum_{k = 0}^N p(2N-k, k) = \sum_{k = 0}^N \frac{1}{2^{2N}} \left( \binom{2N}{k} - \binom{2N}{k-1} \right) = \frac{1}{2^{2N}} \binom{2N}{N} $$</div>
<hr>
<h2>Second Approach</h2>
<p>As suggested by the title of this post, I’ll also describe a second solution to this puzzle, using generating functions. Sure, this will involve some slightly heavier machinery than the previous approach, which was rather elementary, but there is a certain elegance to it.</p>
<p>Let <span class="math">\(a_n\)</span> be the probability that the prophet ended up with exactly <span class="math">\(n\)</span> loaves, including the original loaf. The only way to end up with exactly one loaf is to flip tails immediately, so <span class="math">\(a_1 = 1/2\)</span>.</p>
<p>For <span class="math">\(n > 1\)</span>, he must flip heads first, giving two loaves. If he ended up with exactly <span class="math">\(n\)</span> loaves total, he must have gotten <span class="math">\(k\)</span> from the first loaf, and <span class="math">\(n-k\)</span> from the second loaf. Since the loaves act independently, this has probability <span class="math">\(\sum_{k=1}^{n-1} a_k a_{n-1}\)</span>. Factoring in the fact that he needs to flip heads the first time, we deduce <span class="math">\(a_n = \frac{1}{2} \sum_{k=1}^{n-1} a_k a_{n-k}\)</span>.</p>
<p>If we take the bold (and intuitive!) step of defining <span class="math">\(a_0 = 0\)</span>, we can change the bounds on that sum to be <span class="math">\(0\)</span> through <span class="math">\(n\)</span>, which will make our lives easier.</p>
<p>Let <span class="math">\(G(x) = a_0 + a_1 x + a_2 x^2 + \cdots\)</span> be the generating function for <span class="math">\(a_n\)</span>. We can tease out a very nice expression for <span class="math">\(G(x)\)</span>:
</p>
<div class="math">$$
\begin{align*}
G(x) &= a_0 + a_1 x + \sum_{n=2}^\infty a_n x^n \\
&= \frac{1}{2} x + \sum_{n=2}^\infty a_n x^n \\
&= \frac{1}{2} x + \sum_{n=2}^\infty \left( \frac{1}{2} \sum_{k=0}^n a_k a_{n-k} \right) x^n \\
G(x) &= \frac{1}{2} x + \frac{1}{2} \sum_{n=2}^\infty \sum_{k=0}^n a_k a_{n-k} x^n \\
2 G(x) &= x + \sum_{n=2}^\infty \sum_{k=0}^n a_k a_{n-k} x^n
\end{align*}
$$</div>
<p>Since either <span class="math">\(a_k\)</span> or <span class="math">\(a_{n-k}\)</span> will be <span class="math">\(0\)</span> for <span class="math">\(n < 2\)</span>, we can lower the bound on our sum to <span class="math">\(k = 0\)</span> without changeing anything. After that, set <span class="math">\(\ell = n - k\)</span>:
</p>
<div class="math">$$
\begin{align*}
2 G(x) &= x + \sum_{n=0}^\infty \sum_{k=0}^n a_k a_{n-k} x^n \\
&= x + \sum_{k=0}^\infty \sum_{\ell=0}^\infty a_k a_\ell x^{k+\ell} \\
&= x + \left( \sum_{k=0}^\infty a_k x^k \right) \left( \sum_{\ell=0}^\infty a_\ell x^\ell \right) \\
2G(x) &= x + G(x)^2
\end{align*}
$$</div>
<p>At first blush it looks hard to isolate <span class="math">\(G(x)\)</span>, but once we see this as the quadratic it is, we can apply the handy-dandy quadratic formula:
</p>
<div class="math">$$ G(x) = \frac{2 \pm \sqrt{4 - 4x}}{2} = 1 \pm \sqrt{1 - x} $$</div>
<p>Since <span class="math">\(G(0) = a_0 = 0\)</span>, we know we should take the negative square root.</p>
<hr>
<p>We could at this point find a closed-form expression for <span class="math">\(a_n\)</span>, but that’s not what we’re going to do. Remember that we’re not interested in the probability of getting exactly <span class="math">\(N+1\)</span> loaves, but the probability of getting <span class="math">\(N+1\)</span> or more loaves. In other words, we’d like to know <span class="math">\(b_{N+1}\)</span>, where <span class="math">\(b_n = 1 - \sum_{k=0}^{n-1} a_k\)</span>. [Note: we’re not certain that this is the same as <span class="math">\(\sum_{k=n}^\infty a_k\)</span>; since we haven’t ruled out the possibility that this process goes on forever with positive probability. It’s possible that the <span class="math">\(a_k\)</span> sum to <span class="math">\(<1\)</span>.]</p>
<p>Let <span class="math">\(F(x) = b_0 + b_1 x + b_2 x^2 + \cdots\)</span> be the generating function for the <span class="math">\(b_n\)</span>. We’ll set <span class="math">\(b_0 = 1\)</span>, since <span class="math">\(1\)</span> minus the empty sum should be <span class="math">\(1\)</span>. If you’re familiar with generating functions, you’ll know that <span class="math">\(F(x) = \frac{1}{1 - x} - \frac{x}{1 - x} G(x)\)</span>, but for the newcomers, we’ll do it in slow motion:</p>
<p>To sum the terms of the series, we’ll multiply by the geometric series <span class="math">\(\frac{1}{1-x} = 1 + x + x^2 + \cdots\)</span>. The coefficient for the <span class="math">\(x^n\)</span> term will then be <span class="math">\(a_0 + \cdots a_n\)</span>.
</p>
<div class="math">$$ \frac{G(x)}{1 - x} = \sum_{n=0}^\infty \sum_{k=0}^n a_k x^n $$</div>
<p>Multiplying by <span class="math">\(x\)</span> knocks our exponents up by one, equivalently, moves our coefficients down by one.
</p>
<div class="math">$$ \frac{x}{1- x} G(x) = \sum_{n=0}^\infty \sum_{k=0}^{n} a_k x^{n+1} = \sum_{n=1}^\infty \sum_{k=0}^{n-1} a_k x^n $$</div>
<p>Lastly, we want to subtract every coefficient (except the first) from <span class="math">\(1\)</span>. Fortunately, we already know what <span class="math">\(1 + x + x^2 + \cdots\)</span> is:
</p>
<div class="math">$$ \frac{1}{1 - x} - \frac{x}{1 - x}G(x) = \sum_{n=1}^\infty \left( 1 - \sum_{k=0}^{n-1} a_k \right) x^n $$</div>
<p>The coefficients on the right are exactly <span class="math">\(b_n\)</span>, so we get <span class="math">\(F(x) = \frac{1}{1 - x} - \frac{x}{1 - x} G(x)\)</span>, as promised. This cleans up to:
</p>
<div class="math">$$ F(x) = 1 + \frac{x}{\sqrt{1 - x}} $$</div>
<p>Using the <a href="https://en.wikipedia.org/wiki/Binomial_theorem#Newton's_generalized_binomial_theorem">generalized binomal theorem</a>, we can arrive at a closed form for <span class="math">\(b_n\)</span>.
</p>
<div class="math">$$
\begin{align*}
F(x) &= 1 + x (1 - x)^{-1/2} \\
&= 1 + x \sum_{n=0}^\infty \frac{(-1/2)(-3/2)\cdots(-1/2 - (n-1))}{n!} 1^{-1/2 - n} (-x)^n \\
&= 1 + x \sum_{n=0}^\infty \frac{(1/2)(3/2)\cdots((2n-1)/2)}{n!} x^n \\
&= 1 + x \sum_{n=0}^\infty \frac{(1/2) \cdot 1 \cdot (3/2) \cdot 2 \cdots ((2n-1)/2) \cdot n}{n! \cdot n!} x^n \\
&= 1 + x \sum_{n=0}^\infty \frac{1}{2^{2n}} \frac{1 \cdot 2 \cdot 3 \cdot 4 \cdots (2n-1) \cdot 2n}{n! \cdot n!} x^n \\
&= 1 + x \sum_{n=0}^\infty \frac{1}{2^{2n}} \binom{2n}{n} x^n \\
&= 1 + \sum_{n=0}^\infty \frac{1}{2^{2n}} \binom{2n}{n} x^{n+1}
\end{align*}
$$</div>
<p>So, the probability of getting <span class="math">\(N+1\)</span> or more loaves is <span class="math">\(b_{N+1} = \frac{1}{2^{2N}} \binom{2N}{N}\)</span>, which matches the answer we got before. Thank goodness!</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>The Multiplicative Structure of \( \Bbb Z / n \Bbb Z \)2018-09-10T00:00:00-07:002018-09-10T00:00:00-07:00Henry Swansontag:mathmondays.com,2018-09-10:/units-mod-n<p><span class="mathdefs">
<span class="math">\(\newcommand{\ZZ}{\Bbb Z}
\newcommand{\ZZn}[1]{\ZZ / {#1} \ZZ}\)</span>
</span></p>
<!--TODO i would like to make this more introductory-->
<p>One of the most familiar rings is the ring of integers modulo <span class="math">\(n\)</span>, often denoted <span class="math">\(\ZZn{n}\)</span>. Like all rings, it has an additive structure and a multiplicative one. The additive structure is straightforward: <span class="math">\(\ZZn{n}\)</span> is cyclic, generated by <span class="math">\(1\)</span>. In fact, every integer <span class="math">\(a\)</span> coprime to <span class="math">\(n\)</span> is a generator for this group, giving a total of <span class="math">\(\phi(n)\)</span> generators. The multiplicative structure, on the other hand, is far less apparent.</p>
<p>Not all elements of <span class="math">\(\ZZn{n}\)</span> can participate in the multiplicative group, because not all of them have inverses. For example, 4 has no inverse in <span class="math">\(\ZZn{6}\)</span>; there’s no integer <span class="math">\(a\)</span> such that <span class="math">\(4a \equiv 1 \pmod 6\)</span>. Elements that do have inverses are called <em>units</em>, and we’ll denote the group of units in <span class="math">\(\ZZn{n}\)</span> as <span class="math">\(U_n\)</span>.</p>
<p>Since an element <span class="math">\(a \in \ZZn{n}\)</span> is a unit iff <span class="math">\(a\)</span> and <span class="math">\(n\)</span> are coprime, there are <span class="math">\(\phi(n)\)</span> units, where <span class="math">\(\phi\)</span> is the <a href="https://en.wikipedia.org/wiki/Euler%27s_totient_function">totient function</a>. But the size of the group alone doesn’t nail down the group structure. </p>
<p>For example:</p>
<ul>
<li><span class="math">\(U_5 = \{ 1, 2, 3, 4 \}\)</span>:<ul>
<li>generated by <span class="math">\(2\)</span>: <span class="math">\(2^0 = 1\)</span>, <span class="math">\(2^1 = 2\)</span>, <span class="math">\(2^2 = 4\)</span>, <span class="math">\(2^3 = 8 = 3\)</span></li>
<li>also generated by <span class="math">\(3\)</span>: <span class="math">\(3^0 = 1\)</span>, <span class="math">\(3^1 = 3\)</span>, <span class="math">\(3^2 = 4\)</span>, <span class="math">\(3^3 = 2\)</span></li>
<li>this group is isomorphic to <span class="math">\(\ZZn{4}\)</span></li>
</ul>
</li>
<li><span class="math">\(U_8 = \{ 1, 3, 5, 7 \}\)</span><ul>
<li>every element squares to <span class="math">\(1\)</span></li>
<li>this group is isomorphic to <span class="math">\(\ZZn{2} \times \ZZn{2}\)</span></li>
</ul>
</li>
</ul>
<p>Is there a way to find the structure of <span class="math">\(U_n\)</span>?</p>
<p><span class="mathdefs">
<span class="math">\(\newcommand{\ZZ}{\Bbb Z}
\newcommand{\ZZn}[1]{\ZZ / {#1} \ZZ}\)</span>
</span></p>
<!--TODO i would like to make this more introductory-->
<p>One of the most familiar rings is the ring of integers modulo <span class="math">\(n\)</span>, often denoted <span class="math">\(\ZZn{n}\)</span>. Like all rings, it has an additive structure and a multiplicative one. The additive structure is straightforward: <span class="math">\(\ZZn{n}\)</span> is cyclic, generated by <span class="math">\(1\)</span>. In fact, every integer <span class="math">\(a\)</span> coprime to <span class="math">\(n\)</span> is a generator for this group, giving a total of <span class="math">\(\phi(n)\)</span> generators. The multiplicative structure, on the other hand, is far less apparent.</p>
<p>Not all elements of <span class="math">\(\ZZn{n}\)</span> can participate in the multiplicative group, because not all of them have inverses. For example, 4 has no inverse in <span class="math">\(\ZZn{6}\)</span>; there’s no integer <span class="math">\(a\)</span> such that <span class="math">\(4a \equiv 1 \pmod 6\)</span>. Elements that do have inverses are called <em>units</em>, and we’ll denote the group of units in <span class="math">\(\ZZn{n}\)</span> as <span class="math">\(U_n\)</span>.</p>
<p>Since an element <span class="math">\(a \in \ZZn{n}\)</span> is a unit iff <span class="math">\(a\)</span> and <span class="math">\(n\)</span> are coprime, there are <span class="math">\(\phi(n)\)</span> units, where <span class="math">\(\phi\)</span> is the <a href="https://en.wikipedia.org/wiki/Euler%27s_totient_function">totient function</a>. But the size of the group alone doesn’t nail down the group structure. </p>
<p>For example:</p>
<ul>
<li><span class="math">\(U_5 = \{ 1, 2, 3, 4 \}\)</span>:<ul>
<li>generated by <span class="math">\(2\)</span>: <span class="math">\(2^0 = 1\)</span>, <span class="math">\(2^1 = 2\)</span>, <span class="math">\(2^2 = 4\)</span>, <span class="math">\(2^3 = 8 = 3\)</span></li>
<li>also generated by <span class="math">\(3\)</span>: <span class="math">\(3^0 = 1\)</span>, <span class="math">\(3^1 = 3\)</span>, <span class="math">\(3^2 = 4\)</span>, <span class="math">\(3^3 = 2\)</span></li>
<li>this group is isomorphic to <span class="math">\(\ZZn{4}\)</span></li>
</ul>
</li>
<li><span class="math">\(U_8 = \{ 1, 3, 5, 7 \}\)</span><ul>
<li>every element squares to <span class="math">\(1\)</span></li>
<li>this group is isomorphic to <span class="math">\(\ZZn{2} \times \ZZn{2}\)</span></li>
</ul>
</li>
</ul>
<p>Is there a way to find the structure of <span class="math">\(U_n\)</span>?</p>
<!-- more -->
<hr>
<p>A versatile theorem from ring theory is the Chinese Remainder Theorem, which (as a special case) says that, for <span class="math">\(m\)</span>, <span class="math">\(n\)</span> coprime, the rings <span class="math">\(\ZZn{m} \times \ZZn{n}\)</span> and <span class="math">\(\ZZn{mn}\)</span> are isomorphic. This induces an isomorphism on the units as well (can you see why?).</p>
<p>This means that in order to understand the structure of <span class="math">\(U_n\)</span>, we only need to understand <span class="math">\(U_{p^k}\)</span> for all primes <span class="math">\(p\)</span> and positive integers <span class="math">\(k\)</span>.</p>
<p>We claim that <span class="math">\(U_{p^k}\)</span> is always cyclic for odd <span class="math">\(p\)</span>, but for <span class="math">\(p = 2\)</span>, it’s only cyclic when <span class="math">\(k = 1, 2\)</span>.</p>
<hr>
<p>Let <span class="math">\(p\)</span> be an odd prime.</p>
<p>Of course, we start with the simplest case, <span class="math">\(U_p\)</span>. Because <span class="math">\(\ZZn{p}\)</span> is a field, its multiplicative group is cyclic (see <a href="https://math.stackexchange.com/a/59911/55540">here</a> for a slick proof).</p>
<p>It is tempting to use this as the base case for an induction on <span class="math">\(k\)</span>, but for technical reasons, we need to start our induction at <span class="math">\(k = 2\)</span>.</p>
<p><em>Technical Reasons</em>: Assuming that our claim is true and that <span class="math">\(U_{p^k}\)</span> is indeed cyclic, let’s consider the number of generators as <span class="math">\(k\)</span> increases. Since the number of generators in a cyclic group of size <span class="math">\(m\)</span> is <span class="math">\(\phi(m)\)</span>, we have:</p>
<table>
<thead>
<tr>
<th>Group</th>
<th align="center"># of elements</th>
<th align="center"># of generators</th>
</tr>
</thead>
<tbody>
<tr>
<td><span class="math">\(U_p\)</span></td>
<td align="center"><span class="math">\(p-1\)</span></td>
<td align="center"><span class="math">\(\phi(p-1)\)</span></td>
</tr>
<tr>
<td><span class="math">\(U_{p^2}\)</span></td>
<td align="center"><span class="math">\(p(p-1)\)</span></td>
<td align="center"><span class="math">\((p-1) \phi(p-1)\)</span></td>
</tr>
<tr>
<td><span class="math">\(U_{p^3}\)</span></td>
<td align="center"><span class="math">\(p^2(p-1)\)</span></td>
<td align="center"><span class="math">\(p (p-1) \phi(p-1)\)</span></td>
</tr>
<tr>
<td><span class="math">\(U_{p^4}\)</span></td>
<td align="center"><span class="math">\(p^3(p-1)\)</span></td>
<td align="center"><span class="math">\(p^2 (p-1) \phi(p-1)\)</span></td>
</tr>
<tr>
<td><span class="math">\(U_{p^k}\)</span></td>
<td align="center"><span class="math">\(p^{k-1} (p-1)\)</span></td>
<td align="center"><span class="math">\( p^{k-2}(p-1) \phi(p-1)\)</span></td>
</tr>
</tbody>
</table>
<hr>
<p>From the evidence above, we suspect that if <span class="math">\(g\)</span> is a generator mod <span class="math">\(p\)</span>, only <span class="math">\(p-1\)</span> of the <span class="math">\(p\)</span> possible lifts to <span class="math">\(U_{p^2}\)</span> will be generators. This suggests there is one “bad” lift for each generator.</p>
<p>Fortunately, we can find this bad lift explicitly: we claim it’s <span class="math">\(g^p\)</span>.</p>
<p>We know that <span class="math">\(g^p \equiv g \pmod{p}\)</span>, so <span class="math">\(g^p\)</span> really is a lift of <span class="math">\(g\)</span>. And since <span class="math">\(g^{p(p-1)} \equiv 1 \pmod{p^2}\)</span>, we know that the order of <span class="math">\(g^p\)</span> is at most <span class="math">\(p-1\)</span> – too small to generate all of <span class="math">\(U_{p^2}\)</span>.</p>
<p>If our hunch is true, then this is the <em>only</em> bad lift of <span class="math">\(g\)</span>, and so, guided by our suspicions, make the following claim:</p>
<p><strong>Claim</strong>: if <span class="math">\(a\)</span> is not a multiple of <span class="math">\(p\)</span>, then <span class="math">\(g^p + ap\)</span> is a generator of <span class="math">\(U_{p^2}\)</span>.</p>
<p><em>Proof</em>: Since <span class="math">\(g^p + ap \equiv g \pmod{p}\)</span>, it has order <span class="math">\(p-1\)</span> in <span class="math">\(U_p\)</span>. This means its order in <span class="math">\(U_{p^2}\)</span> must be a multiple of <span class="math">\(p-1\)</span>. Its order must also divide the size of the group, narrowing the possibilities to <span class="math">\(p-1\)</span> and <span class="math">\(p(p-1)\)</span>. Thus, to prove that <span class="math">\(g^p + ap\)</span> is a generator, we just have to show it doesn’t have order <span class="math">\(p-1\)</span>.</p>
<p>Assume it does, and expand <span class="math">\(1 \equiv (g^p + ap)^{p-1}\)</span> by the binomial theorem:
</p>
<div class="math">$$ 1 \equiv (g^p + ap)^{p-1} \equiv \sum_{i = 0}^{p - 1} \binom{p - 1}{i} (g^p)^{p-1-i} (ap)^i \pmod{p^2} $$</div>
<p>The terms for <span class="math">\(i \ge 2\)</span> have two or more factors of <span class="math">\(p\)</span> in them, so they get killed, leaving us with
</p>
<div class="math">$$ 1 \equiv g^{p(p-1)} + (p-1) g^{p(p-2)} ap \pmod{p^2} $$</div>
<p>Recalling that <span class="math">\(g^{p(p-1)} \equiv 1\)</span>, we get:
</p>
<div class="math">$$ 0 \equiv (p-1) g^{p(p-2)} ap \pmod{p^2} $$</div>
<p>For this to be true, we would need to find two factors of <span class="math">\(p\)</span> in <span class="math">\((p-1) g^{p(p-2)} ap\)</span>. There’s clearly one factor, from the <span class="math">\(p\)</span>, but none of the other terms can provide the second. By contradiction, <span class="math">\(g^p + ap\)</span> must have order <span class="math">\(p(p-1)\)</span> and thus generate <span class="math">\(U_{p^2}\)</span>.</p>
<p>Note that there are <span class="math">\(p-1\)</span> choices of <span class="math">\(a\)</span>, and so we’ve confirmed our suspicion that every generator mod <span class="math">\(p\)</span> has <span class="math">\(p-1\)</span> good lifts and one bad lift mod <span class="math">\(p^2\)</span>.</p>
<hr>
<p>Now we’re ready for the inductive step.</p>
<p>Let <span class="math">\(k \ge 2\)</span> and <span class="math">\(g\)</span> be a generator of <span class="math">\(U_{p^k}\)</span>. We claim <span class="math">\(g\)</span> is also a generator for <span class="math">\(U_{p^{k+1}}\)</span>.</p>
<p>Since it’s a generator, it has order <span class="math">\(p^{k-1} (p-1)\)</span> in <span class="math">\(U_{p^k}\)</span>, and so its order in <span class="math">\(U_{p^{k+1}}\)</span> must be a multiple of that. This means it is either <span class="math">\(p^{k-1} (p - 1)\)</span> or <span class="math">\(p^k (p - 1)\)</span>. We just need to show it isn’t the former.</p>
<p>Let’s try to do a binomial expansion like before. We know that <span class="math">\(g^{p^{k-1} (p-1)} = (g^{p^{k-2} (p-1)})^p\)</span>, and that <span class="math">\(g^{p^{k-2} (p-1)} = a p^{k-1} + b\)</span> for some <span class="math">\(b < p^{k-1}\)</span>.
By Euler’s theorem, <span class="math">\(g^{p^{k-2} (p-1)} \equiv 1 \pmod{p^{k-1}}\)</span> (consider the size of <span class="math">\(U_{p^{k-1}}\)</span>). This means that <span class="math">\(b = 1\)</span>. Furthermore, because <span class="math">\(g\)</span> is a generator in <span class="math">\(U_{p^k}\)</span>, we know that <span class="math">\(p \nmid a\)</span>. So <span class="math">\(g^{p^{k-2} (p-1)} = 1 + a p^{k-1}\)</span> where <span class="math">\(p\)</span> and <span class="math">\(a\)</span> are coprime.</p>
<p>Now we can do our binomial business:
</p>
<div class="math">$$ g^{p^{k-1} (p-1)} = \sum_{i = 0}^p \binom{p}{i} (a p^{k-1})^i $$</div>
<p>How many factors of <span class="math">\(p\)</span> are in each term?
- <span class="math">\(i = 0, 1\)</span>: don’t care.
- <span class="math">\(i \ge 2, i \ne p\)</span>: <span class="math">\(1\)</span> from the binomial, and at least <span class="math">\(2(k-1)\)</span> from the power, for a total of at least <span class="math">\(2k-1\)</span>. Since <span class="math">\(k \ge 2\)</span>, we have <span class="math">\(2k-1 \ge k+1\)</span>, and these terms vanish mod <span class="math">\(p^{k+1}\)</span>.
- <span class="math">\(i = p\)</span>: we lose the factor from the binomial, so we have exactly <span class="math">\(p(k-1)\)</span> factors of <span class="math">\(p\)</span>. Since <span class="math">\(p\)</span> is odd, <span class="math">\(p \ge 3\)</span>, and for <span class="math">\(k \ge 2\)</span>, <span class="math">\(3k-3 \ge k+1\)</span>, and this term also vanishes.</p>
<p>So we are left with <span class="math">\(g^{p^{k-1} (p-1)} \equiv 1 + a p^k \pmod{p^{k+1}}\)</span>, and since <span class="math">\(a\)</span> isn’t a multiple of <span class="math">\(p\)</span>, this shows that <span class="math">\(g\)</span> does not have order <span class="math">\(p^{k-1} (p - 1)\)</span>. Thus, <span class="math">\(g\)</span> must be a generator mod <span class="math">\(p^{k+1}\)</span>.</p>
<p>By induction, this shows that <span class="math">\(U_{p^k}\)</span> is cyclic for all <span class="math">\(k\)</span>.</p>
<hr>
<p>Note that the above argument <em>almost</em> works for <span class="math">\(p = 2\)</span>; the base case goes through, and the inductive step fails only when we look at the last term: when <span class="math">\(p = 2\)</span>, we can’t conclude <span class="math">\(p(k-1) \ge k+1\)</span>. But this only fails at <span class="math">\(k = 2\)</span>, it actually continues to work for <span class="math">\(k \ge 3\)</span>. So if there <em>were</em> generators for <span class="math">\(U_8\)</span>, then they would lift to generators for <span class="math">\(U_{16}\)</span>, and those to <span class="math">\(U_{32}\)</span>, and so on. But we just barely fail the jump from <span class="math">\(k = 2\)</span> to <span class="math">\(k = 3\)</span>, and this is why <span class="math">\(p = 2\)</span> is different from its odd peers.</p>
<p>Still though, we can modify our argument slightly to derive the structure of <span class="math">\(U_{2^k}\)</span> for <span class="math">\(k \ge 3\)</span>. Since <span class="math">\(U_8\)</span> is non-cyclic, there is no chance for any higher <span class="math">\(U_{2^k}\)</span> to be cyclic. But we will show they’re pretty darn close.</p>
<p>We will call <span class="math">\(g\)</span> a “near-generator” of <span class="math">\(U_{2^k}\)</span> if <span class="math">\(g\)</span> generates half the group, and multiplying by <span class="math">\(-1\)</span> gives the other half. Our base case is <span class="math">\(U_8 = \{ 1, 3, 5, 7 \}\)</span>, for which <span class="math">\(3\)</span> and <span class="math">\(5\)</span> are near-generators.</p>
<p>Say that <span class="math">\(g\)</span> is a near-generator of <span class="math">\(U_{2^k}\)</span>. We claim that it is also a near-generator of <span class="math">\(U_{2^{k+1}}\)</span>.</p>
<p>As before, we show the possible orders for <span class="math">\(g\)</span>, and eliminate all but one possibility. Since <span class="math">\(g\)</span> is a near-generator mod <span class="math">\(2^k\)</span>, it has order <span class="math">\(2^{k-2}\)</span> in <span class="math">\(U_{2^k}\)</span>. Thus its order in <span class="math">\(U_{2^{k+1}}\)</span> must be a multiple of <span class="math">\(2^{k-2}\)</span>. This leaves possibilities <span class="math">\(2^{k-2}\)</span>, <span class="math">\(2^{k-1}\)</span>, and <span class="math">\(2^k\)</span>. It cannot be <span class="math">\(2^k\)</span>, because that would imply that <span class="math">\(U_{2^{k+1}}\)</span> is cyclic, and that is impossible. So it remains to eliminate <span class="math">\(2^{k-2}\)</span>.</p>
<p>A similar argument to the odd <span class="math">\(p\)</span> case can be used to tell us that <span class="math">\(g^{2^{k-3}} = 1 + a 2^{k-1}\)</span> for some odd <span class="math">\(a\)</span>. Then:
</p>
<div class="math">$$ g^{2^{k-2}} = (g^{2^{k-3}})^2 = (1 + a 2^{k-1})^2 = 1 + 2 \cdot a 2^{k-1} + a^2 2^{2k-2} $$</div>
<p>Taken mod <span class="math">\(2^{k+1}\)</span>, this tells us that <span class="math">\(g^{2^{k-2}} \equiv 1 + a 2^k \pmod{2^{k+1}}\)</span>, eliminating the possibility of <span class="math">\(2^{k-2}\)</span> as the order. Thus, <span class="math">\(g\)</span> must have order <span class="math">\(2^{k-1}\)</span> in <span class="math">\(U_{2^{k+1}}\)</span>.</p>
<p>To show that <span class="math">\(-1\)</span> gives the rest of the group, it suffices to show that <span class="math">\(-1\)</span> is not in the half generated by <span class="math">\(g\)</span>. But if <span class="math">\(g^r \equiv -1 \pmod{2^{k+1}}\)</span>, then surely this would also be true mod <span class="math">\(2^k\)</span>, and so this situation does not arise.</p>
<p>Therefore, for <span class="math">\(k \ge 3\)</span>, <span class="math">\(U_{2^k} = \{ \pm g^r \mid r = 0, 1, \ldots 2^{k-2} - 1 \} \cong \ZZn{2} \times \ZZn{2^{k-2}}\)</span>. The cases of <span class="math">\(U_2\)</span> and <span class="math">\(U_4\)</span> are easily computed to be the trivial group and <span class="math">\(\ZZn{2}\)</span>, respectively.</p>
<hr>
<p>Now we are finally ready to understand <span class="math">\(U_n\)</span> in general: factor <span class="math">\(n\)</span> into primes, and apply the results we learned above.</p>
<p>Specifically, we can answer the question of exactly when <span class="math">\(U_n\)</span> is cyclic.</p>
<p>If <span class="math">\(n\)</span> has two odd prime factors <span class="math">\(p\)</span> and <span class="math">\(q\)</span>, then <span class="math">\(n = p^k q^\ell n'\)</span> with <span class="math">\(n'\)</span> coprime to <span class="math">\(p\)</span> and <span class="math">\(q\)</span>. So <span class="math">\(U_n \cong U_{p^k} \times U_{q^\ell} \times U_{n'}\)</span>. The first two factors in this product have even size, i.e., their sizes have a common factor. This makes it impossible for <span class="math">\(U_{p^k} \times U_{q^\ell}\)</span> to be cyclic, and therefore, <span class="math">\(U_n\)</span> can’t be cyclic either.</p>
<p>If <span class="math">\(8\)</span> divides <span class="math">\(n\)</span>, then <span class="math">\(n = 2^k m\)</span> for some odd <span class="math">\(m\)</span> and some <span class="math">\(k \ge 3\)</span>, and <span class="math">\(U_n = U_{2^k} \times U_m\)</span>. But <span class="math">\(U_{2^k}\)</span> is not cyclic, and this also disqualifies <span class="math">\(n\)</span>.</p>
<p>So we are left with <span class="math">\(n = 1\)</span>, <span class="math">\(2\)</span>, <span class="math">\(4\)</span>, <span class="math">\(p^k\)</span>, <span class="math">\(2p^k\)</span>, and <span class="math">\(4p^k\)</span>. The first three can be checked by hand; they’re all cyclic. We showed earlier that <span class="math">\(U_{p^k}\)</span> is cyclic, and the Chinese Remainder Theorem tells us that <span class="math">\(U_{2p^k} \cong U_2 \times U_{p^k}\)</span> is too (note that <span class="math">\(U_2\)</span> is the trivial group). But <span class="math">\(U_{4p^k} \cong U_4 \times U_{p^k}\)</span>, and both groups have even size, and so <span class="math">\(U_{4p^k}\)</span> is not cyclic.</p>
<hr>
<p>To summarize:
- <span class="math">\(U_n\)</span> is cyclic exactly when <span class="math">\(n = 1\)</span>, <span class="math">\(2\)</span>, <span class="math">\(4\)</span>, <span class="math">\(p^k\)</span> or <span class="math">\(2p^k\)</span>
- <span class="math">\(U_{p^k} \cong \ZZn{p^{k-1} (p-1)}\)</span> for odd <span class="math">\(p\)</span>
- <span class="math">\(U_{2^k} \cong \ZZn{2} \times \ZZn{2^{k-2}}\)</span> for <span class="math">\(k \ge 3\)</span>
- lifting a generator always produces another generator, except potentially from <span class="math">\(p\)</span> to <span class="math">\(p^2\)</span> (but the “bad lift” is known explicitly)</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
var configscript = document.createElement('script');
configscript.type = 'text/x-mathjax-config';
configscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" availableFonts: ['STIX', 'TeX']," +
" preferredFont: 'STIX'," +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>