Math Mondays

Wirefly Hive Problem

2023-02-06T00:00:00-08:00

This puzzle comes from a video about Magic the Gathering that my brother sent me, which you can watch here.

The video is more about the specific rules of Magic, but this isn’t a Magic blog, so let’s get to the math as soon as possible.

I’ll try to summarize what’s going on here, but I don’t actually play Magic, so please forgive the inevitable inaccuracy here.

It’s your turn, and on your side of the board, you have some infinite source of mana, a Wirefly Hive, and Filigree Sages. Meanwhile, your opponent has a Leonin Elder.

Basically, what this means is that, due to the effect of Filigree Sages, you can activate Wirefly Hive as many times as you want this turn. Each time you do so, you flip a coin.

If it comes up heads, you get a Wirefly with 2 power (i.e attack strength), and your opponent gains 1 life from the effect of Leonin Elder.
However, if it’s tails, all your Wireflies are destroyed. Plus, your opponent keeps their increased life!

This means that if you flip several heads in a row, you can gain a bunch of attack, potentially enough to kill your opponent. However, the more you flip tails, the harder your task becomes, as your opponent’s life steadily climbs higher.

The question is: if your opponent starts with $L$ life, what is the probability that you eventually amass enough Wireflies to win?

Small $L$

Obviously if $L = 0$ you have already won, so let’s look at $L = 1$.

If we flip heads, then we get 1 Wirefly, and our opponent increases to 2 life. Because our Wirefly has attack 2, that’s enough for us to win.

But what if we flip tails?

All zero of our Wireflies get destroyed, and our opponent remains at 1 life, so this is actually a no-op. Nothing has happened, and we are free to keep flipping until we get a heads.

In other words, the probability of winning with $L = 1$ is 100%!

How about $L = 2$?

Recurrence Relation

Just like before, we can keep flipping tails at the start, and nothing will happen. So eventually, we will flip a head, and get one Wirefly, bringing our opponent to 3 life. But now it gets harder.

If we get another heads, we get a second Wirefly, and our opponent heals to 4 life. We can now attack with our two Wireflies for 4 damage and win.
If we get tails, we lose our Wirefly, and have to start all over, but our opponent starts with 3 life this time.

If we let $p(L)$ be the probability of winning, when our opponent starts with $L$ life, then we have

$$p(2) = \frac{1}{2} + \frac{1}{2} p(3)$$

We can do something similar for higher $L$. If our opponent starts with $L$ life, then we flip the coin until we get our first head, giving our opponent $L+1$ life, and us one Wirefly. After that, if we get a streak of $L$ heads, our opponent will have $2L$ life, but we will have $L$ Wireflies total, and that is just enough to win. However, if our streak only lasts $k < L$ heads, we have to start over, with our opponent at $L+k$ life.

The probability of getting a streak of $L$ heads (remember, the first is “free”), is $1/2^{L-1}$. The probability of getting $k$ heads followed by a tail is $1/2^{k-1+1}$. Putting it all together, we get:

$$ p(L) = \frac{1}{2^L} + \sum_{k = 1}^{L} \frac{1}{2^k} p(L+k) $$

Unlike normal recurrence relations, this grows towards higher values of $L$, and we have no base case to start with (try expanding this formula for $L=1$!).

Additionally, this relation is under-determined. For example, it is satisfied by $p(L) = 1$ always, which could be the answer, but as we discover below, it is not. So we’ve lost some information somewhere, but I don’t know what it is.

I was not able to get anywhere with this, but maybe there’s some fancy generating function tricks one can do.

Random Walk

To make some progress, let’s transform the problem a bit.

If instead, we weaken our Wireflies to have only 1 attack, and change the effect on Leonin Elder to be “Whenever an artifact is destroyed, gain 1 life”, then we get the same puzzle, but it’s easier to reason about geometrically!

(If our opponent has $L$ life, we still have to flip $L$ heads in a row to get lethal damage. And when we break a streak of $k$ heads, we still have to start over with our opponent at $L+k$ life.)

Let’s draw a grid, where the x-axis is “opponent’s life”, and the y-axis is “number of Wireflies”. Each time we flip heads, we move up one step, and when we flip tails, we slide diagonally down to the x-axis. Our goal is to touch the diagonal $x=y$, at which point we have lethal damage.

Now our problem takes the form of a random walk!

In order to figure out the probability of touching the diagonal, let’s figure out the probability of touching it at each point.

If our opponent starts with $L$ life, and we touch the diagonal at $(N, N)$, then we must have:

Traversed some path from $(L, 0)$ to $(N, 0)$, without touching the diagonal.
Then we flipped $N$ consecutive heads.

The probability of the latter is easy to compute; we’ve done it several times already: $1/2^{N-1}$. The former is trickier.

Any particular path from $(L, 0)$ to $(N, 0)$ comes from a specific sequence of heads streaks, punctuated by arbitrary runs of tails. For example, the sequence shown in the diagram above has streaks of $(1, 2, 1)$, and could have been produced by:

HTHHTHT
HTTTTHHTTTTTHTTTTTT
TTTTTHTHHTTHT
and so on

What’s the probability of getting this particular sequence?

We have to get a streak of one head: $1/2$
Then a streak of two heads: $1/4$
And lastly, a streak of one head again: $1/2$

In total, that’s $1/16$.

In general, the probability of getting a streak pattern of $(k_1, k_2, \ldots, k_i)$ is $1/2^{k_1} \cdot 1/2^{k_2} \cdots 1/2^{k_i}$, or in other words, $1/2^{k_1 + k_2 + \cdots + k_i}$. And since we know we need $N - L$ heads to get from $(L, 0)$ to $(N, 0)$, that means each path has the exact same probability: $1/2^{N - L}$.

The only thing we’re missing is the number of paths from $(L, 0)$ to $(N, 0)$. If we call that quantity $a_{L, N}$, then the probability that we win is:

$$ p(L) = \sum_{N = L}^\infty p(\textrm{walked to $(N, 0)$}) \cdot \frac{1}{2^{N-1}} = \sum_{N = L}^\infty \frac{a_{L, N}}{2^{N - L}} \cdot \frac{1}{2^{N-1}} $$

The only thing left to do is find the values of $a_{L, N}$.

Counting Paths

Counting these paths turns out to be quite an ordeal, and I never ended up finding a closed form for them. Still though, there’s a clean recurrence relation that one can use to compute these things, so let’s dive in.

If we’re looking for a path from $(L, 0)$ to $(N, 0)$, our last step has to be a diagonal. There’s only a few possible places we can slide down the diagonal and hit $(N, 0)$.

Specifically, we can be on any point $(N - i, N + i)$, as long as $N - i < N + i$, otherwise we’d be on (or past) the diagonal, and this would not be a valid path.

The only way to get from $(L, 0)$ to $(N - i, N + i)$ is to make it to $(N - i, 0)$, and then go straight up. So each path to $(N - i, 0)$ gives us exactly one path to $(N, 0)$, meaning the number of paths is just:

$$ a_{L, N} = \sum_{i} a_{L, N - i} $$

where the sum is taken over all $i$ such that $N - i < N + i$. Also, we can skip all $i$ such that $N - i < L$, because there are no paths going backwards.

For a specific example, consider $a_{2, 6}$.

We can slide down to $(6, 0)$ from $(5, 1)$ or $(4, 2)$, but not $(3, 3)$. So we have that $a_{2, 6} = a_{2, 5} + a_{2, 4}$. The possible paths are shown below, albeit, kind of cluttered:

We see that there are 3 possible paths, two of which come from $a_{2, 5}$ (blue and purple), and one of which comes from $a_{2, 4}$ (cyan).

Simpler Recurrence

This is a pretty simple recurrence relation, but we can actually do better!

Consider $a_{2, 7}$. From our existing knowledge, we have that $a_{2, 7} = a_{2, 6} + a_{2, 5} + a_{2, 4}$. But if we replace $a_{2, 5} + a_{2, 4}$, then we get that $a_{2, 7} = 2 a_{2, 6}$!

Similarly, $a_{2, 8} = a_{2, 7} + a_{2, 6} + a_{2, 5}$, but here we are not quite so lucky – we do not have all the terms we need. The best we can do is $a_{2, 8} = 2 a_{2, 7} - a_{2, 4}$.

And in general, for all $n$ and $k > n$, we have:

$$ a_{n, 2k+1} = 2 a_{n, 2k} $$

$$ a_{n, 2k} = 2 a_{n, 2k-1} - a_{n, k} $$

Conclusion?

Unfortunately, I was not able to find any closed-form solution here.

I threw the recurrence relation into Python, and got the following results:

$L$	$p(L)$
0	N/A
1	1
2	0.68332
3	0.36663
4	0.18645
5	0.093619
6	0.046859
7	0.023435
8	0.011718
9	0.0058593
10	0.0029297

Additionally, I found an OEIS sequence that counts $a_{2, n}$, and I was able to find other references to this sequence, but not the specific sum we’re looking for. Wolfram Alpha doesn’t produce any reasonable guesses for a closed form either. Can’t win them all, I guess.

Circular Prison of Unknown Size

2022-03-28T00:00:00-07:00

“Prisoner puzzles” are a popular kind of mathematical puzzle, in which a large group of cooperative players (“prisoners”) play a game against an adversarial supervisor (often “the warden”), with limited communication. Some classic examples are here and here (there’s frequent overlap with “hat problems”).

Recently, I ran across a very difficult prisoner puzzle, which required an intricate solution from the prisoners to win. I’ve rephrased the problem and a few solutions below, along with an interactive demonstration of the strategies.

The inevitable has happened – all the mathematicians in the world have been gathered up and arrested for being huge nerds.

The mathematicians are housed in a custom prison, which has $n$ identical, isolated cells, arranged in a large circle, each containing a single occupant (no empty cells). Inside each cell is a light switch and a light bulb, but the electrical wiring is unusual. If the light switch in a cell is on at noon, the bulb in the adjacent cell will briefly flash. Otherwise, and at all other times, the light bulb is off¹.

In order to prevent communication, every midnight, the warden fills the cells with knockout gas, flips all the switches to “off”, and rearranges the prisoners however he wants. (Still only one prisoner per cell though.)

One day, the warden enters your cell and issues you a challenge to win your freedom, and that of your colleagues. At any point, any one of the prisoners can announce “There are $n$ prisoners!”. If they are correct, then everyone is free. Otherwise, everyone will be executed. He allows you to send a message to all of your colleagues, describing the game and the plan, to which they are not allowed to reply. The warden, of course, will read your message, and shuffle everyone to thwart your strategy.

What plan would you devise?

The StackExchange link above describes a few solutions, and there are two in particular that I think are particularly interesting. In all cases, it’s important that you are singled out by the warden, because this breaks the symmetry between you and everyone else, and allows you to act as the captain for the group.

Upper Bound

For both strategies, the first step is to establish an upper bound, which can be done as follows:

Finding an Upper Bound

We perform a sequence of rounds, i.e., Round 1, Round 2, etc., consisting of a waxing phase and a waning phase.
Within each round, prisoners are either “active” (will flip their switch today) or “inactive” (will not).
The waxing phase of round $k$ lasts for $k$ days:
- At the start of this round, the captain starts off active, and everyone else starts inactive.
- During this phase, anyone who sees a light becomes active, and stays that way for the rest of the phase.
The waning phase of round $k$ lasts for $2^k$ days:
- At the start of this phase, prisoners carry over their active status from the end of the waxing phase.
- If a prisoner does not see a light, they immediately become inactive and remain that way for the rest of the phase.

We claim that, at the end of a round, either everyone is active or everyone is inactive. If no one does, we move on to the next round, otherwise, we stop, and claim that $n \le 2^k$.

Consider the number of active prisoners at the end of the waxing phase. Because each prisoner can activate only one other prisoner per day, the number of active prisoners can at most double each day. This means that, at the end of the waxing phase, there are at most $2^k$ active prisoners.

Now, if everyone is active, the waning phase consists of everyone flipping the switch every day, and the round ends with everyone still active. Otherwise, some inactive prisoner border some active prisoner, and the number of active prisoners decreases by at least one per day. Since we started the phase with at most $2^k$ active prisoners, after $2^k$ days, everyone will be inactive.

Note: this process must eventually terminate, because during the $k$th round, there are at least $k$ active prisoners at the end of the waxing phase, and so, worst case scenario, we’ll finish at round $n$.

Solution by Flipping Coins

At this point, the solutions take different approaches. One of the solutions, given in this answer, relies on giving the prisoners the ability to flip coins, allowing them to make decisions that the warden can’t predict ahead of time.

In order to communicate the results of these coin flips, we build an “announcement” subprocedure.

Announcements

For a predicate $P$, an announcement for $P$ is a procedure that makes it common knowledge whether there exists some prisoner satisfying $P$.

The announcement period lasts $B$ days, where $B$ is an upper bound for the number of prisoners.

Every prisoner satisfying $P$ is always active (and always flips the switch).
Other prisoners become active when they see a light, and remain active afterwards, much like the waxing phase.

At the end of the announcement period, if someone satisfied $P$, everyone is active, otherwise everyone is inactive. This makes the (non-)satisfaction of $P$ common knowledge.

Now we can describe the strategy. The goal is to assign each prisoner a number from $1$ to $n$. At each point, the numbers $1$ through $d$ will be assigned, and $d$ is common knowledge. The captain starts out numbered $1$, and everyone else starts off unnumbered (i.e., $d = 1$).

Then, they repeat the following procedure:

Strategy

($B$ Days) Perform an announcement for “is unnumbered”.
- If we get a negative result, then everyone is numbered, and everyone knows $n$.
- Otherwise, proceed.
($1$ Day) Candidate Selection Day
- This day is the crucial day of the strategy.
- Every numbered prisoner simultaneously flips a coin. If they flip heads, they flip the switch today, otherwise they don’t.
- Unnumbered prisoners do not flip the switch today.
- Prisoners who see a light today are called candidates.
($d \cdot B$ Days) For each $i$ from $1$ to $d$, the prisoner numbered $i$ makes an announcement for “flipped heads”.
- After this, everyone knows exactly how many heads were flipped, i.e., how many candidates there are.
($B$ Days) Finally, an announcement is made for “is unnumbered candidate”.
- If this announcement is positive, and exactly one head was flipped, then there is a unique unnumbered candidate, and so they should assign themselves the number $d+1$, and everyone increments $d$.

Repeated enough times, this procedure will eventually² complete. Imagine that we’ve numbered all but one prisoner. In order to number the last prisoner, every numbered prisoner needs to flip tails, except the one adjacent to the unnumbered one. This is unlikely to happen, but it will eventually occur. The same is also true of all prior attempts to number a prisoner.

Also, the warden can rearrange the prisoners how he wishes, but because every numbered prisoner is equally capable of nominating a candidate, his efforts cannot actually impede the procedure.

To see how this algorithm plays out in practice, here is an interactive simulation. The coins are weighted to come up heads $1/d$ of the time, which I think is optimal.

The prisoners are named $A$ through $E$, but this is just so that you can track their identities between shuffles. Their numbered labels are in the lower left, the coins they flip are in the lower right, and their candidacy status is in the upper right.

Solution by Linear Algebra

The other approach (described here and here) is more complicated, but does not use randomness, and is guaranteed to finish in a certain number of days. It also leverages some linear algebra, so it’s a good thing our prisoners are mathematicians!

It begins the same way as the first solution, by establishing an upper bound $B$, and it uses the same announcement procedure. But instead of numbering individual prisoners, the goal is to partition them into subsets, and, after a certain point, deduce the size of these subsets.

Confused? Me too. It only really clicked for me after writing out this section.

At each point, there will be a partition of prisoners into subsets $S_1, \ldots, S_k$; the prisoners will all know $k$, and which subset they themselves are in. The sets are initialized with you, the captain, in $S_1$, and everyone else in $S_2$.

To refine this partition further, they attempt the following procedure repeatedly:

Strategy

For each subset $I \subseteq \{1, 2, \ldots, k\}$, other than the empty set and the whole set,
- ($1$ Day) All prisoners in $\bigcup_{i \in I} S_i$ flash their lights today. Let $T$ be the set of prisoners that see lights today.
- For each individual $j = 1, 2, \ldots, k$, check how $T$ and $S_j$ overlap:
  - ($B$ Days) Perform an announcement for prisoners in $S_j \cap T$
  - ($B$ Days) Perform an announcement for prisoners in $S_j \setminus T$
  - If both announcements were positive, replace $S_j$ with $S_j \cap T$, and add $S_j \setminus T$ as a new set, incrementing $k$. Then, abort this procedure and try again.
- Basically, $T$ is used to “cut” an $S_j$ into two smaller subsets, if possible.
If we get to this point, stop. We claim it is now possible to announce the correct number of mathematicians.

The loop must eventually stop, because in the worst case, $k = n$ and all subsets are size one. We can even put an upper bound on how long it will take. We can run the procedure at most $n-2$ times before all our subsets are size one. If we delay the splitting as long as possible, we’ll have to go through $2^n$-ish subsets, each of which takes $2B+1$ days. Since the worst case for our upper bound is $B = 2^n$, this gives $(n-2)2^n(2\cdot 2^n+1) \approx 2n^2 \cdot 4^n$ days.

But how does this help them calculate $n$?

Fix one of the subsets $I$. Because $k$ did not increase on the final attempt, we know $T$ was not able to cut any of the $S_j$ this time. In other words, for all $j$, either every prisoner in $S_j$ saw a light, or none of them did. Let $I'$ be the set of $j$ such that prisoners in $S_j$ saw lights. Since the number of prisoners seeing a light must equal the number of prisoners flipping a switch, this gives us an equation:

$$ \sum_{i \in I} |S_i| = \sum_{j \in I'} |S_j| $$

Note that $I'$ cannot be a subset of $I$. The circular nature of the prison means that, unless $I$ is the whole set or the empty set, there must be someone outside of $I$ that saw a light, and we excluded those subsets from the procedure.

So, letting $x_i$ be the size of $S_i$, we now have a system of equations about the variables $x_1, \ldots, x_k$. Importantly, the prisoners know these equations as well! Because of the announcements made when attempting to cut with $T$, the prisoners know whether $S_j \cap T$ or $S_j \setminus T$ is empty or not, and this tells them the contents of $I'$. (They already know $I$ because they agree on the order of iteration.) Lastly, they know that $S_1$ is just the captain, and so $x_1 = 1$.

We claim that, under these constraints, there is exactly one possible solution, and once the prisoners find it, they know the size of every subset exactly. They can win by simply guessing that $n = \sum x_i$.

Proof:

Let $x_1, \ldots, x_k$ and $y_1, \ldots, y_k$ be two such solutions. Let $r$ be the minimum value of $y_i/x_i$ over all $i$, and let $z_i = y_i - r x_i$. We note three things:

Each $z_i \ge 0$, because $r \le y_i/x_i$
At least one $z_i = 0$, because there is some $i$ for which $r = y_i/x_i$
As a linear combination of solutions, the $z_i$ are also a solution

Assume for the sake of contradiction that some of the $z_i$ are non-zero. Let $Z$ be the set of $i$ for which $z_i = 0$, which is by our assumption, not the whole set. Consider the equation corresponding to that subset; there is some subset $Z' \not\subseteq Z$ such that:

$$ \sum_{i \in Z} z_i = \sum_{j \in Z'} z_j $$

The left hand side must be zero, by definition, but because $Z' \setminus Z$ is non-empty, there is some non-zero $z_j$ on the right hand side. All the $z_j$s are non-negative, so we have reached a contradiction. Thus, all $z_i$ are zero, meaning that $y_i = r x_i$. And because $x_1 = y_1 = 1$, this forces $r$ to be $1$, and so the two solutions are identical.

This approach is simulated below, but with one minor caveat. Once the system of equations has a unique solution, the mathematicians will just blurt it out immediately, instead of waiting for all $2^n - 2$ equations to come in.

Note: Experimentally, it seems that it’s extremely common for the prisoners to all get partitioned into sets of size $1$, but that’s not necessarily the case every time. If you were able to rearrange the prisoners as you wished, you could set up such a situation: once you have $4$ partitions, you can ensure that partition $4$ is never split up.

But I haven’t yet added the ability to drag-and-drop prisoners around, so you have to approximate this by just hitting “Undo” and “Next” until you get the result you want. Maybe I’ll add that once I stop seething at JavaScript.

Essentially, every prisoner gets to send a bit to the cell next door. The noon thing is just to rule out prisoners trying to send multiple signals. ↩
As in, almost surely. ↩

A Cooperative Hat Game

2021-02-15T00:00:00-08:00

$\newcommand{W}{\square} \newcommand{B}{\blacksquare}$

Hat puzzles are super popular among mathematicians. Most of them have cute and clever solutions. Here’s one that, at the time of writing, is still an open problem.

Alice and Bob sit facing each other, each with an infinite tower of hats on their heads. Each hat is either black or white, with equal probability. Alice can see all of Bob’s hats, but not her own, and vice versa. On the count of three, both players must name a natural number, which is used to index into their own hat tower. If the two hats match, then the players win, otherwise they lose. (Also, they’re not allowed to talk, cough, wink, or otherwise communicate.)

As an example, say Alice’s hats are $\W\W\B\W\W\B\cdots$ and Bob’s hats are $\B\W\B\W\W\B\cdots$. If Alice says 3 and Bob says 1, then since Alice’s third hat and Bob’s first hat are both black, then they win. If they both say 1, their first hats do not match, so they lose.

What’s the best possible strategy, and how often does it win? No one knows! I have some conjectures here, and some (probably unoriginal) strategies that do pretty well.

Simplest Strategy

The simplest strategy is for both players to ignore any information they have and just pick the first hat. Unsurprisingly, this doesn’t go very well. The outcomes $\W/\W$, $\W/\B$, $\B/\W$, and $\B/\B$ are all equally likely, so the chance of winning is $1/2$.

It’s not at all obvious that you can do any better than this. Since there’s no communication, neither player can learn anything about their own hats, and so both players are equally likely to pick a white hat or a black hat. How can you squeeze out any additional advantage?

First-White Strategy

Here’s a strategy that does better. Both players look for the first white hat on their partner’s head, and guess the corresponding number. For example, if Bob is wearing $\B\B\W\B\W\W\cdots$, Alice would say “3”. If he’s wearing $\W\W\W\B\W\B\cdots$, Alice would say “1”. Call Alice’s guess $a$ and Bob’s guess $b$. What’s the probability of success?

Case $a = b$: they’re both pointing at white hats, so they win.
Case $a < b$: Bob’s guess means that every one of Alice’s hats before $b$ was black, including the one at $a$. Alice stopped looking at Bob’s hats at $a$, so Bob’s $b$th hat could be either color. They win with probability $1/2$.
Case $a > b$: Symmetric to the previous case.

So if $p$ is the probability that $a = b$, then the chance of success is $p + (1 - p) / 2 = 1/2 + p/2$. Even before we know $p$, we can already tell that we’re going to do better than $1/2$!

To find $p$, we sum up the probabilities both players say “1”, that they both say “2”, that they both say “3”, etc. Note that the chance that Alice says “$k$” is the chance that Bob’s $k$th hat is white, and that none of the previous ones were. Likewise for Bob. Summing up the resulting geometric series, we get

$$ p = \sum_{k = 1}^\infty \left[ \left(\frac{1}{2} \right) \left(\frac{1}{2} \right)^{k-1} \right]^2 = \sum_{k = 1}^\infty \frac{1}{4^k} = \frac{1/4}{1 - 1/4} = \frac{1}{3} $$

So by following this strategy, Alice and Bob can win with probability $2/3$. Much better!

Finite Strategies

Here’s another approach: what if we focus only on the first $N$ hats, reducing it to a finite problem?

If $N = 1$ obviously there’s nothing interesting we can do, so let’s look at $N = 2$. If Alice sees only black hats on Bob’s head, then she knows that strategizing is hopeless – Bob will pick a black hat for sure, and she’ll pick a black hat with probability only 50%. Same thing goes if she sees only white, and same thing from Bob’s point of view. So the only interesting cases are when both players have non-monochromatic hat stacks.

There’s four possible situations: $\W\B / \W\B$, $\W\B / \B\W$, $\B\W / \W\B$, and $\B\W / \B\W$. We could brute-force all possible strategies (there’s only four possible for each player, and half of those are constant strategies). But let’s think this one through. Let’s say, arbitrarily, that Alice guesses “1” if she sees $\W\B$, and “2” if she sees $\B\W$. If Bob sees $\W\B$ on Alice’s head, what should he do?

If he has $\W\B$, then Alice will pick “1”, selecting her white hat. Bob should select his white hat by saying “1”.
If he has $\B\W$, then Alice will pick “2”, selecting her black hat. Bob should select his black hat by saying “1”.

In both situations, saying “1” guarantees a win. Similarly, if he sees $\B\W$ on Alice’s head, he wins by saying “2”. So in the “neither player is monochrome” situation, they can win 100% of the time! For the monochrome cases, no strategy is possible, and so that’s just 50%. There’s 4 non-monochrome cases, and 12 monochrome ones, so that gives a win rate of $10/16 = 62.5\%$.

How about $N = 3$? We could just ignore the third hat, giving us a win rate of at least $10/16$, but we can do better. Consider the following (asymmetric) strategy:

If a player sees a monochromatic stack, they pick an arbitrary hat. Doesn’t matter.
If a player sees only one white hat, they pick the index corresponding to that hat.
If Alice sees one black hat, she picks the hat after that one (with wraparound, so $\B\W\W \to 2$, $\W\B\W \to 3$, $\W\W\B \to 1$).
If Bob sees one black hat, he picks the hat before that one (again, with wraparound).

How does this strategy do? Note that the strategy is unchanged by cyclic shifting of the hats, which reduces the amount of casework we have to do.

If either player has a monochromatic stack, then they win only 50% of the time, as usual.

If they both have a one-white-hat stack, then they have a guaranteed win.

$$ \begin{matrix} \\ \textrm{Alice's hats} \\ \\ \textrm{Bob's hats} \end{matrix} \qquad \begin{matrix} \downarrow\hphantom{\B\B} \\ \W\B\B \\ \downarrow\hphantom{\B\B} \\ \W\B\B \end{matrix} \qquad \begin{matrix} \downarrow \\ \W\B\B \\ \downarrow\hphantom{\B\B} \\ \B\W\B \end{matrix} \qquad \begin{matrix} \hphantom{\B\B}\downarrow \\ \W\B\B \\ \downarrow\hphantom{\B\B} \\ \B\B\W \end{matrix} $$

If they both have one-black-hat stacks, then they also have a guaranteed win, though it’s less obvious why.

$$ \begin{matrix} \\ \textrm{Alice's hats} \\ \\ \textrm{Bob's hats} \end{matrix} \qquad \begin{matrix} \downarrow \\ \B\W\W \\ \hphantom{\B\B}\downarrow \\ \B\W\W \end{matrix} \qquad \begin{matrix} \hphantom{\B\B}\downarrow \\ \B\W\W \\ \hphantom{\B\B}\downarrow \\ \W\B\W \end{matrix} \qquad \begin{matrix} \downarrow\hphantom{\B\B} \\ \B\W\W \\ \hphantom{\B\B}\downarrow \\ \W\W\B \end{matrix} $$

The only remaining case is when one player has a one-white stack, and the other has a one-black stack. We can’t win every matchup here, but we can get a solid $4/6$. (Note: what happens if you change the one-black strategy to “pick the black hat”?).

$$ \begin{matrix} \\ \textrm{Alice's hats} \\ \\ \textrm{Bob's hats} \end{matrix} \qquad \begin{matrix} \downarrow \\ \W\B\B \\ \downarrow \hphantom{\B\B} \\ \B\W\W \end{matrix} \qquad \begin{matrix} \hphantom{\B\B} \downarrow \\ \W\B\B \\ \downarrow \hphantom{\B\B} \\ \W\B\W \end{matrix} \qquad \begin{matrix} \downarrow \hphantom{\B\B} \\ \W\B\B \\ \downarrow \hphantom{\B\B} \\ \W\W\B \end{matrix} $$

$$ \begin{matrix} \\ \textrm{Alice's hats} \\ \\ \textrm{Bob's hats} \end{matrix} \qquad \begin{matrix} \downarrow \hphantom{\B\B} \\ \B\W\W \\ \hphantom{\B\B} \downarrow \\ \W\B\B \\ \end{matrix} \qquad \begin{matrix} \downarrow \\ \B\W\W \\ \hphantom{\B\B} \downarrow \\ \B\W\B \\ \end{matrix} \qquad \begin{matrix} \hphantom{\B\B} \downarrow \\ \B\W\W \\ \hphantom{\B\B} \downarrow \\ \B\B\W \\ \end{matrix} $$

This totals up to a winning probability of $44/64 = 68.75\%$. Better than $N = 2$, but also better than our “first-white” strategy!

The casework becomes worse and worse for $N \ge 4$, so we’ll stop here for now.

Stronger Together

We’ve seen two kinds of strategies so far: first-white, and finite strategies. These can be combined, in a pretty simple way, into a strategy better than either of them alone!

With an $N$-hat strategy, the augumented strategy goes as follows:

Each player looks at the first $N$ hats on their partner’s head.
If they’re not monochromatic, then apply the finite strategy as usual.
Otherwise, skip those $N$ hats, and look at hats $N+1$ to $2N$.
If those are non-monochromatic, apply the finite strategy, but increase all your answers by $N$.
Otherwise, look at the next block of $N$ hats, and repeat.

The finite strategies perform worst when facing a monochromatic block of hats. By using the “scan upwards and focus on the first non-monochromatic block” trick, we can sometimes salvage situations where the finite strategy would have to accept the 50-50 guess.

Say that the $N$-hat strategy has win rate $q$. We’d first like to find $q^\ast$, the conditional win rate for scenarios where neither player has a monochromatic stack. Let $W$ be the event “we win”, and $E$ be the event “neither player has a monochromatic stack”. The number of situations where Alice has a non-monochromatic stack is $2^N - 2$, and same for Bob. So the probability of $E$ is $(2^N - 2)^2/4^N$. Thus,

$$ \begin{align*} Pr(W) &= Pr(W | E) Pr(E) + Pr(W | \lnot E) Pr(\lnot E) \\ q &= q^\ast \frac{(2^N - 2)^2}{4^N} + \frac{1}{2} \left( 1 - \frac{(2^N - 2)^2}{4^N} \right) \\ q &= q^\ast \frac{(2^N - 2)^2}{4^N} + \frac{2^{N+1} - 2}{4^N} \\ q \frac{4^N}{(2^N - 2)^2} &= q^\ast + \frac{2^{N+1} - 2}{(2^N - 2)^2} \\ \frac{4^N q - 2^{N+1} + 2}{(2^N - 2)^2} &= q^\ast \\ \end{align*} $$

Next, we want to find $r$, the probability that both players will select the same block of $N$ hats. The chance an individual block is monochromatic is $2/2^N$, and so the chance that Alice (or Bob) picks the $k$th block is “probability the $k$th block is non-monochromatic” times “probability the first $k-1$ were monochromatic”. This is quite similar to the setup we had for the original first-white strategy.

$$ \begin{align*} r &= \sum_{k=1}^\infty \left( \frac{2^N - 2}{2^N} \cdot \left( \frac{2}{2^N} \right)^{k-1} \right)^2 \\ &= \frac{(2^N - 2)^2}{4^N} \sum_{k=1}^\infty \left( \frac{4}{4^N} \right)^{k-1} \\ &= \frac{(2^N - 2)^2}{4^N} \frac{1}{1 - 4/4^N} \\ &= \frac{(2^N - 2)^2}{4^N - 4} \\ &= \frac{2^N - 2}{2^N + 2} \end{align*} $$

So now we can find $q'$, the win rate of the augmented strategy. If they pick the same block, then they win with probability $q^\ast$ (remember that these blocks are necessarily non-monochromatic). If they don’t, then someone is picking into a monochromatic block, and so we’re fated to get only $1/2$ success.

$$ \begin{align*} q' &= r q^\ast + (1 - r) \frac{1}{2} \\ &= \frac{2^N - 2}{2^N + 2} \frac{4^N q - 2^{N+1} + 2}{(2^N - 2)^2} + \frac{4}{2^N + 2} \frac{1}{2} \\ &= \frac{4^N q - 2^{N+1} + 2}{4^N - 4} + \frac{2(2^N - 2)}{4^N - 4} \\ &= \frac{4^N q - 2}{4^N - 4} \end{align*} $$

Since $q \ge 1/2$, we have $q' \ge q$, and when the first inequality is strict, so is the second. So, perhaps unsurprisingly, augmenting a finite strategy makes it work better. How much better? Let’s take our $N = 3$ strategy:

$$ \frac{4^3 (44/64) - 2}{4^3 - 4} = \frac{42}{60} = \frac{7}{10} $$

We’ve nudged our 68.75% chance of winning to a 70% chance. That’s small, but it’s not nothing. Unfortunately, it’s as far as we can go – this is conjectured to be an optimal strategy. No one’s found or ruled out anything better yet.

Observations

Now that we’ve seen some strategies, we can look for some patterns.

In the simplest strategy, we’re equally likely to get any pair of hats. With the “first-white” strategy, what are the odds of each outcome? The only way to get $\W/\W$ is for both players to guess the same index, which happens with probability $1/3$. In the other $2/3$ of the time, half the time Alice guesses the higher number, and half the time it’s Bob. In the former case, Bob’s hat is guaranteed black, and Alice’s hat is random. In the latter case, it’s the other way around. So that adds up to $\B/\B$ with probability $1/3$, $\W/\B$ with probablity $1/6$, and $\B/\W$ with $1/6$.

Similarly, if you work through the strategies given in the “Finite Strategies” section, the probability of $\W/\W$ and $\B/\B$ outcomes are equal, as are $\B/\W$ and $\W/\B$ outcomes. This is no coincidence.

Since Alice is equally likely to pick a white or black hat (remember, she never learns anything about her own hat stack), $Pr(\W/\W) + Pr(\W/\B)$ has to equal $Pr(\B/\B) + Pr(\B/\W)$. Similarly, Bob has to be equally likely to pick white or black, meaning $Pr(\W/\W) + Pr(\B/\W)$ equals $Pr(\W/\B) + Pr(\B/\B)$. Subtracting one equation from the other gives $Pr(\B/\W) = Pr(\W/\B)$, and some quick algebra gives $Pr(\W/\W) + Pr(\B/\B)$ as well.

This tells us something interesting – changing the win condition to “both players pick white hats” doesn’t change the nature of the game at all. Maximizing the probability of matching pairs is the same as maximizing the number of white pairs (and in fact, this is how the problem is usually presented.)

Another thing we can look at is the relationship between the finite and infinite game. Let $p_\infty$ denote the best possible winning probability for the infinite game, and $p_N$ for the game with just $N$ hats. How are these related to each other?

Since an $N$-hat strategy works just as well for a $(N+1)$-hat game (by just ignoring the last hat), we know that $p_{N+1}$ is at least $p_N$. Similarly, $p_\infty \ge p_N$ for all $N$. This gives us a chain of inequalities:

$$ p_1 \le p_2 \le p_3 \le \cdots \le p_\infty $$

Also, from augmenting a strategy, we know that $p_\infty \ge \frac{4^N p_N - 2}{4^N - 4}$ for all $N$.

Upper Bounds, Infinite

Well, we know that $p_\infty$ is at least $0.7$; can we put an upper bound on it too?

Let’s say Alice and Bob have already decided on a strategy, one that has win rate $p$. Now, imagine that, right before the game starts, we split the game into two identical games: in one game, things proceed as normal, and in the other game, all of Alice’s hats are swapped with their opposites. Every black hat becomes a white hat, and vice versa. We’ll refer to these players as “Alice” and “nega-Alice”. Let $X$ be the random variable “how many games are won” (so it is either $0$, $1$, or $2$).

Then clearly the expected value of $X$ is just $p + p$ – each game has probability $p$ of being won, and expected value is linear. But we can also bound it in an interesting way. Let $S$ be the event “Bob picks the same color hat in both games”. Then in such a situation, only one of the two games is winnable. Both Alices will see the same hats on Bob, and will say the same number. But this will always result in different hats between them, and so Bob will win in exactly one game. If we let $q$ denote the probability of $S$ under the chosen strategy:

$$ E[X] = Pr(S) E[X|S] + Pr(\lnot S) E[X | \lnot S] \le q + (1 - q) 2 = 2 - q $$

Rearranging, we get that $p \le 1 - q/2$. What do we know about $q$? If Bob picks the same index in both games, then he’s guaranteed to pick the same color hat too, and if he doesn’t, then the hats are uncorrelated, and there’s a 50-50 chance he picks the same hat. So this means $q \ge 1/2$, and thus $p \le 3/4$.

So we know that the optimal $p_\infty$ is between $0.7$ and $0.75$. This is the best I’ve been able to prove, but apparently, there is a proof that $p_\infty < \frac{81}{112} \approx 0.723$, as mentioned in this paper. Doesn’t seem to be published though, unfortunately.

Upper Bounds, Finite

Let’s, for the moment, assume that $p_\infty$ is indeed $7/10$, and try to put some upper bounds on $p_N$.

In this section, it’ll be easier to work with “number of winning outcomes” than “probability of winning”, so for a strategy on $N$ hats, we’ll call the number of winning outcomes the “score” of a strategy, which is equal to $4^N$ times the win rate. The optimal score for an $N$-hat strategy we’ll denote $s_N$, which is of course equal to $4^N p_N$.

We’ll start with the inequality we learned about from augumenting finite strategies: $p_\infty \ge \frac{4^N p_N - 2}{4^N - 4}$. Rearranging it, we get that $s_N = 4^N p_N \le \frac{7}{10} (4^N - 4) + 2$. Let $B_N$ be the floor of the RHS, so that $s_N \le B_N$. Later, we’ll show that these bounds are sharp, and so $s_N$ actually equals $B_N$, but for now it’s easier to call them different names.

Computing some values of $B_N$, we can see a pattern forming:


$N$	$1$	$2$	$3$	$4$	$5$	$6$	$7$	$8$	$9$	$10$
$B_N$	$2$	$10$	$44$	$178$	$716$	$2866$	$11468$	$45874$	$183500$	$734002$

They seem to follow an almost-geometric recurrence relation:

$B_1 = 2$
for even $N$, $B_N = 4 B_{N-1} + 2$
for odd $N$, $B_N = 4 B_{N-1} + 4$

Proof: Let $e_N$ be the amount removed by flooring, i.e., $\left( \frac{7}{10} (4^N - 4) + 2 \right) - B_N$. We’d like to find $e_N$, since it will make our lives easier.

For odd $N$, this is easy: $4^N - 4$ is divisible by $10$, so the flooring is unnecessary, which makes $e_N = 0$.

For even $N$, $4^N - 4$ is $2$ mod $10$, and so $\frac{7}{10} (4^N - 4)$ is of the form “integer $+ \frac{7 \cdot 2}{10}$”. This makes $e_N = 2/5$.

Now, we can find the difference between $B_N$ and $4 B_{N-1}$:

$$ \begin{align*} B_N - 4 B_{N-1} &= \left( \frac{7}{10} (4^N - 4) + 2 - e_N \right) - 4 \left( \frac{7}{10} (4^{N-1} - 4) + 2 - e_{N-1} \right) \\ &= \left( \frac{7}{10} (4^N - 4) + 2 - e_N \right) + \left( \frac{7}{10} (16 - 4^N) - 8 + 4 e_{N-1} \right) \\ &= \frac{7}{10} (16 - 4) - 6 + 4 e_{N-1} - e_N \\ &= \frac{12}{5} + 4 e_{N-1} - e_N \end{align*} $$

For odd $N$, this is $\frac{12}{5} + \frac{8}{5} - 0 = 4$. For even $N$, this is $\frac{12}{5} + 0 - \frac{2}{5} = 2$. Check.

Now, we don’t know for sure that these $B_N$ are upper bounds on our score. That proof relied on $p_\infty$ actually being $7/10$. But when I take a computer and search for good strategies, I found lots of strategies that acheive $B_N$, and none that surpass it. That’s pretty suggestive that this conjecture is right.

But computer-generated strategies don’t give good intution, and my program starts to struggle at about $N = 11$. Can we come up with a way to construct strategies that hit $B_N$?

Finite Strategies, Part II

We’ll start with the following $3$-hat strategy, and build it up into $4$-hat and $5$-hat strategies. (I’ve picked a symmetric one, for ease of presentation). It has score $44$:


Hats	$\B\B\B$	$\W\B\B$	$\B\W\B$	$\W\W\B$	$\B\B\W$	$\W\B\W$	$\B\W\W$	$\W\W\W$
Choice	$1$	$2$	$1$	$1$	$3$	$2$	$3$	$1$

It’s easy to extend to a $4$-hat strategy, by just ignoring the last hat and applying the original strategy. But obviously this doesn’t improve the probability of winning, and it just increases the score to $4 \cdot 44 = 176$, which is a little less than $B_4 = 178$. Somehow we need to squeeze out an additional two points.

The key observation is that when we designed the $3$-hat strategy, it didn’t matter what our decision was when seeing $\B\B\B$ or $\W\W\W$. When you see your partner with a monochromatic stack of hats, you know that your choice doesn’t matter. But when we extended this to a $4$-hat strategy, those decisons were copied over to $\B\B\B\W$ and $\W\W\W\B$, where now they might matter! (They still won’t matter for $\B\B\B\B$ and $\W\W\W\W$ of course.)

Let’s just focus on the case where both Alice and Bob have one of these “almost monochromatic” stacks. Right now, they’ll both say “1”, and will only win when their stacks are identical. If they change their strategy so that $\B\B\B\W \to 4$, then they’ll win all four possible matchups.

$$ \begin{matrix} \\ \textrm{Alice's hats} \\ \\ \textrm{Bob's hats} \end{matrix} \qquad \begin{matrix} \hphantom{\B\B\B} \downarrow \\ \B\B\B\W \\ \hphantom{\B\B\B} \downarrow \\ \B\B\B\W \\ \end{matrix} \qquad \begin{matrix} \downarrow \hphantom{\B\B\B} \\ \B\B\B\W \\ \hphantom{\B\B\B} \downarrow \\ \W\W\W\B \\ \end{matrix} \qquad \begin{matrix} \hphantom{\B\B\B} \downarrow \\ \W\W\W\B \\ \downarrow \hphantom{\B\B\B} \\ \B\B\B\W \\ \end{matrix} \qquad \begin{matrix} \downarrow \hphantom{\B\B\B} \\ \W\W\W\B \\ \downarrow \hphantom{\B\B\B} \\ \W\W\W\B \\ \end{matrix} $$

That could be our extra two points we need. We just need to confirm that this tweak didn’t have a negative effect elsewhere.

If the matchup doesn’t involve $\B\B\B\W$, then obviously the result is unaffected. So all we have to look at are matchups of the form “$\B\B\B\W$ vs ‘anything other than $\B\B\B\W$ and $\W\W\W\B$’“. Before our tweak, we won exactly half of these matchups. Afterwards, the first player will answer “1”, “2”, or “3”, and the second player will answer “4”. The first player is guaranteed to pick a black hat, and since the second player is equally likely to pick a white or black hat, we still win exactly half of our matchups. So we have a score of $178$, as desired!

How about $N = 5$? We could try the same approach – extend and tweak the $\B\B\B\B\W$ state – but that would only get us to $4 \cdot 178 + 2 = 714$, which is still two points away from our target of $B_5 = 716$.

The key is to think about why tweaking $\B\B\B\W$ was a strict improvement on the old strategy. It didn’t affect the outcome against most other hat configurations. You can reframe our $4$-hat strategy as similar to our augumented “first-white” strategy:

Split the $4$ hats you see into a block of $3$ and a block of $1$.
If the first block is not monochromatic, apply the $3$-hat strategy.
If it is, apply the following strategy:


Hats	$(\B\B\B)\B$	$(\W\W\W)\B$	$(\B\B\B)\W$	$(\W\W\W)\W$
Choice	$1$	$1$	$4$	$1$

That table should look familar; it’s essentially our $2$-hat strategy from earlier on, but using the monochromatic block as a single hat!

This provides an interesting way to build strategies. If we have an $N$-hat strategy $S$, and an $M$-hat strategy $T$, then we can combine them into an $(N+M-1)$-hat strategy that has a potentially better score.

Let $p$ be the win rate of $S$, and $q$ the win rate of $T$. Then we can find the win rate of this new strategy.

If both players have a non-monochromatic first block, then the conditional win rate here is $p^\ast$, which we know how to compute.
If both players have a monochromatic first block, then the conditional win rate is just $q$.
If only one player has a monochromatic first block, then I claim they can only win half the time.
- Say Alice has the monochromatic first block, and Bob doesn’t. Then Alice will only ever answer a number between $1$ and $N$.
- Imagine flipping all of Bob’s hats; since Alice will still pick into her first block, it won’t change the color of the hat she picks. But it does flip the color of Bob’s choice.
- This pairs every win with a loss, and vice versa, so they must be equal in number.

This means the total win rate of this strategy is:

$$ \begin{align*} p_{new} &= p^\ast \frac{(2^N - 2)^2}{4^N} + \frac{1}{2} \frac{2 \cdot 2 \cdot (2^N - 2)}{4^N} + q \frac{2^2}{4^N} \\ &= \frac{4^N p - 2^{N+1} + 2}{4^N} + \frac{2^{N+1} - 4}{4^N} + \frac{4q}{4^N} \\ &= \frac{4^N p + 4q - 2}{4^N} \\ &= p + \frac{4q - 2}{4^N} \end{align*} $$

Interestingly enough, this doesn’t depend on the particular strategy chosen, only its win rate. Converting this into a score-based equation, where $s$ is the score of $S$, and $t$ the score of $T$, we get:

$$ s_{new} = 4^{M-1} s + (t - 4^M / 2) $$

That last term can be interpreted as “score above halfway”. I don’t know if that’s meaningful, but it’s crisp.

Let’s try to make a good $5$-hat strategy with this. We know that combining a $4$ and $2$ hat strategy doesn’t work (we get a score of $4 \cdot 178 + (10 - 8) = 714$). How about $3$ and $3$? We’d get $16 \cdot 44 + (44 - 32) = 716$. That works!

For completeness’s sake, let’s check out $(2, 4)$. The score would be $64 \cdot 10 + (178 - 128) = 710$. Not great, which kind of makes sense. Front-loading the $2$-hat strategy, which is worse than the $4$-hat strategy, is a bad idea.

Using this idea, we can construct strategies with scores of $B_N$ for all $N$.

For $N = 1, 2, 3$ we have explicit examples.
For even $N$, extend an $(N-1)$-hat strategy by the optimal $2$-hat strategy. This has a score of $4 B_{N-1} + (10 - 8) = B_N$.
For odd $N$, extend an $(N-2)$-hat strategy by the optimal $3$-hat strategy. This has a score of $16 B_{N-2} + (44 - 32) = 16 B_{N-2} + 12$. This is $4 B_{N-1} + 4 = B_N$.

Final Thoughts

Okay, we’ve defined a series $B_N$, and shown we can construct strategies for $N$-hat games with a score of $B_N$.

If $p_\infty = 7/10$, then we know that $B_N$ is an upper bound on our possible scores, which makes the strategies described above optimal. And conversely, if these finite strategies are optimal, then we can prove $p_\infty = 7/10$.

I don’t quite have a proof figured out, because there’s some measurability criterion I’m missing, but the gist of it is: it should be the case that an infinite strategy can be approximated arbitrarily well by an $N$-hat strategy, as long as we allow $N$ to be large. If $p_\infty$ were larger than $7/10$, we’d be able to find a finite strategy with success rate higher than $7/10$. But $B_N / 4^N$ is always less than $7/10$:

$$ B_N = \left \lfloor \frac{7}{10} (4^N - 4) + 2 \right \rfloor = \left \lfloor \frac{7}{10} 4^N - \frac{8}{10} \right \rfloor $$

So proving one of these two claims is sufficient for proving the other. Unfortunately, I can’t prove either one of them.

I’ve proven via computer search that the finite strategies described up to $N = 8$ are optimal, which is reassuring, but certainly not a proof.

Interestingly enough, it doesn’t seem to matter if we restrict ourselves to symmetric strategies. We seem to get just as successful strategies even when we’re limited like that.

One possible way to prove an upper bound is to show some kind of relation between a given $N$-hat strategy, and an $(N-1)$-hat strategy derived from it. The difficult part here is that unless you remove a hat from both players at once, you end up in a situation where players have different numbers of hats, which I really don’t want to think about.

But I do want to throw a computer at it!

I wrote up a “relaxation” algorithm, that starts with a random strategy for Alice, computes Bob’s best response to it, then Alice’s best response to that, and so on, until we hit a fixed point. Repeating this over and over again gave the following table of scores:


	$1$	$2$	$3$	$4$	$5$	$6$	$7$	$8$	$9$	$10$
$1$	$2$	$4$	$8$	$16$	$32$	$64$	$128$	$256$	$512$	$1024$
$2$	$4$	$10$	$20$	$40$	$80$	$160$	$320$	$640$	$1280$	$2560$
$3$	$8$	$20$	$44$	$88$	$176$	$352$	$704$	$1408$	$2816$	$5632$
$4$	$16$	$40$	$88$	$178$	$356$	$712$	$1424$	$2848$	$5696$	$11392$
$5$	$32$	$80$	$176$	$356$	$716$	$1432$	$2864$	$5728$	$11456$	$22912$
$6$	$64$	$160$	$352$	$712$	$1432$	$2866$	$5732$	$11464$	$22928$	$45856$
$7$	$128$	$320$	$704$	$1424$	$2864$	$5732$	$11468$	$22936$	$45872$	$91744$
$8$	$256$	$640$	$1408$	$2848$	$5728$	$11464$	$22936$	$45874$	$91748$	$183496$
$9$	$512$	$1280$	$2816$	$5696$	$11456$	$22928$	$45872$	$91748$	$183500$	$367000$
$10$	$1024$	$2560$	$5632$	$11392$	$22912$	$45856$	$91744$	$183496$	$367000$	$734002$

It seems to follow a… pattern? Not a nice pattern, but a pattern. Say you have $a_{m,n}$, where $m > n$. Then:

To step “away from the diagonal”, i.e., to $a_{m+1,n}$, then you just double the score.
To step “toward the diagonal”, i.e., to $a_{m, n+1}$, then you double and add $2^k$, where $k$ is $m - 1 - 2 \lfloor n / 2 \rfloor$.
- In other words, $k$ goes $m-1, m-1, m-3, m-3, m-5, m-5, \ldots$, until it ends in either $2, 2$ or $3, 3, 1$, at which point we arrive at the diagonal itself.

No idea if that’s helpful, or can be cleaned up into anything nice.

The Dehn Invariant, or, Tangrams In Space

2020-03-30T00:00:00-07:00

$\newcommand{\ZZ}{\Bbb Z} \newcommand{\QQ}{\Bbb Q} \newcommand{\RR}{\Bbb R}$

Fans of wooden children’s toys may remember tangrams, a puzzle composed of 7 flat pieces that can be rearranged into numerous different configurations.

As mathematicians, we’re interested in shapes that are slightly simpler than cats or houses.

For example, we might try to design a set of tangrams that can be rearranged into an equilateral triangle. One possibility is shown below.

How about a pentagon?

We don’t have to start with a square, how about a set that can become a star or a triangle?

What pairs of polygons can we design tangram sets for? One way to reframe this problem is in terms of scissors-congruence, which is pretty much what it sounds like. Two polygons are “scissors-congruent” if we can take the first polygon, make a finite number of straight-line cuts to it, and rearrange the pieces into the second polygon. Clearly, two polygons are scissors-congruent if and only if we can design a set of tangrams that connect the two.

Given two polygons, how can we tell if they’re scissors-congruent?

One thing we can do is check their areas, since, if they have different areas, there’s no way they can be scissors-congruent. It turns out that this is the only obstacle – if two polygons have the same area, they must be scissors-congruent!

This surprising result is known as the Wallace–Bolyai–Gerwien theorem, and was proven in the 1830s. We’ll walk through a proof.

It suffices to show that any polygon of area $A$ is scissors-congruent to an $A \times 1$ rectangle. This is because, if $P_1$ and $P_2$ are scissors-congruent to some third shape $Q$, then we can rearrange $P_1$ into $P_2$ by going through $Q$ as an intermediate step. We start by breaking our polygon into triangles:

Next, we’ll transform each triangle into a rectangle, by cutting it halfway up its height, and folding down the apex:

Now we need to change the dimensions of this rectangle, but this step requires some creativity. We need the height of the rectangle to be between $1$ and $2$. If it isn’t, we can repeatedly cut it in half until it does. (If the height is less than $1$, then we run this process in reverse to double it instead.)

Then, we do a sliding maneuver to convert this rectangle into one with height $1$. Notice that we need $u < 1$, or else $u \ell$ would be greater than $\ell$, and we couldn’t draw this diagram.

After doing this to all the triangles, the final step is to glue all these rectangles together, end-to-end, to get the desired $A \times 1$ rectangle.

The natural question to ask next is: can we generalize this? What about 3D shapes? Are any two polyhedra of equal volume also scissors-congruent?

This is the third of Hilbert’s twenty-three problems, and his student, Max Dehn, proved in 1903 that, unlike in two dimensions, the answer is “no”. He did so by constructing a quantity (now known as the “Dehn invariant”) that stays unchanged under scissors-congruence. Two shapes with different Dehn invariants, therefore, cannot be scissors-congruent. For example, a cube and a tetrahedron of equal volume are not scissors-congruent.

Unlike area and volume, the Dehn invariant isn’t as simple as a real number, and we’ll need to do a bit of legwork to define it. The key observation to make is that a cut can only do one of three things to an edge:

miss it completely
cut it at a point
split it along its entire length

By looking at what these operations do to edges, we can cobble together a quantity that stays invariant.

In the first situation, the edge stays unchanged. That one’s easy.

In the second situation, one edge is turned into two edges. The new edges have the same dihedral angle¹ as the original, and their lengths sum to the original length.

In the third situation, we again get two edges, but this time, the length stays the same, and the dihedral angle changes.

Lastly, cuts also create new edges, as they slice through a face. We’d like these to count for nothing, count as zero.

Now that we know what cuts do to edges, how do we use this to define an invariant? If an edge is represented by the ordered pair $(\ell_i, \theta_i)$, we want to enforce the following equivalence relations:

$$ (\ell_1 + \ell_2, \theta) \cong (\ell_1, \theta) + (\ell_2, \theta) \qquad (\ell, \theta_1 + \theta_2) \cong (\ell, \theta_1) + (\ell, \theta_2) $$

These two rules imply some further relations. Consider the sum of $n$ copies of $(\ell, \theta)$. Applying the first rule repeatedly gives $(n \ell, \theta)$, and the second rule gives $(\ell, n \theta)$. This can be extended to negative $n$ as well, so for any integer $n$,

$$ n (\ell, \theta) = (n \ell, \theta) = (\ell, n \theta) $$

If you’re familiar with tensors, you might notice that these are exactly the conditions for a tensor product! If not, don’t worry, you can think of these as ordered pairs still, but we’ll use the symbol $\otimes$ instead of a comma. It may make more sense when we go through the examples.

We still have to deal with the new edges created from cuts in the faces, but these almost resolve themselves. The edges we create come in pairs with supplementary angles. So if the edge pair we create has length $\ell$, we get $(\ell, \theta) + (\ell, \pi - \theta) = (\ell, \pi)$. Using the third rule above, we can drag a $2$ from the left to the right, giving us $(\ell/2, 2\pi)$. If we declare that $2\pi$ is equivalent to $0$ (a reasonable demand, given that we’re working with angles), then these edge pairs automatically cancel each other out, as desired.

We can now define the Dehn invariant: it takes values in $\RR \otimes_\ZZ \RR/2 \pi$ (lengths and angles), and it’s equal to the sum of $\ell_i \otimes \theta_i$ over all the edges. Is something that concise truly unchanged by scissors-congruence?

When we make a cut, either it misses an existing edge, and so the corresponding term in the sum does not change, or it intersects it, in which case that term is replaced by two terms that sum to the original. It also creates new edges, by cutting into the faces. But as we saw earlier, these edges come in pairs that sum to zero, and so the total value of the invariant remains unchanged.

Armed with this invariant, we can now answer the question: are the cube and the tetrahedron are scissors-congruent? Let’s say both have volume 1. The cube has 12 edges, each with dihedral angle $\pi / 2$. To get the volume to be $1$, we need edges of length $1$, so the Dehn invariant of this cube is:

$$ 12 (1 \otimes \frac{\pi}{2}) = 3 (1 \otimes 2 \pi) = (3 \otimes 2 \pi) = 0 $$

A tetrahedron has 6 edges, each with dihedral angle $\arccos(1/3)$. The volume of a tetrahedron with side length $a$ is $a^3 / 6 \sqrt 2$, so the side length of our tetrahedron needs to be $a = (72)^{1/6}$, making the Dehn invariant equal to:

$$ 6 (a \otimes \arccos(1/3)) = 6 a \otimes \arccos(1/3) $$

With some knowledge of modules, one can show that this is non-zero, ² but the crux of the idea is that $\arccos(1/3)$ is not a rational multiple of $\pi$, so we can never get the right hand side of this tensor to collapse to zero. This shows that no matter how many pieces you cut it into, a cube can never be reassembled into a tetrahedron.

One interesting consequence of this: in geometry class, you probably saw some cut-and-paste constructions for proving the area of a parallelogram, or a triangle. This result shows there can never be such a proof for pyramids – calculus is unavoidable!

A final note: we’ve shown that there are at least two obstructions for two scissors-congruence in 3D: volume and Dehn invariant. Are they the only ones? The answer is yes! In other words, if two polyhedra do have the same volume and Dehn invariant, then they are indeed scissors-congruent. The proof of that is much harder, and a good presentation can be found here.

The dihedral angle of an edge is the angle between the two faces adjacent to it. You can think of it as a measure of the ‘sharpness’ of an edge; a 90° edge is like the edge of a countertop, but a 15° edge will cut like a knife. ↩
First, note that for any rational $p/q$, we have $\ell \otimes \frac{p}{q} \pi = \frac{\ell}{2q} \otimes 2 p \pi = 0$. This means that $\RR \otimes_\ZZ \RR/2\pi \cong \RR \otimes_\ZZ \RR/(2\pi\QQ)$. Since both of those modules are divisible, this is equal to $\RR \otimes_\QQ \RR/(2 \pi \QQ)$, which, being a tensor product of $\QQ$-vector spaces, is a $\QQ$-vector space itself. In particular, if $\ell \ne 0$ and $\theta \notin 2 \pi \QQ$, then $\ell \otimes \theta$ is a non-zero vector. ↩

The Mathematical Hydra

2019-09-29T00:00:00-07:00

Imagine you’re tasked with killing a hydra. As usual, the hydra is defeated when all of its heads are cut off, and whenever a head is cut off, the hydra grows new ones.

However, this mathematical hydra is much more frightening than a “traditional” one. It’s got a tree-like structure – heads growing out of its heads – and it can regrow entire groups of heads at once! Can you still win?

Also, this post is the first one with interactivity! Feel free to report bugs on the GitHub issues page.

For the purposes of our game, a hydra is a rooted tree. The root, on the left, is the body, and the leaves are the heads. Intermediate nodes are part of the necks of the hydra, and cannot (yet) be cut off.

You can cut off one head at a time, and when you do, the hydra may grow more heads, according to the following rules:

If the head is connected directly to the root, then the hydra does nothing.
Otherwise, look at the parent node (the one directly underneath the one you just cut off). The hydra grows two new copies of that node and all its children, attaching them to the grandparent as appropriate.

This is hard to convey through text, so let’s walk through an example. Let’s start with a pretty simple hydra, and cut off one of the heads. (Purple indicates newly-grown heads.)

We used to have two heads, and four nodes total, but now we have three, and seven nodes. That’s not good. Let’s try chopping off another one.

This increases the total number of heads, but now, we can cut off the three smallest heads, one at a time, without incident.

We’ve made some visible progress now. Cutting off one of the remaining heads will reveal three more, but we can extinguish them easily.

Repeating this process on the last head will kill the hydra.

We managed to defeat this hydra, but it was a pretty small one. What about something a bit larger? Let’s add one more head to that neck.

This time, you can try to kill it yourself: the illustration below is interactive!

Depending on how persistent you are, you might not be surprised to learn that you can indeed kill this hydra, though it’ll take tens of thousands of moves to do so (29528 moves by my count). In fact, you can kill any hydra, though I’ll make no guarantees about how long it will take.

But what may be surprising is that you can’t avoid killing the hydra, even if you try. No matter how large the hydra, or what order you cut off its heads, you will always defeat it in a finite number of moves.

And even better, this holds true even for faster-regenerating hydras. What if, instead of growing back two copies of the subtree, the hydra grows back three copies? Or a hundred? What if, on the $N$th turn of the game, it grows back $N$ copies? $N^2$? $N!$? What if the hydra just gets to pick how many copies to regrow, as many as it wants?

It doesn’t matter.

You always win.

The proof here relies on ordinal numbers. If you’re not familiar, there’s a good video from Vsauce about them. The key property to know is that the ordinals are “well-ordered”; that is, there is no infinitely long descending sequence¹.

We assign an ordinal number to each hydra, in such a way that cutting off a head produces a hydra with a strictly smaller ordinal. As we play the hydra game, the sequence of hydras we encounter produces a corresponding sequence of ordinals. Since the ordinal sequence is strictly decreasing, it must eventually terminate, and so the hydra sequence must terminate as well. The only way that the hydra sequence can terminate is if we have no more heads to cut off; i.e., we’ve defeated the hydra.

The assignment is done by assigning values to the nodes, and accumulating down to the root:

A head is assigned $0$. Similarly, a trivial (dead) hydra is assigned $0$.
If a node has children with ordinals $\alpha_1, \alpha_2, \ldots, \alpha_n$, then we assign the ordinal $\omega^{\alpha_1} + \omega^{\alpha_2} + \cdots + \omega^{\alpha_n}$.²

What happens when we cut off a head?

If it’s directly attached to the body, then it contributes a term of $\omega^0 = 1$ to the whole ordinal. Killing this head removes this term, decreasing the ordinal.
Otherwise, consider the ordinal of that head’s parent and grandparent. Before we cut off the head, the ordinal of the parent must have been of the form $\alpha + 1$. This means the ordinal of the grandparent has a term $\omega^{\alpha + 1}$. When we cut off the head, the parent ordinal decreases to $\alpha$, but there’s now two more copies of it. This replaces the $\omega^{\alpha + 1}$ term in the grandparent with $3 \omega^\alpha$, which is strictly smaller. And because the rest of the tree remains unchanged, this means the ordinal assigned to the hydra as a whole also decreases.

To illustrate this process, let’s look the ordinals that correspond to the hydras we saw earlier. It may help to read them in reverse order.

We can also see why the hydra’s regeneration speed doesn’t matter. No matter how large $N$ is, as long as it’s finite, $\omega^{\alpha + 1}$ will be strictly larger than $N \omega^{\alpha}$.

One way to think about this is that a neck that forks at height $k+1$ is literally infinitely worse than a neck that forks at height $k$. By cutting off a head, you simplify it at height $k+1$, at the expense of introducing some forking at height $k$, which isn’t as bad.

A last interesting fact: this proof relied on ordinal numbers, which have a whole lot of infinities ($\omega$s) tied up in them. But everything in this hydra game is finite; from an initial hydra, there’s only finitely many hydras we can encounter, each of which has only finitely many heads. Is there a proof that avoids any mention of infinity?

In 1982, Laurence Kirby and Jeff Paris proved that there isn’t, in the following sense: any proof technique strong enough to prove the hydra’s eventual demise is strong enough to prove the consistency of Peano arithmetic. In particular, it’s impossible to prove the hydra theorem from within Peano arithmetic.

In fact, the ordinals are the prototype of every well-founded set, and this is what makes them important. ↩
Without loss of generality, we can relabel the subhydras so that the ordinals are non-strictly descending. This avoids problems coming from the non-commutativity of ordinal addition. ↩

Safes and Keys

2018-11-16T00:00:00-08:00

Here’s a few similar puzzles with a common story:

I have n safes, each one with a unique key that opens it. Unfortunately, some prankster snuck into my office last night and stole my key ring. It seems they’ve randomly put the keys inside the safes (one key per safe), and locked them.

We’ll play around with a few different conditions and see what chances we have of getting all safes unlocked, and at what cost.

1) The prankster was a bit sloppy, and forgot to lock one of the safes. What is the probability I can unlock all of my safes?

The key observation here, as with the subsequent problems, is to consider the arrangement of keys and safes as a permutation. Label the safes and keys $1$ to $n$, and define $\pi(i)$ to be the number of the key inside the $i$th safe. So, if we have key $1$, we unlock safe $1$ to reveal key $\pi(1)$.

Under this interpretation, key $i$ lets us unlock all safes in the cycle containing $i$; we open a safe, find a new key, track down the new safe, and repeat until we end up where we started. So, we want to know the probability that a randomly chosen permutation has exactly one cycle.

This isn’t too hard; we can count the number of one-cycle permutations in a straightforward way. Given a permutation of one cycle, we start with element $1$, we write out $\pi(1)$, $\pi(\pi(1))$, etc, until we loop back to $1$. This produces an ordered list of $n$ numbers, starting with $1$, and this uniquely determines the cycle. There are $(n-1)!$ such lists, and so the probability of having exactly one cycle is $(n-1)!/n! = 1/n$

2) Say the prankster is sloppier, and leaves k safes unlocked. Now what is my probability of success?

This one requires a little more thought. It’s tempting to consider permutations with $k$ cycles, but that’s not quite right. If there’s only one cycle, we’re sure to succeed, and furthermore, even if there are $k$ cycles, our success isn’t guaranteed: we could pick two safes in the same cycle.

By symmetry, label our safes so that we’ve picked safes $1$, $2$, …, $k$. We’d like to know how many permutations have a cycle that completely avoid $1$ through $k$. If, and only if, such a cycle is present, we fail to unlock all the safes.

Let $a_i$ be the number of “good” permutations when there are $i$ safes. We will express $a_n$ in terms of smaller $a_i$s, and solve the resulting recurrence relation.

Given a permutation $\pi$, we can split the set $\{ 1, \ldots n \}$ into two parts: those that have cycles intersecting $\{ 1, \ldots, k \}$, and those that do not. (It may help to think of these sets as “reachable” and “unreachable” safes, respectively). Since $\pi$ never sends a reachable safe to an unreachable one, or vice versa, it induces permutations on both these sets. Also, knowing both these subpermutations, we can reconstruct $\pi$. So, let’s count how many possible permutations there are on the reachable and unreachable sets.

If there are $r$ reachable safes, then there are $a_r$ possible permutations induced on the reachable set, and $(n-r)!$ induced on the unreachable one. (The reason we don’t get the full $r!$ on the reachable set is that some permutations would leave a safe unreachable, when it’s supposed to be reachable.) Furthermore, we have a choice of which safes are reachable. The first $k$ safes must be reachable, so beyond that, we have $\binom{n-k}{r-k}$ more choices to make. Our recurrence relation is then:

$$ n! = \sum_{r = k}^n \binom{n-k}{r-k} a_r (n-r)! = \sum_{r = k}^n a_r \frac{(n-k)!}{(r-k)!} $$

Since $(n-k)!$ doesn’t depend on $r$, we can pull it out to get a neater-looking form:

$$ \frac{n!}{(n-k)!} = \sum_{r=k}^n \frac{a_r}{(r-k)!} $$

Now $n$ only shows up as an index, not anywhere in the summand. This lets us collapse our sum; take this term, and subtract it from the corresponding one for $n-1$:

$$ \begin{align*} \frac{n!}{(n-k)!} - \frac{(n-1)!}{(n-1-k)!} &= \left( \sum_{r=k}^n \frac{a_r}{(r-k)!} \right) - \left( \sum_{r=k}^{n-1} \frac{a_r}{(r-k)!} \right) \\ \frac{n!}{(n-k)!} - \frac{(n-1)!}{(n-1-k)!} &= \frac{a_n}{(n-k)!} \\ n! - (n-1)!(n-k) &= a_n \\ k \cdot (n-1)! &= a_n \end{align*} $$

So there’s $k \cdot (n-1)!$ permutations in which we win. Since there’s $n!$ total, this gives our probability of success at $k/n$.

3) If the prankster is careful, and remembers to lock all the safes, then I have no choice but to break some of them open. What’s the expected number of safes I have to crack?

This one’s much easier than 2). The question here is just “how many cycles are there in a random permutation”, and from a previous post, we know that’s $H_n$, the $n$th harmonic number.

4) Putting it all together: if we start with $k$ safes unlocked, what’s the expected number of safes I have to crack open?

I haven’t actually put this one on solid ground yet! It’s not coming out pretty.

Ax-Grothendieck Theorem

2018-11-12T00:00:00-08:00

$\newcommand{\CC}{\Bbb C} \newcommand{\FF}{\Bbb F} \newcommand{\QQ}{\Bbb Q} \newcommand{\FFx}[1]{\overline{\FF_{#1}}} \newcommand{\ACF}{\mathbf{ACF}} \newcommand{\cL}{\mathcal{L}} \newcommand{\cT}{\mathcal{T}}$

The Ax-Grothendieck theorem is the statement:

Ax-Grothendieck Theorem

Let $f: \CC^n \to \CC^n$ be a polynomial map; that is, each coordinate $f_i: \CC^n \to \CC$ is a polynomial in the $n$ input variables. Then, if $f$ is injective, it is surjective.

This… doesn’t seem like a particularly exciting theorem. But it has a really exciting proof.

The idea behind the proof isn’t algebraic, it isn’t topological, it’s not even geometric, it’s ~~DiGiorno~~ model-theoretic!

The spirit of the proof is as follows:

if the theorem is false, then there is a disproof (a proof of the negation)
this proof can be written in “first-order logic”, a particularly limited set of axioms
because this proof is finitely long, and uses only first-order logic, it “can’t tell the difference” between $\CC$ and $\FFx{p}$ for large enough $p$
- note: $\FFx{p}$ is the algebraic closure of the finite field $\FF_p$
pick a large enough $p$, and transfer our proof to $\FFx{p}$; this won’t affect its structure or validity
show that there is, in fact, no counterexample in $\FFx{p}$
by contradiction, there is no disproof, and the theorem must be true

This is an… unusual proof strategy. I don’t usually think about my proofs as mathematical objects unto themselves. But that’s probably because I’m not a model theorist.

First, we’ll get the last step out of the way.

Proof: Let $f: \FFx{p}^n \to \FFx{p}^n$ be injective. Pick an arbitrary target $y_i \in \FFx{p}^n$ to hit. Let $K \supseteq \FF_p$ be the field extension generated by the $y_i$ and the coefficients that show up in $f$. Since all of these generators are algebraic over $\FF_p$, and there’s finitely many of them, $K$ is finite. Also, since fields are closed under polynomial operations, $f(K^n) \subseteq K^n$. But because $f$ is injective, and $K^n$ is finite, $f(K^n)$ must be all of $K^n$, i.e., there’s some input $x_i$ such that $f(x_i) = y_i$. Thus $f$ is surjective.

Now for the exciting stuff.

We have to figure out a way of taking proofs over $\CC$, and translating them into proofs over $\FFx{p}$. This is daunting, but it’s made easier by the fact that they are both algebraically closed fields, and so they have a shared pool of axioms. Of course, they are very different in other ways: $\CC$ is uncountable while $\FFx{p}$ is countable, they have different characteristic, etc. We have to show that our proof manipulations aren’t affected by these differences.

Since this isn’t an intro to model theory post, I won’t be defining the basic terms. If these look unfamiliar, check out this post.

Let $\ACF$ be the theory of algebraically closed fields. We claim that it’s first-order, and it’s almost complete.

This is a theory in the language of rings, which is $\cL_{ring} = \{ +, \times, 0, 1 \}$. Our axioms are:

the usual field axioms (these are all first-order)
for each $d \ge 1$, add the sentence $\forall a_0 \forall a_1 \cdots \forall a_d \exists x \ a_0 + a_1 x + \cdots a_d x^d = 0 \land a_d \ne 0$
- this are first-order sentences, and together, they tell us that every non-constant polynomial has a root

So $\ACF$ is a first-order theory. It isn’t complete, of course. For example, the sentence $1 + 1 = 0$ is true in $\FFx{2}$, but not in $\FFx{3}$ or $\CC$. Turns out fields of different characteristic are… different. No surprise there.

So we define extensions of $\ACF$, where we do specify the characteristic. For a prime $p$, define $S_p$ to be the sentence $1 + \cdots + 1 = 0$, where there are $p$ copies of $1$. Then the theory of algebraically closed fields of characteristic $p$ is $\ACF_p = \ACF \cup \{ S_p \}$.

What about characteristic $0$? To force our field to have characteristic zero, we can throw in $\lnot S_p$ for all primes $p$: $\ACF_0 = \ACF \cup \{ \lnot S_2, \lnot S_3, \lnot S_5, \ldots \}$. This nails down exactly the algebraically closed fields of characteristic $0$.

We claim that $\ACF_0$ and $\ACF_p$ are complete theories.

If that is indeed the case, then we can prove a stronger form of the Ax-Grothendieck theorem.

Ax-Grothendieck Theorem (Stronger)

Let $k$ be an algebraically closed field. If $f: k^n \to k^n$ is a polynomial map, then if $f$ is injective, it is surjective.

Proof: We start by breaking our claim into a number of first-order sentences. We can’t first-order define an arbitrary polynomial, so we’ll work with all polynomials of bounded degree. For a fixed $d$, the sentence “for all polynomial maps $f$ of degree at most $d$, injectivity of $f$ implies surjectivity of $f$” can be expressed as a first-order sentence.

First, introduce $n \cdot (d+1)$ variables for the coefficients of $f$. The sentence “$f$ is injective” can be made first-order by taking $f(x) = f(y) \implies x = y$ and expanding out the coefficients of $f$. Likewise, “$f$ is surjective” can be written as $\forall z \exists x \ f(x) = z$, and expanding $f$.

As an example, if $n = 1, d = 2$, our sentence is:

$$ \forall a_0 \forall a_1 \forall a_2 \ (\forall x \forall y \ a_2 x^2 + a_1 x + a_0 = a_2 y^2 + a_1 y + a_0 \implies x = y) $$

$$ \implies \forall z \exists x \ a_2 x^2 + a_1 x + a_0 = z $$

Since I literally never want to write out that sentence in the general case, let’s just call it $\phi_d$.

We’ll separately tackle the case of characteristic $p$ and characteristic $0$.

Let $p$ be any prime. Because $\ACF_p$ is complete, either there is a proof of $\phi_d$ or a proof of $\lnot \phi_d$. The latter is impossible; if there were such a proof, then it would show that $\phi_d$ is false in $\FFx{p}$, and we’ve proven before that it is true in this field. Therefore, $\ACF_p$ entails a proof of $\phi_d$.

Similarly, because $\ACF_0$ is complete, either it can prove $\phi_d$, or it can prove $\lnot \phi_d$. Again, for the sake of contradiction, we assume the latter. Let $P$ be a proof of $\phi_d$ from $\ACF_0$. Since $P$ is finite, it can only use finitely many axioms. In particular, it can only use finitely many of the $\lnot S_p$. So there’s some prime $q$ such that $\lnot S_q$ was not used in $P$. Therefore, $P$ is also a valid proof in $\ACF_q$. But we already know there are no proofs of $\lnot \phi_d$ from $\ACF_q$, and so we’ve reached a contradiction. Therefore, there must be a proof of $\phi_d$ from $\ACF_0$.

Since $\ACF_p$ can prove $\phi_d$, and $\ACF_0$ can prove $\phi_d$, we know that $\phi_d$ is true in all algebraically closed fields $k$, no matter what the characteristic of $k$ is. And since $\phi_d$ is true for all $d$, we have proved the claim for polynomials of arbitrary degree.

This proof is magical in two ways.

One is that, despite there being no homomorphisms between $\FFx{p}$ and $\CC$, we were able to somehow transport a claim between the two. This was possible not by looking at the structure of $\CC$ and $\FFx{p}$ themselves, but by using the structure of their axiomatizations. The reduction to only finitely many axioms is an example of the compactness theorem, a very useful logical principle.

The other is that we never actually made use of $\phi_d$! All we knew is that it was a first-order sentence, and that it was true in some model of $\ACF_p$ for each $p$. Generalizing this argument, we get the following principle:

Robinson's Principle

If $\phi$ is a first-order sentence, then the following are equivalent:

$\ACF_p$ proves $\phi$ for all but finitely many $p$
$\ACF_p$ proves $\phi$ for infinitely many $p$
$\ACF_0$ proves $\phi$

Furthermore, the following are equivalent for $r$ a prime or $0$:

$\ACF_r$ proves $\phi$
$\phi$ is true in some algebraically closed field of characteristic $r$
$\phi$ is true in all algebraically closed fields of characteristic $r$

For the first claim, obviously (1) implies (2). The proof that (2) implies (3) is essentially the proof we gave above: if $\phi$ can’t be proved from $\ACF_0$, then $\lnot \phi$ can. This proof can only use finitely many of the $\lnot S_p$, and there’s infinitely many $\ACF_p$ that prove $\phi$, so there’s some $p$ we can transfer the proof to and get our contradiction. The proof that (3) implies (1) is similar: if there’s a proof of $\phi$ from $\ACF_0$, it can be transferred to all but finitely many $\ACF_p$.

The second claim is a direct consequence of completeness of $\ACF_r$.

Combining these two claims gives some very powerful techniques. The way we used it is: to show something is true for all algebraically closed fields, it suffices to show it only for a single example at each prime $p$.

At this point, there is no more spooky magic, and the rest of the article is about justifying the completeness of $\ACF_p$ and $\ACF_0$. Still cool though, IMO.

First, we’ll state a popular theorem in model theory:

Löwenheim–Skolem Theorem

Let $\cT$ be a countable theory. If it has an infinite model, then for any infinite cardinal $\kappa$, it has a model of size $\kappa$.

Essentially, first-order logic is too limited to distinguish between different sizes of infinity; if there’s a model of one infinite size, there’s a model of all infinite sizes. The proof of this theorem is somewhat involved, and we won’t cover it here, but see here for a proof.

Using this, we can prove the Łoś–Vaught test:

Łoś–Vaught Test

Let $\cT$ be a theory and $\kappa$ be some infinite cardinal. We say that $\cT$ is $\kappa$-categorical if there is exactly one model of $\cT$ of size $\kappa$, up to isomorphism.

If $\cT$ is $\kappa$-categorical for some $\kappa$, and has no finite models, then it is a complete theory.

This is unexpected, at least in my opinion. But then again, model theory isn’t my forte. Maybe there’s some intution one can use here that I don’t have.

Proof: If $\cT$ isn’t complete, then there’s some $\phi$ such that $\cT$ proves neither $\phi$ nor $\lnot \phi$. By the completeness theorem, this means there’s a model $M$ of $\cT$ in which $\phi$ is true, and a model $M'$ of $\cT$ in which $\lnot \phi$ is true.

Since all models of $\cT$ are infinite, both $M$ and $M'$ are infinite. This means that $M$ is an infinite model of $\cT \cup \{ \phi \}$, thus we can apply Löwenheim–Skolem to get a model $N$ of $\cT \cup \{ \phi \}$ which has size $\kappa$. Likewise, we use $M'$ to get a model $N'$ of $\cT \cup \{ \lnot \phi \}$ which has size $\kappa$. But because $\cT$ is $\kappa$-categorical and both $N$ and $N'$ are models of $\cT$, they must be isomorphic. But because $\phi$ is true in $N$ and false in $N'$, this is a contradiction.

We’d like to apply the Łoś–Vaught test to $\ACF_p$ and $\ACF_0$. Since all algebraically closed fields are infinite, it suffices to show that these theories are $\kappa$-categoral for some $\kappa$.

Proof: Let $\kappa$ be an uncountable cardinal and $K$ be an algebraically closed field of size $\kappa$. Let $B$ be a transcendence basis of $K$ over its prime subfield $k$ ($\FF_p$ or $\QQ$). A cardinality argument shows that $\|B\| = \kappa$ (this is where the uncountability of $\kappa$ is used; for example, $\overline{\QQ}(t_1, \ldots, t_n)$ has transcendence degree $n$, but cardinality $\aleph_0$). So, if $K'$ is another algebraically closed field, with the same cardinality and characteristic, and we pick a transcendence basis $B'$, it will also have cardinality $\kappa$. The bijection between $B$ and $B'$ induces an isomorphism between $k(B)$ and $k(B')$. But since $K$ and $K'$ are algebraically closed, and algebraic over $k(B) \cong k(B')$, they are algebraic closures of the same field, and are thus isomorphic!

This proves that $\ACF_p$ and $\ACF_0$ are $\kappa$-categorical for uncountable cardinals $\kappa$. In particular, they’re $\kappa$-categorical for at least one infinite cardinal, and so via the Łoś–Vaught test, we conclude they are complete.

Wedderburn's Little Theorem

2018-11-05T00:00:00-08:00

$\newcommand{\ZZ}{\Bbb Z} \newcommand{\QQ}{\Bbb Q}$

Some rings are closer to being fields than others. A domain is a ring where we can do cancellation: if $ab = ac$ and $a \ne 0$, then $b = c$. Even closer is a division ring, a ring in which every non-zero element has a multiplicative inverse. The only distinction between fields and division rings is that the latter may be non-commutative. For this reason, division rings are also called skew-fields.

These form a chain of containments, each of which is strict: fields $\subset$ division rings $\subset$ domains $\subset$ rings

Some examples:

$\ZZ$ is a domain
$\ZZ/6\ZZ$ is not a domain
the set of $n \times n$ matrices is not a domain; two non-zero matrices can multiply to zero
$\QQ$ is a field (duh)
the quaternions are a division ring

Wedderburn’s theorem states that this hierarchy collapses for finite rings: every finite domain is a field.

First, we show that every finite domain is a division ring.

Let $D$ be a finite domain, and $x \in D$ be non-zero. The map $f : D \to D$ given by $f(d) = xd$ is injective, which we get immediately from the definition of a domain. Because $D$ is finite, $f$ injective implies that $f$ is surjective as well. This means there’s some $y$ such that $f(y) = xy = 1$. This makes $y$ a right-inverse of $x$; is it also a left-inverse? Yes! Since $x = 1x = xyx$, cancellation gives us $1 = yx$.

The next step, showing that every finite division ring is a field, is significantly trickier. We’ll continue, knowing that $D$ is a division ring.

Our plan is to re-interpret $D$ as a vector space, to get some information about its size. Then, we’ll drop the additive structure, and apply some group theory to the multiplicative structure. Lastly, our result will be vulnerable to some elementary number theory.

Let $Z$ be the center of $D$; the set of elements that commute multiplicatively with everything in $D$. The distributive law tells us that $Z$ is an abelian group under addition, and by definition, $Z^*$ is an abelian group under multiplication. This makes $Z$ a field, which allows us to apply some linear algebra to the problem.

As with field extensions, a division ring containing a field is a vector space over that field; specifically, $D$ is a vector space over $Z$, where vector addition is addition in $D$, and scalar multiplication is multiplication by an element of $Z$. This gives us some information about the size of $D$. If $Z$ has size $q$, and $D$ has dimension $n$ over $Z$, then $D$ has size $q^n$.

Let’s look at some linear subspaces of $D$ (as a vector space). For an element $x \in D$, let $C(x)$ be the set of all elements that commute with $x$ (this is the centralizer of $x$). We claim that this is a subspace of $D$. It’s clearly closed under addition, and we claim it is also closed under scalar multiplication. If $y \in C(x)$ and $z \in Z$, then it follows quickly that $(zy)x = x(zy)$, i.e., $zy \in C(x)$.

Because $C(x)$ is a linear subspace, it has dimension $q^k$ for some $1 \le k \le n$. And if $x \notin Z$, we know that both these inequalities are strict. If $k = n$, then $C(x) = D$, and $x$ is in fact in the center. If $k = 1$, then $C(x) = Z$, and since $x \in C(x)$ for sure, $x$ is again in $Z$.

Now we can apply some group theory. The class equation is a statement about the conjugacy classes of a group. The details are best saved for another post, but if we have a group $G$ with center $Z(G)$, and $g_1, \ldots, g_r$ are distinct representatives of the non-trivial conjugacy classes, then

$$ |G| = |Z(G)| + \sum_{i=1}^r [G : C(g_i)] $$

Essentially, this comes from the fact that $[G : C(g_i)]$ is the number of conjugates of $g_i$, and that the conjugacy classes partition $G$.

If we apply this to $D^*$, and remember our observation about the size of $C(x)$, then we get:

$$ q^n - 1 = (q - 1) + \sum_{i=1}^r \frac{q^n - 1}{q^{k_i} - 1}, \, 1 < k_i < n $$

We claim that this can only happen when $n = 1$; i.e., when $Z = D$. This would prove that $D$ is a field! From here on out, it’s all number theory.

First, we claim that each $k_i$ divides $n$. Let $n = a k_i + b$ be the result of division with remainder. Since $(q^n - 1)/(q^{k_i} - 1)$ is the index of some $C(x)$, it’s an integer, so $q^{k_i} - 1$ divides $q^n - 1$, or equivalently, $q^n \equiv 1 \pmod{q^{k_i} - 1}$. Substituting $n = a k_i + b$, we get that $q^b \equiv 1 \pmod{q^{k_i} - 1}$. But since $b < k_i$, $q^b - 1 < q^{k_i} - 1$, and so we must have that $q^b - 1 = 0$; i.e., that $b = 0$. (Here, we quietly used the fact that $q > 1$.) Therefore, $k_i$ divides $n$.

For the next step, we’ll need to introduce the cyclotomic polynomials $\Phi_k(x)$. They have three properties in particular that are of interest to us:

they are monic and have integer coefficients
for any $m$, the polynomial $x^m - 1$ factors as $\prod_{k \mid m} \Phi_k(x)$
the roots of $\Phi_k(x)$ are exactly the primitive $k$th roots of unity

The second fact tells us that $\Phi_n(x)$ is a factor of $x^n - 1$, but also, that it is a factor of $(x^n - 1)/(x^{k_i} - 1)$ – the denominator cancels out out some of the $\Phi_k(x)$, but $\Phi_n(x)$ is left intact, since $k_i < n$.

Since the quotients $\frac{x^n - 1}{\Phi_n(x)}$ and $\frac{(x^n - 1)/(x^{k_i} - 1)}{\Phi_n(x)}$ are products of cyclotomic polynomials, each of which is monic with integer coefficients, then they are also monic with integer coefficients. Therefore, if we plug in $x = q$, we will get an integer. This means that the integer $\Phi_n(q)$ divides the integers $q^n - 1$ and $(q^n - 1)/(q^{k_i} - 1)$. Note that we had to work for this; it’s not an immediate consequence of divisibility as polynomials. For example, consider $p(x) = x + 3$ snd $q(x) = x^3 + 3x^2 - x/4 - 3/4$. While $p(x)$ divides $q(x)$ as polynomials, $p(1) = 4$ does not divide $q(1) = 3$.

Now, returning to the class equation, we’ve shown that most of the terms are divisible by the integer $\Phi_n(q)$, so the only leftover term, $q - 1$, is also divisible by $\Phi_n(q)$. We claim this is only possible if $n = 1$, which would then give us our desired result.

Therefore, $n = 1$, which forces $Z = D$, and thus $D$ to be commutative; hence, a field. Q.E.D!

Sylow Theorems

2018-10-29T00:00:00-07:00

$\newcommand{\ZZ}{\Bbb Z} \DeclareMathOperator{\Stab}{Stab} \DeclareMathOperator{\Fix}{Fix} \DeclareMathOperator{\Aut}{Aut} \DeclareMathOperator{\sgn}{sgn}$

In group theory, the Sylow theorems are a triplet of theorems that pin down a suprising amount of information about certain subgroups.

Lagrange’s theorem tells us that if $H$ is a subgroup of $G$, then the size of $H$ divides the size of $G$. The Sylow theorems give us some answers to the converse question: for what divisors of $|G|$ can we find a subgroup of that size?

For a group $G$, and a prime $p$, and $n$ be the largest integer such that $p^n$ divides $|G|$. A $p$-subgroup of $G$ is a subgroup of order $p^k$, and if it has order $p^n$, then it is called a Sylow $p$-subgroup. Under these definitions, the Sylow theorems are:

Sylow Theorems

Every $p$-subgroup is contained in a Sylow $p$-subgroup. As such, Sylow $p$-subgroups exist.
All Sylow $p$-subgroups are conjugate to each other.
Let $n_p$ be the number of Sylow $p$-subgroups, and $m = |G|/p^n$. Then the following hold:
- $n_p$ divides $m$
- $n_p \equiv 1 \bmod p$
- $n_p = [G : N(P)]$, where $N(P)$ is the normalizer of any Sylow $p$-subgroup.

These are rather technical and deserve some more thorough digestion. Sylow 1 tells us that maximal $p$-subgroups are as big as possible; there is no obstruction preventing them from being the full $p^n$.

Sylow 2 tells us that all Sylow $p$-subgroups are isomorphic in a very strong way; there is a conjugation of the group sending them to each other. To see how this is a strong criterion, consider a non-example. Let $G = \ZZ_4 \times \ZZ_2$, and pick out the subgroups $H_1 = \{ (0, 0), (2, 0) \}$ and $H_2 = \{ (0, 0), (0, 1) \}$. It’s clear that $H_1$ and $H_2$ are isomorphic, but they are not conjugate. This manifests in $G/H_1 = \ZZ_2 \times \ZZ_2$ and $G/H_2 = \ZZ_4$ not being isomorphic.

Sylow 3 is the easiest to understand; it just puts some arithmetic criteria on $n_p$. For small-ish groups, this is often enough to nail down $n_p$ exactly!

On to the proofs!

Lemma

First let’s establish a lemma we’ll use frequently.

Lemma

If $G$ is a $p$-group, and it acts on a set $X$, then $|X| \equiv |\Fix(X)| \bmod p$, where $\Fix(X)$ is the set of points in $X$ that are fixed by every $g \in G$.

Proof: Let $x_1, \ldots, x_k$ be representatives for the $G$-orbits of $X$. We know that the sum of the sizes of the orbits is $|X|$. If $x_i$ is a fixed point, then the orbit is of size $1$. If it is not, then by orbit-stabilizer, the size of the orbit is $[G : \Stab(x_i)]$, which is divisible by $p$. Thus, mod $p$, every fixed point contributes $1$, and everything else in $X$ contributes $0$.

Sylow 1

Given a $p$-subgroup $H$, we show that, if it is not already maximal, we can find a $p$-subgroup $H' \supset H$ that is $p$ times bigger. Repeating this process gives us a Sylow $p$-subgroup containing our original $H$. Since the trivial subgroup is a $p$-subgroup, this also establishes the existence of Sylow $p$-subgroups!

Let $H$ be a $p$-group that is not maximal, i.e., it has order $p^i$, where $i < n$. There is a natural action of $H$ on the left coset space $G/H$, and since $H$ is a $p$-group, our lemma tells us that $|G/H|$ is equivalent to the number of fixed points mod $p$. But since $i < n$, $G/H$ has order divisible by $p$. So the number of fixed points of this action is also divisible by $p$.

What do fixed points of this action look like? If $gH$ is a coset fixed by $h \in H$, then $hgH = gH$, i.e., $g^{-1} h g \in H$. If this is true for all $h$, then $g$ lies in the normalizer of $H$. The converse is also true, since these implications were all reversible. This means that $N(H)$ is composed of the cosets of $H$ that are fixed points.

Combining the two observations above, we conclude that $[N(H) : H]$ is divisible by $p$. Therefore, by Cauchy’s theorem, there’s some subgroup of order $p$ in $N(H)/H$. Lifting this subgroup to $N(H)$, we get a subgroup of size $p \cdot |H| = p^{i+1}$. This is the $H'$ we were looking for.

Sylow 2

Let $P$ and $Q$ be two Sylow $p$-subgroups of $G$. We want to show they are conjugate.

There is a natural action of $P$ on $G$ by multiplication, and this descends to an action of $P$ on $G/Q$ (again, left coset space). From our lemma, the number of fixed points of this action is equivalent to $|G/Q|$, mod $p$. But since $Q$ is a Sylow $p$-subgroup, $|G/Q|$ is not divisible by $p$. This means that the number of fixed points cannot be zero; i.e., there is at least one fixed point for this action. This is some $gQ$ such that $pgQ = gQ$ for all $p \in P$. Or, rearranging the terms, a $g$ such that $g^{-1}pg \in Q$ for all $p \in P$. Since $P$ and $Q$ are the same size, being Sylow $p$-subgroups, this means that $g^{-1}Pg = Q$, and so they are indeed conjugate.

Sylow 3

Let $P$ be a particular Sylow $p$-subgroup, and let it act on the set of all Sylow $p$-subgroups by conjugation. We claim that $P$ is the only fixed point of this action. This would, by our lemma (we’re getting so much mileage out of this baby), instantly tell us that $n_p \equiv 1 \bmod p$.

Consider some fixed point $Q$. Then for any $p \in P$, $p^{-1}Qp = Q$, which means that $P$ lies in the normalizer of $Q$. Since both $P$ and $Q$ are Sylow $p$-subgroups of $G$, they are both Sylow $p$-subgroups of $N(Q)$. By Sylow 2, they must be conjugate, but since $Q$ is normal in $N(Q)$, it’s not going anywhere under conjugation. Thus $Q$ must equal $P$.

Next, we show that $n_p = [G : N(P)]$. Consider the action of $G$ by conjugation on the set of Sylow $p$-subgroups. There’s only one orbit, because of Sylow 2, and by orbit-stabilizer, it has size $[G : \Stab(P)]$. But the stabilizer of $P$ is just the normalizer, so $n_p = [G : N(P)]$, as desired.

Lastly, since $m = [G : P] = [G : N(P)] [N(P) : P]$, we get that $n_p$ divides $m$ for free.

Applications

Cool! These are nice theorems, how do we put them to use? Let’s look at some example applications.

Show that $\ZZ_{35}$ is the only group of size $35$.

Let $G$ be a group of size $35$. We’ll consider its Sylow $5$ and $7$-subgroups. By Sylow 3, we know that $n_5 \equiv 1 \bmod 5$, and divides $7$. This means it’s gotta be $1$, which means $G$ has a normal subgroup of size $5$. Likewise, $n_7 \equiv 1 \bmod 7$, and divides $5$, so $G$ has a normal subgroup of size $7$ as well. They intersect trivially, since their sizes are relatively prime, so $G$ is a direct product of these groups. Therefore, $G \cong \ZZ_5 \times \ZZ_7$, which is $\ZZ_{35}$.

Classify all groups of order $105$.

Let $G$ be a group of order $105$. First, we show that it has normal Sylow $5$- and $7$-subgroups. Sylow 3 restricts $n_5 = 1,21$ and $n_7 = 1,15$.

If $n_5 = 1$, then there’s a unique Sylow $5$-subgroup $N_5$. Picking out some Sylow $7$-subgroup $P_7$, we get a subgroup $H = N_5 P_7$ of size $35$ (the normality of $N_5$ is necessary for this to be a subgroup). But from our previous exercise, we know that this must be isomorphic to $\ZZ_{35}$. Since it’s abelian, $P_7$ must of course be normal in $H$. This means that the normalizer $N(P_7) \supseteq H$. Since $n_7 = [G : N(P_7)] \le [G : H] = 3$, we are forced to conclude that $n_7 = 1$ as well.

Likewise, if $n_7 = 1$, we can construct a subgroup $H = P_5 N_7$ isomorphic to $\ZZ_{35}$, in which $P_5$ is normal. The index of $H$ here is $7$, and this also pins down $n_5 = 1$.

If neither of these are $1$, then we run out of elements. Each of these subgroups intersects trivially (because they have prime order), and so we would have $20 \cdot 4$ non-identity elements from the Sylow $5$-subgroups, and $15 \cdot 6$ non-identity elements from the Sylow $7$-subgroups. Adding in the identity, this is a total of $171$ elements, way too many.

So $G$ has normal Sylow $5$- and $7$-subgroups, and their product is a subgroup $H$ or size $35$. As the product of normal subgroups, it is itself normal. Cauchy’s theorem gives us an element $x$ of order $3$, and it generates a subgroup $K$. Since $H$ and $K$ intersect trivially, $HK$ is the whole group, and so $G$ is a semidirect product of $H$ and $K$.

What options do we have for our twisting homomorphism $\phi : K \to \Aut(H)$? All we have to do is specify $\phi(x)$, and all we need is that $\phi(x)^3$ is the identity.

The automorphisms of $\ZZ_n$ are those given by multiplying by some $a$ relatively prime to $n$. As such, the automorphisms of $\ZZ_{35}$ with degree dividing $3$ are $(r \mapsto ar)$, where $a^3 \equiv 1 \bmod 35$. The only such solutions are $1, 11, 16$.

If $a = 1$, then this is the trivial automorphism, and so $G \cong \ZZ_3 \times \ZZ_{35} \cong \ZZ_{105}$.

It turns out that the groups for $a = 11$ and $a = 16$ are isomorphic, but I can’t figure out a clean way to show it at the moment. Stay tuned.

Show $A_5$ is the smallest non-abelian simple group.

To prove this, we need to eliminate the possibility of a simple non-abelian group of any smaller size. First, we can eliminate primes; any group of size $p$ is cyclic, hence abelian.

We can also eliminate prime powers. Any group of prime power order has a non-trivial center, so it cannot be simple.

Next, we eliminate anything that is $2$ mod $4$. Such a number is equal to $2m$ with $m$ odd. If $G$ is a group of size $2m$, let $G$ act on itself by multiplication. This gives us a map $\phi : G \to S_{2m}$ sending $g$ to the permutation it induces. By Cauchy’s theorem, there’s an element of order $2$. This induces a product of $m$ transpositions, and thus an odd permutation. So the map $\sgn \circ \phi : G \to \{ \pm 1 \}$ is surjective, and so its kernel is a non-trivial proper subgroup of $G$. (Unless $G$ has order $2$, but we already handled that case.)

Our last big sweep will be to eliminate groups of size $p^k m$ with $m < p$. Since $n_p$ divides $m$, we have $n_p \le m < p$. But $n_p$ is $1$ mod $p$, and so must be $1$. If there is a single Sylow $p$-subgroup, it must be normal. This eliminates 15, 20, 21, 28, 33, 35, 39, 44, 51, 52, 55, and 57.

This leaves us with 12, 24, 36, 40, 45, 48, and 56.

$|G|=40$: From the congruence conditions, we know that $n_5$ is $1$ mod $5$ and divides $8$. But this forces it to be $1$, so there is a unique Sylow $5$-subgroup.

$|G|=45$: Similar to $|G|=40$, the arithmetic restrictions force $n_5$ to be $1$.

$|G| = 12$: We know that $n_3$ is either $1$ or $4$. If it’s not $1$, there’s $4$ Sylow $3$-subgroups, and because they have prime order, they intersect trivially. This gives $8$ elements of order $3$, leaving $4$ other elements to constitute the Sylow $2$-subgroups. But each Sylow $2$-subgroup has $4$ elements, and so there is a unique (hence normal) one.

$|G| = 56$: Similar to the case for $12$. If $n_7$ is not $1$, it is $8$, yielding $48$ elements of order $7$. The leftover $8$ elements form the unique Sylow $2$-subgroup.

For the other three cases we need some stronger stuff.

Claim: if $G$ is simple and non-abelian, then for all $p$ dividing $|G|$, we must have $|G|$ divides $n_p!$.

Proof: Let $G$ act on the Sylow $p$-subgroups by conjugation. Because there are $n_p$ of them, this gives us a homomorphism $\phi : G \to S_{n_p}$. Since $G$ is simple, $\ker \phi$ is either trivial or all of $G$. Because all Sylow $p$-subgroups are conjugate, the latter situation only occurs when there is only one of them, something impossible if $G$ is simple and non-abelian.

This leaves us with the former case, where the kernel is trivial, and thus $\phi$ is an injection. Identifying $G$ as a subgroup of $S_{n_p}$, we get that $|G|$ divides $n_p!$ as promised.

We can now eliminate the last cases.

$|G|=24$: We know that $n_2$ is either $1$ or $3$, by the usual congruence conditions. But now we have a new tool. If $G$ were simple, then $24$ would divide $n_2!$, which it can’t in either case. So $G$ can’t be simple.

$|G|=36$: We know $n_3$ is $1$ or $4$. If $G$ is simple, then $36$ would divide $n_3!$, which it can’t.

$|G|=48$: Identical to the case for $24$.

Phew!

This was a lot of work. Back when I was in high school, we had to prove this without the Sylow theorems, and by god we appreciated them. Get off my lawn!

(But actually though, that was an… experience.)

The Heawood Number

2018-10-22T00:00:00-07:00

The four-color theorem tells us that we can color any map using only four colors, such that no adjacent regions have the same color.

This is true for any map of the world, whether it’s on a globe or laid out flat. But what about maps on other surfaces?

The mathematical formalization of the four-color theorem is: “any planar graph is 4-colorable”. Let’s break down what that means.

Graph here refers to a collection of vertices and edges, not a plot or a chart. For our purposes, we’ll only consider simple graphs, that is, graphs where a) there is no edge from a point to itself and b) for any pair of points, there’s at most one edge between them. A graph is planar if we can embed it in the plane (i.e., draw it on a sheet of paper) without any of the edges crossing.

A coloring of a graph is a way of coloring the vertices of the graph such that no two vertices of the same color are connected. Note that self-loops make a graph impossible to color, and multiple edges between vertices don’t matter. This is why we concentrate only on simple graphs.

We say a map is $k$-colorable if there exists a coloring with $k$ colors.

So what does this have to do with maps? The problem of coloring a map can be rephrased as a problem about coloring graphs. And since the field is called “graph theory”, and not “map theory”, that’s what we’ll do. Put a vertex for each country, and connect two vertices if the corresponding countries are adjacent. If you can color the map, then the corresponding graph can be colored in the same way. Likewise, if you can color the graph, you can use the same color assignment to color the map.

We’re looking to answer the question: for a surface $S$, how many colors do we need to guarantee we can color any graph embedded in $S$? To do this, we’ll need to make use of an invariant called the “Euler characteristic”.

Euler Characteristic

Euler’s formula for planar graphs says that for any planar graph, $V - E + F = 2$, where $V$ is the number of vertices, $E$ is the number of edges, and $F$ is the number of faces (including the outside face).

This also applies to graphs embedded on the sphere. Imagine taking a pin and poking a hole in the middle of one of the faces. Stretch this hole out until it is wide enough that you can flatten the entire sphere into a disk. Now you have a graph embedded in the plane. (This explains why we like to consider the outside face a legitimate face.)

But this does not apply to graphs embedded on other surfaces! Consider the following graph on the torus:

This has 16 vertices, 32 edges, and 16 faces (count carefully, not all of them are obvious). This has $V - E + F = 0$! Euler’s formula doesn’t work on the torus, but maybe we can salvage it?

Let’s try some examples:

It seems we usually get $0$, but sometimes we do get a $2$, like before. To resolve this, note that in all the examples where we don’t get $0$, some of the faces have “holes”. If you took the face in the $3 - 3 + 1$ example and laid it out flat, it’d look like a ring, not a disk.

So we’ll equip ourselves with another definition: if a graph is embedded in a surface, and none of the resulting faces have holes, we call that embedding honest. (This isn’t standard terminology, but you can’t stop me from naming things whatever I want. Try me.) It turns out that if you honestly embed a graph into the torus, you’ll always get $V - E + F = 0$, no matter which graph you use, or how it’s embedded.

In fact, for any surface $S$, we have a similar result: there’s a fixed integer $\chi(S)$ such that $V - E + F = \chi(S)$, for any honest embedding of any graph. We call this number the Euler characteristic for the surface. For the plane and the sphere, $\chi = 2$. For the torus, $\chi = 0$. Here’s some other examples of surfaces and their Euler characteristics:

The Heawood Number

Now we can approach the generalized four-color theorem. Armed with the Euler characteristic, we define the Heawood number of a surface with Euler characteristic $\chi$ as:

Heawood Number

$$ H(\chi) = \left\lfloor \frac{7 + \sqrt{49 - 24 \chi}}{2} \right\rfloor $$

Yeah. That’s… unmotivated.

We claim that any graph that can be embedded on a surface with characteristic $\chi$, honestly or otherwise, can be colored with at most $H(\chi)$ colors. For the sphere, $H(2) = 4$, so our claim becomes the famous Four-Color Theorem, which is Very Hard To Prove (TM). We’ll deliberately exclude that case, like the cowards we are.

The first step is to prove a lemma about the minimum degree of the graph. That’ll get us most of the way there.

Let $S$ be a surface that isn’t the sphere, and embed a graph $G$ on it, honestly or not. Let $V$, $E$, and $F$ be the usual, and let $\delta$ be the minimum degree of a vertex in $G$. We claim that $\delta \le H(\chi) - 1$.

Proof: First, we can extend this embedding to an honest embedding, by adding extra edges to cut up the faces. This can only make $\delta$ bigger, so if we can prove $\delta \le H(\chi) - 1$ for this new graph, it was also true for the old graph.

Next, consider the following inequalities, the motivations for which are pulled directly from my ass.

Since each face has at least three edges, we know that $2E \ge 3F$.
The sum of the degrees for all vertices is $2E$. Thus, $2E \ge \delta V$.
A vertex cannot be connected to more than $V - 1$ other vertices, so $\delta + 1 \le V$.

Now, from the definition of Euler characteristic, we have:

$$ \begin{align*} \chi &= V - E + F \\ 6\chi &= 6V - 6E + 6F \\ 6\chi &\le 6V - 2E \\ 6\chi &\le 6V - \delta V = (6 - \delta) V \\ \end{align*} $$

Here we must split into cases, depending on the sign of $\chi$.

If $\chi \le 0$, then we make both sides positive before making use of our last inequality:

$$ -6\chi \ge (\delta - 6)V \ge (\delta - 6)(\delta + 1) = \delta^2 - 5 \delta - 6 $$

Now use the handy-dandy quadratic formula; we get that $\delta$ is at most $\frac{5 + \sqrt{49 - 24 \chi}}{2} = H(\chi) - 1$. Boom.

Otherwise, $\chi > 0$, and by the classification of compact surfaces, we know $S$ must be the sphere or the projective plane. We’re explicitly excluding the sphere, so $S$ must be the projective plane, which has Euler characteristic 1. Plugging that in, we get that $6 \le (6 - \delta) V$. Since the right side is positive, we must have $\delta < 6$. Because $H(1) = 6$, we can still guarantee that $\delta \le H(\chi) - 1$.

So for any graph $G$ embedded in $S$, honestly or otherwise, there is a vertex with degree at most $H(\chi) - 1$.

We’re basically done! We’ll describe an explicit procedure to color graphs on $S$ with $H(\chi)$ colors.

Let $G$ be a graph embedded on $S$. Our base case is the graph with one vertex; it can trivially be colored. Otherwise, consider $G$ with $n \ge 2$ vertices. By our lemma, it has some vertex $v$ with degree at most $H(\chi) - 1$. Apply our procedure to the subgraph $G - v$, coloring it with $H(\chi)$ colors. Since $v$ has strictly less than $H(\chi)$ neighbors, there will be at least one color available for us to color $v$ with, and so we can color all of $G$.

Conclusions

We showed that any graph $G$ embedded in $S$, honestly or otherwise, can be colored with $H(\chi) = \left\lfloor \frac{7 + \sqrt{49 - 24 \chi}}{2} \right\rfloor$ colors. The only case we decided not to handle was when $S$ is the sphere. Unfortunately, that case is much harder. The proof above was discovered in 1890 by Percy John Heawood, after whom the number is named. The Four-Color Theorem wasn’t proven until much later, in 1976, by Kenneth Appel and Wolfgang Haken. And what a controversial proof it was! They managed to reduce the problem to checking a particular property of 1,936 graphs. This wasn’t feasible to do by hand, so they used a computer to check those cases. This was the first computer-aided proof, and it ruffled quite a few feathers.

Secondly, we only established an upper bound on the number of colors we need in our palette. Is there a graph that requires all $H(\chi)$ colors? Or can we lower the bound a bit? The Heawood conjecture is the claim that we can’t; i.e., that this bound is sharp. And it’s mostly true. In 1968, Gerhard Ringel and Ted Youngs showed that, on almost any surface, you can embed the complete graph on $H(\chi)$ vertices. Since that graph requires all $H(\chi)$ colors, that shows the bound is sharp. The only exception is the Klein bottle, where the conjecture predicts $H(0)=7$ colors are needed, but in fact, $6$ colors suffice to color any graph.

A maximal coloring of the Klein bottle is shown below:

Linearity of Expectation

2018-10-15T00:00:00-07:00

To introduce this topic, let’s start with an innocuous problem:

You have $10$ six-sided dice. If you roll all of them, what is the expected sum of the faces?

Your intuition should tell you that it’s $35$. But what’s really going on here is an example of a slick principle called linearity of expectation.

We’re not actually computing the probability of getting $10, 11, \ldots, 60$, and summing it all up. Implicitly, we are making the following line of argument: the expected value of the first die is $3.5$, and so the expected value for $k$ dice is $3.5k$. This relies on the following claim: given two random variables $X$ and $Y$, the expected value of their sum, $E[X + Y]$, is just $E[X] + E[Y]$.

This feels intuitively true, and proving it is straightforward. Let $\Omega$ be the space of possible outcomes. Then

$$ \begin{align*} E[X + Y] &= \sum_{\omega \in \Omega} p(\omega) (X + Y)(\omega) \\ &= \sum_{\omega \in \Omega} p(\omega) (X(\omega) + Y(\omega)) \\ &= \sum_{\omega \in \Omega} p(\omega) X(\omega) + \sum_{\omega \in \Omega} p(\omega) Y(\omega) \\ &= E[X] + E[Y] \end{align*} $$

But interestingly enough, at no point did we require $X$ and $Y$ be independent. This still works even when $X$ and $Y$ are correlated! For some sanity-checking examples, consider $X = Y$ and $X = -Y$.

This principle, which is rather obvious when $X$ and $Y$ are independent (so much so that we often use it unconsciously), is unexpectedly powerful when applied to dependent variables. We’ll explore the concept through several example problems.

Gumballs

Imagine a very large gumball machine, with $4$ colors of gumballs in it, evenly distributed. We only have enough money for $6$ gumballs; what’s the expected number of colors we will receive? Assume that the machine has so many gumballs that the ones we take out don’t matter; effectively, we are drawing with replacement.

Let’s compute this the naive way first. Let’s count the number of ways we can get each number of colors, and do the appropriate weighted sum.

There are $4$ ways we can get only one color.

For any two colors, there’s $2^6 = 32$ ways we can get gumballs using just those colors. There’s $6$ pairs of colors, so there’s $32 \cdot 6 = 192$ ways to get at most two colors. Subtracting off the single-color cases, we get $188$ ways to get exactly two colors.

Similarly, for any three colors, there’s $3^6 = 729$ ways to get gumballs with just those colors. There’s $4$ possible triplets, giving $2916$ ways to get at most three colors. Subtracting off the two-color cases, we get $2728$ ways to get exactly three colors.

All other cases have four colors: $4^6 - 2728 - 188 - 4 = 1176$ possible ways.

Now we do the weighted sum. Each possible sequence of gumballs has probability $1/4^6$ of occuring, so the expected value of the number of colors is:

$$ 1 \frac{4}{4^6} + 2 \frac{188}{4^6} + 3 \frac{2728}{4^6} + 4 \frac{1176}{4^6} = \frac{3317}{1024} \approx 3.239 $$

It’s doable, but one can imagine this is much harder for larger numbers.

Let’s take another go at it. For the $i$th color, define $X_i$ to be $1$ if we get at least one gumball of that color, and $0$ otherwise. The number of colors we get, $X$, is then the sum of the $X_i$.

The probability of not getting a gumball of a particular color on a particular draw is $3/4$, so the probability of not getting it in $6$ draws is $(3/4)^6$. This means that $E[X_i] = 1 - (3/4)^6 = 3367/4096$.

The $X_i$ are not independent; for example, if we know three of them are $0$, the last one must be $1$ (we must draw a gumball of some color). But we can still apply linearity of expectation, even to dependent variables.

Thus, the expected number of colors we get is $E[X] = \sum_{i = 1}^4 E[X_i] = 4 \cdot \frac{3367}{4096} = \frac{3367}{1024}$, just as we got earlier.

Notably, this approach extends gracefully to when we take $k$ gumballs with $n$ available colors. The expected value of each $X_i$ is then $(1 - 1/n)^k$, so the expected value of $X$ is then $n (1 - 1/n)^k$.

(This reveals an interesting approximation: if $n$ and $k$ are equal and large, then $(1 - 1/n)^n \approx 1/e$, so the expected number of colors is $n(1 - 1/e) \approx 0.63n$).

Number of Fixed Points

These variables we saw earlier, that are $1$ if a condition is true, and $0$ otherwise, are called indicator variables, and they are particularly good candidates for linearity of expectation problems.

After we shuffle a deck of $n$ cards, what are the expected number of cards that have stayed in the same position? Equivalently, given an arbitrary permutation on $n$ objects, how many fixed points does it have on average.

We have no interest in examining all $n!$ possible outcomes, and summing over the number of fixed points in each. That would be terrible. Instead, we’re going to split our desired variable into several indicator variables, each of which is easier to analyze.

Let $X_k$ be $1$ if the $k$th card is in the $k$th position, and $0$ otherwise. Then the number of fixed points is $\sum_k X_k$.

After shuffling, the $k$th card is equally likely to be in any position in the deck. So the chance of ending up in the same place is $1/n$, which makes $E[X_k] = 1/n$. So by linearity of expectation, $E[X_1 + \cdots + X_n] = n \cdot \frac{1}{n} = 1$. So on average, one card will stay in the same place.

Number of Cycles

We don’t have to limit ourselves to indicator variables: sometimes we can use a constant factor to help us avoid overcounting.

Given a random permutation on $n$ objects, how many cycles does it have?

As a reminder, the cycles of a permutation are the “connected components”. For example, if $\sigma$ sends $1 \to 2$, $2 \to 4$, $3 \to 6$, $4 \to 1$, $5 \to 5$, and $6 \to 3$, then the cycles of $\sigma$ are $(1, 2, 4)$, $(3, 6)$, and $(5)$.

For each $k$, let $X_k = \frac{1}{L}$, where $L$ is the length of the cycle of $\sigma$ containing the number $k$. So for the permutation we described, $X_1 = X_2 = X_4 = 1/3$, $X_3 = X_6 = 1/2$, and $X_5 = 1$. Then the number of cycles is $X_1 + \cdots + X_n$, since each cycle contributes $L$ copies of $1/L$. As usual, these variables are highly dependent (if $X_i = 1/5$, there’d better be four other $X_j$ that equal $1/5$ as well), but we can still apply linearity of expectation.

The probability that $k$ is in a cycle of length $1$ is $1/n$, since $\sigma$ would have to send $k$ to itself.

The probability it is in a cycle of length $2$ is the probability $k$ is sent to some other number, times the probability that the other number is sent back to $k$, i.e. $\frac{n-1}{n} \cdot \frac{1}{n - 1}$, which is $\frac{1}{n}$.

In general, the probability of being in a cycle of length $L$ is $\frac{n-1}{n} \frac{n-2}{n-1} \cdots \frac{n-(L-1)}{n-(L-2)} \cdot \frac{1}{n-(L-1)} = \frac{1}{n}$. Curiously, this is independent of $L$.

So the expected value of $X_k$ is $\frac{1}{n} \sum_{L=1}^n \frac{1}{L} = \frac{H_n}{n}$, where $H_n$ is the $n$th harmonic number. Then the expected number of cycles is $E[X_1] + \cdots + E[X_n] = H_n$.

Buffon’s Needle

We’ll finish up with a rather surprising application to the Buffon’s needle problem:

Consider a gigantic piece of lined paper, with the lines spaced one unit apart. If we throw a needle of length $1$ onto the paper, what is the probability it crosses a line?

Technically, we’re only interested in the probability that the needle crosses the line. But because it can cross at most once, this is equal to the expected number of crossings. So if we let $X_a$ be the expected number of crossings for a needle of length $a$, we’re interested in $E[X_1]$.

Take a needle of length $a + b$, and paint it, covering the first $a$ units of it red, and the other $b$ units blue. Then throw it on the paper. The expected number of crossings is the expected number of red crossings, plus the expected number of blue crossings. But each segment of the needle is just a smaller needle, so the expected number of red crossings is $E[X_a]$, and the expected number of blue crossings is $E[X_b]$. This lets us conclude, unsurprisingly, that $E[X_{a+b}] = E[X_a] + E[X_b]$. This tells us that $E[X_a]$ is linear in $a$, and so $E[X_a] = Ca$ for some unknown constant $C$. (Well, we’ve gotta assume $X_a$ is continuous in $a$, which it is, but shh…)

Furthermore, put a sharp bend in the needle right at the color boundary. Each segment is still a linear needle, so the number of red crossings is still $E[X_a]$, and likewise with blue crossings. So the expected number of crossings for this bent needle is still $E[X_{a+b}]$, despite the kink!

By induction, if you put a finite number of sharp bends in a needle, it doesn’t change the expected number of crossings. All that matters is the total length. And by ~~handwaving~~ a continuity argument, this is true for continuous bends as well. So $X_a$ doesn’t just measure the expected number of crossings for a needle of length $a$, but any reasonable curve of length $a$. (Much to my delight, this phenomenon is called “Buffon’s noodle”.) This means that if we throw a rigid noodle of length $a$ on the paper, the expected number of crossings is $E[X_a] = Ca$.

So let’s consider a particular kind of noodle: a circle with diameter $1$. No matter how it’s thrown onto the paper, it will cross the lines exactly twice. It has circumference $\pi$, and so we can determine that $C = \frac{2}{\pi}$. Thus, for the original needle problem, $p = X_1 = \frac{2}{\pi}$.

Expected Density of Pigeons

2018-10-08T00:00:00-07:00

$\DeclareMathOperator{\res}{Res}$

This one’s another puzzle from work:

Consider a pigeon coop with $n$ pigeonholes, arranged in a straight line. When a pigeon arrives at the coop, it will roost in a pigeonhole only if it is empty, and both neighboring pigeonholes are also empty. It selects such a pigeonhole uniformly at random, enters the pigeonhole, and does not leave. At some point, the coop will fill up, but not every pigeonhole will be occupied. What is the expected density of pigeons in the coop, as $n$ grows large?

If you run a few simulations, you get that it’s about $0.432332\ldots$. But this isn’t any easily recognizable number. What is it in closed form?

This problem illustrates one of the things I find really cool about math: the boundaries between different disciplines are essentially fictitious. This is a combinatorics problem, and so we might expect to be using arguments involving counting, bijections, and other finite tools. But instead we’ll sprint as fast as we can into the realm of analysis and solve the problem there.

Let $a_n$ be the expected number of pigeons for a coop with $n$ holes. Then we can come up with a recurrence relation for $a_n$.

Consider what happens when the first pigeon arrives in an unoccupied coop. If it arrives in the first hole, then we can imagine deleting the first hole and its neighbor from the coop, leaving us with an unoccupied coop of size $n - 2$. If it lands in the last hole, we have the same situation. Otherwise, it lands somewhere in the middle; when a pigeon comes to rest in the $k$th hole (I’m going to $1$-index, by the way), it splits the coop into two smaller coops, one with $k - 2$ holes, and the other with $n - k - 1$ holes. Since each hole is equally likely, we can average over all values of $k$ to get a first draft of our recurrence relation:

$$ a_n = 1 + \frac{1}{n} \left( a_{n-2} + a_{n-2} + \sum_{k=2}^{n-1} (a_{k-2} + a_{n-k-1}) \right) $$

This can be prettied up with some mild re-indexing:

$$ a_n = 1 + \frac{2}{n} \sum_{k=0}^{n-2} a_k $$

We can do even better though! If we consider $n a_n - (n-1) a_{n-1}$, we can collapse most of our terms:

$$ \begin{align*} n a_n - (n-1) a_{n-1} &= \left( n + 2 \sum_{k=0}^{n-2} a_k \right) - \left( n-1 + 2 \sum_{k=0}^{n-1} a_k \right) \\ n a_n - (n-1) a_{n-1} &= 1 + 2 a_{n-2} \\ a_n &= \frac{1}{n} ( 1 + (n-1) a_{n-1} + 2 a_{n-2} ) \end{align*} $$

This isn’t a linear recurrence relation, so we can’t apply linear algebra tricks to it. So we fall back on the Swiss Army knife of recurrence relations: the generating function.

Let $G(z) = a_0 + a_1 z + a_2 z^2 + a_3 z^3 + \cdots$. We don’t know what this function is yet, but we can use the recurrence relation to pin down what it is.

\begin{align*} G(z) &= \sum_{n=0}^\infty a_n z^n \\ G'(z) &= \sum_{n=1}^\infty n a_n z^{n-1} \\ &= a_1 + \sum_{n=2}^\infty n a_n z^{n-1} \\ &= a_1 + \sum_{n=2}^\infty \left( 1 + (n-1) a_{n-1} + 2 a_{n-2} \right) z^{n-1} \end{align*}

Dealing with the three pieces separately makes this much easier to read (and also to write *wink*):

$$ \sum_{n=2}^\infty z^{n-1} = \frac{z}{1 - z} $$

$$ \sum_{n=2}^\infty (n-1) a_{n-1} z^{n-1} = \sum_{n=1}^\infty n a_n z^n = z G'(z) $$

$$ \sum_{n=2}^\infty 2 a_{n-2} z^{n-1} = 2 \sum_{n=0}^\infty a_n z^{n+1} = 2z G(z) $$

Putting it all together, we get a differential equation for $G(z)$:

$$ G'(z) = 1 + \frac{z}{1 - z} + z G'(z) + 2z G(z) $$

Cleaning it up a little, we see that it’s first order and linear, so we can put those diff eq skills to use:

$$ G'(z) = \frac{2z}{1 - z} G(z) + \frac{1}{(1 - z)^2} $$

The details aren’t super important, but basically you use an integrating factor and get:

$$ G(z) = \frac{1 + C e^{-2z}}{2(z-1)^2} $$

What should $C$ be? We’ll have to use our initial conditions, and one of them is particularly straightforward: $G(0) = a_0$, which we know is $0$, and so $C = -1$.

At this point, let’s stop and recollect our thoughts. We’ve defined a function $G(z)$ whose power series coefficients are $a_n$, the average number of pigeons in a coop of size $n$. Our solution is now encoded in quite a peculiar way: how fast do the coefficients of $G(z)$ grow?

To figure this out, let’s put the “analytic” in “analytic combinatorics”, and consider some contour integrals. Fix some $R > 1$, and define $I_n$ to be the integral of $G(z)/z^{n+1}$ around the circle of radius $R$ at the origin (taken counter-clockwise).

What is $I_n$? We can evaluate it using the residue theorem. There are two poles, one at $z = 0$, and the other at $z = 1$. The former is easy to compute; the residue is the coefficient on the $z^{-1}$ term, which is exactly $a_n$. The second does not admit such a nice description, and so we compute it the usual way:

\begin{align*} \res\left( \frac{G(z)}{z^{n+1}}, 1\right) &= \lim_{z \to 1} \frac{d}{dz} (z-1)^2 \frac{G(z)}{z^{n+1}} \\ &= \lim_{z \to 1} \frac{d}{dz} \frac{1 - e^{-2z}}{2 z^{n+1}} \\ &= \lim_{z \to 1} \frac{2 z e^{-2z} - (n+1)(1 - e^{-2z})}{2 z^{n+2}} \\ &= \frac{(n+3)e^{-2} - (n+1)}{2} \end{align*}

So $\frac{1}{2 \pi i} I_n = a_n + \frac{(n+3)e^{-2} - (n+1)}{2}$. What good does this do us?

If you’ve seen this trick before, you know that $I_n$ drops exponentially to $0$ as $n$ increases, but if not, here’s the justification. Let $M$ be the largest value (in terms of absolute value) that $G$ attains on the circle $|z| = R$. Then the triangle inequality tells us:

$$ | I_n | = \left| \int_{C_R} \frac{G(z)}{z^{n+1}}~dz \right| \le \int_{C_R} \left| \frac{G(z)}{z^{n+1}} \right|~dz \le \int_{C_R} \frac{M}{R^{n+1}}~dz = \frac{2 \pi M}{R^n} $$

So as $n \to \infty$, $I_n$ drops to $0$, and so $a_n$ approaches $\frac{(n+1)-(n+3)e^{-2}}{2}$. Therefore, the expected density of pigeons, $a_n/n$, approaches $(1 - e^{-2})/2$, or about $0.432332$.

There were other solutions that people came up with for this problem, but what I really like about this one is that it demonstrates a way to approach these problems in general, and (at least IMO) it’s a pretty unexpected one. If someone asked me to figure out how fast the coefficients of a power series grow, the residue theorem would not be the first thing on my mind. And yet, not only does it get the job done, it works for many other similar problems, in essentially the same way. I’m not much of an analysis person, but my understanding is that this kind of trick is common in analytic combinatorics, and I think that’s pretty cool!

Cauchy Residue Theorem

2018-10-01T00:00:00-07:00

$\DeclareMathOperator{\res}{Res}$

The Cauchy Residue Theorem is a remarkable tool for evaluating contour integrals. Essentially, it says that, instead of computing an integral along a curve $\gamma$, you can replace it with a sum of “residues” at some special points $a_k$:

$$ \oint_\gamma f(z)~dz = 2 \pi i \sum_k \res(f, a_k) $$

But what is a residue? What are the $a_k$? What’s really going on here?

Residues

Since this isn’t a rigorous complex analysis text, it’s a post on some blog, we’ll gloss over some of the technicalities, such as verifying convergence, or checking that holomorphic functions are analytic. All we need is some imagination, and the following fact:

Path Independence

Let $D$ be a region of the complex plane and $f$ be a function holomorphic (complex-differentiable) on $D$. If you take a curve $\gamma$, and continuously deform it into a curve $\gamma'$, staying inside $D$, then

$$ \int_\gamma f(z)~dz = \int_{\gamma'} f(z)~dz $$

Also, we say two such curves are “homotopic”.

For example, if the blue dashed area is $D$, the curves in the first picture are homotopic, but not the curves in the second picture. There is no way to deform one of the curves into the other, without leaving the domain.

If you’re comfortable with multivariable calculus, compare this to the Fundamental Theorem of Calculus for line integrals. How does complex-differentiability encode the “curl-free” condition?

This means that if $\gamma$ is a closed loop and $f$ is holomorphic on the region enclosed by $\gamma$, then $\gamma$ is homotopic to a point, which tells us that $\int_\gamma f~dz$ must be zero. Where things get interesting is when there are points in $D$ at which $f$ is not holomorphic.

So let’s approach the theorem.

Let $f$ be a function holomorphic on $D$, except at a set of points $a_k$, and $\gamma$ a closed curve in $D$, avoiding the points $a_k$. Without loss of generality, we can assume all of the $a_k$ lie within the region enclosed by $\gamma$ (if not, we just make $D$ smaller). We can use the path-independence of contour integrals to deform $\gamma$, without changing the value of the integral:

These corridors between the circles can be moved so they lie on top of each other, and cancel out. This leaves us with circles $C_k$, one for each point $a_k$.

$$ \oint_\gamma f(z)~dz = \sum_k \oint_{C_k} f(z)~dz $$

So all we need to do to now is determine what the integral of $f$ on each circle is.

Residue Definition #1

The residue of $f$ at $a$ is $\displaystyle \frac{1}{2 \pi i} \oint_{C} f(z)~dz$, where $C$ is a small circle around $a$.

From path-independence, we know we can shrink the circles as much as we like without changing the value of the integral, which tells us this definition is well-defined (just make sure $f$ is holomorphic everywhere else in your circle!).

“But wait,” you complain, “This definition is ridiculous; you set it up in such a way that the residue theorem is trivial! What gives?”

Well, there are other, equivalent definitions of residue that are much easier to compute, and those are what give the residue theorem its power. Sometimes people will use these computational definitions of residue as the primary definition, but this obscures what’s going on. When you think of what the residue means, in a spiritual sense, you should think of it as “the integral of a small loop around a point”.

A point at which $f$ is not holomorphic is called a “singularity”, and there are a few types. The most manageable of these is the pole, where $f(z)$ “behaves like” $\frac{1}{(z-a)^n}$. To be more concrete, $f$ has a pole (of order $n$) at $a$ if $(z - a)^n f(z)$ is holomorphic and non-zero at $a$. In other words, a zero of order $n$ cancels out a pole of order $n$.

For example, $\frac{1}{\sin z}$ has a pole of order $1$ at $z = 0$, as evidenced by the fact that $\frac{z}{\sin z}$ approaches $1$ as $z \to 0$. The rational function $\frac{x-2}{x^2 + 1}$ has poles at $\pm i$, also of order $1$. And the function $\frac{1}{\cos z - 1}$ has a pole of order $2$ at zero.

There are other kinds of singularities, but nothing good comes from them, so we will henceforth only consider singularities that are poles.

If $f$ has a pole of order $n$ at $a$, then $(z-a)^n f(z)$ has a Taylor series centered at $z = a$, with non-zero constant term:

$$ (z-a)^n f(z) = b_0 + b_1 (z - a) + b_2 (z - a)^2 + b_3 (z - a)^3 + \cdots $$

Letting $c_k = b_{k+n}$, we can define a series for $f(z)$ itself, called the Laurent series:

$$ f(z) = \frac{c_{-n}}{(z-a)^n} + \frac{c_{-n+1}}{(z - a)^{n-1}} + \cdots + \frac{c_{-1}}{z - a} + c_0 + c_1 (z - a) + \cdots $$

It’s almost a Taylor series, but we allow (finitely many) negative terms as well. This expansion will allow us to compute the residue at $a$.

Let’s just take a single term, $(z - a)^n$, and we’ll recombine our results at the end, because integrals are linear. What happens when we integrate around a circle centered at $a$ with radius $R$? Subsitute $z = a + R e^{it}$ for the contour:

$$ \oint (z - a)^n~dz = \int_0^{2\pi} (R e^{it})^n~d(R e^{it}) = i R^{n+1} \int_0^{2\pi} e^{(n+1) it}~dt = i R^{n+1} \left[ \frac{e^{(n+1)it}}{(n+1)i} \right]^{2\pi}_0 $$

Since $n$ is an integer, $e^{(n+1)2 \pi i} = 1$, and $e^{0} = 1$, so this integral should be zero. But that doesn’t make any sense; that would suggest that the integral of any function around a circle is zero. But that’s not true.

We actually made a mistake in the last step; the antiderivative of $e^{kt}$ is $e^{kt} / k$ unless $k = 0$. For that to happen, we need $n = -1$, and in that case:

$$ \oint \frac{1}{z - a}~dz = \int_0^{2\pi} \frac{d(R e^{it})}{R e^{it}} = \int_0^{2\pi} i~dt = 2 \pi i $$

Therefore, when we integrate $f(z) = \sum_{k = -n}^\infty c_k (z - a)^k$, all the terms vanish, except for the $k = -1$ term, which pops out a $2 \pi i \cdot c_{-1}$. This gives us another definition for the residue!

Residue Definition #2

If $f$ has a pole at $a$, and a Laurent series $f(z) = \sum c_k (z - a)^k$, then the residue of $f$ at $a$ is $c_{-1}$.

If this were all we knew, it would still be a pretty good theorem. Finding power series instead of taking integrals? Not too shabby. But we can take it one step more.

Finding power series can be frustrating; how many people know the power series for $\tan z$ off the top of their head? Besides, we don’t need the whole thing, just a specific coefficient.

Instead, we’ll assume the existence of a power series, and use some tricks to extract $c_{-1}$.

Say we’ve got a simple pole (a pole of order $1$). By multiplying by $(z - a)$, we can get a Taylor series:

$$ (z - a) f(z) = c_{-1} + c_0 (z - a) + c_1 (z - a)^2 + \cdots $$

If we plug in $z = a$, then we’ll get $c_{-1}$. Well, technically, we can’t plug in $z = a$ directly, because $f(z)$ isn’t defined at $a$. But if we take a limit, that’s okay.

How about a pole of order $2$? Our trick won’t work the same way; if we apply it naively, we’ll just get $c_{-2}$, which we don’t care about at all.

$$ (z - a)^2 f(z) = c_{-2} + c_{-1} (z - a) + c_0 (z - a)^2 + c_1 (z - a)^3 \cdots $$

But if we take the derivative, we can knock off a term from the end, and then we can take the limit as $z \to a$.

$$ \frac{d}{dz} (z - a)^2 f(z) = c_{-1} + 2 c_0 (z - a) + 3 c_1 (z - a)^2 \cdots $$

For $n = 3$, there’s a slight wrinkle; we end up with an extra factor of $2$ that we have to divide out:

$$ \frac{d^2}{dz^2} (z - a)^3 f(z) = 2 c_{-1} + 6 c_0 (z - a) + 12 c_1 (z - a)^2 \cdots $$

The pattern for higher-order poles is similar:

multiply by $(z - a)^n$; this changes our term of interest to $c_{-1} (z - a)^{n-1}$
take $n-1$ derivatives; the important term is now $(n-1)! c_{-1}$
divide by $(n-1)!$; the important term is now $c_{-1}$
take the limit as $z \to a$; all higher order terms vanish, and we are left with $c_{-1}$

We now have our last, and most computationally accessible, definition of residue:

Residue Definition #3

If $f$ has a pole at $a$ of order $n$, then the residue of $f$ at $a$ is:

$$ \res(f, a) = \lim_{z \to a} \frac{1}{(n-1)!} \frac{d^{n-1}}{dz^{n-1}} (z - a)^n f(z) $$

This is the definition often presented as “the” definition of residue, but this hides where the residue theorem comes from, and why residues are defined the way they are.

Winding Number

As a final note, we can add a tiny bit more generality to the theorem.

Technically, we’ve been a little sloppy with our curve $\gamma$. What if it goes the other way? Or loops around some points multiple times?

To fix this, we introduce $W(\gamma, a)$, the winding number of $\gamma$ around $a$. It means exactly what the name suggests, it indicates how many times (and in what direction) $\gamma$ loops around $a$. Counter-clockwise is positive, and clockwise is negative. Two examples are pictured below:

In the first picture, the specified points have winding number +1 and +2, and in the second, they have -1 and +1. The only thing this changes about our proof is that when we deform our $\gamma$ into circles, we may get multiple loops around the same point:

But by definition, the number of loops is exactly the winding number, and if the loop runs clockwise, we pick up a negative sign. So after accounting for multiplicity and direction, we get:

$$ \oint_\gamma f(z)~dz = \sum_k W(\gamma, a_k) \res(f, a_k) $$

Monsky's Theorem

2018-09-24T00:00:00-07:00

$\newcommand{\RR}{\Bbb R} \newcommand{\QQ}{\Bbb Q} \newcommand{\ZZ}{\Bbb Z}$

For which $n$ can you cut a square into $n$ triangles of equal area?

This question appears quite simple; it could have been posed to the Ancient Greeks. But like many good puzzles, it is a remarkably stubborn one.

It was first solved in 1970, by Paul Monsky. Despite the completely geometric nature of the question, his proof relies primarily on number theory and combinatorics! There is a small amount of algebraic machinery involved, but his proof is quite accessible, and we will describe it below.

If you have a napkin on hand, it should be straightforward to come up with a solution for $n = 2$ and $4$. A little more thought should yield solutions for any even $n$. One such scheme is depicted below:

But when $n$ is odd, you will have considerably more trouble. Monsky’s theorem states that such a task is, in fact, impossible.

Monsky's Theorem

The unit square cannot be dissected into an odd number of triangles of equal area.

The result clearly extends to squares of any size, and in fact, arbitrary parallelograms.

There are two key ingredients here:

Sperner’s Lemma
2-adic valuations

Proof sketch:

Color the vertices of the dissection using three colors
Find a triangle with exactly one vertex of each color
Show that such a triangle cannot have area $1/n$

If the last step seems ridiculous to you, don’t worry. It’s completely non-obvious that the coloring of a triangle’s vertices could at all be related to its area. But once you see the trick, it will (hopefully) seem less mysterious. Just hang in there.

Sperner’s Lemma

Consider a polygon $P$ in the plane, and some dissection of it into triangles $T_i$. As promised in the previous section, color the vertices with three colors; we’ll use red, green, and blue. We will call a segment purple if it has one red and one blue endpoint. A triangle with exactly one corner of each color will be called trichromatic. (Great terminology, eh?)

A Sperner coloring is a coloring of the vertices of $T_i$, using three colors, with the following properties:

no face of $P$, nor any face of one of the $T_i$, contains vertices of all three colors
there are an odd number of purple segments on the boundary of $P$

For example, the following are Sperner colorings:

But these are not – the first has lines of more than two colors, and the second has an even number of purple boundary segments:

In this format, Sperner’s lemma can be stated as:

Sperner's Lemma

Given a Sperner coloring of $(P, T_i)$, there is at least one trichromatic triangle.

Check the examples above, both Sperner colorings have trichromatic triangles. The first non-Sperner coloring has one, but the other does not.

Proof: First, we establish a lemma: a triangle $T$ is trichromatic iff its faces have an odd number of purple segments.

This is easy to see if there are no vertices lying on the faces of $T$: a trichromatic triangle has exactly one purple segment, and otherwise, it has zero or two.

We can reduce to this case by deleting vertices that lie on the faces of $T$. We claim that this won’t change whether the number of purple segments is even or odd. And of course, since we aren’t touching the corners, it can’t change whether or not the triangle is trichromatic. Consider some vertex on a face of $T$. If that face contains green at all, then by the first property of Sperner colorings, it can’t ever have purple segments, as it must omit either red or blue vertices. Monochromatic faces also present no concern, because they also cannot have purple segments. The remaining cases are shown below:

Cool. How does this help us?

Let’s do some counting mod $2$. Let $f(T)$ be the number of purple segments in a triangle $T$. What is the sum of all $f(T)$, mod $2$?

On one hand, it’s simply the number of trichromatic triangles; $f(T) \not\equiv 0 \pmod 2$ exactly when $T$ is trichromatic. But also, it’s the number of purple segments on the boundary. Each purple segment in the interior of $P$ gets counted twice, and so contributes nothing, but boundary segments contribute exactly once.

Since there are an odd number of purple segments on the boundary of $P$, there are an odd number of trichromatic triangles. In particular, there’s at least one of them.

(This illustrates a common trick among combinatorialists: if you want to show that an object $X$ exists, show that the number of $X$s is odd. Cheeky!)

$2$-adic valuations

Before we describe our coloring, we’ll take an unexpected detour into the land of valuations.

A valuation is a function that assigns a notion of “value” or “size” to numbers. There’s multiple conventions, but we one we’ll use is that a valuation on a ring $R$ is a function $\nu$ from $R$ to $\RR^+ \cup \{ \infty \}$ such that:

$\nu(x) = \infty$ if and only if $x = 0$
$\nu(xy) = \nu(x) + \nu(y)$
$\nu(x + y) \ge \min(\nu(x), \nu(y))$

We’ll assign the obvious rules to $\infty$, such as, $a + \infty = \infty$, and $\min(a, \infty) = a$.

One example of a valuation, that might help guide your intuition, is the “multiplicity of a root”. For some polynomial $p(x) = a_0 + a_1 x + \cdots + a_n x^n$, let $\nu(p)$ be the index of the first non-zero coefficient. For example, $\nu(3x^4 - x^5 + 7x^8) = 4$, and $\nu(1 + x - x^2) = 0$. If all coefficients are zero, define $\nu(p) = \infty$. In essence, $\nu(p)$ is “how many” roots $p$ has at $0$; e.g., is $0$ a single root? A double root? Not a root at all?

Is this a valuation?

Well we satisfied the first property by fiat. The second one is pretty easy to see; when you multiply two polynomials, the lowest term has the sum of the degrees. And the third one ain’t too bad either. If both $p$ and $q$ have zero coefficients on $x^k$, $p+q$ certainly will too. The converse isn’t true though, it’s possible that the low-degree terms in $p$ and $q$ could cancel, and so $\nu(p+q)$ could be larger than either $\nu(p)$ or $\nu(q)$. This is why we have an inequality, instead of an equality.

The particular valuation we’re interested in the $2$-adic valuation, which measures how divisible by two a number is. The more factors of $2$ a number has, the bigger its valuation is.

For example, $\nu_2(2) = \nu_2(6) = \nu_2(-22) = 1$, since they all have a single factor of $2$. Odd integers have $\nu_2$ of $0$, since they have no factors of $2$ at all. And because $0$ can be factored as $2^k \cdot 0$ for any $k$, no matter how big, it makes sense to say $\nu_2(0) = \infty$.

To extend this to rational numbers, we consider $2$s in the denominator to count as negative. Consider the following examples until they make sense:

$$ \nu_2(1/4) = -2 \qquad \nu_2(1/3) = 0 \qquad \nu_2(2/3) = 1 \qquad \nu(3/8) = -3 \qquad \nu_2(12/5) = 2 $$

We claim this is also a valuation.

Again, we get the first property simply because we defined it to be so. The second one is also easy to verify, but the third one needs some work.

Let $x$ and $y$ be rational numbers. By pulling out all the factors of $2$ from numerator and denominator, they can be written as $x = 2^n \frac{a}{b}$ and $y = 2^m \frac{c}{d}$, where $a$, $b$, $c$, and $d$ are odd. (Note that any of these, including $n$ and $m$, may be negative.) Without loss of generality, let $n \ge m$. We’d like to show that $\nu_2(x + y)$ is at least $\min(\nu_2(x), \nu_2(y)) = m$.

$$ x + y = 2^n \frac{a}{b} + 2^m \frac{c}{d} = 2^m \left( \frac{2^{n-m} a}{b} + \frac{c}{d} \right) = 2^m \frac{2^{n-m} ad + bc}{bd} $$

Since $2^{n-m} ad + bc$ is an integer, and $bd$ is odd, $x + y$ has at least $m$ factors of $2$, and so $\nu_2(x + y) \ge m$, as desired. Notably, if $n$ is strictly larger than $m$, i.e., $\nu(x) > \nu(y)$, then $2^{n-m} ad + bc$ is odd, and we can guarantee that $\nu_2(x+y)$ is exactly $\nu(y)$. This is actually a property true of all valuations, so we’ll state it again:

$\nu(x + y) \ge \min(\nu(x), \nu(y))$, and if $\nu(x) \ne \nu(y)$ this is an equality

So $\nu_2$ is an honest-to-god valuation on $\QQ$. By a theorem of Chevalley, we can extend this to a valuation on $\RR$. The details are not particularly important, and the curious reader can find them at the end of this post.

Coloring The Plane

Our coloring of the dissection will use the (extended) $2$-adic valuation. Our choice of coloring is peculiar enough that it deserves its own section though.

Given a point $(x,y)$ in the plane, we’ll color it:

red if $\nu_2(x) > 0$ and $\nu_2(y) > 0$
green if $\nu_2(x) \le 0$ and $\nu_2(x) \le \nu_2(y)$
blue if $\nu_2(y) \le 0$ and $\nu_2(y) < \nu_2(x)$

This coloring has some interesting properties, which we’ll establish quickly.

Claim

If $P$ is a red point, then $Q$ and $Q-P$ have the same color.

Proof: This is a good exercise for the reader. Make use of the fact that, if $\nu_2(a) > 0$ and $\nu_2(x) \le 0$, then $\nu_2(x - a) \ge \min(\nu_2(x), \nu_2(a)) = \nu_2(x)$. On the other hand, if $\nu_2(x) > 0$, then $\nu_2(x - a) > 0$ as well.

Claim

If we forget the dissection for a second, and pick any three collinear points in the plane, they cannot all be different colors.

Proof: Let $P_r$, $P_g$, and $P_b$ be three points, colored red, green, and blue, respectively. We must show they can’t be collinear; equivalently, the vectors $P_g - P_r$ and $P_b - P_r$ are not parallel. This is a question about linear independence, so we’d better take a determinant. Let $P_g - P_r = (x_g, y_g)$, and $P_b - P_r = (x_b, y_b)$.

$$ \det M = \det \begin{pmatrix} x_g & x_b \\ y_g & y_b \end{pmatrix} = x_g y_b - x_b y_g $$

To show that $\det M$ is non-zero, we can show that its $2$-adic valuation is nonzero. This might seem harder, but since the only thing we know about these points is their valuations, it’s the only shot we have!

By the previous claim, $P_g - P_r$ is green, and $P_b - P_r$ is blue. From the coloring rules, we then know that $\nu_2(y_b) < \nu_2(x_b)$ and $\nu_2(x_g) \le \nu_2(y_g)$. So $\nu_2(x_g y_b)$ is strictly less than $\nu_2(x_b y_g)$. The third property then tells us that $\nu_2(\det M) = \nu_2(x_g y_b) \le 0$. Therefore, $\det M \ne 0$, and so $P_r$, $P_g$, and $P_b$ cannot be collinear.

Putting it Together

Now we’re ready. Let $n$ be odd, and consider a dissection of the unit square into $n$ triangles of equal area.

Using the coloring rule above, we claim we get a Sperner coloring. The time we invested in the previous section pays off handsomely, as both required properties become almost trivial.

No face of the square, nor of a triangle in the dissection, can contain vertices of all three colors, because no line anywhere in the plane can have vertices of all three colors!
Again, we use the fact that there are no trichromatic lines. Consider the corners of the square and their colors:

$$ (0, 0) \textrm{ is red} \qquad (1, 0) \textrm{ is green} \qquad (0, 1) \textrm{ is blue} \qquad (1, 1) \textrm{ is green} $$

The only segments that could be purple lie between $(0, 0)$ and $(0, 1)$. And because one endpoint is red, and the other blue, there must be an odd number of purple segments. (Remember our exercise about deleting vertices on faces…?)

Therefore, this coloring is a Sperner coloring, and so somewhere, there is a trichromatic triangle. To finish the proof, we must show that this triangle can’t have area $1/n$.

Let’s revisit our second claim. Strong as it is, we can squeeze just a tiny bit more out of it. Using the same notation as before, basic coordinate geometry tells us that the area of the triangle formed by $P_r$, $P_g$, and $P_b$ is $K = \frac{1}{2} \det M$. By showing that $\det M \ne 0$, we showed that this triangle was not degenerate, i.e., the three points were not collinear. But we actually showed a little more than that; we showed that $\nu_2(\det M) \le 0$. Therefore, if a trichromatic triangle has area $K$, then $\nu_2(K) = \nu_2(\frac{1}{2} \det M) \le -1$.

But because $n$ is odd, $\nu_2(1/n) = 0$. Contradiction.

Appendix

We promised a proof that a valuation on $\QQ$ can be extended to a valuation on $\RR$. More generally, for a field extension $L/K$, a valuation $\nu$ on $K$ can be extended to a valuation on $L$.

Unfortunately, I’ve got diagrams to finish making before Monday ends, so I’ll amend this later ;)

Doubling Loaves, in Two Ways

2018-09-17T00:00:00-07:00

This one comes from a puzzle that a coworker gave me.

There’s a miracle in the Gospels in which Jesus feeds a crowd of 5000, using only a few loaves of bread and some fish. As he breaks the food apart and hands it out, it does not diminish, and eventually the entire crowd is fed.

In our puzzle, we have a prophet who is not quite so saintly. He starts with a single loaf of bread, and has to feed a crowd of $N$ people. But he also wants to be able to feed himself. Furthermore, our guy’s got a bit of a gambling problem: at each step, he flips a fair, unbiased coin.

If it comes up heads, he duplicates one of his loaves.
Otherwise, he hands out a loaf of bread to someone in the crowd.

He only stops when he runs out of bread, or he creates $N$ new loaves (at which point, the entire crowd can be fed, and he can eat the original loaf).

The question is: what is the probability that he can successfully feed everyone?

For small values of $N$, we can manage this by hand:

$N = 0$: He can always feed himself, so the probability of success, $p$, is $1$.
$N = 1$: Everything depends on the first coin toss. If it is heads, then he has two loaves, and can feed himself and someone else. Otherwise, he’s just handed away his only loaf, and the game ends. So $p = 1/2$.
$N = 2$: As before, he must flip heads on the first toss. Consider the second toss. If it is heads, then he has created two loaves, plus the original, and so everyone can be fed. Otherwise, he hands out a loaf, leaving him with one loaf, and two people to feed. This reduces to the previous case, in which there is a $1/2$ chance of success. So if he makes the first toss, he has a $3/4$ chance of success, giving us $p = 3/8$ for the whole process.

Clearly, this gets tedious quickly. We need a more systematic approach.

First Approach

One approach is to rephrase this as a problem about lattice walks.

Let the point $(x, y)$ represent the state where we have created $x$ new loaves (not counting the original loaf), and fed $y$ people (not counting himself). Then duplicating a loaf is a step to the right, and handing out a loaf is a step upward. On this grid, the prophet starts at $(0, 0)$, and randomly chooses to walk right or up. He wins if he touches the line $x = N$, and loses if he crosses the diagonal $x = y$. (Touching the diagonal is okay, at that point, he still has one loaf left.)

Let $p(a, b)$ be the probability that the prophet reaches the point $(a, b)$ on his random walk. It’s only possible to reach the region $0 \le b \le a$, so we will set $p(a, b) = 0$ outside this range. Since it’s our starting point, $p(0, 0)$ is clearly $1$. For all other points, we can state our probability recursively; if the prophet gets to the point $(a, b)$, then he must have come from $(a-1, b)$ or $(a, b-1)$. From either of those points, he has a $1/2$ chance of getting to $(a, b)$, so $p(a, b) = \frac{1}{2}(p(a-1, b) + p(a, b-1))$.

If you write these numbers out in a grid, you’ll quickly get tired of seeing powers of $2$ in the denominator:


$0$	$0$	$0$	$0$	$7/128$	$21/256$	$15/256$
$0$	$0$	$0$	$5/64$	$7/64$	$7/64$	$3/32$
$0$	$0$	$1/8$	$5/32$	$9/64$	$7/64$	$5/64$
$0$	$1/4$	$1/4$	$3/16$	$1/8$	$5/64$	$3/64$
$1$	$1/2$	$1/4$	$1/8$	$1/16$	$1/32$	$1/64$

So we’ll define an auxilary function $q(a, b) = 2^{a+b} p(a, b)$, leaving us with nice clean integers. The recurrence relation for $q$ is:

$$ q(0, 0) = 1 \qquad q(a, b) = 0 \textrm{ if } b > a \qquad q(a, b) = q(a-1, b) + q(a, b-1) \textrm{ otherwise} $$


$0$	$0$	$0$	$0$	$14$	$42$	$90$
$0$	$0$	$0$	$5$	$14$	$28$	$48$
$0$	$0$	$2$	$5$	$9$	$14$	$20$
$0$	$1$	$2$	$3$	$4$	$5$	$6$
$1$	$1$	$1$	$1$	$1$	$1$	$1$

This table, and the recurrence relation, feel somewhat like Pascal’s triangle, with the apex in the lower left, and each counter-diagonal forming a row.


$1$	$5$	$15$	$35$	$70$	$126$	$210$
$1$	$4$	$10$	$20$	$35$	$56$	$84$
$1$	$3$	$6$	$10$	$15$	$21$	$28$
$1$	$2$	$3$	$4$	$5$	$6$	$7$
$1$	$1$	$1$	$1$	$1$	$1$	$1$

But since we’re forcing the region above the diagonal to be $0$, this causes a defect. Subtracting the relevant parts of our grid from Pascal’s triangle, we get:


$-$	$-$	$-$	$-$	$56$	$84$	$120$
$-$	$-$	$-$	$15$	$21$	$28$	$36$
$-$	$-$	$4$	$5$	$6$	$7$	$8$
$-$	$1$	$1$	$1$	$1$	$1$	$1$
$0$	$0$	$0$	$0$	$0$	$0$	$0$

This is just a piece of Pascal’s triangle, shifted by one! It appears that $q(a, b) = \binom{a+b}{b} - \binom{a+b}{b-1}$, a suspicion that is easy to confirm via the recurrence relations.

Now we know $q(a, b)$ in closed form, and thus $p(a, b)$ as well. But we can’t just sum over $p(N, 0)$, $p(N, 1)$, …, $p(N, N)$, because these don’t correspond to disjoint events. In fact, to reach $(N, N)$ safely, the prophet must have passed through the point $(N, N-1)$ first!

We’ll have to do something a little silly. Consider the counter-diagonal from $(N, N)$ to $(2N, 0)$, and note that a path can touch at most one of those points. Furthermore, if there is a path of length $2N$, and it touched the line $x = N$, then it must end on one of these points.

So we’ll change the rules of the game a little bit. The prophet still loses if he runs out of bread, but otherwise, he must keep flipping until the coin is flipped $2N$ times. This doesn’t affect the end conditions: after the coin has been flipped $2N$ times, he’s either run out of loaves, or he’s flipped at least $N$ heads. And clearly, this doesn’t affect his chances of success (once $N$ new loaves have been created, it is impossible to fail). But it does change where the “finish line” for our walk is. The prophet succeeds exactly when his walk ends on the counter-diagonal from $(N, N)$ to $(2N, 0)$!

This telescopes easily into a clean expression:

$$ \sum_{k = 0}^N p(2N-k, k) = \sum_{k = 0}^N \frac{1}{2^{2N}} \left( \binom{2N}{k} - \binom{2N}{k-1} \right) = \frac{1}{2^{2N}} \binom{2N}{N} $$

Second Approach

As suggested by the title of this post, I’ll also describe a second solution to this puzzle, using generating functions. Sure, this will involve some slightly heavier machinery than the previous approach, which was rather elementary, but there is a certain elegance to it.

Let $a_n$ be the probability that the prophet ended up with exactly $n$ loaves, including the original loaf. The only way to end up with exactly one loaf is to flip tails immediately, so $a_1 = 1/2$.

For $n > 1$, he must flip heads first, giving two loaves. If he ended up with exactly $n$ loaves total, he must have gotten $k$ from the first loaf, and $n-k$ from the second loaf. Since the loaves act independently, this has probability $\sum_{k=1}^{n-1} a_k a_{n-1}$. Factoring in the fact that he needs to flip heads the first time, we deduce $a_n = \frac{1}{2} \sum_{k=1}^{n-1} a_k a_{n-k}$.

If we take the bold (and intuitive!) step of defining $a_0 = 0$, we can change the bounds on that sum to be $0$ through $n$, which will make our lives easier.

Let $G(x) = a_0 + a_1 x + a_2 x^2 + \cdots$ be the generating function for $a_n$. We can tease out a very nice expression for $G(x)$:

$$ \begin{align*} G(x) &= a_0 + a_1 x + \sum_{n=2}^\infty a_n x^n \\ &= \frac{1}{2} x + \sum_{n=2}^\infty a_n x^n \\ &= \frac{1}{2} x + \sum_{n=2}^\infty \left( \frac{1}{2} \sum_{k=0}^n a_k a_{n-k} \right) x^n \\ G(x) &= \frac{1}{2} x + \frac{1}{2} \sum_{n=2}^\infty \sum_{k=0}^n a_k a_{n-k} x^n \\ 2 G(x) &= x + \sum_{n=2}^\infty \sum_{k=0}^n a_k a_{n-k} x^n \end{align*} $$

Since either $a_k$ or $a_{n-k}$ will be $0$ for $n < 2$, we can lower the bound on our sum to $k = 0$ without changing anything. After that, set $\ell = n - k$:

$$ \begin{align*} 2 G(x) &= x + \sum_{n=0}^\infty \sum_{k=0}^n a_k a_{n-k} x^n \\ &= x + \sum_{k=0}^\infty \sum_{\ell=0}^\infty a_k a_\ell x^{k+\ell} \\ &= x + \left( \sum_{k=0}^\infty a_k x^k \right) \left( \sum_{\ell=0}^\infty a_\ell x^\ell \right) \\ 2G(x) &= x + G(x)^2 \end{align*} $$

At first blush it looks hard to isolate $G(x)$, but once we see this as the quadratic it is, we can apply the handy-dandy quadratic formula:

$$ G(x) = \frac{2 \pm \sqrt{4 - 4x}}{2} = 1 \pm \sqrt{1 - x} $$

Since $G(0) = a_0 = 0$, we know we should take the negative square root.

We could at this point find a closed-form expression for $a_n$, but that’s not what we’re going to do. Remember that we’re not interested in the probability of getting exactly $N+1$ loaves, but the probability of getting $N+1$ or more loaves. In other words, we’d like to know $b_{N+1}$, where $b_n = 1 - \sum_{k=0}^{n-1} a_k$. [Note: we’re not certain that this is the same as $\sum_{k=n}^\infty a_k$; since we haven’t ruled out the possibility that this process goes on forever with positive probability. It’s possible that the $a_k$ sum to $<1$.]

Let $F(x) = b_0 + b_1 x + b_2 x^2 + \cdots$ be the generating function for the $b_n$. We’ll set $b_0 = 1$, since $1$ minus the empty sum should be $1$. If you’re familiar with generating functions, you’ll know that $F(x) = \frac{1}{1 - x} - \frac{x}{1 - x} G(x)$, but for the newcomers, we’ll do it in slow motion:

To sum the terms of the series, we’ll multiply by the geometric series $\frac{1}{1-x} = 1 + x + x^2 + \cdots$. The coefficient for the $x^n$ term will then be $a_0 + \cdots a_n$.

$$ \frac{G(x)}{1 - x} = \sum_{n=0}^\infty \sum_{k=0}^n a_k x^n $$

Multiplying by $x$ knocks our exponents up by one, equivalently, moves our coefficients down by one.

$$ \frac{x}{1- x} G(x) = \sum_{n=0}^\infty \sum_{k=0}^{n} a_k x^{n+1} = \sum_{n=1}^\infty \sum_{k=0}^{n-1} a_k x^n $$

Lastly, we want to subtract every coefficient (except the first) from $1$. Fortunately, we already know what $1 + x + x^2 + \cdots$ is:

$$ \frac{1}{1 - x} - \frac{x}{1 - x}G(x) = \sum_{n=1}^\infty \left( 1 - \sum_{k=0}^{n-1} a_k \right) x^n $$

The coefficients on the right are exactly $b_n$, so we get $F(x) = \frac{1}{1 - x} - \frac{x}{1 - x} G(x)$, as promised. This cleans up to:

$$ F(x) = 1 + \frac{x}{\sqrt{1 - x}} $$

Using the generalized binomal theorem, we can arrive at a closed form for $b_n$.

$$ \begin{align*} F(x) &= 1 + x (1 - x)^{-1/2} \\ &= 1 + x \sum_{n=0}^\infty \frac{(-1/2)(-3/2)\cdots(-1/2 - (n-1))}{n!} 1^{-1/2 - n} (-x)^n \\ &= 1 + x \sum_{n=0}^\infty \frac{(1/2)(3/2)\cdots((2n-1)/2)}{n!} x^n \\ &= 1 + x \sum_{n=0}^\infty \frac{(1/2) \cdot 1 \cdot (3/2) \cdot 2 \cdots ((2n-1)/2) \cdot n}{n! \cdot n!} x^n \\ &= 1 + x \sum_{n=0}^\infty \frac{1}{2^{2n}} \frac{1 \cdot 2 \cdot 3 \cdot 4 \cdots (2n-1) \cdot 2n}{n! \cdot n!} x^n \\ &= 1 + x \sum_{n=0}^\infty \frac{1}{2^{2n}} \binom{2n}{n} x^n \\ &= 1 + \sum_{n=0}^\infty \frac{1}{2^{2n}} \binom{2n}{n} x^{n+1} \end{align*} $$

So, the probability of getting $N+1$ or more loaves is $b_{N+1} = \frac{1}{2^{2N}} \binom{2N}{N}$, which matches the answer we got before. Thank goodness!

The Multiplicative Structure of $ \Bbb Z / n \Bbb Z $

2018-09-10T00:00:00-07:00

$\newcommand{\ZZ}{\Bbb Z} \newcommand{\ZZn}[1]{\ZZ / {#1} \ZZ}$

One of the most familiar rings is the ring of integers modulo $n$, often denoted $\ZZn{n}$. Like all rings, it has an additive structure and a multiplicative one. The additive structure is straightforward: $\ZZn{n}$ is cyclic, generated by $1$. In fact, every integer $a$ coprime to $n$ is a generator for this group, giving a total of $\phi(n)$ generators. The multiplicative structure, on the other hand, is far less apparent.

Not all elements of $\ZZn{n}$ can participate in the multiplicative group, because not all of them have inverses. For example, 4 has no inverse in $\ZZn{6}$; there’s no integer $a$ such that $4a \equiv 1 \pmod 6$. Elements that do have inverses are called units, and we’ll denote the group of units in $\ZZn{n}$ as $U_n$.

Since an element $a \in \ZZn{n}$ is a unit iff $a$ and $n$ are coprime, there are $\phi(n)$ units, where $\phi$ is the totient function. But the size of the group alone doesn’t nail down the group structure.

For example:

$U_5 = \{ 1, 2, 3, 4 \}$:
- generated by $2$: $2^0 = 1$, $2^1 = 2$, $2^2 = 4$, $2^3 = 8 = 3$
- also generated by $3$: $3^0 = 1$, $3^1 = 3$, $3^2 = 4$, $3^3 = 2$
- this group is isomorphic to $\ZZn{4}$
$U_8 = \{ 1, 3, 5, 7 \}$
- every element squares to $1$
- this group is isomorphic to $\ZZn{2} \times \ZZn{2}$

Is there a way to find the structure of $U_n$?

A versatile theorem from ring theory is the Chinese Remainder Theorem, which (as a special case) says that, for $m$, $n$ coprime, the rings $\ZZn{m} \times \ZZn{n}$ and $\ZZn{mn}$ are isomorphic. This induces an isomorphism on the units as well (can you see why?).

This means that in order to understand the structure of $U_n$, we only need to understand $U_{p^k}$ for all primes $p$ and positive integers $k$.

We claim that $U_{p^k}$ is always cyclic for odd $p$, but for $p = 2$, it’s only cyclic when $k = 1, 2$.

Let $p$ be an odd prime.

Of course, we start with the simplest case, $U_p$. Because $\ZZn{p}$ is a field, its multiplicative group is cyclic (see here for a slick proof).

It is tempting to use this as the base case for an induction on $k$, but for technical reasons, we need to start our induction at $k = 2$.

Technical Reasons: Assuming that our claim is true and that $U_{p^k}$ is indeed cyclic, let’s consider the number of generators as $k$ increases. Since the number of generators in a cyclic group of size $m$ is $\phi(m)$, we have:

Group	# of elements	# of generators
$U_p$	$p-1$	$\phi(p-1)$
$U_{p^2}$	$p(p-1)$	$(p-1) \phi(p-1)$
$U_{p^3}$	$p^2(p-1)$	$p (p-1) \phi(p-1)$
$U_{p^4}$	$p^3(p-1)$	$p^2 (p-1) \phi(p-1)$
$U_{p^k}$	$p^{k-1} (p-1)$	$ p^{k-2}(p-1) \phi(p-1)$

From the evidence above, we suspect that if $g$ is a generator mod $p$, only $p-1$ of the $p$ possible lifts to $U_{p^2}$ will be generators. This suggests there is one “bad” lift for each generator.

Fortunately, we can find this bad lift explicitly: we claim it’s $g^p$.

We know that $g^p \equiv g \pmod{p}$, so $g^p$ really is a lift of $g$. And since $g^{p(p-1)} \equiv 1 \pmod{p^2}$, we know that the order of $g^p$ is at most $p-1$ – too small to generate all of $U_{p^2}$.

If our hunch is true, then this is the only bad lift of $g$, and so, guided by our suspicions, make the following claim:

Claim: if $a$ is not a multiple of $p$, then $g^p + ap$ is a generator of $U_{p^2}$.

Proof: Since $g^p + ap \equiv g \pmod{p}$, it has order $p-1$ in $U_p$. This means its order in $U_{p^2}$ must be a multiple of $p-1$. Its order must also divide the size of the group, narrowing the possibilities to $p-1$ and $p(p-1)$. Thus, to prove that $g^p + ap$ is a generator, we just have to show it doesn’t have order $p-1$.

Assume it does, and expand $1 \equiv (g^p + ap)^{p-1}$ by the binomial theorem:

$$ 1 \equiv (g^p + ap)^{p-1} \equiv \sum_{i = 0}^{p - 1} \binom{p - 1}{i} (g^p)^{p-1-i} (ap)^i \pmod{p^2} $$

The terms for $i \ge 2$ have two or more factors of $p$ in them, so they get killed, leaving us with

$$ 1 \equiv g^{p(p-1)} + (p-1) g^{p(p-2)} ap \pmod{p^2} $$

Recalling that $g^{p(p-1)} \equiv 1$, we get:

$$ 0 \equiv (p-1) g^{p(p-2)} ap \pmod{p^2} $$

For this to be true, we would need to find two factors of $p$ in $(p-1) g^{p(p-2)} ap$. There’s clearly one factor, from the $p$, but none of the other terms can provide the second. By contradiction, $g^p + ap$ must have order $p(p-1)$ and thus generate $U_{p^2}$.

Note that there are $p-1$ choices of $a$, and so we’ve confirmed our suspicion that every generator mod $p$ has $p-1$ good lifts and one bad lift mod $p^2$.

Now we’re ready for the inductive step.

Let $k \ge 2$ and $g$ be a generator of $U_{p^k}$. We claim $g$ is also a generator for $U_{p^{k+1}}$.

Since it’s a generator, it has order $p^{k-1} (p-1)$ in $U_{p^k}$, and so its order in $U_{p^{k+1}}$ must be a multiple of that. This means it is either $p^{k-1} (p - 1)$ or $p^k (p - 1)$. We just need to show it isn’t the former.

Let’s try to do a binomial expansion like before. We know that $g^{p^{k-1} (p-1)} = (g^{p^{k-2} (p-1)})^p$, and that $g^{p^{k-2} (p-1)} = a p^{k-1} + b$ for some $b < p^{k-1}$. By Euler’s theorem, $g^{p^{k-2} (p-1)} \equiv 1 \pmod{p^{k-1}}$ (consider the size of $U_{p^{k-1}}$). This means that $b = 1$. Furthermore, because $g$ is a generator in $U_{p^k}$, we know that $p \nmid a$. So $g^{p^{k-2} (p-1)} = 1 + a p^{k-1}$ where $p$ and $a$ are coprime.

Now we can do our binomial business:

$$ g^{p^{k-1} (p-1)} = \sum_{i = 0}^p \binom{p}{i} (a p^{k-1})^i $$

How many factors of $p$ are in each term?

$i = 0, 1$: don’t care.
$i \ge 2, i \ne p$: $1$ from the binomial, and at least $2(k-1)$ from the power, for a total of at least $2k-1$. Since $k \ge 2$, we have $2k-1 \ge k+1$, and these terms vanish mod $p^{k+1}$.
$i = p$: we lose the factor from the binomial, so we have exactly $p(k-1)$ factors of $p$. Since $p$ is odd, $p \ge 3$, and for $k \ge 2$, $3k-3 \ge k+1$, and this term also vanishes.

So we are left with $g^{p^{k-1} (p-1)} \equiv 1 + a p^k \pmod{p^{k+1}}$, and since $a$ isn’t a multiple of $p$, this shows that $g$ does not have order $p^{k-1} (p - 1)$. Thus, $g$ must be a generator mod $p^{k+1}$.

By induction, this shows that $U_{p^k}$ is cyclic for all $k$.

Note that the above argument almost works for $p = 2$; the base case goes through, and the inductive step fails only when we look at the last term: when $p = 2$, we can’t conclude $p(k-1) \ge k+1$. But this only fails at $k = 2$, it actually continues to work for $k \ge 3$. So if there were generators for $U_8$, then they would lift to generators for $U_{16}$, and those to $U_{32}$, and so on. But we just barely fail the jump from $k = 2$ to $k = 3$, and this is why $p = 2$ is different from its odd peers.

Still though, we can modify our argument slightly to derive the structure of $U_{2^k}$ for $k \ge 3$. Since $U_8$ is non-cyclic, there is no chance for any higher $U_{2^k}$ to be cyclic. But we will show they’re pretty darn close.

We will call $g$ a “near-generator” of $U_{2^k}$ if $g$ generates half the group, and multiplying by $-1$ gives the other half. Our base case is $U_8 = \{ 1, 3, 5, 7 \}$, for which $3$ and $5$ are near-generators.

Say that $g$ is a near-generator of $U_{2^k}$. We claim that it is also a near-generator of $U_{2^{k+1}}$.

As before, we show the possible orders for $g$, and eliminate all but one possibility. Since $g$ is a near-generator mod $2^k$, it has order $2^{k-2}$ in $U_{2^k}$. Thus its order in $U_{2^{k+1}}$ must be a multiple of $2^{k-2}$. This leaves possibilities $2^{k-2}$, $2^{k-1}$, and $2^k$. It cannot be $2^k$, because that would imply that $U_{2^{k+1}}$ is cyclic, and that is impossible. So it remains to eliminate $2^{k-2}$.

A similar argument to the odd $p$ case can be used to tell us that $g^{2^{k-3}} = 1 + a 2^{k-1}$ for some odd $a$. Then:

$$ g^{2^{k-2}} = (g^{2^{k-3}})^2 = (1 + a 2^{k-1})^2 = 1 + 2 \cdot a 2^{k-1} + a^2 2^{2k-2} $$

Taken mod $2^{k+1}$, this tells us that $g^{2^{k-2}} \equiv 1 + a 2^k \pmod{2^{k+1}}$, eliminating the possibility of $2^{k-2}$ as the order. Thus, $g$ must have order $2^{k-1}$ in $U_{2^{k+1}}$.

To show that $-1$ gives the rest of the group, it suffices to show that $-1$ is not in the half generated by $g$. But if $g^r \equiv -1 \pmod{2^{k+1}}$, then surely this would also be true mod $2^k$, and so this situation does not arise.

Therefore, for $k \ge 3$, $U_{2^k} = \{ \pm g^r \mid r = 0, 1, \ldots 2^{k-2} - 1 \} \cong \ZZn{2} \times \ZZn{2^{k-2}}$. The cases of $U_2$ and $U_4$ are easily computed to be the trivial group and $\ZZn{2}$, respectively.

Now we are finally ready to understand $U_n$ in general: factor $n$ into primes, and apply the results we learned above.

Specifically, we can answer the question of exactly when $U_n$ is cyclic.

If $n$ has two odd prime factors $p$ and $q$, then $n = p^k q^\ell n'$ with $n'$ coprime to $p$ and $q$. So $U_n \cong U_{p^k} \times U_{q^\ell} \times U_{n'}$. The first two factors in this product have even size, i.e., their sizes have a common factor. This makes it impossible for $U_{p^k} \times U_{q^\ell}$ to be cyclic, and therefore, $U_n$ can’t be cyclic either.

If $8$ divides $n$, then $n = 2^k m$ for some odd $m$ and some $k \ge 3$, and $U_n = U_{2^k} \times U_m$. But $U_{2^k}$ is not cyclic, and this also disqualifies $n$.

So we are left with $n = 1$, $2$, $4$, $p^k$, $2p^k$, and $4p^k$. The first three can be checked by hand; they’re all cyclic. We showed earlier that $U_{p^k}$ is cyclic, and the Chinese Remainder Theorem tells us that $U_{2p^k} \cong U_2 \times U_{p^k}$ is too (note that $U_2$ is the trivial group). But $U_{4p^k} \cong U_4 \times U_{p^k}$, and both groups have even size, and so $U_{4p^k}$ is not cyclic.

To summarize:

$U_n$ is cyclic exactly when $n = 1$, $2$, $4$, $p^k$ or $2p^k$
$U_{p^k} \cong \ZZn{p^{k-1} (p-1)}$ for odd $p$
$U_{2^k} \cong \ZZn{2} \times \ZZn{2^{k-2}}$ for $k \ge 3$
lifting a generator always produces another generator, except potentially from $p$ to $p^2$ (but the “bad lift” is known explicitly)


\(N\)	\(1\)	\(2\)	\(3\)	\(4\)	\(5\)	\(6\)	\(7\)	\(8\)	\(9\)	\(10\)
\(B_N\)	\(2\)	\(10\)	\(44\)	\(178\)	\(716\)	\(2866\)	\(11468\)	\(45874\)	\(183500\)	\(734002\)


Hats	\(\B\B\B\)	\(\W\B\B\)	\(\B\W\B\)	\(\W\W\B\)	\(\B\B\W\)	\(\W\B\W\)	\(\B\W\W\)	\(\W\W\W\)
Choice	\(1\)	\(2\)	\(1\)	\(1\)	\(3\)	\(2\)	\(3\)	\(1\)


Hats	\((\B\B\B)\B\)	\((\W\W\W)\B\)	\((\B\B\B)\W\)	\((\W\W\W)\W\)
Choice	\(1\)	\(1\)	\(4\)	\(1\)


	\(1\)	\(2\)	\(3\)	\(4\)	\(5\)	\(6\)	\(7\)	\(8\)	\(9\)	\(10\)
\(1\)	\(2\)	\(4\)	\(8\)	\(16\)	\(32\)	\(64\)	\(128\)	\(256\)	\(512\)	\(1024\)
\(2\)	\(4\)	\(10\)	\(20\)	\(40\)	\(80\)	\(160\)	\(320\)	\(640\)	\(1280\)	\(2560\)
\(3\)	\(8\)	\(20\)	\(44\)	\(88\)	\(176\)	\(352\)	\(704\)	\(1408\)	\(2816\)	\(5632\)
\(4\)	\(16\)	\(40\)	\(88\)	\(178\)	\(356\)	\(712\)	\(1424\)	\(2848\)	\(5696\)	\(11392\)
\(5\)	\(32\)	\(80\)	\(176\)	\(356\)	\(716\)	\(1432\)	\(2864\)	\(5728\)	\(11456\)	\(22912\)
\(6\)	\(64\)	\(160\)	\(352\)	\(712\)	\(1432\)	\(2866\)	\(5732\)	\(11464\)	\(22928\)	\(45856\)
\(7\)	\(128\)	\(320\)	\(704\)	\(1424\)	\(2864\)	\(5732\)	\(11468\)	\(22936\)	\(45872\)	\(91744\)
\(8\)	\(256\)	\(640\)	\(1408\)	\(2848\)	\(5728\)	\(11464\)	\(22936\)	\(45874\)	\(91748\)	\(183496\)
\(9\)	\(512\)	\(1280\)	\(2816\)	\(5696\)	\(11456\)	\(22928\)	\(45872\)	\(91748\)	\(183500\)	\(367000\)
\(10\)	\(1024\)	\(2560\)	\(5632\)	\(11392\)	\(22912\)	\(45856\)	\(91744\)	\(183496\)	\(367000\)	\(734002\)


\(0\)	\(0\)	\(0\)	\(0\)	\(7/128\)	\(21/256\)	\(15/256\)
\(0\)	\(0\)	\(0\)	\(5/64\)	\(7/64\)	\(7/64\)	\(3/32\)
\(0\)	\(0\)	\(1/8\)	\(5/32\)	\(9/64\)	\(7/64\)	\(5/64\)
\(0\)	\(1/4\)	\(1/4\)	\(3/16\)	\(1/8\)	\(5/64\)	\(3/64\)
\(1\)	\(1/2\)	\(1/4\)	\(1/8\)	\(1/16\)	\(1/32\)	\(1/64\)


\(1\)	\(5\)	\(15\)	\(35\)	\(70\)	\(126\)	\(210\)
\(1\)	\(4\)	\(10\)	\(20\)	\(35\)	\(56\)	\(84\)
\(1\)	\(3\)	\(6\)	\(10\)	\(15\)	\(21\)	\(28\)
\(1\)	\(2\)	\(3\)	\(4\)	\(5\)	\(6\)	\(7\)
\(1\)	\(1\)	\(1\)	\(1\)	\(1\)	\(1\)	\(1\)


\(-\)	\(-\)	\(-\)	\(-\)	\(56\)	\(84\)	\(120\)
\(-\)	\(-\)	\(-\)	\(15\)	\(21\)	\(28\)	\(36\)
\(-\)	\(-\)	\(4\)	\(5\)	\(6\)	\(7\)	\(8\)
\(-\)	\(1\)	\(1\)	\(1\)	\(1\)	\(1\)	\(1\)
\(0\)	\(0\)	\(0\)	\(0\)	\(0\)	\(0\)	\(0\)

Group	# of elements	# of generators
\(U_p\)	\(p-1\)	\(\phi(p-1)\)
\(U_{p^2}\)	\(p(p-1)\)	\((p-1) \phi(p-1)\)
\(U_{p^3}\)	\(p^2(p-1)\)	\(p (p-1) \phi(p-1)\)
\(U_{p^4}\)	\(p^3(p-1)\)	\(p^2 (p-1) \phi(p-1)\)
\(U_{p^k}\)	\(p^{k-1} (p-1)\)	\( p^{k-2}(p-1) \phi(p-1)\)

Math Mondays

Wirefly Hive Problem

Small \(L\)

Recurrence Relation

Random Walk

Counting Paths

Simpler Recurrence

Conclusion?

Circular Prison of Unknown Size

Upper Bound

Solution by Flipping Coins

Solution by Linear Algebra

A Cooperative Hat Game

Simplest Strategy

First-White Strategy

Finite Strategies

Stronger Together

Observations

Upper Bounds, Infinite

Upper Bounds, Finite

Finite Strategies, Part II

Final Thoughts

The Dehn Invariant, or, Tangrams In Space

The Mathematical Hydra

Safes and Keys

Ax-Grothendieck Theorem

Wedderburn's Little Theorem

Sylow Theorems

Lemma

Sylow 1

Sylow 2

Sylow 3

Applications

The Heawood Number

Euler Characteristic

The Heawood Number

Conclusions

Linearity of Expectation

Gumballs

Number of Fixed Points

Number of Cycles

Buffon’s Needle

Expected Density of Pigeons

Cauchy Residue Theorem

Residues

Winding Number

Monsky's Theorem

Sperner’s Lemma

\(2\)-adic valuations

Coloring The Plane

Putting it Together

Appendix

Doubling Loaves, in Two Ways

First Approach

Second Approach

The Multiplicative Structure of \( \Bbb Z / n \Bbb Z \)