Fermat’s theorem on sums of two squares says that an odd prime can be written as
if and only if
. The proofs of Heath-Brown and Zagier are striking because they prove this theorem by counting fixed points of involutions. The guiding principle is simple: when an involution acts on a finite set, all non-fixed points pair off, so the parity of the set is controlled by the fixed points. Heath-Brown found his proof while studying Liouville’s papers on identities for parity functions, and he applied this idea to the set of triples
satisfying
. Zagier later compressed the same argument into an extremely short proof.
Before giving that proof, we first prove the standard preliminary fact that is a square modulo
exactly when
. We prove it in the same spirit, using a finite group action and fixed-point counting.
Let be prime and let
. Let
be the group generated by the two commuting involutions
and
; thus
has four elements: the identity,
,
, and
. By Burnside’s lemma,
. The identity has
fixed points;
has none, since
would force
; and
has exactly the two fixed points
. Finally, the fixed points of
are precisely the solutions of
in
. Since
is a field, this equation has either
or
solutions: if
is one solution, then
is the other, and they are distinct because
. Therefore Burnside gives
. The left side is divisible by
, so the right side is. If
, then
, forcing
; if
, then
, forcing
. Hence
is solvable exactly when
. This is the first instance of the fixed-point philosophy behind the Heath-Brown–Zagier proof: instead of constructing the desired object directly, one embeds it in a finite symmetry and forces it to appear by counting fixed points.
Heath-Brown’s Proof
Assume , say
. Heath-Brown considers the finite set
Define three involutions preserving the equation :
Let and
. Since
would give
, impossible, every element of
has either
or
. The map
changes
to
, so
is the disjoint union of
and
. Hence
. Similarly,
is the disjoint union of
and
. The only boundary case would be
, but then
impossible since
is prime. Thus every point lies either in
or outside it, and applying
interchanges these two parts. Therefore
, so
.
Now maps
to itself: if
, then
, and also
because this is just
. Since
, its orbits on
have size
or
. A fixed point of
satisfies
, hence
. Then
. Since
is prime and
, we get
and
. Thus
has exactly one fixed point on
, namely
. Hence
is odd, and therefore
is odd.
Finally, maps
to itself. Since
is an involution on the odd finite set
, it has a fixed point. For such a point,
, and therefore
Thus is a sum of two squares. This is Heath-Brown’s proof: two decompositions of the same finite set show that
is odd, and the final involution
then forces the required fixed point.
Zagier’s Proof
Zagier’s proof compresses the same fixed-point idea into a single involution. Again let , and consider the finite set
Define by
The boundary cases cannot occur: if , then
; if
, then
. Hence every element of
lies in exactly one case. The formulas preserve
and keep the coordinates positive. Moreover,
is an involution: the first and third cases undo each other, while in the middle case the image again lies in the middle case and applying the same formula twice returns
.
The fixed points are immediate. The first case would force , and the third would force
, so a fixed point must lie in the middle case. There
, so
. Hence
. Since
is prime, this forces
and
. Thus
has exactly one fixed point, namely
. Therefore
is odd. Finally, the swap
is also an involution of
; since
has odd size, it has a fixed point. This fixed point has
, and so
. Thus
is a sum of two squares.
Zagier’s argument proves existence but does not immediately explain how to find the representation. However, by using the two involutions together one obtains a constructive procedure.
We know that has exactly one fixed point, namely
. Starting from this point, alternately apply
and
:
The set is finite, and both
and
are bijections. Hence this process must eventually repeat; moreover, because every step can be reversed, there is no initial tail before the repetition. Thus the points form a cycle. The cycle starts at the unique fixed point of
and leaves it by applying
. When the cycle returns to
, it must return by another application of
; otherwise we would have found a second fixed point of
. Therefore the cycle has a mirror symmetry. At the point opposite
in this cycle, the map
must fix the point. But being fixed by
means exactly that
. Hence this opposite point gives
. So by repeatedly applying the two explicit maps
and
, starting from
, one actually reaches a triple with
, and this triple gives the desired representation of
as a sum of two squares.
Reduction of Indefinite Quadratic Forms
Zagier’s proof can be read as a reduction argument for indefinite quadratic forms, compressed into a single involution. Let be prime, and consider the finite set
. To a triple
we associate the indefinite binary quadratic form
Its discriminant is
Thus the triples in are not arbitrary: they are a convenient normalization of certain indefinite forms of discriminant
, with the coefficient of
negative, the coefficient of
positive, and the middle coefficient
positive. From this point of view, Zagier’s map is a reduction move written directly in the coordinates
.
The final symmetry is simple. Define If
were already known to be odd, then the involution
would have a fixed point. Such a fixed point would satisfy
, and therefore
. So the real task is to prove that
is odd.
The most natural symmetry is to flip the sign of the middle coefficient, replacing by
. This preserves the discriminant, but it leaves the normalized region
. So the idea is: flip
, then reduce it back to a positive representative by changing the middle coefficient by a multiple of one of the outer coefficients, while adjusting the remaining coordinate so that the discriminant stays equal to
In the most natural case, after flipping to
, we add
to make the new middle coefficient positive. This gives
. To keep the discriminant unchanged, the new last coordinate must be
, because
. Thus the central reduction move is
. This stays inside
precisely when the two new coordinates
and
are positive, that is, when
So this explains the middle branch of Zagier’s map.
If , this central move fails because
. In that case we reduce in the other direction, keeping
fixed but replacing
by the positive representative
Again the last coordinate is forced by the discriminant:
Thus the right-hand branch is
If , the central move fails for the other reason: the forced coordinate
is not positive. So we cross to the adjacent representative on the other side, using the old
as the new second coordinate. We then choose the positive middle coefficient obtained by adding
:
. The remaining coordinate is again forced:
This is positive exactly when
. Hence the left-hand branch is
In this reading, Zagier’s proof is a reduction picture in disguise: one tries to flip the middle coefficient, uses reduction to bring the form back into the positive normalized region, obtains an involution with one fixed point, and then the final symmetry forces the symmetric form that gives the two-square representation.