Fermat’s Two-Squares Theorem: Involutions, Indefinite Forms

Fermat’s theorem on sums of two squares says that an odd prime p can be written as p=a^2+b^2 if and only if p\equiv 1\pmod 4 . The proofs of Heath-Brown and Zagier are striking because they prove this theorem by counting fixed points of involutions. The guiding principle is simple: when an involution acts on a finite set, all non-fixed points pair off, so the parity of the set is controlled by the fixed points. Heath-Brown found his proof while studying Liouville’s papers on identities for parity functions, and he applied this idea to the set of triples (x,y,z)\in\mathbb N^3 satisfying x^2+4yz=p . Zagier later compressed the same argument into an extremely short proof.

Before giving that proof, we first prove the standard preliminary fact that -1 is a square modulo p exactly when p\equiv 1\pmod 4 . We prove it in the same spirit, using a finite group action and fixed-point counting.

Let p\ne 2 be prime and let X=\mathbb F_p^\ast . Let G be the group generated by the two commuting involutions x\mapsto -x and x\mapsto 1/x ; thus G has four elements: the identity, x\mapsto -x , x\mapsto 1/x , and g\mapsto -1/x . By Burnside’s lemma, 4|X/G|=\sum_{h\in G}|{\text{Fix}}(h)| . The identity has p-1 fixed points; x\mapsto -x has none, since x=-x would force x=0 ; and x\mapsto 1/x has exactly the two fixed points 1,-1 . Finally, the fixed points of g\mapsto -1/x are precisely the solutions of x^2=-1 in \mathbb F_p^\ast . Since \mathbb F_p is a field, this equation has either 0 or 2 solutions: if \alpha is one solution, then -\alpha is the other, and they are distinct because p\ne 2 . Therefore Burnside gives 4|X/G|=(p-1)+0+2+|{\text{Fix}}(g)|=(p+1)+|{\text{Fix}}(g)| . The left side is divisible by 4 , so the right side is. If p\equiv 3\pmod 4 , then p+1\equiv 0\pmod 4 , forcing |{\text{Fix}}(g)|=0 ; if p\equiv 1\pmod 4 , then p+1\equiv 2\pmod 4 , forcing |{\text{Fix}}(g)|=2 . Hence x^2\equiv -1\pmod p is solvable exactly when p\equiv 1\pmod 4 . This is the first instance of the fixed-point philosophy behind the Heath-Brown–Zagier proof: instead of constructing the desired object directly, one embeds it in a finite symmetry and forces it to appear by counting fixed points.

Heath-Brown’s Proof

Assume p\equiv 1\pmod 4 , say p=4k+1 . Heath-Brown considers the finite set

S=\{(x,y,z)\in\mathbb Z^3:4xy+z^2=p,\ x,y>0\}.

Define three involutions preserving the equation p=4xy+z^2 :

A(x,y,z)=(y,x,-z), ~~ \\ B(x,y,z)=(y,x,z), ~~ \\ C(x,y,z)=(x-y+z,\ y,\ 2y-z).

Let T=\{(x,y,z)\in S:z>0\} and U=\{(x,y,z)\in S:x+z>y\} . Since z=0 would give p=4xy , impossible, every element of S has either z>0 or z<0 . The map A changes z to -z , so S is the disjoint union of T and AT . Hence |S|=2|T| . Similarly, S is the disjoint union of U and AU . The only boundary case would be x+z=y , but then p=4xy+z^2=4x(x+z)+z^2=(2x+z)^2, impossible since p is prime. Thus every point lies either in U or outside it, and applying A(x,y,z)=(y,x,-z) interchanges these two parts. Therefore |S|=2|U| , so |T|=|U| .

Now C maps U to itself: if x+z>y , then x-y+z>0 , and also (x-y+z)+(2y-z)>y because this is just x>0 . Since C^2=1 , its orbits on U have size 1 or 2 . A fixed point of C satisfies (x-y+z,y,2y-z)=(x,y,z) , hence y=z . Then p=4xy+y^2=y(4x+y). . Since p is prime and x,y>0 , we get y=1 and x=(p-1)/4=k . Thus C has exactly one fixed point on U , namely (k,1,1) . Hence |U| is odd, and therefore |T| is odd.

Finally, B(x,y,z)=(y,x,z) maps T to itself. Since B is an involution on the odd finite set T , it has a fixed point. For such a point, x=y , and therefore

p=4x^2+z^2=(2x)^2+z^2.

Thus p is a sum of two squares. This is Heath-Brown’s proof: two decompositions of the same finite set show that T is odd, and the final involution B then forces the required fixed point.

Zagier’s Proof

Zagier’s proof compresses the same fixed-point idea into a single involution. Again let p=4k+1 , and consider the finite set

S=\{(x,y,z)\in\mathbb N^3\mid  x^2+4yz=p\}.

Define T\to S by

T(x,y,z)= \begin{cases} (x+2z,z,y-x-z),& x<y-z,\\ (2y-x,y,x-y+z),& y-z<x<2y,\\ (x-2y,x-y+z,y),& x>2y. \end{cases}

The boundary cases cannot occur: if x=y-z , then p=(y+z)^2 ; if x=2y , then p=4y(y+z) . Hence every element of S lies in exactly one case. The formulas preserve x^2+4yz and keep the coordinates positive. Moreover, T is an involution: the first and third cases undo each other, while in the middle case the image again lies in the middle case and applying the same formula twice returns (x,y,z) .

The fixed points are immediate. The first case would force z=0 , and the third would force y=0 , so a fixed point must lie in the middle case. There 2y-x=x , so x=y . Hence p=x^2+4xz=x(x+4z). . Since p is prime, this forces x=y=1 and z=k . Thus T has exactly one fixed point, namely (1,1,k) . Therefore |S| is odd. Finally, the swap R: (x,y,z)\mapsto (x,z,y) is also an involution of S ; since S has odd size, it has a fixed point. This fixed point has y=z , and so p=x^2+4y^2=x^2+(2y)^2. . Thus p is a sum of two squares.

Zagier’s argument proves existence but does not immediately explain how to find the representation. However, by using the two involutions together one obtains a constructive procedure.

We know that T has exactly one fixed point, namely (1,1,k) . Starting from this point, alternately apply R and T :

(1,1,k)\xrightarrow{R}(1,k,1)\xrightarrow{T}(3,1,k-2)\xrightarrow{R}\cdots .

The set S is finite, and both R and T are bijections. Hence this process must eventually repeat; moreover, because every step can be reversed, there is no initial tail before the repetition. Thus the points form a cycle. The cycle starts at the unique fixed point of T and leaves it by applying R . When the cycle returns to (1,1,k) , it must return by another application of R ; otherwise we would have found a second fixed point of T . Therefore the cycle has a mirror symmetry. At the point opposite (1,1,k) in this cycle, the map R must fix the point. But being fixed by R means exactly that y=z . Hence this opposite point gives p=x^2+4y^2=x^2+(2y)^2. . So by repeatedly applying the two explicit maps R and T , starting from (1,1,k) , one actually reaches a triple with y=z , and this triple gives the desired representation of p as a sum of two squares.

Reduction of Indefinite Quadratic Forms

Zagier’s proof can be read as a reduction argument for indefinite quadratic forms, compressed into a single involution. Let p=4k+1 be prime, and consider the finite set S={(x,y,z)\in\mathbb N^3 \mid x^2+4yz=p}. . To a triple (x,y,z)\in S we associate the indefinite binary quadratic form

-yu^2+xuv+zv^2.

Its discriminant is x^2-4(-y)z=x^2+4yz=p.

Thus the triples in S are not arbitrary: they are a convenient normalization of certain indefinite forms of discriminant p , with the coefficient of u^2 negative, the coefficient of v^2 positive, and the middle coefficient x positive. From this point of view, Zagier’s map is a reduction move written directly in the coordinates (x,y,z) .

The final symmetry is simple. Define R(x,y,z)=(x,z,y). If |S| were already known to be odd, then the involution R would have a fixed point. Such a fixed point would satisfy y=z , and therefore p=x^2+4y^2=x^2+(2y)^2. . So the real task is to prove that |S| is odd.

The most natural symmetry is to flip the sign of the middle coefficient, replacing x by -x . This preserves the discriminant, but it leaves the normalized region x>0 . So the idea is: flip x , then reduce it back to a positive representative by changing the middle coefficient by a multiple of one of the outer coefficients, while adjusting the remaining coordinate so that the discriminant stays equal to p

In the most natural case, after flipping x to -x , we add 2y to make the new middle coefficient positive. This gives x'=2y-x. . To keep the discriminant unchanged, the new last coordinate must be z'=x-y+z , because (2y-x)^2+4y(x-y+z)=x^2+4yz. . Thus the central reduction move is (x,y,z)\mapsto(2y-x,\ y,\ x-y+z). . This stays inside S precisely when the two new coordinates 2y-x and x-y+z are positive, that is, when y-z<x<2y. So this explains the middle branch of Zagier’s map.

If x>2y , this central move fails because 2y-x<0 . In that case we reduce in the other direction, keeping y fixed but replacing x by the positive representative x'=x-2y. Again the last coordinate is forced by the discriminant: z'=\frac{p-(x-2y)^2}{4y}=x-y+z. Thus the right-hand branch is (x,y,z)\mapsto(x-2y,\ x-y+z,\ y).

If x<y-z , the central move fails for the other reason: the forced coordinate x-y+z is not positive. So we cross to the adjacent representative on the other side, using the old z as the new second coordinate. We then choose the positive middle coefficient obtained by adding 2z : x'=x+2z. . The remaining coordinate is again forced: z'=\frac{p-(x+2z)^2}{4z}=y-x-z. This is positive exactly when x<y-z . Hence the left-hand branch is (x,y,z)\mapsto(x+2z,\ z,\ y-x-z).

In this reading, Zagier’s proof is a reduction picture in disguise: one tries to flip the middle coefficient, uses reduction to bring the form back into the positive normalized region, obtains an involution with one fixed point, and then the final symmetry R forces the symmetric form that gives the two-square representation.

Leave a comment