Fermat’s Sum of Two Squares: Reduction Proofs

We prove Fermat’s two-squares theorem for primes. The theorem says that whenever a prime is congruent to one modulo four, it is a sum of two squares.

p\equiv 1\pmod 4\quad\Longrightarrow\quad p=a^2+b^2

The purpose of this exposition is not merely to collect five proofs. It is to explain how the first family of proofs is really organized around one object and one problem. The object is created by a square root of minus one modulo the prime. The problem is to turn that local congruence into an actual element of norm equal to the prime. In elementary number theory, this process appears as descent. In Gaussian integers, it appears as the Euclidean algorithm. In quadratic forms, it appears as reduction of a form of discriminant minus four. In ideal theory, it appears as principalization of an ideal. In geometry of numbers, it appears as finding a short vector in the same ideal viewed as a lattice. These are not five unrelated miracles. They are five coordinate systems for the same underlying phenomenon: a local splitting gives an object above the prime, and reduction forces that object to become principal.

Assume throughout that the prime satisfies the congruence condition.

p\equiv 1\pmod 4

By Euler’s criterion, minus one is a quadratic residue modulo this prime. Hence there exists an integer whose square is congruent to minus one modulo the prime. We choose such an integer and call it t .

t^2\equiv -1\pmod p

Equivalently, there is a positive integer k such that

t^2+1=kp

This equation is the seed of the entire argument. It says that a multiple of the prime is already a sum of two squares.

kp=t^2+1^2

What remains is to remove the extra multiplier. This is the whole problem. The descent proof lowers the multiplier directly. The Gaussian proof lowers it by norm-Euclidean division. The quadratic-form proof lowers it by basis reduction. The ideal proof says that the obstruction to lowering it vanishes because the class number is one. The geometry-of-numbers proof finds, in the same ideal lattice, a short vector whose norm is forced to be exactly the prime.

1. Elementary descent Proof:

The elementary proof begins from a general representation kp=x^2+y^2 . At the start, one may take x=t,\quad y=1 because t^2+1=kp.

The descent step proves the following: if the multiplier k is greater than one, then one can construct a new representation

mp=X^2+Y^2 with 0<m<k .

Once this is known, the proof is finished, because a strictly decreasing sequence of positive integers cannot continue forever. Eventually the multiplier must become one, and then the desired representation appears.

p=a^2+b^2

To perform the descent, choose least absolute residues of x and y modulo k . Thus choose integers r and s such that

r\equiv x\pmod k,\qquad s\equiv y\pmod k and |r|\leq \frac{k}{2},\qquad |s|\leq \frac{k}{2}

Since x^2+y^2=kp we have x^2+y^2\equiv 0\pmod k . Passing to the chosen residues gives r^2+s^2\equiv 0\pmod k . Therefore there is an integer m such that r^2+s^2=km .

The smallness of r and s is the key point. It gives r^2+s^2\leq \frac{k^2}{4}+\frac{k^2}{4}=\frac{k^2}{2} . Since r^2+s^2=km we obtain m\leq \frac{k}{2}<k .

Thus the new multiplier will be smaller, provided we can produce a new sum-of-two-squares representation of mp . This is where the multiplicative identity for sums of two squares enters.

(r^2+s^2)(x^2+y^2)=(rx+sy)^2+(ry-sx)^2 .

The left-hand side is (km)(kp)=mk^2p .

Now we check that the two terms on the right are divisible by k . Since r\equiv x\pmod k and s\equiv y\pmod k , we have

rx+sy\equiv r^2+s^2\equiv 0\pmod k and ry-sx\equiv rs-sr\equiv 0\pmod k .

Hence the following are integers: X=\frac{rx+sy}{k},\qquad Y=\frac{ry-sx}{k} .

Dividing the two-squares identity by k^2 gives

mp=X^2+Y^2 with 0<m<k.

This is the descent. It is important to see exactly what the proof has done. The congruence condition produced a multiple of the prime represented as a sum of two squares. Then, by choosing small residues modulo the multiplier and using the norm identity, we divided out the multiplier in a controlled way and replaced it by a smaller one. The proof terminates because the multiplier is a positive integer. The endpoint is precisely the disappearance of the multiplier.

kp=x^2+y^2\quad\Rightarrow\quad mp=X^2+Y^2\quad\Rightarrow\quad\cdots\quad\Rightarrow\quad p=a^2+b^2

This proof is elementary, but it already contains the whole story. The two-squares identity is norm multiplicativity in disguise. The least-residue choice is Euclidean division in disguise. The decreasing multiplier is the norm being reduced in disguise.

2. Gaussian integers: Euclidean Algorithm

Now rewrite the previous descent inside the Gaussian integers.

Let \alpha=x+iy . Then N(\alpha)=x^2+y^2=kp .

The descent proof chose residues r and s of x and y modulo k . In Gaussian language, combine them into one Gaussian integer. \rho=r+is .

The congruences r\equiv x\pmod k,\quad s\equiv y\pmod k become the single congruence \rho\equiv \alpha\pmod k . The equation r^2+s^2=km becomes N(\rho)=km and the inequality m<k is just the assertion that the residue \rho was chosen with norm N(\rho)<k^2 , smaller than k^2 .

Now define \beta=\frac{\rho\overline{\alpha}}{k} . This is a Gaussian integer. Indeed, since \rho\equiv \alpha\pmod k we get

\rho\overline{\alpha}\equiv \alpha\overline{\alpha}=N(\alpha)=kp\equiv 0\pmod k.

Therefore k divides \rho\overline{\alpha} in the Gaussian integers, so \beta belongs to \mathbb Z[i] . Taking norms gives N(\beta)=\frac{N(\rho)N(\alpha)}{k^2} . Using N(\rho)=km,\quad N(\alpha)=kp we obtain N(\beta)=\frac{(km)(kp)}{k^2}=mp .

This is exactly the descent step. To see that it is literally the same formula, expand the numerator.

\rho\overline{\alpha}=(r+is)(x-iy)=(rx+sy)+i(sx-ry)

Thus \beta=\frac{rx+sy}{k}+i\frac{sx-ry}{k} The real and imaginary parts are exactly the integers obtained in the elementary descent proof, up to the harmless sign difference between ry-sx and sx-ry . So the elementary descent proof is not merely analogous to the Gaussian proof. It is the Gaussian norm-reduction step written in real and imaginary coordinates.

The Gaussian proof is often stated in the language of greatest common divisors. From t^2+1=kp we get (t+i)(t-i)=kp . Thus p\mid (t+i)(t-i) inside \mathbb Z[i] . But p cannot divide t+i , since that would force p to divide both t and 1 . Similarly, p cannot divide t-i . Therefore p is not prime in \mathbb Z[i] .

More constructively, compute the Gaussian greatest common divisor \delta=\gcd(p,t+i) . This greatest common divisor is not a unit, because p has a nontrivial common factor with t+i . It is also not associated to p , because p does not divide t+i . Therefore \delta is a proper nontrivial divisor of p in \mathbb Z[i] . Taking norms, we know that N(\delta)\mid N(p)=p^2 . The only possible norms are 1,~ p,~ p^2 . The first would make \delta a unit, and the last would make \delta associated to p . Hence N(\delta)=p . Writing \delta=a+bi we obtain p=N(\delta)=a^2+b^2 .

This is the Gaussian integer proof. Its relationship with descent is exact: the Euclidean algorithm in \mathbb Z[i] repeatedly chooses small remainders, and the elementary descent proof is the same remainder-reduction process after unpacking the real and imaginary parts. The multiplier k in the descent proof is the visible shadow of the Gaussian norm being reduced.

3. Quadratic forms: Reduction

Now we pass to binary quadratic forms. From the seed equation t^2+1=kp define the form

Q(X,Y)=pX^2+2tXY+kY^2 .

Its discriminant is (2t)^2-4pk=4t^2-4(t^2+1)=-4

Thus Q is a positive definite binary quadratic form of discriminant -4 . Also, Q(1,0)=p . So p is represented by a form of discriminant -4 . If this form is equivalent to the principal form X^2+Y^2 then p is represented by X^2+Y^2 , and the theorem follows.

The crucial point is that this form has not been pulled out of thin air. It is exactly the normalized norm form of the Gaussian ideal attached to the congruence. Define I=(p,t+i)\subset \mathbb Z[i] As a free abelian group of rank two, this ideal has basis \{p, t+i \}. A general element of the ideal is pX+(t+i)Y

Compute its norm: N(pX+(t+i)Y)=N((pX+tY)+iY) . Therefore N(pX+(t+i)Y)=(pX+tY)^2+Y^2 . Expanding gives p^2X^2+2ptXY+(t^2+1)Y^2

Using t^2+1=kp this becomes p^2X^2+2ptXY+kpY^2 . Factoring out p , we obtain N(pX+(t+i)Y)=p(pX^2+2tXY+kY^2) . Hence Q(X,Y)=\frac{N(pX+(t+i)Y)}{p}

The ideal I has norm p . One way to see this is to look at the quotient. In the quotient by I , we have p=0 and i=-t , and because t^2\equiv -1\pmod p , this gives exactly a quotient of size p . Thus N(I)=p

So we can write the form more conceptually as Q(X,Y)=\frac{N(pX+(t+i)Y)}{N(I)}

This identity is the bridge. It says that the quadratic form Q is the norm form of the ideal I , expressed in the particular basis p,t+i and normalized by the ideal norm.

Now consider what equivalence of quadratic forms means. A change of variables

X=aX'+bY',\quad Y=cX'+dY' \qquad \text{with} \qquad \begin{pmatrix}a&b\\ c&d\end{pmatrix}\in SL_2(\mathbb Z)

does not change the set of represented integers. It only changes the coordinates. In the ideal language, this is exactly a change of \mathbb Z -basis.

If the old basis is e_1=p,e_2=t+i then a unimodular change gives a new basis e_1'=ae_1+ce_2, e_2'=be_1+de_2 . The determinant condition ensures that the new pair is still a basis of the same ideal. Therefore equivalent quadratic forms are not different objects. They are the same ideal written in different bases. Form reduction is therefore basis reduction. One changes the basis of the ideal so that the coefficients of the norm form become smaller and better balanced. This is the quadratic-form version of Euclidean reduction. The Gaussian proof chooses smaller remainders. The form proof chooses a better basis. These are two presentations of the same act.

For discriminant -4 , the reduction theory is especially simple. Suppose [A,B,C] is a reduced positive definite form of discriminant -4 . Then B^2-4AC=-4 and reduction means |B|\leq A\leq C . From the discriminant equation, 4AC=B^2+4 . Since |B|\leq A we have B^2\leq A^2 and since A\leq C we have A^2\leq AC . Thus 4A^2\leq 4AC=B^2+4\leq A^2+4 . Hence 3A^2\leq 4 so A=1 . Substituting back gives B^2-4C=-4 Since |B|\leq 1 , we must have B=0 and then C=1

Thus the only reduced positive definite form of discriminant -4 is [1,0,1] which is the form X^2+Y^2 .

Therefore the form [p,2t,k] reduces to [1,0,1] .

Since [p,2t,k] represents p , the equivalent principal form also represents p . Hence p=a^2+b^2 .

This is the quadratic-form proof. But now its meaning is clear: it is the same Gaussian ideal being rewritten in a better basis until its norm form becomes the principal form.

4. Ideals: class number one

The ideal-theoretic proof is the cleanest conceptual version. The congruence t^2\equiv -1\pmod p creates the ideal I=(p,t+i)\subset \mathbb Z[i]

This ideal lies above p and has norm N(I)=p . The theorem is now reduced to a single question: is this ideal principal? If I=(\alpha) for some Gaussian integer \alpha=a+bi then N(\alpha)=N(I)=p and therefore p=a^2+b^2 .

Thus the whole theorem becomes the principalization statement (p,t+i)=(a+bi) .

The reason this is true is that \mathbb Z[i] has class number one. In fact, it is Euclidean with respect to the norm N(a+bi)=a^2+b^2 and every Euclidean domain is a principal ideal domain. Therefore every ideal of \mathbb Z[i] is principal. In particular, the ideal I=(p,t+i) must be generated by one Gaussian integer. Because its norm is p , that generator has norm p .

This is not a different proof in spirit from descent. It is what descent is proving by hand. The descent proof shows directly that the multiplier can be lowered until an element of norm p appears. The Gaussian Euclidean proof computes a generator by a greatest-common-divisor algorithm. The quadratic-form proof changes the basis until the norm form becomes principal. The ideal proof says that such a generator must exist because the ideal class group is trivial.

The dictionary is exact. The elementary representation kp=x^2+y^2 is the same as the Gaussian norm equation N(x+iy)=kp . The starting congruence produces the ideal I=(p,t+i) and the quadratic form [p,2t,k] . The descent endpoint k=1 is the Gaussian endpoint N(a+bi)=p which is the form endpoint [p,2t,k]\sim [1,0,1] which is the ideal endpoint I=(a+bi) .

This is the central point of the first family of proofs. They are not merely similar. They are the same principalization process written in progressively more invariant languages. Descent is the coordinate computation. Gaussian integers reveal the norm structure. Quadratic forms record the norm structure as a basis-dependent form. Ideals remove the basis and identify the true invariant object.

5. Geometry of numbers:

The geometry-of-numbers proof belongs to the same family because it uses the same ideal, but it produces the generator by a different reduction principle. Instead of iteratively reducing a multiplier or applying Euclidean division, it views the ideal as a lattice in the plane and finds a short vector.

Consider a lattice

\Lambda=\{(x,y)\in\mathbb Z^2:x\equiv ty\pmod p\} .

This lattice is not merely analogous to the ideal I=(p,t+i) . It is exactly that ideal written in real coordinates. Indeed, an element of I has the form pA+(t+i)B=(pA+tB)+iB . So if x=pA+tB,\qquad y=B then x\equiv ty\pmod p . Conversely, if x\equiv ty\pmod p then x-ty=pA for some integer A , and hence x+iy=pA+(t+i)y\in I .

Therefore I=(p,t+i) as a lattice in the complex plane is precisely \Lambda . The covolume of this lattice is p . This follows because \Lambda is the kernel of the surjective homomorphism \mathbb Z^2\to \mathbb Z/p\mathbb Z,\qquad (x,y)\mapsto x-ty Thus the index of \Lambda in \mathbb Z^2 is p , and so \det(\Lambda)=p .

Now observe the arithmetic property of this lattice. If (x,y)\in\Lambda then x\equiv ty\pmod p . Therefore x^2+y^2\equiv t^2y^2+y^2=(t^2+1)y^2\equiv 0\pmod p So every vector in the lattice has squared length divisible by p . p\mid x^2+y^2 . But x^2+y^2=N(x+iy) .

Thus the geometry-of-numbers proof is still a norm proof. It is looking for an element of the same ideal whose norm is exactly p .

Minkowski’s convex body theorem now supplies the reduction step. In a lattice of covolume p , every centrally symmetric convex region in the plane whose area is greater than 4p contains a nonzero lattice point. Consider the disk x^2+y^2<2p . Its radius is \sqrt{2p} and its area is 2\pi p . Since 2\pi p>4p Minkowski’s theorem gives a nonzero vector (x,y)\in\Lambda inside this disk. Hence 0<x^2+y^2<2p .

But we already know that p\mid x^2+y^2 . The only positive multiple of p strictly smaller than 2p is p itself. Therefore x^2+y^2=p . This proves the theorem.

The geometry proof is sometimes described as different from descent, and in a narrow algorithmic sense that is true: it does not visibly run the sequence kp\to mp\to\cdots\to p . But conceptually it is still principalizing the same ideal. The lattice \Lambda is the ideal I=(p,t+i) . Minkowski finds a nonzero element of this ideal with norm less than 2p . Since all norms in this ideal are divisible by p , this short element must have norm exactly p . If \alpha=x+iy\in I and N(\alpha)=p=N(I) then I=(\alpha) .

So geometry of numbers proves principalization by shortest-vector reduction (the existence of which here is proved by a pigeonhole style argument). Descent principalizes by repeated local improvements. Euclidean algorithm principalizes by small remainders. Form reduction principalizes by changing the basis. Geometry principalizes by finding the shortest possible nonzero vector in the ideal lattice.



Each proof finds the same hidden generator by some form of reduction. Elementary descent reduces the extra multiplier in a norm equation until the multiplier becomes 1 . Gaussian integers reveal this as Euclidean norm reduction in \mathbb Z[i] , where the generator appears as \gcd(p,t+i) . Quadratic forms express the same object as the normalized norm form of the ideal; reducing [p,2t,k] to [1,0,1] is exactly the passage to a principal basis. Geometry of numbers performs the same search geometrically: it draws the ideal as a lattice and uses short-vector reduction to find an element whose norm, being positive and divisible by p , must equal p .

The conceptual reason all these reductions succeed is class number one. The congruence t^2\equiv -1\pmod p creates an ideal I=(p,t+i) with N(I)=p . The task is to reduce this ideal, form, lattice, or norm equation until it becomes principal. In a ring with nontrivial class group, such a reduction could stop in a nonprincipal class; the local congruence would create an object above p , but not necessarily a global element of norm p . For \mathbb Z[i] , however, the class number is one, so there is no such obstruction. Every reduction route must end at a principal generator. Thus I=(a+bi) and taking norms gives p=N(a+bi)=a^2+b^2 . The theorem is therefore the passage from local splitting to global principality: the square root of -1 modulo p creates the ideal, and class-number-one reduction forces that ideal to produce the desired element of norm p .

Leave a comment