The Four-Square Theorem using Hurwitz Quaternions

The familiar proof of Fermat’s two-square theorem through Gaussian integers has a very satisfying shape. For a prime p\equiv1\pmod 4 , one first finds a solution of r^2\equiv-1\pmod p . This produces the Gaussian ideal

\displaystyle (p,r+i)\subset \mathbb Z[i].

The fact that \mathbb Z[i] has class number one, or more concretely that it is a principal ideal domain, turns this ideal into one generated by a single Gaussian integer \pi=a+bi . Taking norms then gives p=a^2+b^2 .

Hurwitz’s proof of the four-square theorem is the same strategy in a richer ring. The Gaussian integers are replaced by a ring of integral quaternions; the binary norm a^2+b^2 is replaced by the quaternary norm a^2+b^2+c^2+d^2 ; and a commutative ideal is replaced by a left ideal. The point is not merely that quaternions provide Euler’s four-square identity. They provide an arithmetic setting in which a prime p can be forced to factor, and that factorization produces a quaternion of norm p .

The theorem we want is:

Every positive integer is a sum of four integer squares.

As with the two-square theorem, it is enough to prove the result for primes. Indeed, if m and n are norms of integral quaternions, then mn is also such a norm. This is the four-square identity in structural form.

Quaternions

Let \mathbb H be Hamilton’s quaternion algebra. Its elements are expressions

\displaystyle x=a+bi+cj+dk,

where a,b,c,d\in\mathbb R , and where the symbols i,j,k satisfy

\displaystyle i^2=j^2=k^2=-1,\quad ij=k=-ji,\quad jk=i=-kj,\quad ki=j=-ik.

Multiplication is associative but not commutative. The basic involution is quaternionic conjugation:

\displaystyle \overline{a+bi+cj+dk}=a-bi-cj-dk.

The norm is defined by \displaystyle N(x)=x\overline{x}=a^2+b^2+c^2+d^2.

Although multiplication in \mathbb H is noncommutative, the norm is multiplicative. Indeed, because \overline{xy}=\overline y,\overline x ,

N(xy)=xy\overline{xy}=xy\,\overline y\,\overline x  =xN(y)\overline x=N(y)x\overline x=N(x)N(y).

Here N\left(y\right) is a real number and hence commutes with every quaternion. This immediately explains Euler’s four-square identity. If

\displaystyle x=a+bi+cj+dk,\quad y=e+fi+gj+hk

have integer coordinates, then xy also has integer coordinates. Therefore N\left(xy\right) is another sum of four integer squares, while multiplicativity says that

\displaystyle N(xy)=N(x)N(y).

Thus the product of two sums of four squares is again a sum of four squares. The problem is therefore reduced to showing that every prime p occurs as the norm of some quaternion with integer coordinates.

Hurwitz Quaternions

The most obvious integral quaternions are the Lipschitz quaternions

\displaystyle \mathbb Z[i,j,k]=\{a+bi+cj+dk:a,b,c,d\in\mathbb Z\}.

They are closed under multiplication, and their norm is exactly the form a^2+b^2+c^2+d^2 that we want. But this ring is slightly too sparse for a Euclidean algorithm.

Geometrically, \mathbb Z[i,j,k] is the usual integer lattice \mathbb Z^4 inside \mathbb R^4 . Given a point of \mathbb R^4 , one can round each coordinate to a nearest integer. The resulting lattice point is at distance at most 1 , but equality occurs at the centres of the unit four-dimensional cubes. Thus ordinary integral quaternions come very close to being Euclidean, but fail exactly at those cube centres.

Hurwitz’s insight was to add those missing centres. Let \displaystyle \omega=\frac{1+i+j+k}{2}. The ring of Hurwitz quaternions is

\displaystyle \mathcal O=\mathbb Z+\mathbb Zi+\mathbb Zj+\mathbb Z\omega.

Equivalently, its elements are precisely the quaternions \displaystyle a+bi+cj+dk whose four coordinates are either all integers or all half-integers. Thus

\displaystyle \mathcal O=\mathbb Z^4\cup\left(\mathbb Z+\frac{1}{2}\right)^4.

inside \mathbb R^4 . The second collection consists exactly of the centres of the unit cubes of \mathbb Z^4 . This one enlargement repairs the Euclidean algorithm.

The norm of every Hurwitz quaternion is an integer. This is obvious for integral coordinates. If the coordinates are half-integral, write x=\frac{A+Bi+Cj+Dk}{2}, where A,B,C,D are odd integers. Then N(x)=\frac{A^2+B^2+C^2+D^2}{4}. Each odd square is congruent to 1\pmod 4 , so the numerator is divisible by 4 . Hence N\left(x\right)\in\mathbb Z .

The elements of norm 1 are the units of \mathcal O . There are twenty-four of them: the eight integral units \displaystyle \pm1,\ \pm i,\ \pm j,\ \pm k, together with the sixteen half-integral units \displaystyle \frac{\pm1\pm i\pm j\pm k}{2}. These extra units will eventually allow us to convert a half-integral norm representation into an ordinary representation by four integer squares.

The crucial fact about \mathcal O is the following nearest-point statement:

\displaystyle \text{For every }z\in\mathbb H,\text{ there is }q\in\mathcal O \text{ with }N(z-q)<1.

To see this, round the four coordinates of z to nearest integers. The resulting integral quaternion q satisfies N\left(z-q\right)\leq1 . Equality could occur only if every coordinate of z lies exactly halfway between two consecutive integers. But then all four coordinates of z are half-integers, so z itself belongs to \mathcal O . In that exceptional case we choose q=z , obtaining distance 0 . Thus strict inequality is always possible.

This geometric fact gives division with remainder. Let \alpha,\beta\in\mathcal O , with \beta\neq0 . Apply the nearest-point statement to the real quaternion \alpha\beta^{-1} . Choose q\in\mathcal O such that

\displaystyle N(\alpha\beta^{-1}-q)<1.

Define \displaystyle r=\alpha-q\beta. Then r\in\mathcal O , and \displaystyle r=(\alpha\beta^{-1}-q)\beta. By multiplicativity of the norm,

\displaystyle N(r)=N(\alpha\beta^{-1}-q)N(\beta)<N(\beta).

Thus

\displaystyle \alpha=q\beta+r, \quad N(r)<N(\beta).

This is a left-sided Euclidean division algorithm. The order matters: the quotient q occurs on the left of \beta . It follows that every left ideal of \mathcal O is principal. Indeed, let I be a nonzero left ideal, and choose a nonzero element \pi\in I of smallest norm. For any \alpha\in I , divide \alpha by \pi :

\displaystyle \alpha=q\pi+r, \quad N(r)<N(\pi).

Because I is a left ideal, q\pi\in I , and hence r=\alpha-q\pi\in I . By the minimality of N\left(\pi\right) , we must have r=0 . Therefore every \alpha\in I belongs to \mathcal O\pi , and so

\displaystyle I=\mathcal O\pi.

This is the precise quaternionic analogue of class number one. Since \mathcal O is noncommutative, one does not speak of the ordinary ideal class group of a quadratic ring. Instead, one says that the Hurwitz order has one-sided class number one: every left ideal is principal.

The Ideal

Let p be an odd prime. The first ingredient is the same elementary congruence argument that appeared in Lagrange’s descent proof. There exist integers r,s such that

\displaystyle r^2+s^2\equiv-1\pmod p.

Indeed, consider the sets \{0^2,1^2,\ldots,\left(\frac{p-1}{2}\right)^2\} and \{-1-0^2,-1-1^2,\ldots,-1-\left(\frac{p-1}{2}\right)^2\} modulo p . Each has \left(p+1\right)/2 distinct elements, so the two sets must intersect.

Choose r,s with \displaystyle 0\leq r,s\leq\frac{p-1}{2} and \displaystyle 1+r^2+s^2\equiv0\pmod p. Now define the quaternion

\displaystyle \theta=1+ri+sj.

Its norm is \displaystyle N(\theta)=1+r^2+s^2. Thus p\mid N\left(\theta\right) . Moreover, our size restriction on r and s gives

\displaystyle 0<N(\theta)<p^2.

This quaternion is the four-dimensional analogue of the Gaussian integer r+i . In the two-square proof one has \displaystyle N(r+i)=r^2+1\equiv0\pmod p. Here one has instead

\displaystyle N(1+ri+sj)=1+r^2+s^2\equiv0\pmod p.

The additional coordinate is exactly what removes the congruence obstruction. In the Gaussian setting, -1 must be a square modulo p , which happens only for p\equiv1\pmod 4 . In the quaternionic setting, -1 is always a sum of two squares modulo an odd prime. Now form the left ideal

\displaystyle J=\mathcal O p+\mathcal O\theta.

This is the direct analogue of the Gaussian ideal \displaystyle (p,r+i)\subset\mathbb Z[i]. The ideal J lies strictly between \mathcal O p and \mathcal O . First, J strictly contains \mathcal O p , because \theta\in J but \theta\notin\mathcal O p . Indeed, if \theta=p\gamma for some \gamma\in\mathcal O , then

\displaystyle N(\theta)=p^2N(\gamma),

which would force N\left(\theta\right)\geq p^2 , contradicting 0<N\left(\theta\right)<p^2 . Second, J is proper. Suppose instead that 1\in J . Then there are a,b\in\mathcal O such that \displaystyle 1=ap+b\theta. Modulo p\mathcal O , this becomes \displaystyle b\theta\equiv1\pmod{p\mathcal O}. Multiply on the right by \overline\theta . Since

\displaystyle \theta\overline\theta=N(\theta)\equiv0\pmod p,

we obtain

\displaystyle \overline\theta\equiv b\theta\overline\theta=bN(\theta)\equiv0\pmod{p\mathcal O}.

But \overline\theta\notin p\mathcal O , by exactly the same norm argument used for \theta . This contradiction proves that

\displaystyle \mathcal O p\subsetneq J\subsetneq\mathcal O.

This is the ideal-theoretic heart of the argument. Modulo p , the class of \theta is nonzero, but its product with \overline\theta vanishes. Thus \theta behaves like a nonzero zero divisor modulo p . In the Gaussian proof, the congruence r^2+1\equiv0\pmod p shows that r+i and r-i multiply to 0 modulo p ; here the same role is played by \theta and \overline\theta .

Because every left ideal of \mathcal O is principal, there exists a Hurwitz quaternion \pi such that \displaystyle J=\mathcal O\pi. Since p\in J , there is some \rho\in\mathcal O for which \displaystyle p=\rho\pi. Neither factor can be a unit. If \pi were a unit, then J=\mathcal O , contradicting that J is proper. If \rho were a unit, then \displaystyle \pi=\rho^{-1}p, and therefore \displaystyle J=\mathcal O\pi=\mathcal O p, contradicting that J properly contains \mathcal O p .

Taking norms gives \displaystyle p^2=N(p)=N(\rho)N(\pi). Both factors on the right are integers greater than 1 . Since p is an ordinary prime, the only possibility is

\displaystyle N(\rho)=N(\pi)=p.

Thus p is the norm of a Hurwitz quaternion.

At this point, the essential factorization argument is complete. The constructed ideal has produced a generator of norm p , just as a principal Gaussian ideal produces a Gaussian integer of norm p . There remains one final issue. A Hurwitz quaternion of norm p might have half-integral coordinates, whereas the four-square theorem asks for integer coordinates.

Integer Coordinates

Let \pi\in\mathcal O satisfy \displaystyle N(\pi)=p, with p odd. If \pi already has integer coordinates, then \displaystyle \pi=a+bi+cj+dk gives immediately \displaystyle p=a^2+b^2+c^2+d^2. Suppose instead that \pi has half-integral coordinates. Then we may write

\displaystyle \pi=\frac{A+Bi+Cj+Dk}{2},

where A,B,C,D are odd integers. Choose signs \varepsilon_0,\varepsilon_1,\varepsilon_2,\varepsilon_3\in{\pm1} satisfying \displaystyle \varepsilon_0\equiv A,\quad \varepsilon_1\equiv-B,\quad \varepsilon_2\equiv-C,\quad \varepsilon_3\equiv-D \pmod 4. The quaternion \displaystyle u= \frac{\varepsilon_0+\varepsilon_1i+\varepsilon_2j+\varepsilon_3k}{2} is a Hurwitz unit, so N\left(u\right)=1 . Direct multiplication gives

\displaystyle \begin{aligned} 4\pi u={}&(A\varepsilon_0-B\varepsilon_1-C\varepsilon_2-D\varepsilon_3)+ (A\varepsilon_1+B\varepsilon_0+C\varepsilon_3-D\varepsilon_2)i\\ &+(A\varepsilon_2 B\varepsilon_3+C\varepsilon_0+D\varepsilon_1)j+(A\varepsilon_3+B\varepsilon_2 C\varepsilon_1+D\varepsilon_0)k. \end{aligned}

The chosen congruences imply that every coefficient on the right is divisible by 4 . For example, \displaystyle A\varepsilon_0-B\varepsilon_1-C\varepsilon_2-D\varepsilon_3 \equiv A^2+B^2+C^2+D^2 \equiv0 \pmod 4. For the coefficient of i , one gets \displaystyle A\varepsilon_1+B\varepsilon_0+C\varepsilon_3-D\varepsilon_2 \equiv -AB+AB-CD+DC \equiv0 \pmod 4, and the other two coordinates cancel similarly.

Therefore \pi u has integer coordinates: \displaystyle \pi u=a+bi+cj+dk for some a,b,c,d\in\mathbb Z . Since u is a unit,

\displaystyle p=N(\pi)=N(\pi u)=a^2+b^2+c^2+d^2.

Thus every odd prime is a sum of four integer squares. The prime 2 is already represented by \displaystyle 2=1^2+1^2+0^2+0^2. Finally, every positive integer factors into primes, and the product of norms of integral quaternions is again the norm of an integral quaternion. Hence every positive integer is a sum of four squares.

For the two-square theorem, one works in \mathbb Z[i] , whose norm is \displaystyle N(a+bi)=a^2+b^2. For a prime p\equiv1\pmod 4 , one chooses r with \displaystyle r^2\equiv-1\pmod p, forms the ideal \displaystyle (p,r+i), uses the fact that \mathbb Z[i] is a principal ideal domain, obtains a generator \pi , and concludes that N\left(\pi\right)=p . For the four-square theorem, one works in the Hurwitz order \mathcal O , whose norm is \displaystyle N(a+bi+cj+dk)=a^2+b^2+c^2+d^2. For every odd prime p , one chooses r,s with \displaystyle r^2+s^2\equiv-1\pmod p, forms the left ideal \displaystyle \mathcal O p+\mathcal O(1+ri+sj), uses the fact that \mathcal O is a left principal ideal ring, obtains a generator \pi , and concludes that N\left(\pi\right)=p .

So the passage from two squares to four squares is not accidental. The Gaussian integers and the Hurwitz quaternions are both normed arithmetic rings. In each case, the norm is the quadratic form one wants to study; the congruence produces a nontrivial ideal above p ; and class number one turns that ideal into a single element whose norm is p .

The difference is that the quaternionic setting has enough room for every odd prime. The equation \displaystyle r^2\equiv-1\pmod p has a solution only when p\equiv1\pmod 4 , whereas \displaystyle r^2+s^2\equiv-1\pmod p has a solution for every odd prime. The extra square is exactly what makes the four-square theorem universal.

Finally, this proof also explains the relation with Lagrange’s descent proof. In the descent proof, one explicitly reduces coordinates modulo a multiplier and finds a smaller norm representative. In the Hurwitz proof, the same minimization principle is absorbed into the Euclidean algorithm for the Hurwitz lattice. The statement that every left ideal is principal is, in this sense, a global and structural form of descent.

Leave a comment