AKS Primality Test

Primality is among the oldest questions in arithmetic: given a positive integer \displaystyle n>1 , is it prime, or does it factor into smaller integers? Using basic number theory, there is an immediate procedure. A composite integer has a prime divisor at most \displaystyle \sqrt n , so one may test every possible divisor up to that point. This always works, and it proves that primality is decidable. But complexity theory asks a different question. It asks not merely whether a method eventually gives the right answer, but whether the amount of work is reasonably small compared with the amount of information needed to specify the input. When the integer \displaystyle n is written in binary, that amount of information is its bit length L=\lceil\log_2 n\rceil. This is the natural measure of the size of the problem. An integer close to 2^L may be astronomically large as a number, yet it is represented by only about L binary digits. Trial division examines on the order of \sqrt n possible divisors; for an L -bit integer, this is roughly \sqrt n\asymp2^{L/2}. Thus trial division takes exponentially many steps in the input length. It is efficient when \displaystyle n itself is regarded as the scale, but it is not efficient in the sense required by computational complexity.

The AKS theorem establishes the much stronger conclusion that primality can be decided in genuinely efficient time. There is a deterministic algorithm which, given the binary expansion of any integer n>1 , always determines whether n is prime, and whose number of bit operations is bounded by a fixed polynomial in L . In the standard language of complexity theory,

\displaystyle \text{PRIMES}\in\mathbf P.

The significance of this statement is not that it gives the fastest known practical primality test. Rather, it gives an unconditional theoretical answer: primality does not require factorization, exhaustive search, random choices, or any unproved hypothesis. It can be recognized with certainty using an amount of computation that grows only polynomially with the number of digits of the input.

Polynomial Identity

The proof does not arise from a miraculous efficient way of trying divisors. Its starting point is Fermat’s little theorem, but used in a much more rigid polynomial form. A prime does not merely satisfy the numerical congruence a^p\equiv a\pmod p . It satisfies the polynomial identity

\displaystyle (X+a)^p\equiv X^p+a\pmod p.

The ordinary Fermat congruence is recovered by substituting a number for X . The polynomial identity is vastly stronger: it says that all intermediate binomial coefficients disappear modulo p . AKS begins from this observation, compresses the polynomial identity by imposing \displaystyle X^r=1 , and then proves that a composite number cannot imitate this compressed prime behavior too often. The proof has one central tension. The accepted congruences force a certain multiplicative set inside a finite field to be very large. But if the input were composite and not a prime power, the same congruences would force that set to be too small. The two estimates contradict one another.

Let \displaystyle p be prime. The binomial theorem gives

\displaystyle (X+a)^p=\sum_{j=0}^{p}\binom pj X^j a^{p-j}.

For every index j strictly between 0 and p , the coefficient \binom pj is divisible by p . Thus all middle terms vanish modulo p : (X+a)^p\equiv X^p+a^p\pmod p. Fermat’s little theorem gives \displaystyle a^p\equiv a\pmod p , so (X+a)^p\equiv X^p+a\pmod p. For a prime, this identity holds for every integer \displaystyle a . More importantly, the full identity characterizes primality. Suppose that \displaystyle n\ge2 , that \displaystyle (a,n)=1 , and that

\displaystyle (X+a)^n\equiv X^n+a\pmod n

holds as a genuine polynomial congruence, without reducing modulo any further polynomial in X . Then \displaystyle n must be prime.

Indeed, assume that \displaystyle n is composite, choose a prime divisor \displaystyle q of \displaystyle n , and write q^k\Vert n. The coefficient of \displaystyle X^q in (X+a)^n-(X^n+a) is \displaystyle \binom nq a^{n-q}. Since (a,n)=1 , the factor a^{n-q} is not divisible by \displaystyle q . Thus we only need to examine \displaystyle  \binom nq . Using \displaystyle  \binom nq=\frac nq\binom{n-1}{q-1}, we see that the first factor has q -adic valuation k-1 . The second factor is not divisible by \displaystyle q , because

\displaystyle \binom{n-1}{q-1} = \frac{(n-1)(n-2)\cdots(n-q+1)}{(q-1)!} \equiv \frac{(-1)(-2)\cdots(-(q-1))}{(q-1)!} \not\equiv0

modulo q . Therefore v_q\Big(\binom nq\Big)=k-1. The coefficient \displaystyle \binom nq a^{n-q} is not divisible by q^k , hence not divisible by n . So the full polynomial congruence fails for every composite n .

This would already give a primality test: verify \displaystyle (X+a)^n\equiv X^n+a\pmod n.

The defect is that these are degree-\displaystyle n polynomials. Handling \displaystyle n coefficients means handling exponentially many coefficients in the bit length \displaystyle L . The entire AKS method is an answer to one question: can we retain enough of this prime identity while reducing its degree from \displaystyle n to something polynomial in \displaystyle \log n ?

Choose a positive integer \displaystyle r and work in the quotient ring (\mathbf Z/n\mathbf Z)[X]/(X^r-1). Inside this quotient, one has X^r=1. Every power of X can therefore be reduced modulo r in its exponent, so every polynomial has a representative of degree less than \displaystyle r . Instead of checking the full degree-n identity, AKS checks the compressed congruence

\displaystyle (X+a)^n\equiv X^n+a\pmod{n,X^r-1}.

A prime still passes this test, because the full polynomial identity remains true after any further reduction. A composite can sometimes pass such a truncated test, however, because reducing modulo \displaystyle X^r-1 identifies many different powers of X . AKS therefore does not claim that one congruence, or even one value of a , detects primality. The theorem lies in the much subtler statement that a composite cannot pass enough carefully chosen compressed congruences unless it is a prime power.

The parameter \displaystyle r must be chosen so that the powers of n remain nontrivial modulo r . Write \text{ord}_r(n) for the least positive integer d such that n^d\equiv1\pmod r. This is defined when \displaystyle (n,r)=1 . AKS chooses the least r for which \text{ord}_r(n)>L^2. The inequality says that, modulo r , the powers 1,n,n^2,\ldots,n^{L^2} are still all distinct. This is exactly the reservoir of distinct exponents that will later force a large collection of distinct field elements.

Given n>1 , carry out the following steps.

  1. First test whether n is a nontrivial perfect power: n=b^e,\quad b>1,\qquad e>1. If it is, declare n composite.
  2. Next find the least r such that \text{ord}_r(n)>L^2.
  3. Then compute (a,n) for every integer a with 2\le a\le r . If any such greatest common divisor satisfies 1<(a,n)<n, declare n composite.
  4. If no such divisor has appeared and n\le r, declare n prime. This step is safe because, if n were composite, it would have a prime divisor at most \sqrt n<n\le r , which the preceding gcd checks would have found.
  5. Now define \ell=\left\lfloor\sqrt{\varphi(r)}L\right\rfloor, where \varphi(r) is Euler’s totient function. For every integer a with \displaystyle 1\le a\le\ell, test whether (X+a)^n\equiv X^n+a\pmod{n,X^r-1}. If any test fails, declare n composite. If every test succeeds, declare it prime.

We now explain why this last declaration is justified. The easy direction is immediate: if n is prime, it is not a nontrivial perfect power, has no nontrivial gcd with any a<n , and satisfies the full polynomial Fermat identity. Thus every prime passes every stage. The difficult direction begins by assuming that the algorithm accepts some integer \displaystyle n , and proving that this accepted integer must be prime. Suppose the algorithm reaches its final declaration. Since the perfect-power test did not reject, n is not a nontrivial prime power. Since the gcd stage did not reject, every prime divisor of \displaystyle n is larger than \displaystyle r . Indeed, if a prime \displaystyle p divided \displaystyle n and satisfied \displaystyle p\le r , then the gcd computation at \displaystyle a=p would reveal the proper divisor \displaystyle p .

Also \displaystyle n>r , because otherwise the algorithm would already have stopped at the earlier prime declaration. Let \displaystyle p be a prime divisor of \displaystyle n . We need one for which \text{ord}_r(p)>1. Such a divisor exists. If every prime divisor q of n were congruent to \displaystyle 1 modulo r , then their product with multiplicities would give n\equiv1\pmod r. But \displaystyle \text{ord}_r(n)>L^2\ge1 , so n is not congruent to \displaystyle 1 modulo r . Choose a prime divisor p for which \text{ord}_r(p)>1 . At this point the proof has selected a particular prime factor \displaystyle p of \displaystyle n . The accepted congruences will be reduced modulo p . The aim is to show that they force n to be a power of this p . The preliminary perfect-power test will then finish the proof.

Reduce the accepted congruence modulo p . Since p\mid n , for every 1\le a\le\ell we obtain (X+a)^n\equiv X^n+a\pmod{p,X^r-1}. There is also the trivial case \displaystyle a=0 , namely X^n=X^n . Thus we may work uniformly with the linear polynomials X,\ X+1,\ldots,X+\ell. In characteristic \displaystyle p , the Frobenius rule says (U+V)^p=U^p+V^p. Hence (X+a)^p=X^p+a. The less obvious statement is that \displaystyle n/p has the same effect on the tested linear polynomials (because r<p ).

Define Y=(X+a)^{n/p}-(X^{n/p}+a). Taking the p th power in characteristic p gives

\displaystyle Y^p=(X+a)^n-(X^n+a^p)=(X+a)^n-(X^n+a).

The accepted congruence says that this is zero modulo p and X^r-1 . One must now justify the inference from Y^p=0 to Y=0 . This would fail in a ring with nilpotents, so it is not a formal triviality. Here it works because p>r , and hence p\nmid r . The derivative of \displaystyle X^r-1 is rX^{r-1}, which is not the zero polynomial modulo p . Thus X^r-1 has no repeated roots over an algebraic closure of \mathbf F_p . Equivalently, the quotient ring \mathbf F_p[X]/(X^r-1) has no nonzero nilpotent elements. Therefore Y^p=0 implies Y=0 , and so

\displaystyle (X+a)^{n/p}\equiv X^{n/p}+a\pmod{p,X^r-1}.

It is useful to give this phenomenon a name. Call a positive integer m a Frobenius-like exponent for a polynomial f(X) when

\displaystyle f(X)^m\equiv f(X^m)\pmod{p,X^r-1}.

For each tested polynomial X+a , both \displaystyle p and \frac np are Frobenius-like exponents. The reason for introducing this terminology is that these exponents can be multiplied. If \displaystyle m and \displaystyle m' are Frobenius-like for \displaystyle f , then

\displaystyle f(X)^{mm'} = \Big(f(X)^m\Big)^{m'} \equiv f(X^m)^{m'} \equiv f(X^{mm'}).

In the second congruence we simply apply the Frobenius-like relation for m' after replacing X by X^m . Likewise, if m is Frobenius-like for both f and g , then it is Frobenius-like for their product:

\displaystyle (f(X)g(X))^m= f(X)^mg(X)^m \equiv f(X^m)g(X^m).

Consequently, every exponent of the form m=\left(\frac np\right)^i p^j,\quad i,j\ge0, is Frobenius-like for every polynomial of the form

\displaystyle f(X)=\prod_{a=0}^{\ell}(X+a)^{e_a},\quad e_a\ge0.

This is the hidden rigidity created by acceptance. The algorithm checks only a short list of congruences. But those congruences propagate multiplicatively into a very large family of exponent identities.

Let \displaystyle H be the subgroup of \displaystyle (\mathbf Z/r\mathbf Z)^\times generated by the residue classes of p and \frac np. Every element of \displaystyle H can be represented by one of the nonnegative products \left(\frac np\right)^i p^j, . Write t=|H|. The element n=\frac np\cdot p lies in H . Hence the order of n modulo r is at most t : t\ge\text{ord}_r(n)>L^2. This lower bound is the reason AKS selected r using a large-order condition. The group H contains more than L^2 distinct residue classes, and each of them will later produce a distinct power of a primitive r th root of unity.

Before entering a finite field, observe that \ell<r<p. Indeed, t>L^2 and t\le\varphi(r)\le r , so r>L^2 . Therefore \ell\le\sqrt r L<r. The inequality p>r came from the gcd stage. Consequently the residues 0,1,\ldots,\ell are distinct modulo \displaystyle p . This modest-looking fact is needed later, because it guarantees that the polynomials X,X+1,\ldots,X+\ell are distinct linear factors over \mathbf F_p . The quotient ring modulo X^r-1 is useful for computation, but it is not the ideal setting for the final counting argument. We want to use the elementary fact that a nonzero polynomial of degree d over a field has at most d roots. For this reason, choose an irreducible factor h(X)\mid\Phi_r(X) over \mathbf F_p , where \Phi_r(X) is the r th cyclotomic polynomial. Form the finite field F=\mathbf F_p[X]/(h(X)).

Let \zeta denote the image of X in F . Because h divides the cyclotomic polynomial, \zeta is a primitive r th root of unity: \zeta^r=1, but \zeta^d\ne1 for 1\le d<r. The degree of this field extension is [F:\mathbf F_p]=\text{ord}_r(p). To see this, put d=\deg h . The field F has p^d elements, so every nonzero element satisfies z^{p^d-1}=1 . In particular, \zeta^{p^d}=\zeta, and therefore p^d\equiv1\pmod r. This proves that \text{ord}_r(p)\mid d . Conversely, if k=\text{ord}_r(p) , then \zeta^{p^k}=\zeta, so \zeta is a root of X^{p^k}-X . Its minimal polynomial h(X) divides this polynomial, giving d\le k . Hence d=k . By the choice of p , this degree exceeds 1 . Thus \zeta\notin\mathbf F_p. In particular none of the elements \zeta,\ \zeta+1,\ldots,\zeta+\ell is zero. If \zeta+a=0 for some a\in\mathbf F_p , then \zeta=-a would belong to \mathbf F_p , contrary to the extension degree being greater than one.

Define K to be the multiplicative subgroup of F^\times generated by these elements: K=\langle\zeta,\zeta+1,\ldots,\zeta+\ell\rangle. The remaining proof gives one lower and one upper bound for |K| . Consider the family of polynomials

\displaystyle \mathcal P= \big \{ \prod_{a=0}^{\ell}(X+a)^{e_a}:e_a\ge0 \big \}.

A polynomial in \mathcal P has degree e_0+e_1+\cdots+e_\ell. Because the linear factors X,X+1,\ldots,X+\ell are distinct over \mathbf F_p , unique factorization shows that distinct exponent vectors produce distinct polynomials. We claim that distinct members f,g\in \mathcal P of degree less than t remain distinct after evaluation at \zeta : f(\zeta)\ne g(\zeta). Suppose instead that f(\zeta)=g(\zeta). Let Q(X)=f(X)-g(X). Then Q is nonzero and has degree less than t . We will force it to have at least t distinct roots.

Take any residue class u\in H . Choose an exponent m=\left(\frac np\right)^i p^j whose reduction modulo r equals u . The earlier propagation argument says that m is Frobenius-like for both f and g . Therefore f(\zeta)^m=f(\zeta^m),~ g(\zeta)^m=g(\zeta^m). Since f(\zeta)=g(\zeta) , we deduce f(\zeta^m)=g(\zeta^m). But m\equiv u\pmod r and \zeta^r=1 , so \zeta^m=\zeta^u. Thus Q(\zeta^u)=0 for every \displaystyle u\in H . The elements \zeta^u are distinct as u runs through the t residue classes in H , because \zeta has exact order r . Hence Q has at least t distinct roots, while its degree is less than t . This is impossible.

Thus every product in \mathcal P of degree at most t-1 gives a different element of K . The number of choices of nonnegative integers e_0,\ldots,e_\ell satisfying e_0+\cdots+e_\ell\le t-1 is \binom{t+\ell}{t-1}. Therefore |K|\ge\binom{t+\ell}{t-1}. This is the large side of the contradiction. Acceptance makes the exponents in H propagate a putative equality f(\zeta)=g(\zeta) to too many roots. As a result, no collision among low-degree products is possible, and K must contain many elements.

Now suppose, toward a contradiction, that n is not a power of the chosen prime p . Consider the finite set

\displaystyle \widehat{\mathcal I}= \big \{ (\frac np)^i p^j: 0\le i,j\le\lfloor\sqrt t\rfloor \big\}.

All these integers are distinct. For if \left(\frac np\right)^i p^j= (\frac np)^{i'}p^{j'}, then comparing the prime factors other than p forces i=i' , because n has a prime factor different from p . Once i=i' , one gets j=j' . There are (\lfloor\sqrt t\rfloor+1)^2>t such integers, but only t residue classes in the group H . Hence two distinct elements m_1,m_2\in\widehat{\mathcal I} have the same residue modulo r . Arrange them so that m_1>m_2. Then \zeta^{m_1}=\zeta^{m_2}. Each generator \zeta+a of K satisfies

\displaystyle (\zeta+a)^{m_1}=(\zeta+a)^{m_2}.

Indeed, both exponents are Frobenius-like for X+a , so (\zeta+a)^{m_i}=\zeta^{m_i}+a, and the right sides agree because \zeta^{m_1}=\zeta^{m_2} . Since every generator has the same m_1 th and m_2 th power, every element y\in K does as well: y^{m_1}=y^{m_2}. Thus every element of K is a root of the nonzero polynomial Y^{m_1}-Y^{m_2}. Over a field, a nonzero polynomial of degree m_1 has at most m_1 roots. Hence |K|\le m_1. The size of m_1 is controlled by the way it was chosen. Since both exponents in its definition are at most \lfloor\sqrt t\rfloor ,

\displaystyle m_1 \le (\frac np)^{\lfloor\sqrt t\rfloor} p^{\lfloor\sqrt t\rfloor} =n^{\lfloor\sqrt t\rfloor}\le n^{\sqrt t}.

Therefore, under the assumption that n is not a power of \displaystyle p , |K|\le n^{\sqrt t}. This is the small side of the contradiction. The large group H forces many distinct products in K . But the pigeonhole collision among the exponents forces every element of K to satisfy one polynomial equation of relatively low degree.

Put s=\lfloor\sqrt t L\rfloor. Because t>L^2, we have \sqrt t>L, and hence t>\sqrt t L. Therefore t\ge s+1. Also H is a subgroup of (\mathbf Z/r\mathbf Z)^\times , so t\le\varphi(r). It follows that \ell= \left\lfloor\sqrt{\varphi(r)}L\right\rfloor \ge s. The lower bound for K therefore gives

\displaystyle |K| \ge\binom{t+\ell}{t-1}=  \binom{t+\ell}{\ell+1} \ge \binom{2s+1}{s}.

For s\ge2 , \binom{2s+1}{s}>2^{s+1}. One quick way to see this is to write \binom{2s+1}{s} =\prod_{j=1}^{s}\frac{s+1+j}{j}. The first factor is larger than 3 , and each remaining factor is larger than 2 , so the whole product exceeds 2^{s+1} . Since s+1>\sqrt t L, we obtain 2^{s+1}>2^{\sqrt t L}=n^{\sqrt t}. Combining the estimates yields |K|>n^{\sqrt t}.

But the upper-bound argument, under the assumption that n is not a power of p , gave |K|\le n^{\sqrt t}. This contradiction proves that an accepted n must be a power of the selected prime: n=p^e. The first step of the algorithm had already rejected all nontrivial perfect powers. Hence e=1 , so n=p.

Thus every integer accepted by the algorithm is prime. Together with the earlier observation that every prime passes, this proves correctness: The AKS algorithm outputs PRIME if and only if n is prime.

Correctness is not enough. The theorem says that the algorithm runs in time polynomial in L=\log_2 n , so the parameters r and \ell must themselves be polynomially bounded in L .

The key fact is that a suitable r exists with r=O(L^5). Here is the elementary reason. Put B=\lfloor L^2\rfloor and choose a large polynomial bound M=\lceil4L^5\rceil. Suppose, for contradiction, that every integer q\le M coprime to n satisfies \text{ord}_q(n)\le B. Then for each such q , there is some 1\le d\le B with n^d\equiv1\pmod q. Thus every such q divides the product P=\prod_{d=1}^{B}(n^d-1). The logarithm of this product is small: \log_2 P < (1+2+\cdots+B)L< \frac{B(B+1)}2L < L^5. On the other hand, the least common multiple of all integers q\le M coprime to n is much too large to divide P . A standard binomial-coefficient estimate gives \text{lcm}(1,2,\ldots,M)\ge\frac{2^{M-1}}{M}. Removing the prime powers whose underlying primes divide n loses at most a factor M^{\omega(n)} , where \omega(n) is the number of distinct prime factors of n . Since \omega(n)\le L, the least common multiple of the integers at most M that are coprime to n is at least \frac{2^{M-1}}{M^{L+1}}. For M=4L^5 , this is larger than 2^{L^5} once L is outside a finite range of small values, and those small inputs can be checked directly. Thus it cannot divide P , whose binary logarithm is below L^5 . The contradiction proves that some r\le4L^5 has \text{ord}_r(n)>L^2. Since the algorithm chooses the least such r , it follows that r=O(L^5). Consequently,

\displaystyle \ell =\lfloor\sqrt{\varphi(r)}L\rfloor \le \sqrt r L =O(L^{7/2}).

The algorithm therefore performs only polynomially many gcd calculations and polynomial congruence tests. Each polynomial calculation is performed in the quotient modulo X^r-1 , so its representatives have degree below r . Repeated squaring computes (X+a)^n\pmod{n,X^r-1} using O(L) polynomial multiplications. Even with naive cyclic polynomial multiplication, the degree and coefficient sizes are bounded by fixed powers of L . Hence every stage, and therefore the full algorithm, has bit complexity bounded by a polynomial in L .

The AKS theorem does not say that the short congruence (X+a)^n\equiv X^n+a\pmod{n,X^r-1} by itself recognizes primes. Composites can imitate it in isolated cases. The proof works because successful tests for a whole interval of a values create a large system of compatible Frobenius-like identities. Those identities spread from the two exponents p \quad\text{and}\quad\frac np to the large semigroup \big \{ (\frac np)^i p^j:i,j\ge0 \big\}. Modulo r , this semigroup contains a large group of exponents. In a finite field containing a primitive r th root of unity, the large group forces many products of \zeta,\zeta+1,\ldots,\zeta+\ell to be distinct. Yet, unless n is a power of a single prime, two exponents must collide modulo r , forcing every one of those products to satisfy the same low-degree polynomial equation. A set cannot simultaneously be that large and that small. That contradiction is the heart of AKS. The perfect-power check removes the one surviving possibility. What remains is exactly primality: PRIMES is in \mathbf P.

Leave a comment