QR algorithm

The QR algorithm is a method for finding eigenvalues. In the real symmetric case, the problem is especially clean. We are given A=A^T\in\mathbb R^{n\times n} and we want to find numbers \lambda_1,\dots,\lambda_n and orthonormal vectors u_1,\dots,u_n such that Au_i=\lambda_i u_i,\qquad i=1,\dots,n. . Equivalently, we want an orthogonal matrix S=(u_1\ \cdots\ u_n) and a diagonal matrix \Lambda such that A=S\Lambda S^T.

Thus the eigenvalue problem is, at heart, a problem of finding the right coordinate system. The matrix A may look complicated in the standard coordinates, but if we rotate the coordinate axes so that they point in the eigenvector directions, then the same linear transformation becomes diagonal. In that special coordinate system, A no longer mixes different coordinates; it merely stretches each coordinate axis by the corresponding eigenvalue.

The QR algorithm is an iterative procedure for discovering this coordinate system. Its guiding idea is simple: repeatedly replace the current matrix by an orthogonally similar one, in such a way that the coordinate axes are gradually rotated toward the eigenvector directions. If this process succeeds, the matrices produced by the algorithm become closer and closer to diagonal matrices. The eigenvalues then appear on the diagonal.

The important point is that the iterations in the algorithm do not change the eigenvalues. They change only the orthonormal basis in which the same linear transformation is being expressed.

The algorithm

Start with A_0=A. . For k=0,1,2,\dots , perform the following two steps:

1. Compute the QR factorization

A_k=Q_kR_k,

where Q_k^TQ_k=I and R_k is upper triangular.

2. Reverse the factors and define

A_{k+1}=R_kQ_k.

That is the unshifted QR algorithm.

At first glance, the instruction “factor A_k and multiply the factors in the opposite order” looks like a trick. The first important observation is that it is not a trick. Since A_k=Q_kR_k, we have R_k=Q_k^TA_k. Therefore

A_{k+1}=R_kQ_k=Q_k^TA_kQ_k.

So each QR step is an orthogonal similarity transformation. In other words, A_{k+1} is the same linear transformation as A_k , but written in a new orthonormal coordinate system. This immediately explains why the eigenvalues are preserved.

Moreover, symmetry is preserved. IIf A_k is symmetric, then so is A_{k+1} , because A_{k+1}^T=(Q_k^TA_kQ_k)^T=Q_k^TA_k^TQ_k=Q_k^TA_kQ_k=A_{k+1}. Thus, starting from a real symmetric matrix, every iterate remains real symmetric and has exactly the same eigenvalues as the original matrix.

The hope is that, as k increases, these orthogonal changes of coordinates make the matrix less and less coupled off the diagonal. That is, we want {\text{off}}(A_k)=A_k-{\text{diag}}(A_k)\to 0. When this happens, A_k\approx {\text{diag}}(\lambda_1,\dots,\lambda_n) .

This is the basic picture: the QR algorithm rotates the coordinate system while preserving the spectrum. The off-diagonal entries measure how far the current coordinate axes are from being eigenvector directions. As those entries decay, the diagonal entries become increasingly accurate approximations to the eigenvalues.

Proof of Convergence

Define the accumulated orthogonal matrix P_k=Q_0Q_1\cdots Q_{k-1},\qquad P_0=I.

Since each QR step satisfies A_{k+1}=Q_k^TA_kQ_k, unpacking gives

A_k=P_k^TAP_k.

This identity is the conceptual center of the QR algorithm. It says that A_k is not a new linear transformation. It is the original transformation A , but written in the moving orthonormal coordinate system whose axes are the columns of P_k . Thus the QR algorithm is trying to rotate the coordinate system until the matrix stops mixing the coordinate directions. If the columns of P_k become eigenvectors of A , then

P_k^TAP_k\to \Lambda.

QR as simultaneous power iteration: The ordinary power method begins with a vector v and repeatedly forms A^kv. Suppose Au_i=\lambda_i u_i and |\lambda_1|>|\lambda_2|\geq\cdots\geq|\lambda_n|. If v=c_1u_1+c_2u_2+\cdots+c_nu_n, then A^kv=c_1\lambda_1^ku_1+c_2\lambda_2^ku_2+\cdots+c_n\lambda_n^ku_n. If c_1\neq0 , the first term eventually dominates. The vector A^kv points more and more in the direction of u_1 . The error is governed by the ratio \left|\frac{\lambda_2}{\lambda_1}\right|^k.

The QR algorithm is the same idea applied to a whole basis at once. But there is a problem: if we repeatedly apply A to many independent vectors, all of them tend to collapse toward the dominant eigenvector u_1 . QR prevents this collapse by orthogonalizing after every step. The first vector is pulled toward u_1 . The second vector is forced to remain orthogonal to the first, so it is pulled toward the strongest remaining direction u_2 . The third vector is then pulled toward u_3 , and so on. Thus QR is the power method together with a geometric exclusion principle: once a dominant direction has been captured, the later directions must search in its orthogonal complement. This explains why QR is naturally a flag algorithm. It does not merely approximate individual eigenvectors. It approximates the nested invariant subspaces

{\text{span}}(u_1)\subset {\text{span}}(u_1,u_2)\subset\cdots\subset {\text{span}}(u_1,\dots,u_n).

The convergence of the first r directions is controlled by the gap across the cut, \left|\frac{\lambda_{r+1}}{\lambda_r}\right|. This is the first serious lesson of the algorithm: QR does not find all eigendirections at the same speed. Each part of the flag converges at a rate dictated by the neighboring eigenvalue ratio.

Theorem: Assume A=A^T=S\Lambda S^T, where S is orthogonal and \Lambda={\text{diag}}(\lambda_1,\dots,\lambda_n). Assume the eigenvalue magnitudes are strictly separated:

|\lambda_1|>|\lambda_2|>\cdots>|\lambda_n|>0.

Finally, assume the generic nondegeneracy condition that S^T has an LU factorization without row exchanges. Equivalently, the relevant leading principal minors are nonzero. This condition rules out exceptional starting alignments. It is not the main idea; it is a technical condition ensuring that the standard coordinate flag is not accidentally orthogonal to the eigenvector flag.

Then the QR algorithm, with the diagonal entries of R_k chosen positive, satisfies A_k\to \Lambda up to harmless sign conventions for the eigenvectors. More quantitatively, if \rho=\max_{1\leq i<n}\left|\frac{\lambda_{i+1}}{\lambda_i}\right|<1, then there is a constant C , depending on A and the initial coordinate system, such that

||A_k-\Lambda||_F\leq C\rho^k.

The theorem says exactly what the power-method picture suggests. Powers of A separate eigendirections according to their eigenvalue magnitudes, and the QR algorithm records this separation while continually re-orthonormalizing the basis.

Proof: The proof becomes natural once one stops looking at the single QR step and instead looks at the powers A^k . The key fact is this: the accumulated matrix P_k=Q_0Q_1\cdots Q_{k-1} is precisely the orthogonal factor in the QR factorization of A^k . More precisely, there exists an upper triangular matrix T_k such that

A^k=P_kT_k.

This is the bridge between QR iteration and the power method. Let us prove it. For k=1 , the statement is just the first QR factorization, A=Q_0R_0. Now suppose that, for some k , A^k=P_kT_k, with T_k upper triangular. Since A_k=P_k^TAP_k, we may rewrite this as AP_k=P_kA_k. Therefore

A^{k+1}=AA^k=AP_kT_k=P_kA_kT_k.

But the k -th QR factorization gives A_k=Q_kR_k. Hence A^{k+1}=P_kQ_kR_kT_k=P_{k+1}T_{k+1}, where P_{k+1}=P_kQ_k and T_{k+1}=R_kT_k. Since a product of upper triangular matrices is upper triangular, T_{k+1} is upper triangular. This completes the induction.

Thus the QR algorithm is not an arbitrary iteration. It is an efficient recursive way of computing the QR factorizations of A,\ A^2,\ A^3,\dots. This observation explains why convergence should occur. Powers of A separate eigendirections according to the sizes of the eigenvalues, and the QR factorization extracts an orthonormal basis from those increasingly separated directions.

Now use the spectral decomposition A=S\Lambda S^T. Then A^k=S\Lambda^kS^T. Assume that |\lambda_1|>|\lambda_2|>\cdots>|\lambda_n|>0, and assume the generic condition that S^T has an LU factorization without row exchanges. Thus we may write S^T=LU, where L is unit lower triangular and U is upper triangular. Substituting this into the formula for A^k gives A^k=S\Lambda^kLU. Now insert I=\Lambda^{-k}\Lambda^k between L and U : A^k=S(\Lambda^kL\Lambda^{-k})\Lambda^kU.

Define L_k=\Lambda^kL\Lambda^{-k}. Then A^k=SL_k(\Lambda^kU). The factor \Lambda^kU is upper triangular. Thus, up to harmless diagonal sign conventions, the orthogonal factor in the QR factorization of A^k is the orthogonal factor of SL_k . So the whole question reduces to understanding L_k . Write L=(l_{ij}). Since L is lower triangular, its only possible nonzero entries below the diagonal occur when i>j .

For such entries, (L_k)_{ij}=l_{ij}\left(\frac{\lambda_i}{\lambda_j}\right)^k.

This formula is the heart of the proof. Because the eigenvalues have been ordered by decreasing magnitude, whenever i>j we have |\lambda_i|<|\lambda_j|. Therefore \left|\frac{\lambda_i}{\lambda_j}\right|<1. It follows that every strictly lower-triangular entry of L_k tends to zero. Since L has ones on the diagonal, the diagonal of L_k is still equal to one. Hence L_k\to I.

More quantitatively, \displaystyle |L_k-I|_F^2=\sum_{i>j}| l_{ij}|^2\left|\frac{\lambda_i}{\lambda_j}\right|^{2k}. The slowest decay comes from the largest ratio |\lambda_i/\lambda_j| with i>j . Under the ordering |\lambda_1|>|\lambda_2|>\cdots>|\lambda_n|, this worst adjacent ratio is \rho=\max_{1\leq i<n}\left|\frac{\lambda_{i+1}}{\lambda_i}\right|. Thus there is a constant C_L such that |L_k-I|_F\leq C_L\rho^k.

Now return to A^k=SL_k(\Lambda^kU). Since L_k\to I , the matrix SL_k tends to S . Taking QR factorizations is a smooth operation near a nonsingular matrix whose leading principal minors do not vanish. Therefore the orthogonal factor of SL_k tends to the orthogonal factor of S , namely S itself, up to possible sign changes of columns. Consequently, |P_k-SD_k|_F\leq C_P\rho^k, where D_k is a diagonal matrix with diagonal entries \pm1 . These signs are harmless: changing the sign of an eigenvector does not change the diagonalization.

Finally, recall that A_k=P_k^TAP_k. Since P_k approaches S up to signs, we obtain A_k\to S^TAS=\Lambda. This proves convergence.The proof also reveals the rate. The error is carried by the lower-triangular part of L_k , and each entry of that lower-triangular part is multiplied by a factor \left(\frac{\lambda_i}{\lambda_j}\right)^k,\qquad i>j. Thus the convergence is controlled by ratios of eigenvalue magnitudes. The worst ratio is \rho=\max_{1\leq i<n}\left|\frac{\lambda_{i+1}}{\lambda_i}\right|, and therefore the basic global convergence rate is O(\rho^k).

This is the essential mechanism: powers of A magnify the dominant eigendirections faster than the weaker ones; QR factorization extracts the resulting orthonormal frame; and the amount of remaining contamination decays like eigenvalue ratios.

The practical symmetric QR algorithm

The QR algorithm described above is the conceptual algorithm. The practical algorithm for a dense real symmetric matrix uses four refinements: tridiagonalization, implicit QR steps, shifts, deflation.

    Each refinement keeps the same basic principle: replace the matrix by an orthogonally similar matrix, so the eigenvalues are unchanged.

    1. Tridiagonalization

    Given a dense real symmetric matrix A , first reduce it to tridiagonal form: T_0=H^TAH, where H is orthogonal. Thus T_0 has the same eigenvalues as A . The matrix T_0 has the form

    T_0=\begin{pmatrix} a_1 & b_1 & 0 & \cdots & 0\\ b_1 & a_2 & b_2 & \ddots & \vdots\\ 0 & b_2 & a_3 & \ddots & 0\\ \vdots & \ddots & \ddots & \ddots & b_{n-1}\\ 0 & \cdots & 0 & b_{n-1} & a_n \end{pmatrix}.

    The reduction is done by Householder reflectors H_j=I-2u_ju_j^T,\qquad |u_j|_2=1. At step j , the reflector H_j is chosen so that A\mapsto H_jAH_j zeros out the entries below the first subdiagonal in column j . Because the reflector is applied on both sides, symmetry is preserved. After n-2 such steps, the matrix is tridiagonal.

    This is the decisive practical reduction. A dense symmetric matrix has O(n^2) entries, while a tridiagonal matrix has only 2n-1 relevant entries. The eigenvalue problem is unchanged, but the later QR steps become much cheaper.

    2. Shifts

    On an active tridiagonal block, choose a shift \mu_k and compute T_k-\mu_kI=Q_kR_k. Then set T_{k+1}=R_kQ_k+\mu_kI. Equivalently, T_{k+1}=Q_k^TT_kQ_k. So a shifted QR step is still an orthogonal similarity transformation. The eigenvalues are not changed. The role of the shift is to improve the convergence rate. If \mu_k is close to an eigenvalue near the bottom of the active block, then the corresponding bottom subdiagonal entry tends to shrink much faster. The shift will change the convergenve as follows.

    |b_{n-1}^{(k+1)}|\lesssim C|b_{n-1}^{(k)}|\left|\frac{\lambda_n-\mu_k}{\lambda_{n-1}-\mu_k}\right|. Thus, if \mu_k is close to \lambda_n but not close to \lambda_{n-1} , the ratio is small and the bottom subdiagonal decays rapidly.

    A simple choice is the Rayleigh shift, \mu_k=(T_k)_{nn}. This is because we are assuming for large enough k, this entry is already to close to an eigenvalue so that above argument shows the shift improve the covnergence.

    A better practical choice is the Wilkinson shift. It uses the eigenvalue of the trailing 2\times2 block \begin{pmatrix} a_{n-1} & b_{n-1}\\ b_{n-1} & a_n \end{pmatrix} that is closer to a_n .

    3. The implicit QR step

    One could form Q_k and R_k explicitly, but that would destroy the efficiency gained from tridiagonalization. Instead, the practical algorithm uses an implicit QR step. Start with the shifted tridiagonal matrix T_k-\mu_kI . Its first column has only two possibly nonzero entries: \begin{pmatrix} a_1-\mu_k\\ b_1 \end{pmatrix}.

    Choose a Givens rotation G_1 acting on coordinates 1,2 that zeros the second entry. Apply it as a similarity transformation: T_k\mapsto G_1^TT_kG_1. This creates one extra nonzero entry just outside the tridiagonal band. This extra entry is called a bulge. Next choose a Givens rotation G_2 acting on coordinates 2,3 to remove the bulge. This creates a new bulge one step lower. Continue in this way: G_1,\ G_2,\ \dots,\ G_{n-1}. The bulge is chased down the matrix until it disappears at the bottom. The final matrix is again symmetric tridiagonal. Thus the implicit step performs the same shifted QR similarity transformation, T_{k+1}=Q_k^TT_kQ_k, but without forming the full matrix Q_k . It uses only local rotations on neighboring coordinates. This is why one QR step on a tridiagonal matrix costs only O(n) operations.

    4. Deflation

    As the QR iteration proceeds, some subdiagonal entries become very small. If b_i^{(k)} is numerically negligible. We set b_i^{(k)}=0. The tridiagonal matrix then splits into two smaller independent blocks: T_k=\begin{pmatrix} T_1 & 0\\ 0 & T_2 \end{pmatrix}. The eigenvalue problem has now separated into two smaller eigenvalue problems. This is deflation.

    In summary, the practical symmetric QR algorithm proceeds as follows. Starting with a dense real symmetric matrix A , one first makes a single orthogonal reduction to tridiagonal form, T_0=H^TAH. Then, on each active tridiagonal block, one chooses a shift \mu_k and performs an implicit shifted QR step, implemented by Givens rotations and bulge chasing, so that T_{k+1}=Q_k^TT_kQ_k. Whenever a subdiagonal entry becomes negligible, it is set to zero and the matrix deflates into smaller independent blocks. The process continues until every active block is 1\times1 . The remaining diagonal entries are the computed eigenvalues. Thus tridiagonalization makes the problem sparse, implicit QR preserves that sparsity, shifts accelerate convergence, and deflation makes the problem shrink.

    Leave a comment