QR algorithm

The QR algorithm is a method for finding eigenvalues. In the real symmetric case, the problem is especially clean. We are given $A=A^T\in\mathbb R^{n\times n}$ and we want to find numbers $\lambda_1,\dots,\lambda_n$ and orthonormal vectors $u_1,\dots,u_n$ such that $Au_i=\lambda_i u_i,\qquad i=1,\dots,n.$ . Equivalently, we want an orthogonal matrix $S=(u_1\ \cdots\ u_n)$ and a diagonal matrix $\Lambda$ such that $A=S\Lambda S^T.$

Thus the eigenvalue problem is, at heart, a problem of finding the right coordinate system. The matrix $A$ may look complicated in the standard coordinates, but if we rotate the coordinate axes so that they point in the eigenvector directions, then the same linear transformation becomes diagonal. In that special coordinate system, $A$ no longer mixes different coordinates; it merely stretches each coordinate axis by the corresponding eigenvalue.

The QR algorithm is an iterative procedure for discovering this coordinate system. Its guiding idea is simple: repeatedly replace the current matrix by an orthogonally similar one, in such a way that the coordinate axes are gradually rotated toward the eigenvector directions. If this process succeeds, the matrices produced by the algorithm become closer and closer to diagonal matrices. The eigenvalues then appear on the diagonal.

The important point is that the iterations in the algorithm do not change the eigenvalues. They change only the orthonormal basis in which the same linear transformation is being expressed.

The algorithm

Start with $A_0=A.$ . For $k=0,1,2,\dots$ , perform the following two steps:

1. Compute the QR factorization

$A_k=Q_kR_k,$

where $Q_k^TQ_k=I$ and $R_k$ is upper triangular.

2. Reverse the factors and define

$A_{k+1}=R_kQ_k.$

That is the unshifted QR algorithm.

At first glance, the instruction “factor $A_k$ and multiply the factors in the opposite order” looks like a trick. The first important observation is that it is not a trick. Since $A_k=Q_kR_k,$ we have $R_k=Q_k^TA_k.$ Therefore

$A_{k+1}=R_kQ_k=Q_k^TA_kQ_k.$

So each QR step is an orthogonal similarity transformation. In other words, $A_{k+1}$ is the same linear transformation as $A_k$ , but written in a new orthonormal coordinate system. This immediately explains why the eigenvalues are preserved.

Moreover, symmetry is preserved. IIf $A_k$ is symmetric, then so is $A_{k+1}$ , because $A_{k+1}^T=(Q_k^TA_kQ_k)^T=Q_k^TA_k^TQ_k=Q_k^TA_kQ_k=A_{k+1}.$ Thus, starting from a real symmetric matrix, every iterate remains real symmetric and has exactly the same eigenvalues as the original matrix.

The hope is that, as $k$ increases, these orthogonal changes of coordinates make the matrix less and less coupled off the diagonal. That is, we want ${\text{off}}(A_k)=A_k-{\text{diag}}(A_k)\to 0.$ When this happens, $A_k\approx {\text{diag}}(\lambda_1,\dots,\lambda_n)$ .

This is the basic picture: the QR algorithm rotates the coordinate system while preserving the spectrum. The off-diagonal entries measure how far the current coordinate axes are from being eigenvector directions. As those entries decay, the diagonal entries become increasingly accurate approximations to the eigenvalues.

Proof of Convergence

Define the accumulated orthogonal matrix $P_k=Q_0Q_1\cdots Q_{k-1},\qquad P_0=I.$

Since each QR step satisfies $A_{k+1}=Q_k^TA_kQ_k,$ unpacking gives

$A_k=P_k^TAP_k.$

This identity is the conceptual center of the QR algorithm. It says that $A_k$ is not a new linear transformation. It is the original transformation $A$ , but written in the moving orthonormal coordinate system whose axes are the columns of $P_k$ . Thus the QR algorithm is trying to rotate the coordinate system until the matrix stops mixing the coordinate directions. If the columns of $P_k$ become eigenvectors of $A$ , then

$P_k^TAP_k\to \Lambda.$

QR as simultaneous power iteration: The ordinary power method begins with a vector $v$ and repeatedly forms $A^kv.$ Suppose $Au_i=\lambda_i u_i$ and $|\lambda_1|>|\lambda_2|\geq\cdots\geq|\lambda_n|.$ If $v=c_1u_1+c_2u_2+\cdots+c_nu_n,$ then $A^kv=c_1\lambda_1^ku_1+c_2\lambda_2^ku_2+\cdots+c_n\lambda_n^ku_n.$ If $c_1\neq0$ , the first term eventually dominates. The vector $A^kv$ points more and more in the direction of $u_1$ . The error is governed by the ratio $\left|\frac{\lambda_2}{\lambda_1}\right|^k.$

The QR algorithm is the same idea applied to a whole basis at once. But there is a problem: if we repeatedly apply $A$ to many independent vectors, all of them tend to collapse toward the dominant eigenvector $u_1$ . QR prevents this collapse by orthogonalizing after every step. The first vector is pulled toward $u_1$ . The second vector is forced to remain orthogonal to the first, so it is pulled toward the strongest remaining direction $u_2$ . The third vector is then pulled toward $u_3$ , and so on. Thus QR is the power method together with a geometric exclusion principle: once a dominant direction has been captured, the later directions must search in its orthogonal complement. This explains why QR is naturally a flag algorithm. It does not merely approximate individual eigenvectors. It approximates the nested invariant subspaces

${\text{span}}(u_1)\subset {\text{span}}(u_1,u_2)\subset\cdots\subset {\text{span}}(u_1,\dots,u_n).$

The convergence of the first $r$ directions is controlled by the gap across the cut, $\left|\frac{\lambda_{r+1}}{\lambda_r}\right|.$ This is the first serious lesson of the algorithm: QR does not find all eigendirections at the same speed. Each part of the flag converges at a rate dictated by the neighboring eigenvalue ratio.

Theorem: Assume $A=A^T=S\Lambda S^T,$ where $S$ is orthogonal and $\Lambda={\text{diag}}(\lambda_1,\dots,\lambda_n).$ Assume the eigenvalue magnitudes are strictly separated:

$|\lambda_1|>|\lambda_2|>\cdots>|\lambda_n|>0.$

Finally, assume the generic nondegeneracy condition that $S^T$ has an LU factorization without row exchanges. Equivalently, the relevant leading principal minors are nonzero. This condition rules out exceptional starting alignments. It is not the main idea; it is a technical condition ensuring that the standard coordinate flag is not accidentally orthogonal to the eigenvector flag.

Then the QR algorithm, with the diagonal entries of $R_k$ chosen positive, satisfies $A_k\to \Lambda$ up to harmless sign conventions for the eigenvectors. More quantitatively, if $\rho=\max_{1\leq i<n}\left|\frac{\lambda_{i+1}}{\lambda_i}\right|<1,$ then there is a constant $C$ , depending on $A$ and the initial coordinate system, such that

$||A_k-\Lambda||_F\leq C\rho^k.$

The theorem says exactly what the power-method picture suggests. Powers of $A$ separate eigendirections according to their eigenvalue magnitudes, and the QR algorithm records this separation while continually re-orthonormalizing the basis.

Proof: The proof becomes natural once one stops looking at the single QR step and instead looks at the powers $A^k$ . The key fact is this: the accumulated matrix $P_k=Q_0Q_1\cdots Q_{k-1}$ is precisely the orthogonal factor in the QR factorization of $A^k$ . More precisely, there exists an upper triangular matrix $T_k$ such that

$A^k=P_kT_k.$

This is the bridge between QR iteration and the power method. Let us prove it. For $k=1$ , the statement is just the first QR factorization, $A=Q_0R_0.$ Now suppose that, for some $k$ , $A^k=P_kT_k,$ with $T_k$ upper triangular. Since $A_k=P_k^TAP_k,$ we may rewrite this as $AP_k=P_kA_k.$ Therefore

$A^{k+1}=AA^k=AP_kT_k=P_kA_kT_k.$

But the $k$ -th QR factorization gives $A_k=Q_kR_k.$ Hence $A^{k+1}=P_kQ_kR_kT_k=P_{k+1}T_{k+1},$ where $P_{k+1}=P_kQ_k$ and $T_{k+1}=R_kT_k.$ Since a product of upper triangular matrices is upper triangular, $T_{k+1}$ is upper triangular. This completes the induction.

Thus the QR algorithm is not an arbitrary iteration. It is an efficient recursive way of computing the QR factorizations of $A,\ A^2,\ A^3,\dots.$ This observation explains why convergence should occur. Powers of $A$ separate eigendirections according to the sizes of the eigenvalues, and the QR factorization extracts an orthonormal basis from those increasingly separated directions.

Now use the spectral decomposition $A=S\Lambda S^T.$ Then $A^k=S\Lambda^kS^T.$ Assume that $|\lambda_1|>|\lambda_2|>\cdots>|\lambda_n|>0,$ and assume the generic condition that $S^T$ has an LU factorization without row exchanges. Thus we may write $S^T=LU,$ where $L$ is unit lower triangular and $U$ is upper triangular. Substituting this into the formula for $A^k$ gives $A^k=S\Lambda^kLU.$ Now insert $I=\Lambda^{-k}\Lambda^k$ between $L$ and $U$ : $A^k=S(\Lambda^kL\Lambda^{-k})\Lambda^kU.$

Define $L_k=\Lambda^kL\Lambda^{-k}.$ Then $A^k=SL_k(\Lambda^kU).$ The factor $\Lambda^kU$ is upper triangular. Thus, up to harmless diagonal sign conventions, the orthogonal factor in the QR factorization of $A^k$ is the orthogonal factor of $SL_k$ . So the whole question reduces to understanding $L_k$ . Write $L=(l_{ij}).$ Since $L$ is lower triangular, its only possible nonzero entries below the diagonal occur when $i>j$ .

For such entries, $(L_k)_{ij}=l_{ij}\left(\frac{\lambda_i}{\lambda_j}\right)^k.$

This formula is the heart of the proof. Because the eigenvalues have been ordered by decreasing magnitude, whenever $i>j$ we have $|\lambda_i|<|\lambda_j|.$ Therefore $\left|\frac{\lambda_i}{\lambda_j}\right|<1.$ It follows that every strictly lower-triangular entry of $L_k$ tends to zero. Since $L$ has ones on the diagonal, the diagonal of $L_k$ is still equal to one. Hence $L_k\to I.$

More quantitatively, $\displaystyle |L_k-I|_F^2=\sum_{i>j}| l_{ij}|^2\left|\frac{\lambda_i}{\lambda_j}\right|^{2k}.$ The slowest decay comes from the largest ratio $|\lambda_i/\lambda_j|$ with $i>j$ . Under the ordering $|\lambda_1|>|\lambda_2|>\cdots>|\lambda_n|,$ this worst adjacent ratio is $\rho=\max_{1\leq i<n}\left|\frac{\lambda_{i+1}}{\lambda_i}\right|.$ Thus there is a constant $C_L$ such that $|L_k-I|_F\leq C_L\rho^k.$

Now return to $A^k=SL_k(\Lambda^kU).$ Since $L_k\to I$ , the matrix $SL_k$ tends to $S$ . Taking QR factorizations is a smooth operation near a nonsingular matrix whose leading principal minors do not vanish. Therefore the orthogonal factor of $SL_k$ tends to the orthogonal factor of $S$ , namely $S$ itself, up to possible sign changes of columns. Consequently, $|P_k-SD_k|_F\leq C_P\rho^k,$ where $D_k$ is a diagonal matrix with diagonal entries $\pm1$ . These signs are harmless: changing the sign of an eigenvector does not change the diagonalization.

Finally, recall that $A_k=P_k^TAP_k.$ Since $P_k$ approaches $S$ up to signs, we obtain $A_k\to S^TAS=\Lambda.$ This proves convergence.The proof also reveals the rate. The error is carried by the lower-triangular part of $L_k$ , and each entry of that lower-triangular part is multiplied by a factor $\left(\frac{\lambda_i}{\lambda_j}\right)^k,\qquad i>j.$ Thus the convergence is controlled by ratios of eigenvalue magnitudes. The worst ratio is $\rho=\max_{1\leq i<n}\left|\frac{\lambda_{i+1}}{\lambda_i}\right|,$ and therefore the basic global convergence rate is $O(\rho^k).$

This is the essential mechanism: powers of $A$ magnify the dominant eigendirections faster than the weaker ones; QR factorization extracts the resulting orthonormal frame; and the amount of remaining contamination decays like eigenvalue ratios.

The practical symmetric QR algorithm

The QR algorithm described above is the conceptual algorithm. The practical algorithm for a dense real symmetric matrix uses four refinements: tridiagonalization, implicit QR steps, shifts, deflation.

Each refinement keeps the same basic principle: replace the matrix by an orthogonally similar matrix, so the eigenvalues are unchanged.

Tridiagonalization

Given a dense real symmetric matrix $A$ , first reduce it to tridiagonal form: $T_0=H^TAH,$ where $H$ is orthogonal. Thus $T_0$ has the same eigenvalues as $A$ . The matrix $T_0$ has the form

$T_0=\begin{pmatrix} a_1 & b_1 & 0 & \cdots & 0\\ b_1 & a_2 & b_2 & \ddots & \vdots\\ 0 & b_2 & a_3 & \ddots & 0\\ \vdots & \ddots & \ddots & \ddots & b_{n-1}\\ 0 & \cdots & 0 & b_{n-1} & a_n \end{pmatrix}.$

The reduction is done by Householder reflectors $H_j=I-2u_ju_j^T,\qquad |u_j|_2=1.$ At step $j$ , the reflector $H_j$ is chosen so that $A\mapsto H_jAH_j$ zeros out the entries below the first subdiagonal in column $j$ . Because the reflector is applied on both sides, symmetry is preserved. After $n-2$ such steps, the matrix is tridiagonal.

This is the decisive practical reduction. A dense symmetric matrix has $O(n^2)$ entries, while a tridiagonal matrix has only $2n-1$ relevant entries. The eigenvalue problem is unchanged, but the later QR steps become much cheaper.

2. Shifts

On an active tridiagonal block, choose a shift $\mu_k$ and compute $T_k-\mu_kI=Q_kR_k.$ Then set $T_{k+1}=R_kQ_k+\mu_kI.$ Equivalently, $T_{k+1}=Q_k^TT_kQ_k.$ So a shifted QR step is still an orthogonal similarity transformation. The eigenvalues are not changed. The role of the shift is to improve the convergence rate. If $\mu_k$ is close to an eigenvalue near the bottom of the active block, then the corresponding bottom subdiagonal entry tends to shrink much faster. The shift will change the convergenve as follows.

$|b_{n-1}^{(k+1)}|\lesssim C|b_{n-1}^{(k)}|\left|\frac{\lambda_n-\mu_k}{\lambda_{n-1}-\mu_k}\right|.$ Thus, if $\mu_k$ is close to $\lambda_n$ but not close to $\lambda_{n-1}$ , the ratio is small and the bottom subdiagonal decays rapidly.

A simple choice is the Rayleigh shift, $\mu_k=(T_k)_{nn}.$ This is because we are assuming for large enough k, this entry is already to close to an eigenvalue so that above argument shows the shift improve the covnergence.

A better practical choice is the Wilkinson shift. It uses the eigenvalue of the trailing $2\times2$ block $\begin{pmatrix} a_{n-1} & b_{n-1}\\ b_{n-1} & a_n \end{pmatrix}$ that is closer to $a_n$ .

3. The implicit QR step

One could form $Q_k$ and $R_k$ explicitly, but that would destroy the efficiency gained from tridiagonalization. Instead, the practical algorithm uses an implicit QR step. Start with the shifted tridiagonal matrix $T_k-\mu_kI$ . Its first column has only two possibly nonzero entries: $\begin{pmatrix} a_1-\mu_k\\ b_1 \end{pmatrix}.$

Choose a Givens rotation $G_1$ acting on coordinates $1,2$ that zeros the second entry. Apply it as a similarity transformation: $T_k\mapsto G_1^TT_kG_1.$ This creates one extra nonzero entry just outside the tridiagonal band. This extra entry is called a bulge. Next choose a Givens rotation $G_2$ acting on coordinates $2,3$ to remove the bulge. This creates a new bulge one step lower. Continue in this way: $G_1,\ G_2,\ \dots,\ G_{n-1}.$ The bulge is chased down the matrix until it disappears at the bottom. The final matrix is again symmetric tridiagonal. Thus the implicit step performs the same shifted QR similarity transformation, $T_{k+1}=Q_k^TT_kQ_k,$ but without forming the full matrix $Q_k$ . It uses only local rotations on neighboring coordinates. This is why one QR step on a tridiagonal matrix costs only $O(n)$ operations.

4. Deflation

As the QR iteration proceeds, some subdiagonal entries become very small. If $b_i^{(k)}$ is numerically negligible. We set $b_i^{(k)}=0.$ The tridiagonal matrix then splits into two smaller independent blocks: $T_k=\begin{pmatrix} T_1 & 0\\ 0 & T_2 \end{pmatrix}.$ The eigenvalue problem has now separated into two smaller eigenvalue problems. This is deflation.

In summary, the practical symmetric QR algorithm proceeds as follows. Starting with a dense real symmetric matrix $A$ , one first makes a single orthogonal reduction to tridiagonal form, $T_0=H^TAH.$ Then, on each active tridiagonal block, one chooses a shift $\mu_k$ and performs an implicit shifted QR step, implemented by Givens rotations and bulge chasing, so that $T_{k+1}=Q_k^TT_kQ_k.$ Whenever a subdiagonal entry becomes negligible, it is set to zero and the matrix deflates into smaller independent blocks. The process continues until every active block is $1\times1$ . The remaining diagonal entries are the computed eigenvalues. Thus tridiagonalization makes the problem sparse, implicit QR preserves that sparsity, shifts accelerate convergence, and deflation makes the problem shrink.

Share this:

Related

Leave a comment Cancel reply