Misplaced Pages

QR algorithm

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Algorithm to calculate eigenvalues

In numerical linear algebra, the QR algorithm or QR iteration is an eigenvalue algorithm: that is, a procedure to calculate the eigenvalues and eigenvectors of a matrix. The QR algorithm was developed in the late 1950s by John G. F. Francis and by Vera N. Kublanovskaya, working independently. The basic idea is to perform a QR decomposition, writing the matrix as a product of an orthogonal matrix and an upper triangular matrix, multiply the factors in the reverse order, and iterate.

The practical QR algorithm

Formally, let A be a real matrix of which we want to compute the eigenvalues, and let A0 := A. At the k-th step (starting with k = 0), we compute the QR decomposition Ak = QkRk where Qk is an orthogonal matrix (i.e., Q = Q) and Rk is an upper triangular matrix. We then form Ak+1 = RkQk. Note that A k + 1 = R k Q k = Q k 1 Q k R k Q k = Q k 1 A k Q k = Q k T A k Q k , {\displaystyle A_{k+1}=R_{k}Q_{k}=Q_{k}^{-1}Q_{k}R_{k}Q_{k}=Q_{k}^{-1}A_{k}Q_{k}=Q_{k}^{\mathsf {T}}A_{k}Q_{k},} so all the Ak are similar and hence they have the same eigenvalues. The algorithm is numerically stable because it proceeds by orthogonal similarity transforms.

Under certain conditions, the matrices Ak converge to a triangular matrix, the Schur form of A. The eigenvalues of a triangular matrix are listed on the diagonal, and the eigenvalue problem is solved. In testing for convergence it is impractical to require exact zeros, but the Gershgorin circle theorem provides a bound on the error.

Using Hessenberg form

In the above crude form the iterations are relatively expensive. This can be mitigated by first bringing the matrix A to upper Hessenberg form (which costs 10 3 n 3 + O ( n 2 ) {\textstyle {\tfrac {10}{3}}n^{3}+{\mathcal {O}}(n^{2})} arithmetic operations using a technique based on Householder reduction), with a finite sequence of orthogonal similarity transforms, somewhat like a two-sided QR decomposition. (For QR decomposition, the Householder reflectors are multiplied only on the left, but for the Hessenberg case they are multiplied on both left and right.) Determining the QR decomposition of an upper Hessenberg matrix costs 6 n 2 + O ( n ) {\textstyle 6n^{2}+{\mathcal {O}}(n)} arithmetic operations. Moreover, because the Hessenberg form is already nearly upper-triangular (it has just one nonzero entry below each diagonal), using it as a starting point reduces the number of steps required for convergence of the QR algorithm.

If the original matrix is symmetric, then the upper Hessenberg matrix is also symmetric and thus tridiagonal, and so are all the Ak. In this case reaching Hessenberg form costs 4 3 n 3 + O ( n 2 ) {\textstyle {\tfrac {4}{3}}n^{3}+{\mathcal {O}}(n^{2})} arithmetic operations using a technique based on Householder reduction. Determining the QR decomposition of a symmetric tridiagonal matrix costs O ( n ) {\displaystyle {\mathcal {O}}(n)} operations.

Iteration phase

If a Hessenberg matrix A {\displaystyle A} has element a k , k 1 = 0 {\displaystyle a_{k,k-1}=0} for some k {\displaystyle k} , i.e., if one of the elements just below the diagonal is in fact zero, then it decomposes into blocks whose eigenproblems may be solved separately; an eigenvalue is either an eigenvalue of the submatrix of the first k 1 {\displaystyle k-1} rows and columns, or an eigenvalue of the submatrix of remaining rows and columns. The purpose of the QR iteration step is to shrink one of these a k , k 1 {\displaystyle a_{k,k-1}} elements so that effectively a small block along the diagonal is split off from the bulk of the matrix. In the case of a real eigenvalue that is usually the 1 × 1 {\displaystyle 1\times 1} block in the lower right corner (in which case element a n n {\displaystyle a_{nn}} holds that eigenvalue), whereas in the case of a pair of conjugate complex eigenvalues it is the 2 × 2 {\displaystyle 2\times 2} block in the lower right corner.

The rate of convergence depends on the separation between eigenvalues, so a practical algorithm will use shifts, either explicit or implicit, to increase separation and accelerate convergence. A typical symmetric QR algorithm isolates each eigenvalue (then reduces the size of the matrix) with only one or two iterations, making it efficient as well as robust.

A single iteration with explicit shift

The steps of a QR iteration with explicit shift on a real Hessenberg matrix A {\displaystyle A} are:

  1. Pick a shift μ {\displaystyle \mu } and subtract it from all diagonal elements, producing the matrix A μ I {\displaystyle A-\mu I} . A basic strategy is to use μ = a n , n {\displaystyle \mu =a_{n,n}} , but there are more refined strategies that would further accelerate convergence. The idea is that μ {\displaystyle \mu } should be close to an eigenvalue, since making this shift will accelerate convergence to that eigenvalue.
  2. Perform a sequence of Givens rotations G 1 , G 2 , , G n 1 {\displaystyle G_{1},G_{2},\dots ,G_{n-1}} on A μ I {\displaystyle A-\mu I} , where G i {\displaystyle G_{i}} acts on rows i {\displaystyle i} and i + 1 {\displaystyle i+1} , and G i {\displaystyle G_{i}} is chosen to zero out position ( i + 1 , i ) {\displaystyle (i+1,i)} of G i 1 G 1 ( A μ I ) {\displaystyle G_{i-1}\dotsb G_{1}(A-\mu I)} . This produces the upper triangular matrix R = G n 1 G 1 ( A μ I ) {\displaystyle R=G_{n-1}\dotsb G_{1}(A-\mu I)} . The orthogonal factor Q {\displaystyle Q} would be G 1 T G 2 T G n 1 T {\displaystyle G_{1}^{\mathrm {T} }G_{2}^{\mathrm {T} }\dotsb G_{n-1}^{\mathrm {T} }} , but it is neither necessary nor efficient to produce that explicitly.
  3. Now multiply R {\displaystyle R} by the Givens matrices G 1 T {\displaystyle G_{1}^{\mathrm {T} }} , G 2 T {\displaystyle G_{2}^{\mathrm {T} }} , …, G n 1 T {\displaystyle G_{n-1}^{\mathrm {T} }} on the right, where G i T {\displaystyle G_{i}^{\mathrm {T} }} instead acts on columns i {\displaystyle i} and i + 1 {\displaystyle i+1} . This produces the matrix R Q = R G 1 T G 2 T G n 1 T {\displaystyle RQ=RG_{1}^{\mathrm {T} }G_{2}^{\mathrm {T} }\dotsb G_{n-1}^{\mathrm {T} }} , which is again on Hessenberg form.
  4. Finally undo the shift by adding μ {\displaystyle \mu } to all diagonal entries. The result is A = R Q + μ I {\displaystyle A'=RQ+\mu I} . Since Q {\displaystyle Q} commutes with I {\displaystyle I} , we have that A = Q T ( A μ I ) Q + μ I = Q T A Q {\displaystyle A'=Q^{\mathrm {T} }(A-\mu I)Q+\mu I=Q^{\mathrm {T} }AQ} .

The purpose of the shift is to change which Givens rotations are chosen.

In more detail, the structure of one of these G i {\displaystyle G_{i}} matrices are G i = [ I 0 0 0 0 c s 0 0 s c 0 0 0 0 I ] {\displaystyle G_{i}={\begin{bmatrix}I&0&0&0\\0&c&-s&0\\0&s&c&0\\0&0&0&I\end{bmatrix}}} where the I {\displaystyle I} in the upper left corner is an ( n 1 ) × ( n 1 ) {\displaystyle (n-1)\times (n-1)} identity matrix, and the two scalars c = cos θ {\displaystyle c=\cos \theta } and s = sin θ {\displaystyle s=\sin \theta } are determined by what rotation angle θ {\displaystyle \theta } is appropriate for zeroing out position ( i + 1 , i ) {\displaystyle (i+1,i)} . It is not necessary to exhibit θ {\displaystyle \theta } ; the factors c {\displaystyle c} and s {\displaystyle s} can be determined directly from elements in the matrix G i {\displaystyle G_{i}} should act on. Nor is it necessary to produce the whole matrix; multiplication (from the left) by G i {\displaystyle G_{i}} only affects rows i {\displaystyle i} and i + 1 {\displaystyle i+1} , so it is easier to just update those two rows in place. Likewise, for the Step 3 multiplication by G i T {\displaystyle G_{i}^{\mathrm {T} }} from the right, it is sufficient to remember i {\displaystyle i} , c {\displaystyle c} , and s {\displaystyle s} .

If using the simple μ = a n , n {\displaystyle \mu =a_{n,n}} strategy, then at the beginning of Step 2 we have a matrix A a n , n I = ( × × × × × × × × × × 0 × × × × 0 0 × × × 0 0 0 × 0 ) {\displaystyle A-a_{n,n}I={\begin{pmatrix}\times &\times &\times &\times &\times \\\times &\times &\times &\times &\times \\0&\times &\times &\times &\times \\0&0&\times &\times &\times \\0&0&0&\times &0\end{pmatrix}}} where the × {\displaystyle \times } denotes “could be whatever”. The first Givens rotation G 1 {\displaystyle G_{1}} zeroes out the ( i + 1 , i ) {\displaystyle (i+1,i)} position of this, producing G 1 ( A a n , n I ) = ( × × × × × 0 × × × × 0 × × × × 0 0 × × × 0 0 0 × 0 ) . {\displaystyle G_{1}(A-a_{n,n}I)={\begin{pmatrix}\times &\times &\times &\times &\times \\0&\times &\times &\times &\times \\0&\times &\times &\times &\times \\0&0&\times &\times &\times \\0&0&0&\times &0\end{pmatrix}}{\text{.}}} Each new rotation zeroes out another subdiagonal element, thus increasing the number of known zeroes until we are at H = G n 2 G 1 ( A a n , n I ) = ( × × × × × 0 × × × × 0 0 × × × 0 0 0 h n 1 , n 1 h n 1 , n 0 0 0 h n , n 1 0 ) . {\displaystyle H=G_{n-2}\dotsb G_{1}(A-a_{n,n}I)={\begin{pmatrix}\times &\times &\times &\times &\times \\0&\times &\times &\times &\times \\0&0&\times &\times &\times \\0&0&0&h_{n-1,n-1}&h_{n-1,n}\\0&0&0&h_{n,n-1}&0\end{pmatrix}}{\text{.}}} The final rotation G n 1 {\displaystyle G_{n-1}} has ( c , s ) {\displaystyle (c,s)} chosen so that s h n 1 , n 1 + c h n , n 1 = 0 {\displaystyle sh_{n-1,n-1}+ch_{n,n-1}=0} . If | h n 1 , n 1 | | h n , n 1 | {\displaystyle |h_{n-1,n-1}|\gg |h_{n,n-1}|} , as is typically the case when we approach convergence, then c 1 {\displaystyle c\approx 1} and | s | 1 {\displaystyle |s|\ll 1} . Making this rotation produces R = G n 1 G n 2 G 1 ( A a n , n I ) = ( × × × × × 0 × × × × 0 0 × × × 0 0 0 × c h n 1 , n 0 0 0 0 s h n 1 , n ) , {\displaystyle R=G_{n-1}G_{n-2}\dotsb G_{1}(A-a_{n,n}I)={\begin{pmatrix}\times &\times &\times &\times &\times \\0&\times &\times &\times &\times \\0&0&\times &\times &\times \\0&0&0&\times &ch_{n-1,n}\\0&0&0&0&sh_{n-1,n}\end{pmatrix}}{\text{,}}} which is our upper triangular matrix. But now we reach Step 3, and need to start rotating data between columns. The first rotation acts on columns 1 {\displaystyle 1} and 2 {\displaystyle 2} , producing R G 1 T = ( × × × × × × × × × × 0 0 × × × 0 0 0 × c h n 1 , n 0 0 0 0 s h n 1 , n ) . {\displaystyle RG_{1}^{\mathrm {T} }={\begin{pmatrix}\times &\times &\times &\times &\times \\\times &\times &\times &\times &\times \\0&0&\times &\times &\times \\0&0&0&\times &ch_{n-1,n}\\0&0&0&0&sh_{n-1,n}\end{pmatrix}}{\text{.}}} The expected pattern is that each rotation moves some nonzero value from the diagonal out to the subdiagonal, returning the matrix to Hessenberg form. This ends at R G 1 T G n 1 T = ( × × × × × × × × × × 0 × × × × 0 0 × × × 0 0 0 s 2 h n 1 , n c s h n 1 , n ) . {\displaystyle RG_{1}^{\mathrm {T} }\dotsb G_{n-1}^{\mathrm {T} }={\begin{pmatrix}\times &\times &\times &\times &\times \\\times &\times &\times &\times &\times \\0&\times &\times &\times &\times \\0&0&\times &\times &\times \\0&0&0&-s^{2}h_{n-1,n}&csh_{n-1,n}\end{pmatrix}}{\text{.}}} Algebraically the form is unchanged, but numerically the element in position ( n , n 1 ) {\displaystyle (n,n-1)} has gotten a lot closer to zero: there used to be a factor s {\displaystyle s} gap between it and the diagonal element above, but now the gap is more like a factor s 2 {\displaystyle s^{2}} , and another iteration would make it factor s 4 {\displaystyle s^{4}} ; we have quadratic convergence. Practically that means O ( 1 ) {\displaystyle O(1)} iterations per eigenvalue suffice for convergence, and thus overall we can complete in O ( n ) {\displaystyle O(n)} QR steps, each of which does a mere O ( n 2 ) {\displaystyle O(n^{2})} arithmetic operations (or as little as O ( n ) {\displaystyle O(n)} operations, in the case that A {\displaystyle A} is symmetric).

Visualization

Figure 1: How the output of a single iteration of the QR or LR algorithm varies alongside its input

The basic QR algorithm can be visualized in the case where A is a positive-definite symmetric matrix. In that case, A can be depicted as an ellipse in 2 dimensions or an ellipsoid in higher dimensions. The relationship between the input to the algorithm and a single iteration can then be depicted as in Figure 1 (click to see an animation). Note that the LR algorithm is depicted alongside the QR algorithm.

A single iteration causes the ellipse to tilt or "fall" towards the x-axis. In the event where the large semi-axis of the ellipse is parallel to the x-axis, one iteration of QR does nothing. Another situation where the algorithm "does nothing" is when the large semi-axis is parallel to the y-axis instead of the x-axis. In that event, the ellipse can be thought of as balancing precariously without being able to fall in either direction. In both situations, the matrix is diagonal. A situation where an iteration of the algorithm "does nothing" is called a fixed point. The strategy employed by the algorithm is iteration towards a fixed-point. Observe that one fixed point is stable while the other is unstable. If the ellipse were tilted away from the unstable fixed point by a very small amount, one iteration of QR would cause the ellipse to tilt away from the fixed point instead of towards. Eventually though, the algorithm would converge to a different fixed point, but it would take a long time.

Finding eigenvalues versus finding eigenvectors

Figure 2: How the output of a single iteration of QR or LR are affected when two eigenvalues approach each other

It's worth pointing out that finding even a single eigenvector of a symmetric matrix is not computable (in exact real arithmetic according to the definitions in computable analysis). This difficulty exists whenever the multiplicities of a matrix's eigenvalues are not knowable. On the other hand, the same problem does not exist for finding eigenvalues. The eigenvalues of a matrix are always computable.

We will now discuss how these difficulties manifest in the basic QR algorithm. This is illustrated in Figure 2. Recall that the ellipses represent positive-definite symmetric matrices. As the two eigenvalues of the input matrix approach each other, the input ellipse changes into a circle. A circle corresponds to a multiple of the identity matrix. A near-circle corresponds to a near-multiple of the identity matrix whose eigenvalues are nearly equal to the diagonal entries of the matrix. Therefore, the problem of approximately finding the eigenvalues is shown to be easy in that case. But notice what happens to the semi-axes of the ellipses. An iteration of QR (or LR) tilts the semi-axes less and less as the input ellipse gets closer to being a circle. The eigenvectors can only be known when the semi-axes are parallel to the x-axis and y-axis. The number of iterations needed to achieve near-parallelism increases without bound as the input ellipse becomes more circular.

While it may be impossible to compute the eigendecomposition of an arbitrary symmetric matrix, it is always possible to perturb the matrix by an arbitrarily small amount and compute the eigendecomposition of the resulting matrix. In the case when the matrix is depicted as a near-circle, the matrix can be replaced with one whose depiction is a perfect circle. In that case, the matrix is a multiple of the identity matrix, and its eigendecomposition is immediate. Be aware though that the resulting eigenbasis can be quite far from the original eigenbasis.

Speeding up: Shifting and deflation

The slowdown when the ellipse gets more circular has a converse: It turns out that when the ellipse gets more stretched - and less circular - then the rotation of the ellipse becomes faster. Such a stretch can be induced when the matrix M {\displaystyle M} which the ellipse represents gets replaced with M λ I {\displaystyle M-\lambda I} where λ {\displaystyle \lambda } is approximately the smallest eigenvalue of M {\displaystyle M} . In this case, the ratio of the two semi-axes of the ellipse approaches {\displaystyle \infty } . In higher dimensions, shifting like this makes the length of the smallest semi-axis of an ellipsoid small relative to the other semi-axes, which speeds up convergence to the smallest eigenvalue, but does not speed up convergence to the other eigenvalues. This becomes useless when the smallest eigenvalue is fully determined, so the matrix must then be deflated, which simply means removing its last row and column.

The issue with the unstable fixed point also needs to be addressed. The shifting heuristic is often designed to deal with this problem as well: Practical shifts are often discontinuous and randomised. Wilkinson's shift—which is well-suited for symmetric matrices like the ones we're visualising—is in particular discontinuous.

The implicit QR algorithm

In modern computational practice, the QR algorithm is performed in an implicit version which makes the use of multiple shifts easier to introduce. The matrix is first brought to upper Hessenberg form A 0 = Q A Q T {\displaystyle A_{0}=QAQ^{\mathsf {T}}} as in the explicit version; then, at each step, the first column of A k {\displaystyle A_{k}} is transformed via a small-size Householder similarity transformation to the first column of p ( A k ) {\displaystyle p(A_{k})} (or p ( A k ) e 1 {\displaystyle p(A_{k})e_{1}} ), where p ( A k ) {\displaystyle p(A_{k})} , of degree r {\displaystyle r} , is the polynomial that defines the shifting strategy (often p ( x ) = ( x λ ) ( x λ ¯ ) {\displaystyle p(x)=(x-\lambda )(x-{\bar {\lambda }})} , where λ {\displaystyle \lambda } and λ ¯ {\displaystyle {\bar {\lambda }}} are the two eigenvalues of the trailing 2 × 2 {\displaystyle 2\times 2} principal submatrix of A k {\displaystyle A_{k}} , the so-called implicit double-shift). Then successive Householder transformations of size r + 1 {\displaystyle r+1} are performed in order to return the working matrix A k {\displaystyle A_{k}} to upper Hessenberg form. This operation is known as bulge chasing, due to the peculiar shape of the non-zero entries of the matrix along the steps of the algorithm. As in the first version, deflation is performed as soon as one of the sub-diagonal entries of A k {\displaystyle A_{k}} is sufficiently small.

Renaming proposal

Since in the modern implicit version of the procedure no QR decompositions are explicitly performed, some authors, for instance Watkins, suggested changing its name to Francis algorithm. Golub and Van Loan use the term Francis QR step.

Interpretation and convergence

The QR algorithm can be seen as a more sophisticated variation of the basic "power" eigenvalue algorithm. Recall that the power algorithm repeatedly multiplies A times a single vector, normalizing after each iteration. The vector converges to an eigenvector of the largest eigenvalue. Instead, the QR algorithm works with a complete basis of vectors, using QR decomposition to renormalize (and orthogonalize). For a symmetric matrix A, upon convergence, AQ = , where Λ is the diagonal matrix of eigenvalues to which A converged, and where Q is a composite of all the orthogonal similarity transforms required to get there. Thus the columns of Q are the eigenvectors.

History

The QR algorithm was preceded by the LR algorithm, which uses the LU decomposition instead of the QR decomposition. The QR algorithm is more stable, so the LR algorithm is rarely used nowadays. However, it represents an important step in the development of the QR algorithm.

The LR algorithm was developed in the early 1950s by Heinz Rutishauser, who worked at that time as a research assistant of Eduard Stiefel at ETH Zurich. Stiefel suggested that Rutishauser use the sequence of moments y0 A x0, k = 0, 1, ... (where x0 and y0 are arbitrary vectors) to find the eigenvalues of A. Rutishauser took an algorithm of Alexander Aitken for this task and developed it into the quotient–difference algorithm or qd algorithm. After arranging the computation in a suitable shape, he discovered that the qd algorithm is in fact the iteration Ak = LkUk (LU decomposition), Ak+1 = UkLk, applied on a tridiagonal matrix, from which the LR algorithm follows.

Other variants

One variant of the QR algorithm, the Golub-Kahan-Reinsch algorithm starts with reducing a general matrix into a bidiagonal one. This variant of the QR algorithm for the computation of singular values was first described by Golub & Kahan (1965). The LAPACK subroutine DBDSQR implements this iterative method, with some modifications to cover the case where the singular values are very small (Demmel & Kahan 1990). Together with a first step using Householder reflections and, if appropriate, QR decomposition, this forms the DGESVD routine for the computation of the singular value decomposition. The QR algorithm can also be implemented in infinite dimensions with corresponding convergence results.

References

  1. J.G.F. Francis, "The QR Transformation, I", The Computer Journal, 4(3), pages 265–271 (1961, received October 1959). doi:10.1093/comjnl/4.3.265
  2. Francis, J. G. F. (1962). "The QR Transformation, II". The Computer Journal. 4 (4): 332–345. doi:10.1093/comjnl/4.4.332.
  3. Vera N. Kublanovskaya, "On some algorithms for the solution of the complete eigenvalue problem," USSR Computational Mathematics and Mathematical Physics, vol. 1, no. 3, pages 637–657 (1963, received Feb 1961). Also published in: Zhurnal Vychislitel'noi Matematiki i Matematicheskoi Fiziki, vol.1, no. 4, pages 555–570 (1961). doi:10.1016/0041-5553(63)90168-X
  4. ^ Golub, G. H.; Van Loan, C. F. (1996). Matrix Computations (3rd ed.). Baltimore: Johns Hopkins University Press. ISBN 0-8018-5414-8.
  5. ^ Demmel, James W. (1997). Applied Numerical Linear Algebra. SIAM.
  6. ^ Trefethen, Lloyd N.; Bau, David (1997). Numerical Linear Algebra. SIAM.
  7. Ortega, James M.; Kaiser, Henry F. (1963). "The LL and QR methods for symmetric tridiagonal matrices". The Computer Journal. 6 (1): 99–101. doi:10.1093/comjnl/6.1.99.
  8. "linear algebra - Why is uncomputability of the spectral decomposition not a problem?". MathOverflow. Retrieved 2021-08-09.
  9. Watkins, David S. (2007). The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods. Philadelphia, PA: SIAM. ISBN 978-0-89871-641-2.
  10. Parlett, Beresford N.; Gutknecht, Martin H. (2011), "From qd to LR, or, how were the qd and LR algorithms discovered?" (PDF), IMA Journal of Numerical Analysis, 31 (3): 741–754, doi:10.1093/imanum/drq003, hdl:20.500.11850/159536, ISSN 0272-4979
  11. Bochkanov Sergey Anatolyevich. ALGLIB User Guide - General Matrix operations - Singular value decomposition . ALGLIB Project. 2010-12-11. URL: Accessed: 2010-12-11. (Archived by WebCite at https://www.webcitation.org/5utO4iSnR?url=http://www.alglib.net/matrixops/general/svd.php
  12. Deift, Percy; Li, Luenchau C.; Tomei, Carlos (1985). "Toda flows with infinitely many variables". Journal of Functional Analysis. 64 (3): 358–402. doi:10.1016/0022-1236(85)90065-5.
  13. Colbrook, Matthew J.; Hansen, Anders C. (2019). "On the infinite-dimensional QR algorithm". Numerische Mathematik. 143 (1): 17–83. arXiv:2011.08172. doi:10.1007/s00211-019-01047-5.

Sources

External links

Numerical linear algebra
Key concepts
Problems
Hardware
Software
Category: