UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
APPLICATION OF EIGENVALUE SENSITIVITY AND EIGENVECTOR SENSITIVITY IN EIGENCOMPUTATIONS BY PURANDAR SARMAH A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1993 ACKNOWLEDGEMENTS I am indebted to a number of people, prepare my dissertation. his advice and help throughout directly or indirectly First, I would like to thank my advisor I for helping me to )r. William Hager the different stages of this dissertation. I am grateful to Kermit Sigmon and Dr. James Keesling for working with me on several algorithms which I used to get some numerical results in my dissertation. would like to thank Dr. committee member. Wi Bruce Edwards for his invaluable advice as my supervisory thout his advice I would not have been able to push my work this far. I am grateful to Dr. Nilotpal Ghosh, who was a visiting professor in our department. His work on eigenvalue problem had direct influence on some of my results. Finally, I want to express my deep sense of gratitude to my wife for helping me in different ways. Without her encouragement and help during the more frustrating moments of my research, this work would not have come to this final stage. TABLE OF CONTENTS ACKNOWLEDGEMENTS LIST OF FIGURES . ABSTRACT CHAPTERS INTRODUCTION . . . . . EIGENVALUES OF UNSYMMETRIC MATRICES . Block Diagonalization of a Matrix . A Differential Equation Approach to Eigen A Differential Equation Approach to Eigenc Stepsize Convergence of Block Diagonalization of a Block Schur Decomposition of a Matrix . An Algorithm for Block Schur Decomposit Parallel Processing in Eigencomputations EIGENVALUES OF SYMMETRIC MATRICES computations . . omputations with Armijo's r * S 4 4 4 5 S S S S 5 4 S Lct i X* S S S S S S S S S S * M a ri . . . ion of a Matrix 4 4 4 4 9 5 5 5 4 1 8 Diagonalization of a Symmetric Matrix using Armijo's Stepsize. .. Block Diagonalization of a Symmetric Matrix using Armijo's Stepsize 4 CONCLUSION REFERENCES 9 44a4108 9 9 4 5 4 4 4 5 5 4 4 5 5 4 5 9 6 4 9 9 S S 6 4 9 5 5 4 9 S S 6 5 9 1 1 2 BIOGRAPHICAL SKETCH S4 4 5 6 0 4 4 S 114 LIST OF FIGURES The path of convergence of the eigenvalues of the matrix A. . The path of convergence of the eigenvalues of C. . . The direction, in which the elements of G(S, U) are computed, when the number of processors are greater than half of the total blocks. The direction, in which the elements of G(S, U) are computed, the number of processors are less than half of the total blocks. when Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy APPLICATION OF EIGENVALUE SENSITIVITY AND EIGENVECTOR SENSITIVITY IN EIGENCOMPUTATIONS By PURANDAR SARMAH December 1993 Chairman: William Hager Major Department: Mathematics In 1987 , in his paper "Bidiagonalization and Diagonalization," William W Hager presented an algorithm to diagonalize a matrix with distinct eigenvalues. ment the algorithm, First, To imple we need a good approximation to the matrix of eigenvectors. we present the derivation of his algorithm, and then we show how it can be generalized to diagonalize a matrix with multiple eigenvalues. vergence result is established for the generalized algorithm. A local quadratic con When a good starting guess for the eigenvectors is not known, Hager's algorithm may not converge. To re store convergence, we modified the algorithm by replacing diagonalizations by Schur decompositions. complex plane, We found that if uniformly distributed numbers on a circle in the which includes all Gerschgorin disks for the matrix, are used to ap proximate the eigenvalues, and if the eigenvectors are approximated by the columns of the identity matrix, then rapid convergence is obtained. We presented some node programs which, alone with some modifications to the last algorithm, can be used in we present additional modifications of Hager's algorithm and the generalized algo rithm that take advantage of matrix symmetry and reduce the given matrix into a diagonal matrix with eigenvalues along the diagonal. CHAPTER 1 INTRODUCTION A be an n x n matrix. We want to find an iterative method to compute a diagonalization XAX1 , provided it exists, where the columns of eigenvectors of A, and A = diag(A1, A2,.. * n ) whose diagonal elements are the eigenvalues of A. Here we will use a sensitivity result for eigenpairs to diagonalize a matrix. We assume eigenvalues of A are all distinct. In the derivation of necessary equations to diagonalize a matrix, we need the continuity of eigenvalues, and perturbation theory based on Gerschgorin's theorem. We give the statement of the theorem below: Theorem 1.O. 1 C Cxn , Pi = aijd, and Iz a l A(A) Furthermore, a set of k Gerschgorin disks are isolated from the other n k disks, then their union contains precisely k eigenvalues of A. Proof of Theorem 1.0.1 For proof, the interested readers are referred to [Atk89, The orem 9.1, page 588]. D = {z < Pi}, column of small number e, Given an we will show arbitrary the next matrix theorem a sufficiently there exist continuously differentiable functions Aj(e), and xj(e) with Aj(O) = Aj, and (A + EE)xj() = Theorem 1.0.2 (0) = xj such that That is (Aj(c),xj(c)) is an eigenpair of A + cE. Suppose X 'AX = diag( A, is a diagonalization of 1. . Then given arbitrary n x n matrix for a sufficiently small C, there exist continuously diferentiable functions Aj(e), and xj(e) with A,(O) (0) = xj such that the following matrix equation holds: (A + eE)xj(E) = Aj(e)xj(e). (1.1) Proof of Theorem 1.0.2 Let X' AX = diag( A I. . An ) be a diagonaliza tion of A Let E be an n x n matri a small number. Consider the matrix 1(A + eE)X Then for i for i where yfExj, is the row = diag( and let C = BDB .. . then cfi, C2 *rs S * kf,3 .5. kf2 1 tf'n L: tfs2 a a n I* 1 n, i lkVL I Let A be an n xn matrix. Assume the eigenvalues of A are all distinct. A,(e)zr,(c). = A j, ,A2,. . dij = { X? + fjj E/ij 1, 1,... e/k, I 3 and k is chosen so as to make the jth Gerschgorin disk disjoint from the other n 1 disks. We may choose a value of kc, which is independent of e in the following way. Choose k to have the largest value consistent with the inequalities: 1 IA I A i 1, ., n. We achieve the above inequality if mm isj 1 n. With the above value of k, the jth disk will be isolated from the other n 1 disks, the radius of the jth disk will be of order , and the perturbation in the jth eigenvalue will be of order E as e 0. Next we discuss the effect of perturbations in the elements of A on the eigen vectors in its diagonalization. As earlier, let X'AX = diag be a diagonalization of A. eigenvectors. We denote ei The columns zl, ..., 2 of X igenvectors of A + eE by xi(e), form a complete set of ..., xn(), and eigen vectors of D = X'(A + cE)X by vl(c), V2(j), * .. vn(E) such that xi(c) = Xvi(c). Since (A,(e),v1(c)) is an eigenpair of D, and XA(0) = Aj, so we must have vj(O) which is the jth unit vector. Let vj(c) be For sufficiently small for some normalized so that its e, claim that the jth element of vj(e) is 1. j. Since Dvy() = Aj(e)v)(e), largest element is 1. If not, let vij(e) so equating the ith component on both sides, we get (1.2) \j(e)vij()= AiVij(e) + tCfuv(1), I=1 that is S t ' A' ' = ej, Ikfi I , 2, * n 52(<), less than NE as e  0 for some N Since Iv (c)I 1, so from (1.2), we get IA1(E) A IIv (c)I CZIMfi. A(E) A I 1 As AsI for i i = 1,..., n, then Ivij(e) Ne, where max i#j Thus vij(c) are of order e for (Ay(c) A\)vij(e) = efi + The second term on the right is of order e2, so for 1=1 f  ,... ,n. Also from (1.2), we get failvj(E). 1,... ,n, we have Vj() fij  Ai Hence Aj(E continuous. and xyj() are differentiable functions of e, and their first derivatives are This completes the proof of the theorem. O Now we are ready to derive the necessary equations to find a diagonalization of a matrix A, whose eigenvalues are all distinct. Let (As, n x n matrix xj) be a simple eigenvalue, and corresponding eigenvector of A. E, and for a sufficiently small Given an e, let A2(c), and xyj() be differentiable functions of c such that Aj(0) = Aj, and xj(O) to c and putting e = 0, Differentiating (1.1) with respect we have Ax4(O) + Ex, = A (0)xj + Aax'(0). Since the columns (1.3) Xi, x2, ..., x, of X form a complete set of eigenvectors of A, N N V. = X . =O(e2). Since a scalar multiple of an eigenvector is again an eigenvector, we can take bjj(c) = without loss of generality. Now differentiating (1.4) with respect to e and setting c = 0, we have n (1.5) Substituting (1.5) into (1.3), we obtain b1 (0)x; + Exz = A'(O)x + Aj b j(0)x bj (0) Exj = I = xA(.x b'(o) ( Aj  (1.6) To obtain expressions for A[(0), and bl1(O) we use relations (1.7) where yT is the ith row of X1 So premultiplying (1.6) by y , and yT yields A(O) = y Ex1. (1.8) yTExj = b'kj(O)( j Ak) bkj(o) = Yk Exz With these values of b'k(0), (1.5) becomes x=j0 nyTE (1.9) A first order Tavlor expansion rives k  = k, =(0) b',(0)z;. b:j (0)XjA x + (0)x, + Substituting the values of A)(0), and sx(0) from (1.8), imations, we get and (1.9) in the above approx xj( ) Aj(0) + eyjfExj, xj(O) + Z yTE xj j 3 (1.10) To develop an algorithm we use (1.10) in the following way. Suppose XAX"' is an approximate diagonalization of B. If we identify E in (1.10) with B  XAX then to the first order: Aj(0) + ey (B XAX1) Aj(O) + eyBxj eAj(), yT (B (0) +  XAX'1) j j  \ xi 42 A x 0) + With SyjTBxy v Bx1 A e = 1, these approximations for an eigenvalue, and corresponding eigenvec tor give: \ne"w 3 ynew Bx(ld S(yd)T = xold , t L. '=l Inew A] 1 : n oxd 1: n, \net (1.11) We can use thnew above expressions to formulate an algorithm to compute a diago We can use the above expressions to formulate an algorithm to compute a diago nalization of a nondefective matrix A as follows: Alqorithm 1.0. I (Diaqonalization) Given a nondefective matrix A E C x" a matrix of approximate eigenvectors X0o, and a tolerance tol greater than the unit roundoff, the following algorithm comnutes a diagonalization A = XAX1 x;(e) C1, C2,...,c, ] is an x n zero matrix. for ( )T end for j = 1:n for i=1 if i <cj (yiT end end xj 4 xj j * xj/lI xjII 4X Dif IICIIF where II I IF is the Frobenious matrix norm. end Above, we use a normalization for xj to reduce the growth of X. Example 1.0.1 If Algorithm 1.0.1 is applied to 1.4 3.1 4.5 4.7 0.1 2.3 0.5 5.0 1.3 7.7 1.4 with Xo = I, and tol = 109 , then the eigenvalues converge as shown in the following table: C= +cj; Iteration 8.0000 7.6170 7.1096 8.7039 7.9836 7.9953 7.9943 7.9943 7.1000 5.0012 4.5657 5.0217 4.9963 4.9964 4.9964 0.1000 0.9699 1.5900 0.9711 0.8616 0.8889 0.8892 0.8892 7.7000 6.71 6.1843 6.0978 6.0530 6.0571 6.0571 6.0571 1.4000 0.9700 .0941 .2429 1.9139 1.9235 1.9229 1.9229 The exponents of convergence p in I IXk+1  xIPl XlIF for different values of k are given in the following table, where Xk is the matrix of eigenvectors at the kth iteration, and X is the final matrix of eigenvectors: 3.207 0.430 4.123 1.962 Hence the method appears to converge quadratically. ( N. Ghosh gave a theoretical proof of the quadratic convergence of the diagonalization method ). Example 1.0.2 If Algorithm 1.0.1 is applied to with X0 = I, and tol = , then the approximate eigenvalues do not converge to the eigenvalues of A, which is clear from the following table: 108 Iteration 1 100 200 300 400 500 600 700 800 900 1000 1100 55 1241.77 124807.75 416577.75 539592 685138 13438605.19 72503.38 70919.84 436916.5 2682893.03 1244105 4853 318959.32 230821 163324.75 422251 1459224.5 513144 586343.25 1537.75 42945.63 440503.5 1586. 368210.46 768266.74 474518.5 17628 7491094 3064817 787740.75 635731.38 901415.25 11654959.5 29 5100.90 369633.58 1252454 868096 235428.5 14400555 1601468 886619.25 511936 1409781.21 1572 6 446.15 192874.49 1373474 17506.02 889446 6993606 2049060 558243.75 562132 4951288.04 3270228.12 The eigenvalues of A are A 233.3505 27.1460 3339 43.2339 62.9287 ]. The above example shows that in general, the identity matrix can not be used for the starting guess matrix of eigenvectors for a nondefective matrix with distinct eigenvalues. If by another method, a matrix of eigenvectors of a matrix A can be determined, then this method can be used to find the eigenvalues, and corresponding eigenvectors of a slightly perturbed matrix A + EE, small scalar. where E is an arbitrary matrix, and e is a For example, first reduce the matrix A to an upper Hessenberg form H by an orthogonal matrix Q, and then apply the QR method with double implicit shift to H to obtain a quasiupper triangular matrix. Next solve (H pIl) find corresponding eigenvector v of the eigenvalue p. Then x = Qv is the eigenvector corresponding to the eigenvalue Now these eigenvectors can used as approximate eigenvectors in the diagonalization method to find the eigenvalues, and corresponding eigenvectors of the perturbed matrix A + eE. I 0l Q W0 rnA,.n +b* ni A :I tn 1 I . n T f + n ni inf arnv II n o an k at' S, v I 9 U are A = [233.3505, 27.1460, 8.3339 43.2339, 62.9287]. To find an eigenvector x corresponding to the eigenvalue pu, let B A ip. Then solve Bz = 0 for x. Here is the matrix of eigenvectors: 0.4674 0.4487 0.4730 0.4986 0.3284 0.5909 0.6005 0.2783 0.1176 0.4460 0.3002 0.0657 0.4712 0.6361 0.5280 0.7147 0.1294 0.3346 0.0707 0.5962 0.2386 0.3043 0.5919 0.4198 0.5691 Next consider the following matrices AI, and A2, which we get by perturbing the elements of A: 55.056 47.032 74.078 68.042 42.075 7.065 61.012 1.045 33.087 56.035 73.068 55.56 47.32 68.42 42.75 33.87 56.35 73.68 4.013 59.053 26.086 61.12 21.45 4.13 59.53 26.86 1.021 61.067 79.031 29.046 20.079 51.21 61.67 79.31 29.46 20.79 79.023 60.024 6.097 79.23 60.24 52.57 6.97 If we use Algorithm 1.0.1 2 with X0 = X, and then af ter 3 iterations we obtain diagonalizations Y1A A2Z whereas if we apply the QR method with double impli cit shift to ST AIS, and ST then after 4 iterations we obtain real Schur decompositions pT(S AS)Q = F. The derivation of the results ), and (1.9) can be found n many Numer Linear Algebra texts. The derivation of the formu which are used * aI *  Iw 1 4 ' .... .... I ..1 1 1 I 1. .I E *I S L.*)L I I 1 I.. *1 I I S)P 4 A1 = A2 = A2S, ** I _ _:_ : _ I_ 1 .. ! .. II._ A 11 In Section 2.1, we will show how Algorithm 1.11 can be generalized to diagonalize a nondefective matrix with multiple eigenvalues. In Section we will show that the generalized algorithm converges locally quadratically. Approach to Eigencomputations is introduced in Section A Differential Equation to create an iterative method to find the eigenvalues, corresponding eigenvectors of a nondefective matrix A. A(t) The differential equations governing the block matrices have the form: = A(t) + Diag (X(t)AX(t)) and X(t) = X(t)F(X(t),A(t)), *1 where Diag(M) a block diagonal matrix formed from the diagonal blocks of the block matrix M, Fjj(X(t), A()) is a square zero matrix, and Fij(X(t) A(t)) j is the solution B to the matrix equation BAj(t)  Ai(t)B = Y(t)TAXj(t). Here Y1(t)T denotes the ith block row of X(t)1 The Euler approximation to the differential equations can be expressed An+l(t) An + At (Diag xAX,) A) and Xn+(t)= Xn + AtXF (X, In Section 2.3, we modify (1.12) by replacing the fixed stepsize At in each iteration by a variable stepsize Armijo rule from optimization theory is used to find a stepsize s, for which IA Xn+ ( s)A}n+(s)Xn1{(S)I IF X, AX 1IIF holds. Example is not known, .3.2 shows that , when a good approximation to the matrix of eigenvectors the algorithm in Section 2.3 may not converge. In Section discuss the factors sensitivity in the Schur decomposition relative to perturbations in the coefficient matrix to compute a block Schur decomposition of a matrix. In Section 2.6, we present an algorithm to compute a block Schur decomposition of a from optimization theory. The starting eigenvalues associated with the Schur decom position are uniformly distributed points on a circle in the complex plane, where the circle includes all Gerschgorin disks for the matrix, and the starting guess unitary factor is the identity matrix. In Chapter 3, we discuss how to exploit the structure of a matrix. matrices are diagonalizable, Since symmetric we modify the diagonalization method, and the block diagonalization method to exploit the matrix symmetry and reduce the given matrix into a diagonal matrix with eigenvalues along the diagonal. CHAPTER 2 EIGENVALUES OF UNSYMMETRIC MATRICES 2.1 Block Diagonalization of a Matrix Let A be an n x n nondefective matrix. Our aim is to find an iterative method to compute a block diagonalization of the form A = XAX 1, where A = diag(Ai, A2,..., At) is a block diagonal matrix, and X = [X,X2, ..., Xt] is a compatible block column matrix such that AXj = Xj Aj. Here A = diag( Aj . . ,j ) is an mj x mj scalar matrix. can always be arranged so that the eigenvalues of A; and Aj for i are distinct. We will use a sensitivity result for eigenvalues and eigenvectors to block diagonalize a matrix. In the derivation of necessary equations to block diagonalize a matrix, we need the continuity of eigenvalues, and the perturbation theory based on Gerschgorin's theorem. Recall that the coefficients of the characteristic polynomial of a matrix B are con tinuous functions of the elements of B [Wil65, page 66], and the zeros of a polynomial depend continuously on its coefficients. Proof in [Hen74, page 281 So the eigen values of B depend continuously on the elements of B. Also, Rouch6's th that if f and g are analytic in a neighborhood of a closed disk D = {z: eorem states z a centered at a , f has no zeros on 9D = {z * a a= and ig(z) < If(z) holds z C OD, then z) + g(z) have the same number of zeros in D. Detail is in [Hen74, page then the disk D = {z v7} centered at A contains precisely k eigenvalues of A+E. Proof of Theorem 2.1.1 Proof is given in [Ste90, page 167]. Next we will discuss the effect of perturbations in the elements of A on the eigen values, and the eigenvectors in its diagonalization, which is based on Gerschgorin theorem. Let A be an n x n nondefective matrix. Suppose X'AX s a block diagonalization of A, where A is a block diagonal matrix. onal block of A If Aj denotes the jth diag , then we have AXj = XjAj, where Xj is the jth block column of X. Given an arbitrary matrix E, and for a sufficiently small e, we will show in the next theorem that there exist continuously differentiable functions Aj(e), and Xj(c) with Aj(O) = A and Xj(O) = Xj such that (A + eE)X,(e) = X,(c)A,(e). Theorem 2.1.2 Let A be an n x n nondefective matrix. Suppose X AX = A is block diagonalization of A, and if Aj is the jth diagonal block of A, then AX, = where XA eigenvalues. the jth block column of X. Then given an arbitrary n Suppose Ai, and Aj for i x n matrix E have distinct , and for a sufficiently small , there exist continuously differentiable functions A,(e), and Xj(e ) with A(0)) = A and X(0O) = Xj such that the following matrix equation holds: (A + eE)Xj,() = Xj(c)Aj(e). Proof of Theorem 2.1.2 Let X 1AX = diag( A1, A2,..., At ) be a block diagonaliza tion of where A1 = diag Aj ) is a scalar matrix , and let mrn = dim(A3). Suppose min > 1. Let E be an n x n matrix, and e be a small number. Consider the Xj A, matrix D = X1(A + eE)X. The form of this matrix is illustrated below for t = 5: where Fij = YTEXj, YT is the ith block row of X Let B = diag where B1 = diag  P , . P for i = 2,..., t, and dim(Bj) 1,..., t. Let C = BDB 1. then cFU1 pF21 PF31 pF41 pFas F,2 P EF22 EF32 eF42 eF52 2 F13 P eF23 EF33 EF43 F53 2 F14 P eF24 eF34 EF44 eF54 F15 P cF25 EF35 cF45 EF55 Next, U1FllU E be a diagonalization where diag( '1,02, ami ) with some of ai's may be identical. Let W = diag( U , B2, Bt ). Then W1CW pF21U pF31U pF41U pF51U 2 SU F1 p EF22 F32 F42 F52 6 U1 F1 3 P EF23 eF33 F43 EF53 2 SU1 F14 p We may choose a value of p, which is independent of E such that the first mi disks are disjoint from the other n mi disks. So for a proper value of p, mi eigenvalues of . . P , B2, ... Bt), = m , 16 Next we discuss the effect of perturbations in the elements of A on the eigenvectors in its block diagonalization. As earlier, let X1 AX = diag( A, A2,...,At) be a block diagonalization of A, where = diag( A3,. .., Aj ) is a scalar matrix, and mj dim(A1). Clearly the columns x , X, of X form a complete set of eigenvectors. We denote eigenvectors of A + eE by x1(e), a2(),.. and eigenvectors of D = . ,n(c), '(A + cE)X by ux(e),u2(c),... ,un(c) such that x;(e) = Xui(c). Illustrate, we consider a matrix D with t = n2 such that A1 = diag(A1 Aj= "j+2, D = where fi; = yTjExj, and yT is the ith row of X First consider the simple eigenvalue . Since (A4c), U4(C)) an eigenpair of A4(O)  A4, so we must have u4(0) , which the fourth vector. normalized so that largest element is 1 For sufficiently small e, claim that the fourth element of u4(e) If not, let ui4(c) 1 for some i Since Du4( = A4()u4(c), so equating the th component on both sides, we get 4{()Ui4(E)= A XiU4() + (2.2) that is A4()  Ai=E tjnuj4(e). j 2 n 3 i***? 22, . S 1), = e4 f;/u.4(e), 1 1 * 1* II 1~ ~ all less than Me as e * 0 for some M Since IUj4(E) 1, so from (2.2), we get 4() Au4(e) cEI f~slI Xt(e) A.I A4 Ai for i 4, i = 1,...,n, then Iu4(e)lI Me, where max id4 Thus un4(c) are of order e for i 1, ..., n. Again from (2.2), we obtain (A4(e) Xi) ui4(e) = ef;4 + e The second term on the right is of order e2, so for i fijUj4(c). 1,..., n, we have Ui4(c)  A4 = O(e2). Now we turn to eigenvectors corresponding to the multiple eigenvalue A1. Suppose ul(E) is a normalized eigenvector of D corresponding to the eigenvalue AX(c). Using a similar argument as in the case of the simple eigenvalue 24(e), we can show that the largest element of ui (e) can not be 4,..., n. In fact, these elements are of order E. Therefore the normalized ui (e) must have one of the following forms: 1 P2(e) pl(e) 1 P2 (E) pi(e) p2(E) 1 ,(E) u;i(E), three degrees of freedom to normalize ul(c), two degrees of freedom to normalize u2(e), and one degree of freedom to normalize u3(c). Hence we take ux1() = 1 0 0 u41(c) , u2(c) = 1 0 "42(<0)  uf2 () u3(c) = 0 0 1 U43(t) t"f3(e) As in the case of the simple eigenvalue A4(c), we can show that = 0(2), i = 4, ..., 72 1,2, 3, where A1 = A2 = A4. Hence Aj(e), and X3(e) are continuously differentiable functions This completes the proof of the theorem. Now we are ready to derive the necessary equations to obtain a block diagonal ization A = element of A XAX' of a nondefective matrix A. Let Aj be the jth block diagonal , and X, be the corresponding block column vector of X matrix E, and for a sufficiently small e E R, let A3(c), and Xj(e) bt Given an n x n e differentiable functions of E such Aj(0) = A1, and Xj(O) = Xi . Differentiating (2.1) with respect to e and then putting e = 0, we obtain AX>(0) + EX, = X (0) A + XjA (0). (2.3) Since the columns of X are linearly independent, so we can express XA(e) in the following way: (2.94) where Bo,(c) is an mi x mn matrix. We normalize (2.4) by taking B3,(c) = I. Differ uj({)  Substituting (2.5) into ( t we obtain XiB~j(0) + EX1 = XiB'j(0)A1 + x OA(0) XA B ,(0) + EX6 =z XB~j(0)A XjA (0) = X, (B (0)Aj  A.B(O)) + XjA (0). (2.6) Since YiXk for i for i (2.7) where YT th block row of X Premultiplying .6) by , and YT respec tively, yields AJ(0) = YTEXJ,  A.iBb(0) Equation ( a linear equation in the unknown Bi(0O) Here we mention two well known methods to find B3j(0). To derive these two methods , we need the following two theorems. Theorem .3 [H , page 317] E Cp and C e Crx' are nondefective matrices, then the solution to th matrix equation (2.10) given by =VWU ere B EU1 and C are diagonalizations of B, and C respectively, and W  (wej is arx p mat with wij /1RU) aj wi Proof of Theorem i Let B = UEU1 and C = VflV1 be diagonalizations of B, = yiTE =U S_ _ BA'(0)Aj = R = V~V1 I M Let W = V1ZU and D = V'RU , then the last equation reduces to WE Comparing the (i,j) entry on both sides of the preceding relation, we have wij (aj w) di, 1R ) 1,2,... ,p, wij = wi and with this W, VWU1 Theorem 2.1.4 E RPXp and B E Rrx'r then the solution to the matrix equation AZ ZB = C (2.11) given by Z=U PVT where A = URUT and B = VSVT are real Schur decompo sitions of A, and B respectively, and P RP  is the solution to the Sylvester equation PS = UTC V. Proof of Theorem 2.1. Let A = URUT , and B = VSVT be real Schur decomposi tions of A, and B respectively. 1 URUT RUT 7hen (2.11) becomes  ZVSVT  T zvS Let P = and D = UTCV Then the last equation reduces to RP  PS= which is a Sylvester equation. For the detailed solution of a Sylvester equation readers are referred to [Gol90, page 387]. 0 Next we will show how to use the above two theorems to derive two methods to find the unknown B (0)) n (2.9). Method 1 ( Theorem 2.1.3): Let P = B,(0), and R = YITyEX, then (2.9) reduces DA AD ? 19 1Q\ S0. UTCV.  flW 1,2,...,r DA._A.DD Method Theorem 2.1.4 = B(0), .= YTEXj, then (2.9) becomes PAj AiP = C. (2.13) Solving ( .13) for P, we obtain P = VZUT , where Aj = URUT A, = VSVT are real Schur decompositions of A and A; respectively, and the solution to the Sylvester equa tion SZ  ZR= W with W  vTCU. Combining the analysis of the effect of perturbations in the elements of a non defective matrix A on the eigenvalues, and corresponding eigenvectors in its diago nalization, and the results from (2.5), Expansions of Aj(e), and Xj(e): ), and ( .9) we get the following Taylor Theorem 2.1.5 a nondefective matrix, = diag(Aj ...,A) a scalar matrix, where A, is an eigenvalue of A of multiplicity dim(Aj) let Xy be the matrix of linearly independent eigenvectors of A corresponding to the eigenvalue A, such that Xj Aj. and Xj(c) Given an arbitrary matrix E, and for a sufficiently small functions of e, let Aj(e), Xj(e)Aj(e), Aj(0) = A, and X,(0) = Xy . Then Aj() = A +eYjTEXj + O(c t Xj() = Xj + the solution to the matrix equation PAj AiP = YiTEXj. We use Theorem 2.1.5 to develop an algorithm in the following way. Suppose nA f 1 A 10 ann r\rrwmS ,f D T ... AX, be continuously differentiable e such that (A + eE)Xj(E) where P = Pi= XiPg + 0( xj( ) Aj + CYf BXj eAj, t X1 + ZX;Psj, where P = P,1 is the solution to PA, AiP With e , these approximations lead to the following block version of Algo rithm 1.11: A51CW =( BXOJd I 3=1 Xnew i =Xld _ Y^ XoldPid j=l (2.14) ynew (Xnew) where P = P,j the solution to PAew" 2  AWP = Yod A, r ri B.?,"3 With a matrix of approximate eigenvectors X0o, we can use the above expressions to compute the block diagonalization of a nondefective matrix A in the following way. Alhorithm (Block Diaqonalization) Given an n x n nondefective matrix A matrix of approximate eigenvectors X0, and a tolerance tol greater than the unit roundoff, the following algorithm computes a block diagonal zation A = XAX1 Dif= So0 = ... = x 11,1 , until Dif < tol is a n x n zero matrix. A n * ) ak . . YTBXj C =n ' .<U + .. 1 Cj=Cj C.7=03 end end xj = xj + cj X = /I Iz II for l= 1 IJ, .3x "25 in3 is column partitioning. Sis column partitioning. Dif = IICIIF end Above, we normalize Xj to control the growth of X The main problem in this method is how to choose a matrix of approximate eigenvectors for the starting guess. If by another method, a matrix of eigenvectors of a matrix A can be determined, then this method can be applied with X as the starting matrix of eigenvectors to find the eigenvalues, and corresponding eigenvectors of a slightly perturbed matrix A + eE, where E is an arbitrary matrix, and e is a small number. For example, first reduce the matrix A to an upper Hessenberg form H by an orthogonal matrix Q, and then apply the QR method with double implicit shift to H to obtain a real Schur decomposition STH To obtain an eigenvector y corresponding to the eigenvalue y let B = H  upI.l Next solve By = 0 for y in the following way. Put y(i) = , and then solve [B(:, 1) B(:, i+1: n)]z = i) by using the QR factor zation technique to solve an overdetermined system of equations. Put y(l 1), and y(i + 1 n)= :n1). Then x = Qy is the eigenvector of A corresponding to the eigenvalue $i. Now these eigenvectors can r n 1 1 1 S = :i  = Xl ; 1) r. r n I Example 2.1.1 Consider the matrix We reduce A , first to an upper Hessenberg form H , and then apply the QR method with double implicit shift with tol = 106 After 8 iterations , we get a real Schur decomposition STAS =U. The eigenvalues of are A 4.0394, 1.0197 0.4450i, 1.0197 0.4450i Next, we use the technique discussed in the last para graph to find corresponding eigenvectors. eigenvalues, we solve the equation By = I Here for each pair of complex conjugate 3 only for one, and then take the real and the imaginary parts of y as two vectors corresponding to the complex conjugate pair of eigenvalues. The corresponding matrix is: 0.5921 0.7387 0.3222 0.2979 0.1265 0.9462 0.7927 0.2726 0.5453 Next consider the following matrix Ai, which we get by perturbing the elements of 1.010 1.007 2.002 2.005 3.009 1.003 1.006 1.008 If we use Algorithm 2.1.1 to A1 with and tol = 106 then after 2 iterations we get a block diagonal double implicit shift to ST zation Y1AY then after 2 i, and if we apply the QR method with terations we obtain a real Schur form R as well. Examlpie 2.1.2 Consider the matrix =D A = QT(S' AIS)Q = If we apply the QR method with double implicit shift with tol after = 106 to B, then 1 iteration, we obtain a real Schur form STBS = U, and the eigenvalues of U are A = [1, ]. Next, we use the technique discussed prior to Example 2.1.1 to find corresponding eigenvectors. Here is the matrix of eigenvectors: 0.7559 0.3780 0.3780 0.3780 0.5774 0.5774 0.5774 0.6547 0.4364 0.4364 0.4364 0.40 0.40 0.8165 Next consider the following matrix B1, which we get by perturbing the elements of 9.9931 6.0059 6.0093 6.0085 .9947 5.0009 6.0065 6.0042 2.9930 3.0091 2.0076 0.0026 5.9995 3.0074 3.0033 5.0063 If we apply Algorithm 2.1.1 to B1 with = 106. then after iterations we obtain a block diagonalization Y1B1Y = Di, whereas if we apply the QR method with double mplicit shift to STB1 S, then after 3 iterations we obtain a real Schur form QT(STB1S)Q = R. Example 2.1.3 Consider the matrix B1= X 26 2 5 7 7 9 4 5 8 3 5 5 9 7 4 7 9 8 3 5 7 7 6 9 3 4 8 5 2 3 4 5 3 9 9 7 5 4 3 9 5 6 7 7 10 9 2 3 4 5 6 6 8 3 5 5 10 7 6 9 6 7 9 2 2 3 9 7 2 7 4 4 1 6 3 5 5 10 1 2 3 9 4 1 9 4 1 9 9 2 7 2 7 9 3 9 5 5 6 8 8 C 3 5 5 1 4 6 9 4 7 9 5 7 7 2 9 9 9 5 1 5 2 10 2 6 6 0 0 8 7 3 6 2 2 8 4 10 9 4 3 7 8 1 4 6 2 8 2 10 3 2 6 2 3 7 1 8 5 6 6 3 5 3 6 5 4 3 4 4 4 5 9 5 9 3 5 5 7 4 6 10 9 3 10 2 8 7 5 8 2 2 0 5 8 2 9 8 7 4 1 6 5 7 1 4 7 4 7 6 9 8 3 6 8 10 4 2 10 7 8 7 2 5 9 9 6 9 5 5 3 10 5 9 5 8 8 8 1 2 7 9 8 4 8 3 4 5 5 3 2 2 8 2 2 7 1 9 3 0 4 3 4 5 1 6 8 6 10 6 1 10 6 1 7 6 8 2 5 4 2 3 4 3 1 8 5 3 5 8 9 7 5 10 6 4 8 7 7 10 10 9 7 8 7 7 2 9 5 5 7 7 10 2 4 3 5 9 4 5 8 4 4 7 7 2 8 7 898 8 3 8 8 4 3 2 1 2 10 2 1 2 6 7 8 7 8 7 6 6 7 1 9') Q R 7 3 1 If we apply the QR algorithm with double implicit shift with = 106 to C then after 27 iterations we get a real Schur form STCS = U, and the eigenvalues of U are A = [92.2615, 20.8808 13.5193 + 10.7480i 13.5193  10.7480i 0.2459 + 15.2499i 0.2459  15.2499 6.3218 + 13.5313 6.3218 13.5313, 9.3504 + 7.8883i, 9.3504  7.8883i, 9.0619 + 8.7751i, 9.0619 8 .7751i, 1.5718 + 10.4593i, 1.5718 10.4593i ,8.0296, 7.3844 + 1.4578i, 7.3844  1.4578i, 0.6072, 3.1993, 2.2102 Next, we use the technique discussed prior to Example 2.1.1 to find correspond ing eigenvectors. Here for each pair of complex conjugate eigenvalues, we solve the equation Bx = 0 only for one, and then take the real and the imaginary parts of x as two vectors corresponding to the complex conjugate pair of eigenvalues. the matrix of vectors by We denote Next consider the matrix I    1I P 28 whereas if we apply the QR method with double implicit shift to STCaS, then after 27 iterations we obtain a real Schur form QT(STCIS)Q = R. 2.2 A Differential Equation Approach to Eigencomputations In the previous section, we found that the main problem in using the block di agonalization method is how to choose a matrix of approximate eigenvectors for the starting guess. Here we will study, whether the Euler method can be used to block diagonalize a nondefective matrix A with the identity matrix as the matrix of approx imate eigenvectors. In numerical ordinary differential equations, one way to obtain the Euler method is to drop the 2nd order error term in Taylor Series of the given function. Theorem 2.1.5 dropping the 2nd order error terms, expressions for Aj(c), and X1(c), which can be used n Euler's method. we get Now we will discuss how to implement the Euler method to find the eigenvalues of a nondefective matrix A. Euler's method for the differential equation z(t)) has the form: Zn+l = Zn + Atf(z,), where z, is the approximation to z(nAt), and At the constant time step. Here f are vector functions of t in general. In the block diagonalization procedure, we attempt to generate a block diagonal matrix A, and an asso ciated block matrix such that = X A. If Aj denotes the jth diagonal block of A then we have AXA= AXA,. The differential equation governing the block matrices has the form: A(t) = A where Diag(M) (t) + Diag ((t)'AX()) , a block diagonal matrix formed from the diagonal blocks of the X(t) = X(t)F(X(), A(t)), = X(t). Typical starting guess is A(0) Diag(A), and X(0) =I. Hence the Euler approximation to the differential equations can be expressed A+l = An + At (Diag (X,AX,) A) , Xn+l = Xn + AtXF (X,, A,) (2.15) use the values of Xn+l from (2.15) to create an terative method to compute a block diagonalization of an n x n real matrix A. But first, we will discuss how to compute a block diagonalization of an upper triangular ma trix. For any upper triangular matrix T, its eigenvalues are the diagonal elements. We will show how to compute a block diagonalization y1TY = D of T where = diag( Dn, D22, ..., D ) with the property that each diagonal block is upper triangular, and the eigenvalues of Di, and Dj for j are distinct. Let Y = In, and aiyi = \tii  til. Given a tolerance tol, if aOj > tol, then define be equal to the identity matrix except for the (i, j) element, ti  t I'll tJJ which is z. . Let Wij Then the product W ITWi is an upper triangular matrix, whose (i, j) element is zero. Next update Y by ynew = yoldWij. t3* We use this technique to zero tij, whenever An efficient way to zero tij, when a from the bottom right corner of the matrix. n, is to start Zero tij in the decreasing order of the row index i until i 1. If aij < tol is encountered, then momentarily stop zeroing on the jth column and go to the (j 1)th column. Continue this process all the way to the second column. Once the second column is done, then in the increasing order of the column index j go to those columns, where some of the tij are not zeroed in the first round, even though the corresponding aij are greater than the tol. This time also zero t;; in the decreasing order of the row index i until i = 1. whenever aj > An+l, In fact , the above technique can be extended to a k x k quasiupper triangular matrix T Rmxn In this case is equal to the k x k block identity matrix except for the (i, j) block, which is where Z is the solution to the matrix equation TiiZ  ZTfj = We summarize the block diagonalization of T in the following algorithm. Algorithm 2.2.1 Given a k x k quasiupper triangular matrix T E Rx" " and a coa lescing tolerance tol greater than the square root of the unit roundoff, the following algorithm computes a block diagonalization of T. S is the n x n identity matrix. row  i] col = 1: 3  1; minim = 2tol while minim > tol Q = min{ Je w if ai3 :a E A(Ta), and wE A(T7j) < tol row row ]; col = j col { row, and col are two arrays, whose elements are the row index i, and the column index j respectively, of an entry Tj of T which will not be replaced by a zero matrix. minim = Gij else Solve T. ZT=j  Tii for and then form I'V,. t rN I Pn p f^ F1 _ end end m = length(row) for 1 = 1 : m i = row(1) while i > if ij ~;  1; j3 = col(Z) 1 W'TW,,; ii t&7 y ,~ S = SWij i=i end end Let l , and suppose NT is an array of k elements whose ith element is the dimension of the ith block of T along the diagonal. For i < j, define a;j as in the above code. < tol, then merge blocks Ti and Tjj to form a single block using permutations of columns and rows of T. Use those column permutations to merge Update ko"d and NVTota to obtain kfc" and NT""" STry the above merging technique for all possible combinations of 1 < i < j3 1. Now we are ready to compute a block diagonalization of an n x n nondefective matrix A using the following procedure: Alaorithm 2.2.2 (Dunamical Eiaencomputations ) Given a nondefective matrix A RanXn , a tolerance greater than unit roundoff, a coalescing tolerance greater than the square root of the unit roundoff, a constant stepsize At smaller r Define tol8 = 100toll Next we break the sketch of the algorithm into several steps. Step 0: Take X = I, A = Ao, Y =X1 =n( the number of blocks ), and , 1,..., 1 ] an array of n elements whose ith element is the dimension of the ith diagonal block of A. Step 1: Let m For i < j, define DIF = min{ a uW a E A(Aj) w E (A.) If DIF < toll to form a single block using permutations of columns and rows of A. Use those column permutations to X to merge Xi and Xj, and those row permutations to Y to merge Y;T and yT Update NBold , and NSold to get lynew NSA"ew Try the above merging technique for all possible combinations of 1 Using column and row permutations, arrange blocks of A n the decreasing order of sizes along the diagonal. those column permutations to arrange block columns of X in the decreasing order of sizes, those row permutations to arrange block rows of Y in the decreasing order of SIZes. Step 2: Construct F(X, A) as follows: Fo (X,A) = fori for i where P = P, is the solution to the matrix equation PAj AiP = Y)T AX,. Step 3: for i Suppose p = [Am+i, = 1, 2, . . Am+2, ., AaNB m, and dim(Ai) where 0 for i = m + 1, m + m < NB with dim(Ai) Define d , * diag(Y AX)(1 : n) p, where t = 1 + (i), and diag(AM)(t is a vector formed from t through n elements of the vector diag(M). Our aim is to find the smallest s (0,1) with the property /; + sdi = yj + sdj. To obtain this s we do the following. , then merge blocks Ai and A, and define s = min{ ri step of size s instead of the normal time step At. 1 }. If s < At, then take a time Otherwise take the normal time step At. Step 4: Update A, X, and Y as follows: +t((Yold)T AXr ld J 1,2,..., NB, Xold (I + AtF (Xold (Xnew) Step 5: Consider the block Aj with 1 , where m is defined as in Step = QUQT be a Schur decomposition of A,. Then where D = diag Ull, U22, ..., Ug ), and N is the strictly upper triangular part of U. either a 1 x 1 matrix, or a 2 matrix. Let NU be an array whose ith element is the dimension of the block Usi. to reduce U Use Algorithm 1 with tol2 as the coalescing tolerance to a block diagonal matrix, to update the array blocks 1, and to get the invertible matrix S. Replace Aj by U and the number of XjQS, Yj S1YQTy NB by NB + I 1, and the size of the block Aj in NS by Try the above decoupling procedure for all j such that 1 < m. Step 6: Compute f = IIA XAYI\IF. Goto Step 1 until < tol. Example 2.2.1 If Algorithm 2.2.2 is applied to with tol = 0.01. toll = 104 and At = 0.05 , then after 130 iterations the diagonal matrix Xne" ynew ,Aod)) U2 is 1, 2, ..., n t, O < r7  A^a , /*? / Anew ^j = D+N, AO = . with tol = 0.01, toll = 104 , and At = 0.05, then it takes 131 iterations to the diagonal matrix Ao to converge to D= diag 37.00078 58.11556 27.61726 42.99974 1.99896 The eigenvalues of D are A = 2.99948 + 2.22976i, 2.99948 2.22976i, 1.99896 Example 2.2. If we apply Algorithm 2.2.2 to the matrix with Ao = diag( 1, 3), tol = 0.01, = 104 , and At = 0.01, then it takes terations to the diagonal matrix Ao to converge to diag ( 109.0896 193.3755 62.4219 110.5633 15.3313 13.3030 35.8551 28.3276 1.2449 0.1837 196 10.7697 6.8342, 4.8365 The eigenvalues of D are 0.7369 +3.0020i 0.7369  3.0020i, 6.4982 + 0.6740i, 6.4982  0.6740i, 4.7624 + 0.2597 i, 4.7624 0.2597i, 6.8342, 4.8365 i. This method has the following disadvantage. the size of the given matrix A A = AO = 35 2.3 A Differential Equation Approach to Eigencomputations with Armijo's Stepsize In the Differential Equation Approach to Eigencomputations, we found that with a constant time step At, which is usually a small number, too many iterations are required to achieve a few digits accuracy in the results. So here we will modify the Differential Equation Approach to Eigencomputations by varying the time step At in each iteration. We plan to achieve this by using Armijo's rule from optimization theory. Let A be an n x n real nondefective matrix. We will find an iterative method to compute a block diagonal zation of the form A = ..., At ) is a block diagonal matrix, and X XAXl = [ x, , where A = diag( A, A2, Xt ] is a compatible block column matrix such that AXj = XjAj. As in Section , for the block eigenvalue problem, the differential equations that we solve are: A(t) = A(t)+ Diag (X(t)'AX(t)), X(t)= X(t)F(X(t),A(t)), where Diag(M) a block diagonal matrix formed from the diagonal blocks of a block matrix M , F(X(t), A(t)) is a square zero matrix, and Fi(X(t), A(t)) for i is the solution B to the matrix equation BAj(t) Ai(t)B = Y(t) AXj(t). Here denotes the ith block row of X(t)1 . Hence the Euler approximation to the differential equations can be expressed as: An+1 = A= + At (Diag (X,AXn) A) , Xn+l + AtXF (X, An) (2.16) Euler method the normal stepsize At is constant. Here we will vary the s tnsize At in each iteration. Let s be a positive Darameter, and let Q2( s), and Z( X2, , y(t) = X, SI So n(O) = An, Z(0) = X,, and R((At) = An+, Z(At) = Xn+1, the matrices gener ated by a Euler step. Define f(s) = IG(s)IIF, where G(s) = A Z(s)Q(s)Z( Hence f(0) = IA XA, X,1'F, and f(At)= IIA Xhn+IAl+,XRnylIF. If the starting guess block diagonal matrix A0o, and the starting guess invertible matrix Xo are good approximations of A, and X then we must have f(At) < f(O). n the factorization A Here our goal is to find an s, 0 X AX for which f(0) holds. To this end, we use Armijo's rule from optimization theory. Armijo's rule, we determine s in the following way. Evaluate f(s) at s = 1, 1... stopping when Simplifying the above inequality we get f(O) 1  f(o). 2 It turns out ([H ag88, page 178 80]) that to use the above rule, we must have f'(0) f(O). So our next aim is to determine f'(0), when f(s) = G(s) lr, and G(s)= A Z( s)Q(s) 1. Suppressing the subscripts of A., and Xn in (2.17), we s (Diag (X'AX) A) Before we find +sXF(X, A). f'(0), we want to prove the following fundamental result. Lemma 2.3.1 Let G(s) be an nx n complex matrix, whose elements are differentiable functions of s. If we define f(s) = IIG(s) p, then f(0)f'(0) = trace (G(O)" (2.18) :G(s),=o). (1 ) f(O). 2/ s)  Z(s) Differentiating (2.19) with respect to s, we have trace (d (G(s)H) G(s) + G(s)H G(s) d \H trace dG(s) \ds d G(s) + G(s)H G(s) d = 2trace G(s)H G(s) After simplification setting s = 0, we get d f(0)f '(O) = trace G(0)dG ( s)1=o Now with OL(s) = A +s (Diag (XAX) A), Z(s) = + sXF (X, A) A Z(s)l(s)Z(s)1 , and f(s) = IG(s) IF and the result of the above lemma, we in a position to show that f'(O)= f(O). Theorem 2.3.1 Let A E R xn be a nondefective matrix. Suppose = XAX1 block diagonalization of A, where A = diag( A1, A2,..., At ) is a blo diagonal matrix such that Ai, and Aj for i is a compatible block column m j have distinct eigenvalues, atrix. If we define L(s) IX = [X, X2,..., Xt] s (Diag(X1 AX) A), +sXF(X,A), G(s)= A Z(s )Q(s)Z(s)1 , and f(s) IG(s)lF, where Diag(M) is a block diagonal matrix formed from the diagonal blocks of the block matrix M , F (X, A) zs a square zero matrix, and Fij (X, A) for is the solution P to the matrix equation PA1 AiP = YTAXj, where YiT denotes the th block row , then f'(O) = f(O). Proof of Theorem 2.3.1 The result (2.18) of Lemma 2.3.1 gives 2f(s)f'(s) are ofX1 s) = =A+ Differentiating the expression for G(s) with respect to we have d G(s) ds d   ds (s)) Q(s)Z(s)1 d d (Z(s))\ d (s) (n(s)) 2 d d5 d '(s) O(s) s (Z(s)1 (s))Z(s)1 (2.21 d G(s)18,=o ds d  Z( s)18=on(0)Z(O)1 d  Z(O)j (s)I.=oZ(O) d6s (2.22) Differentiating nf(s), and Z(s) with respect to s and then evaluating the derivatives = 0, we get d ( =o f(s)Is=0O ds = A + Diag (X' AX) d Z(s)1,o = XF(X,A). ds Using these values and values of f(0), and Z(0) in (2.22), we have G(s),=0o XF(X,A)AX'  X (Diag (X'AX)  A) X1 +XAX' (XF (X, A))X'  X(F (X, A) A AF(X,A))X1 XDiag (X 'AX) X1 + XAX' (2.23) Next, let D = F(X, A)A  AF(X, A). Then (2.24) FPA, AXF, 1JI^* & where F1i = F(XA). According to our assumption for i is the solution P r t vnntin PA  AP gn P.A  VT A V . d S)Z5(s)1_ d Evaluating G(s) at s = 0, we obtain ds (o)d +t Z(0)n(0) Z(0)" Z(s) So Z(0)x  VTAY (s) ()1 +Z(s)(l Dij  A 7 . Hen e it )th matrev ! Using (2.25) into (2.23), we have s)1,=o X (X1AX  Diag (X'AX)) X1 XDiag (X AX) X XAX1 (A XAX1 . (2.26) Since G(0) XAX1 , so (2.26) gives d ds s)ls=o = G(0), and with this value (2.20) reduces to f(o)f'(O) Hence f'(O) = trace (G(O)T (G(O))) f(O), provided f(0) The above theorem implies that Armijo's rule can be applied to Algorithm 2.2.2 mentioned in the Differential Equation Approach to Eigencomputations. A modified algorithm to find the eigenvalues and corresponding eigenvectors of a nondefective matrix A can be obtained as follows: Algorithm 2.3.1 Given a nondefective matrix A R nxn , a tolerance tol greater than the unit roundoff, a coalescing tolerance toll greater than the square root of the unit roundoff, an invertible matrix Xo, and a block diagonal matrix A0, the following algorithm uses (2.15) to compute a block diagonalization A = XAX1 Define tol2 100toll Next we break the sketch of the algorithm into several steps. Step 0: Take X  Xo, = A0. denote the number of diagonal blocks of A, and let NS denote an array of NB elements whose ith element d ds (o)n(o)Z(o)1 SA SA   f(0)2 = X permutations of columns and rows of A. Use those column permutations to merge Xi and X1, and those row permutations to Y to merge yT and YT Sr Update NBold , and NSo'd to get NB"e" and NS"n" Try the above merging procedure for all possible combinations of 1 Using column and row permutations, arrange blocks of A in the decreasing order of sizes along the diagonal. those column permutations to arrange block columns of X in the decreasing order of sizes, and those row permutations to arrange block rows of Y in the decreasing order of sizes. Step Construct F(X, A) in the following way. Take F3j (X, A) = 0, which is an mj x mj zero matrix, and for i Fij (X,A) can be determined by using the following loop: forj = 1 for i :NB =1 :NB 3 Solve PAj AiP = F ,(X, A) = P. end end end AXj for P Step where G(s) X(s)A(s)X(s)' , X(s) X(I+ sF), and A s) = A+ s(Diag(X 'AX) A). Evaluate f(s) at s = 1, , stopping when ifi / S) (). (2.27) x .w Step 4: Suppose p = [Am+l, Am+2,..., ANB]T , where 0 m < NB with dim(Ai) 1 for i diag (YAX) (t ...,m, and dim(A,) : n)p, where t = 1 = 1 for i + l,m + 2 SNS(i), and diag(M)(t t=1 , . ,NB. Define d : n) is a vector formed from t through n elements of the vector diag(M). Our aim is to find two indices il, and iu as follows. Let k be an array with the property that Pk(i) I#k(i+1). Next form the ratio: Pk(i+1) dk(i+l) and let rio = min{ ri nt, 0 Let il=t 1 min{k(io), k(io + 1) }, and iu Step 5: =L 1 Update A, X, an + max{ k(io), k(io + 1) }. id Y as follows: +z ((Y,"o)T AX"d .7 1,2,...,NB, X new zF (X" , Aod)) Y new (Xnew) Step 6: Consider the block Aj with , where m is defined as in Step = QUQT be a Schur decomposition of Then where D = diag( Ul, U22, ..., U** ), and N is the strictly upper triangular part of U . Ui is either a 1 x 1 matrix, or a matrix. Let NU be an array whose ith element is the dimension of the block Uii. Use Algorithm 1 with tol2 as the coalescing tolerance to reduce U to a block diagonal matrix, to update the array and the number of blocks l, and to get the invertible matri Replace Aj by U X1Q S'Qryj NB by NB + 1 1, and the size of the block Aj in NS by . Try the above decoupling procedure for all j with 1 < m. :i z  Pk(i)  A , ^j A ,ew 1"J" X'ld (I + =D+ Step 8: Compute f(0) = IA  XAYIIP. Goto Step 1 until f(0) < tol. Example 2.3.1 If we apply Algorithm 2.3.1 with Xo = I, tol = 106 and toll = 104 , then after 42 iterations the diagonal matrix Ao converges to D are diag The eigenvalues of D are A= [ 6.01310 5.98585 1 + 1.41421i 4.53255 4.01310 1 1.41421i, 0]. Example 2.3.2 If Algorithm 2.3.1 is applied to with Xo after , the tolerance, and the coalescing tolerance are as in Example 2.3.1, then 196 iterations the diagonal matrix Ao becomes D196 = diag ( 1.04694840, 0.00224879, 0.95530039) In the following 4 successive iterations Die changes to 1.04676152, 0.00223123, 1.04657539, 0.00221380, 0.95546971 0.95563841 diag 1.04639002 , 0.00219651, 0.95580649 ) diag (1.04620540, 0.00217935, 0.95597395 ) i Ao = Ao = D197 diag Dis D199 D200oo Example 2.3.3 Consider the matrix If we apply the QR method with double implicit shift with tol = 106 to A, then after iterations we obtain a real Schur form STAS = U the eigenvalues of U are A = 10.8425 + 7.9032i 10.8425 7.903 .6168 + 7556i, 2.6168  7556i, 3.5414 + 1.2899i, 3.5414  1.2899i, 8.4658 ]. Next, we use the technique discussed prior to Example 2.1.1 to find corresponding eigenvectors. Here for each pair of complex conjugate eigenvalues, then we solve the equation Bx = 0 only for one, take the real and the imaginary parts of x as two vectors corresponding to the complex conjugate pair of eigenvalues. Consider the following We denote the matrix of vectors by g perturbed matrix A1, which we obtain by perturbing the elements of A: A= 4.98 5.97 3.91 1.93 7.02 3.03 3.06 0.99 7.07 2.06 1.94 4.08 1.96 3.01 2.9 7.05 1.96 4.02 7.00 1.09 1.09 1.96 3.01 0.91 6.01 4.99 If we use Algorithm 2.3.1 to A1 with Xo = X1AX = 106 = 104 , then after 3 iterations we get a block diagonalization Y'AIY zb n rF ,/ft^ I nrn 'xnnv 1 7 +1r:. flfl rA *,t, I4 r vhln Irnl4 ^ fn .i A C 4 + n ,rSn r , and X , Ao = Di, A = Ydiag( 1, 2, 1, 3, 2, 1, 1 )Y1 where Y = Qdiag( 1, 2, 3, 3 )QT ,and Q is an orthogonal matrix. If we apply the QR method with double implicit shift with tol = 106 to A, then after 4 iterations, we obtain a real Schur form STAS Next, we use the technique discussed prior to Example 2.1.1 to find corresponding eigenvectors. We denote the matrix of eigenvectors by Consider the matrix If we use Algorithm 2.3.1 to the perturbed matrix = A + P/1000 with Xo = X1AX , tol _ 106 , and = 104 then after iterations we obtain a block diagonalization whereas if we apply the QR method with double mplicit shift to STA1 then after 6 iterations we obtain a real Schur form A1S)Q, = R, Although this method can be applied to find eigenvalues of some matrices ( Exam pie 2.3.1, ) in general it does not work for all matrices. equal eigenvalues, or the size of the matrix is For example, if a matrix has arge, then the method does not work Example In all of these cases, two 1 x 1 diagonal blocks approach each other, but as the iterations continue, the rate at which they approach each other gets slower .* ,. n = U 'A = D, Ql( ST IT ... ..: : 1 .. .... *f > :...,_ ** I, ,l I l, j I,.,* w ,I, ,*k*, *r A *  I 1 _! ... perturbed matrix A + eE, where E is an arbitrary matrix, and e is a small scalar ( Example .3.3 and 2.3.4. 2.4 Convergence of Block Diagonalization of a Matrix In Section 2.1, matrix. we developed Algorithm 2.14 to block diagonalize a nondefective Here we will examine the speed of convergence of Algorithm 2.14. Let A be an n x n nondefective matrix, and let A = YAY1 be a block diag onalization of A where A = diag A1, A2,..., Ak ) is a block diagonal matrix, Y1, Y2, . . Yk ] is an associated block column matrix such that AYk = YjA. Define X SY (I+E) Xnew' = X(I+ F), where E is an n x n perturbation matrix such that Diag (X 1AX) = diag( D1 D2, . , Dk ) is an approximate block diagonalization have distinct eigenvalues, and = (Fij) is a k x k block matrix such that Fjj is a square zero matrix, and Fij for j is the solution P to the matrix equation PDj . Assume X is close to Y DiP We will show that = (X1AX).,. Xnew y II=o( Clearly El2) = O( lX  YI2), and j)Diag(X1AX) Aj O(llX That means the block diagonalization method converges locally quadratically. Theorem 2.1.1 Let A E R~nxn be a nondefective matrix and suppose A = YAY1 a block diagonalization of where A = diag( A1, A2, ..., Ak ) is a block diagonal matrix and Y Yi, Y2, Y ] an associated block column matrix such that AYj= YjAj, and Ai , and A1 for have distinct eigenvalues. Suppose the pertur nation matrix E E Rn" Xn is sufficiently small to ensure that X is close to Y in the equation X = Y (I + E), and Diag(X 'AX) = D is an approximate block diagonal = D , Di, Then IIDiag (X'AX) Aj O (IEI2), SIXrne Proof of Theorem 2.4.1 Assume that if ipj qij for i, j = 1,2,...,n, then IPI Now X'AX (I+ E)1 YAY(I+ E) (I + E)A (I + E). Since (I + E) = X'AX I E + E2(I + E) [I E + E(I+E)] A(I + E) = A+ AE EA EAE+(I + E)1A(I+E). Let L = EAE+ E2(I+ E)1 A(I+ E). Then  EAEII+ lIE2 (I + E) 211A IIE II A(I+E)II E112 (1 + + AE EA + L, so Diag (X'AX) = A + Diag(AE EA) + Diag(L). Next normalize Y (2.28) such that the diagonal blocks of Y'X are identity matrices. That is Diag(Y1X) I = 0. Let then from the relation E = YiX I, we have 0 7~T V Since X AX I l Y l = iEl[ . Eij = i 4i 47 Thus Diag(AE EA) = Diag(C) = 0, and from (2.28) we have Diag(Xx AX) A = Diag(L). Hence IDiag (XAX) All Diag(L) o(j1E,12), (2.29) which proves the first part of Theorem 2.4.1. Since Diag(X AX) = D = diag( D1, D2,..., Dk ), and F = (Ff1) is a k matrix, where Fjj = 0, and for i f j, BD, DiB = (X1AX)i, so G = FD k block Fij is the solution B to the matrix equation  DF, where F.j Dj Dn Fj F*J"JLlAt (X1AX) j Hence FD DF = G = nondiag (XAX, where nondiag(M) is a block matrix formed from the block matrix M by replacing the diagonal blocks by zero matrices. Since X1AX = A+AE EA + L, hence nondiag(X1AX) = nondiag(A+AE EA For z That is (2.30) j, taking the (i,j) block on both sides of (2.30), we obtain FjD DYFij U tj U~ ^Ai  (EijAj AiEij) + Lij. (2.31) Let Hij = EijAj AiE1y, and let Dj = UEU1 izations of Dj, and Di respectively. , and Di = VRV1 be diagonal Substituting these values in (2.31), and then 1',r 1 +L). FD DF = nondiag (AE EA + L). Let W= V'FiU Then the above equation reduces to WE  flW = V'HUijU + V'LijU. Now solving for W = (wim), we get wim = Ptm + qim, where pim (V' LaU),m W = 1,2,...,mi, and m  1,2,..., = (Pin), and Q = (qpm). Then simplifying the relation W = P + Q, we obtain F = VPU1 + VQU1 (2.32) Next, let A, = UiE1,U 1, and A1 = ViQ1 Vj be diagonalizations of Ai respectively. Using (2.29), we have lAt Dil _ exist O1,2, 1r, and F2 such that U1 = U + 01 o (1[E2), where t and A, = zi,. So there S= 0+Z+ F2, and IiO tl O(El2) and lFi, O(EjI2), where t=1, Now (UE+ ur 1 + OE + O1Ti) U' (I + 1U1) (usUl +UFrlU + 10EUl + OFU) (I OU' (I + 1) UEU1 where UrFU1 + eOU' + xe,rU'  (UEU' + UF U' + ,EUL + 01FiU') OlU1 (I + 0hU1)' Hence IA I IIUF17U1' + IiO1 U1' + IO Uelrlul II UEU1 + UF U +6 ,sEU' + OiFU' IOU1 I I + OeU') O (l ll2). Similarly, A = VOV1 + with IAl II Eij (U U' +A,) 0 ( Ej 2). Since Hi, = EjA, AEA,,  (VnV' + A2) .E, VIH I I V1IErIrU AAF1 JK Jft 9* *  V1E.;U + V' (E.Ai, A,9 ) U. (2.33)  i , Ei , Qim  E+ F, = V+2 (U + o1) (+ r)(U +o,61) + AI, * After simplifying the last equation, we get ptm = (V1HijU)im = (VlEijU) a v m V1EU dT faT \r /77 +Stm, where sim (V1 (EijAI A2E) U)tm  W Let = (sm), then P and with this value of P (2.32) reduces to VV'EijUU1 + V(S + Q)U1  E1ij + R j, where Rij = V(S+ Q)U1 Since sim (V1 (E;jA  A2E^ ) U)tm , and qim  U. (V' L~1 U)im U). AIlIA + IRj II mini am II IIEII)I IUI IVll minl O (IIE 2) Thus F = E + R, where R = (Rij) with IIRII =0 (El2). Next X"ne X(I+F) Y (I + E) (I E + R) Y (I E2) +Y (I + E)R YE2 + +Y(I+ E)R. Hence IIX"ew  Y = 0 (E12). This proves the second part of Theorem 2.4.1. '' The result of the above theorem implies that the block diagonalization of a non defective matrix converges locally quadratically. 2.5 Block Schur Decomposition of a Matrix XneW a ]/ = V'EgjU + Vl(lE I IlLr ilm IIUII U 11m IM P) I 50 approximation to the matrix of eigenvectors, and in some methods a good approxi mation to the eigenvalues as well. Here we will find an iterative method to compute a block Schur decomposition of a matrix of A. Let A be an n x n complex matrix. Let A = SUSH be a block Schur decomposition of A, where U is x k block upper triangular matrix whose (i, j) block is an mi x m matrix, and , S2,. * Sk ] is a compatible block column unitary matrix such that ASj Uij. It can always be arranged so that the eigenvalues of Ul/, and Ujj j are distinct. Before we derive the necessary equations to find a block Schur decomposition of a matrix A, we discuss the effect of perturbations in the elements of A on the columns of the unitary matrix and the elements of the upper triangular matrix ts Schur decomposition S AS = U . That is given an arbitrary matrix E, and for a sufficiently small e, we will show in the next theorem that there exist continuously differentiable functions and U(c) with S(0) , and U(0) = U such that (A + cE)S(c) = S(c)U(e). Theorem 2.5.1 Let A be an n X n complex matrix. Suppose ST A a block Schur decomposition of A such tha the diagonal blocks LUi, and Ujj for i have distinct eigenvalues. sufficiently small e, Then given an arbitrary nx n there exist continuously differentiable complex matrix functions , and S(c), for a and U(c) with S(0 )= S and U(O0) = U such that the following matrix equation holds: (A + cE) S(c) = S()U(c). 2.34) First we discuss the effect of perturbations in the elements of A on the columns of the unitary matrix S in its Schur decomposition SHA Proof of Theorem and let Q(c)1DQ(e) = U(c) be the triangular decomposition of D by the invertible matrix Q(c). Then we have s;(e) = Sq(c), 1, ...,n. The form of the matrix D is illustrated for a 7 x 7 matrix with k = 5, such that dim(Uin) = dim(U22) = 2, and dim(Ujj) = 1, j = 3,4,5: U17 U27 U37 U47 U57 "67 U77 U?7 a where gij = s[Esj. Equating the jth column on both sides of DQ(e) = Q(c)U( we obtain uijqzj(e) + gilqlj ( (2.35) 2i, = ..., n. Since uj(O) = uj, so we must have qj(O) which is the jth unit vector. Next and taking we normalize qj(e), j = as the largest element. For sufficiently small e, claim that qgn(e) = which is the 1st component of ql(e). If not, let q,ni() = 1. Then from ( .35) we get "U11() "u = e gniqfl(e). Now letting we find u11 un = 0, a contradiction. Next we will show that Iq l() 0, for some Mnl Since Iqil(e) so from (2.35), obtain Iul((e U, Ign, I Let lut(&) u..I 1 lul  u.. then Iai (e'l < M r,1. where = ej, ,,,J  e 0, D = qil(c)uhj(e), 1,..., n by setting qij(e) = 0, Mn, e qI~(e))l Using the above arguments to the components qil, i 3,.. 1 in the decreas ing order of the row index we can show that qi1(c) 1, and q.(e)I Min~, where Again from ( .35), we have (u11(c)  u)qn(c) = ulqi (c) + t=1 6 1= ~ c (u11(e)  u)921()= n U2Iqn(I()+ g2qn((). 1=1 To make the above two equations consistent we take q21(E) = Using similar argu ments, and considering the elements in the decreasing order of the row index i, can show that jqij(e) for some Mi n  j+l , ., n. To show Iqjj(c) < AI~1e, we use the result qit(c) = 1, Mt,, 1 To keep the consistency, we take q43( c) =0. max Then q;j(e) 34 =t  V ,n Thus qij(c) are of order c for 1,..., j + 1,..., n, except q21(c), and q43(e), which are equal to zero. Next from (2.35), we have ujj(c) u)qj(c) + ) gij = e I 1=i+l l=j+1 The term on the right is of order ij(fc)  Vu)'> ujj Ui l= +1 utqij(0)  j >q(O0)ut, I=1 =O( Next we discuss the effect of perturbations in the elements of A on the elements of their runner trianpuilar matrix Uf in its Schtir decomnosition SHA . As in the * V V j = MAlif, Iqj(c) = i + 1...n. ...,n 1 qu(c)uij(e) g;tQfj(E). uitq{j( ,  I same 7x7 matrix D with k = 5 Here we will use the results of perturbations of the columns of Q(e). Putting j 1, and j=2, in (2.35), we obtain nuilq(c) + f glqi(e) = ut1(), /=1 ulq12(e) + e u21ql2(e) + gljql2(E) = "12(0), g21iq2(e) = U1(E). After simplifying the above equations, we get u1(  ull +t gl + U12() (U12 + 6 12 + u11(e) (u11 + +E /=3 uliq2(e)) n 1=3 n 1=3 =Ef 1=3 glfqin (e), gllql92(), 921qI2(t) For sufficiently small let lunl(E) uii 1 IUl ~u^\' = 3,5,...,n. Then terms on the right of the above equations will be of order c2, and we have u11() (ull + g11 + 2() u12 + g12z + sl(e) (ill + ullq,2(0) u21q12(0) = O(e2), Thus perturbations in sll, and u12 are of order e as Due to Theorem 2.1.1, it is always possible to find a sufficiently small c, such that Iu1(A) ua 1 2u\I u IS _ AiiL"! J&  1,3,5,...,n. So after simplifying (2.35), we can show that for =0( Hence uj() uj + C/ij + C i1 uqj=(O) Q(O)uj l=1 Thus the perturbation is of order E as e  0. Hence U(e) S(c) are differentiable functions of and their first derivatives are continuous. This completes the proof of the theorem. C Now we are ready to derive the necessary equations to find a block Schur decom position A = SUSN ^U Given an n x n complex matrix E, and for a sufficiently small and U() be continuously differentiable functions of e such that S(0) = S, and U(O) Differentiating ( .34) with respect to e and then putting obtain AS'(O) + ES= S'(O)U + SU'(0) SHAS'(O) + SH ES= (O)U+U'(O). (2.36) Since S"A = US" , so (2.36) gives USHS'(O) + S" ES= SHS'(O)U + U'(O). (2.37) Taking the jth column block on both sides of (2.37), we have US" s (o) + ES = SS'(O) + (0). (2.38) Next consider the following expressions for S,(c): k S,(c) = SiBij(), (2.39) where B ,() is an mi x mj matrix. We normalize (2.39) by taking Bj(c) Rn f In;ifforn nit.;ntr (9 QfQ\ vutbth r cnort t n r znrl then cot tl.na lj+1 = U f I I =0 fnilr Since S is unitary, so SHSt'= (2.41) Using (2.40), and (2.41) we get SHS'(0) = B'(0). So (2.38) reduces to UB5(O) + SHES1 = B'(0)U + VJ(0), where BJ(0) is the jth column block of B'(0). Hence for each j, we have the following two matrix equations: k 1=j+1 uB'1(O) + s+ ES = B,(O)U1j + U' (O) for i 1,2,..., U.B ,(0) + sH ES = Z7 B, (O) Ui The last two equations can be rewritten UJ:(0) = S3ES, + l=j+l UitBI,()  SB,(O)Uj for i 1, 2,..., k. (2.42) Bj.(0)Ujj UBJ(0) =SHES + Ua1Bf1(0) 1=i+1 Solution of ( =1 l=1 B1,(O)Ulj for i .42) is dependent on the solution of (2.43). the solution of (2.43). we need entries below From the right hand side of (2.43) Btj(0) on the jth block column; > j, j = 1,... k1. (2.43) So we focus on how to find t is clear that to find Btj(0), namely Bf+1i(0), . . Bkj(0), and entries on the left of B y(O) on the ith block row; namely Bi.(0),..., Bj_(0) Pictorially these can be described with zero suppressed in Bi(0) as: Bd' _ 27. = 1l, k 1. That is to determine B , we need entries just above the arrowheads directing towards the left, and just on the right of the downwards arrowheads. Here we will give a sketch to find Let C = S,"ESj. < k, then C k 1=i+1 Un BB, and ifj > 1,then C z B,'Ui. Next we need to solve the matrix equation B' j UjBf = C for B'. yUI t3 tJz To solve this equation, let P = Bt  U11, Then it becomes GP  PF where R = since F and G are upper triangular matrices, so using the following method ( detail is n [Gol90, page 387]), we can solve the matrix equation GP PF = R for P. Let R = [ri 2, * rm, , and P = Pl P2, Pm, be column partitionings, then solve (G fitI) pt = rt + i1 / fmtPm m=1 for Pt, 1 = 1,...,mnj. Once we obtain P, then B' = P for i 2.6 An Algorithm for Block Schur Decomposition of a Matrix In Section we derived the necessary equations to find a block Schur decom position of a matrix. Here we plan to develop an algorithm using Armijo's rule from optimization theory to find a block Schur decomposition of a matrix. A be an n x n complex matrix. Suppose = SU is a block chur de composition of A, where U is a k x k block upper triangular matrix such that Ut, and Uj for i $ j have distinct eigenvalues, and S block column unitary matrix such that AS, = _[ S, S2,. .,Sk is a compatible Given an arbitrary complex matrix , and for a sufficiently small e, let S(c), and U(e) be analytic functions of e, such that ,U(0)= U ( A+ eE)S (c)= S(c)U(E). (2.44) = R, 12 k 1 0) = 57 To develop an algorithm, suppose SUSH is an approximate block Schur decompo sition of B. We identify A, and E in ( 44) with SUSH , and B  SUSH respectively. Let S'(0) where Gij( is an mi x my zero matrix for i j, and for U) is the solution P to the matrix equation PUjj UViP = C, where c = S (B SUSHI) E 1=i+l UiGij(S,, j1  Git(S,U)UIJ. 1=1 From (2.37), we have USHSI'() + SHE'S = SHS'(O)U+U'(O). Next substituting the values of E, and S'(0) in the above equation and then simpli fying, we obtain U'(0) = UG(S, U) + SHBS U G(S, U)U Now substituting the values of U'(O), and S'(0) into (2.45) and suppressing zeros in S(0), and U(0), we get S(e) U(c) S (I + eG(S, U + e (UG(5 U) + SHBS U)U) (2.46) Taking e = 1 in (2.46), we obtain Snew (I+G( Sold, Uold)) UoldG (Sold ,Uold) ( old H BSold  G (Sold ,Uold) Uold If the starting guess unitary matrix So is not a good approximation of Q, where = QRQH is a Schur decomposition of B, then in the update Sne"w = Soda (I+ U"" UG( 1 ~L steady change in the values of S"e" and U"ew" we need a small increment in each iteration. parameter. To achieve this, we redefine each iterate, and introduce a small positive Let t be a positive parameter; define S(t) = Sold (I + IG (Sald Uotld)) and U(i) = UoId+t (UOtdG (Sold ,u,,) + (sold)H BSold _ Uold ,Uod) UOid) Then S(0)  Sold  , S(1) = Sne" , U(0) = Uold U(1) =U U Define IIZ(t) lF, where Z(t) Then f(O) = IB  s(0)u(o)s(o)" f(1) = IIB  When the starting guess So, and Uo are good approximations of Q, and R in the factorization = QRQH then we must have f(1) holds. f(O). As earlier, here our goal is to find a t, To this end 1, for which f(t) , we use Armijo's rule from optimization theory. In Armijo's f(0) rule, we determine t in the following way. Evaluate f(t) at t=l,2 ,..., stopping when  I) f(). in Section 2.3 , to use the above rule, we must have f'() = f(o). So our next aim is to determine when f(t) = IIZ(ti) 7, and (t) = Suppressing the superscripts of and U"d n the definitions of S(t) and t t) we (it) U(t1) S(I+tG(L U+t (UG( With the above definition of f(t), SHBS U G( we will (2. 47 ,U)U) show in the next theorem that f(O). Theorem 2.6.1 Let A s C"x" and suppose A = SUS" is a block Schur decomposition where U ak x k block upper triangular matrix such that U i, and Ul, for , 4 t ** .I 1 1 I C.. ...........3f. .. ..A 5> II  =B /"B  G (Sold S(t)U(t)S(t)1 S(1)U(1)S(1) IIF. s(t)U(t)s(t)" /'(O), ,U)+ f'(0) = 1.. ^ .U. ** J ..... J J ,ll * h s Proof of Theorem 2.6.1 The result (2.18) of Lemma 2.3.1 gives Differentiating the expression for derivative at t = 0, we have (o)H Z(t) with respect (t) t=o) to t, and (2.48) then evaluating the d Z(t) Ito= di d t=oU(0) +s(o)u(o)S(o0) d  S(O) U(t)It=oS(O)1 S(t)I =oS(O  (2.49) Next differentiating U(t), and S(t) with respect to t, and then evaluating the deriva tives at t = 0, we obtain d U(t)t=o = UG( dt U) + SHAS UG( U)U S(t)t=o = SG(S, dt Using these values and values of U(0), and S(0) in (2.49), we get d ()l di I SG(S, U)USH s(UG( U) + SHAS U G(S, U)U) SH + SUSHSG(S, U)SH  (A SUSH) Z(O). d Hence  dt (t)It=o = Z(0), and with this value (2.48) reduces to f(o) df(t)l,=o trace ( f (O) (o) ( Z(0))) dn  f(fl provided f(O1 * 1 f(0)df(t) s(0)1 Alaorithm 2.6.1 Sxn"n (Block Schur Decomposition) Given a tolerance greater than the unit roundoff, a coalescing tolerance toll greater than the square root of the unit roundoff, an upper triangular matrix Uo, and a unitary matrix So, following algorithm uses (2.47) to compute a block Schur decomposition A = SUSH Define tol8 100ltoll We break the sketch of the algorithm into several steps. Step 0: Take U= Uo. Let NB denote the number of diagonal blocks of , and let NS denote an array of NB elements whose ith element is the dimension of the ith diagonal block of U. 2m = define ayj = min { C" Wo : a A(L If aij , then merge blocks Uii and Ujj to form a ngle block using a unitary transformation to U. Postmultiply S by the same unitary transformation. Update NBold , and NSOad to get NBle"C and NS" Try the above coupling procedure for all possible combinations of 1 Using a unitary transformation arrange diagonal blocks of U n the decreasing order of sizes. Postmultiply S by the same unitary transformation. Construct G(S, U) as follows: For i take Gij ( , which is an m; xmj zero matrix, and for i can be determined by using the following loop. B = AS SU for j = :NB for = :J+1 S = S(Ujj , /VP~ C=C 1  Gk(S, U)Uk end Solve PUjj UiiP = C for P U) =P end end Step 3: (tl) F, where Z(t) SA , S(t) = S(I + tG(S, U)), and U(t) = U + t (UG U) + SHAS U G( U)U) Evaluate f(t) ,..., stopping when (1 Let p be the first value of t for which ( ) f(O). 2\~n (2.50) .50) is true. Step 4: Suppose p = [ Um+lm+l,.. *, UNBNB , where 0 _< m < NB with dim(U/) for i dim(Uii) , , Define d diag (UG(U diag(M) (t , U) + SHAS U G where : n) is a vector formed from t through n elements = 1 + NS(i), and of the vector diag(M). of the vector diag(M). aim is to find two indices il u as follows. be an array with property that Itk(i+I) Next form the ratio: /Pk(i+1)  Pk(i) and let = min{ i= 1, 2, n t, Re(r,) <1}. Let il =t 1+ min{ k(io), k(io + 1) }, and i u=t 1+ max{ k(io), k(io + 1) }. Step 5: Update U S as follows: ,2, m, : n), Gij(S, S(t)U(t)S(t)1 = m + U)U)(t dk(i) dk(i+l) >0 & S""ew Sold (I + pG (old , UoI)) Snew new ( Snew Step 6: Consider the block Uj j with 1 m, where m is defined as in Step Let Ujj = QRQH be a Schur decomposition of U1j. Find a unitary matrix P such that PH RP =D+N , where D = diag( Vi1, V227, . Vkk ), N is the strictly upper triangular part of V, mi, and min{ a wl : rE A(Vi, and w A(ut) > to12. Replace Ujj by V SJQP. Update NBOJa , and NSod to get NBnC NSn"" Try the above decoupling procedure for all j such that 1 Step Create a 2 diagonal block by merging Una and UU.,iu by a unitary transformation. Postmultiply S by the same unitary transformation. Let Sold = QR be a QRfactorization of Sold Then take Snew, U new Q= ZQ, where is the matrix from Step 5. Step 9: Compute f(0) = hA  ]F and goto Step 1 until f(0) < tol Example If we apply the real version of Algorithm 2.6.1 to the matrix A in Example with U0 = diag( 0, 0),So= I tol= 106 and toll = 104 then after 9 iterations the upper triangular matrix Uo converges to 0.88991 0.73813 .72597 1.11009 1.63393 2490 The eigenvalues of U are A 1+ 1.41421  1.41421i, 0 ]. we use the QR method with double implicit shift to A with the same tolerance, then after iterations we get a real Schur decomposition A = QRQT w here n  _ F1*_ Z = SV h* =Q _. **r i iJi 1 Next consider the following matrix A1, which we obtain by perturbing the elements of A: 0.001 2.002 1.002 2.003 1.002 1.001 1.001 0.001 If we apply the real version of Algorithm 2.6.1 tolerance, and to A1 with So the coalescing tolerance are the same, then after iteration we get a block Schur decomposition PTA1P = 0, whereas if we use the QR method with double implicit shift to QT AiQ, then after 5 iterations we obtain a real Schur form VT(QTAIQ)V =r. Example 2.6.2 If the real version of Algorithm is applied to the matrix A in Example with Uo = diag ( 1), o = I, the tolerance, and the coalescing tolerance are as in Example , then after 10 iterations the upper triangular matrix Uo converges to 1.00104 0 0 3.13124 0 0 1.64159 1.22 0.99896 If we use the QR method with double implicit shift to A with the same tolerance, then after 6 terations we obtain a real chur form: 1.04199 0.00051 0 3.46359 0.95802 0 0.69221 1.23323 0 Example 2.6.3 Consider the following matrices A, and P: 80 236 164 AQ= = Q, Ai= = R, If we apply the QR method with double implicit shift to A with tol = 106 then after A = iterations we get a real Schur form STAS 2, 5, 1, 1, = U. The eigenvalues of U 1]. Now if the real version of Algorithm .6.1 is applied to the perturbed matrix A 1 = A + P/5000 with Uo= U = 104 So = , and the tolerance is above, then after 3 iterations we obtain a block Schur decomposition Q'AiQ whereas if we apply the QR method with STA\S, then after 10 iterations we obtain a real Schur form double implicit VT(STAS)V : shift Although the real version of Algorithm 2.6.1 works for all matrices A with a diag onal matrix formed from the diagonal elements of A, as the starting upper triangular matrix U0, and So = I, it has some disadvantages. For example, if the size of the matrix is large, then we need to take a very large coalescing tolerance compared to = 104 through several iterations at the beginning. lescing tolerance, However with a large coa the sizes of some diagonal blocks of U become large compared to the size of A, which undermines the main goal of this algorithm; namely keep the size of the diagonal blocks of U as small as possible. In Example 2.6.1, and 2.6.3 we noticed that if by another method, a real Schur form A = SUST can use U as the starting upper triangular matrix, and S can be computed, then we as the starting orthogonal matrix to obtain a block Schur decomposition of a perturbed matrix A + eE very rapid convergence, with where E is an arbitrary matrix, and e is a small scalar. the following table for different values of n we give the number of iterations required to obtain block Schur decompositions of A +eE by the real version of Algorithm 2.6.1 with Uo , and So =S, and real Schur forms of ST(A + rE)S by the QR method with double cit shift, where ST AS = U is a real chur form of the matrix A. = t, =U number of iterations number of iterations for size for our algorithm the QR algorithm with double shift A + eE A + eE S (A +eEE)S 10 2 15 19 102 20 2 27 28 102 30 2 42 45 102 40 2 53 56 102 50 2 79 70 102 Example 2.6.1 If Algorithm 2.6.1 s applied to the matrix with Uo = diag( 9.5 + 16.4544 9.5 + 16.4544 19, 9.5 16.45448i 9.5 16.45448i), where the elements of Uo are the uniformly distributed points on a circle, which includes all Gerschgorin disks for = I, = 106 = 104 then after 13 iterations we obtain a block Schur decomposition SHA where = diag 2.25911  3.43367i, 1.83571  3.95 2.25911 3.43367i, 7.52532 1.83571 + 3.9588 ,7.37213 ), is the strictly upper tri angular part of U. Figure 2.1 shows how the initial eigenvalues of Uo converged to the eigenvalues of A. If we use the QR method with double implicit shift to A with the above tolerance, then after 9 iterations we obtain a real Schur form QTAQ Next consider the following matrix A1, =R. which we get by perturbing the elements of 4.0022 0.0068 Snn01 4.9949 3.9940 A Anna 0.0063 .9956 _9 QQ1i 3.0095 4.9915 n nn09 1.9959 0.9942 > nnaR 1.0072 2.0080 9 QQ090 = U, D + 20r 15 10 5 0 5 10 15 20" 20 Figure 2.1. 15 10 5 0 5 10 15 The path of convergence of the eigenvalues of the matrix A. If we use Algorithm 2.6.1 to Al with Uo = U S, o = S, the tolerance, and the coalescing tolerance are as above, then after 4 iterations we get a block Schur decomposition PHAIP =9 whereas f we apply the QR method with double implicit shift A1Q with the same tolerance, then after 8 iterations we obtain a real Schur form VT(QT AiQ)V =F . Example 2.6.5 If Algorithm 2.6.1 is applied to the matrices with diagonal matrices, = diag( 19, 3 + 13, 3  16i ), = diag( 7 E = diag( + 30i, 32,  30i ), and fl = diag( 1, 0.5 + 0.8660i 0.5+0.8660i 0.5 0.8660i, 0.5 0.8660i ) as Uo, where the elements of A, e, and (2 are the uniformly distributed points on circles, which include all Gerschgorin disks for B, , D, and E respectively, So = I, tol = 106 , and toll , then after 16, 112, 12, and 11 iterations we obtain block Schur decompositions B, C respectively. If we use the QR method with double implicit shift, then after 6, 14, and 1 iterations we obtain real Schur forms of B, C, and D respectively, whereas E diverges. Example 2.6.6 If we apply Algorithm 2.6.1 Xdiag( 1, 1, 1, 1, 3, 3, 0, 5, 4, 5, 4 )X1 where X = Pdiag(1, , 5, 6, 7, are orthogonal matrices with = diag( 43.6939 41.5811+ 13.3395i 35.4496+ 5.3733i, 25.8996+34.9233i, 13.8658+41.054 0.5263+43.1676i 12.8132+41.0548 24.8470 +34.9233i 34.3970 +25.3733i, 40.5285+ 13.3395i 42.6413 .5285 13.3395i, 34.397025.3733i, 24.847034.9233i, 12.8132 41.0548i, 0.526 43.1676i , 13.865841.0548i, 25.899634.9233i, 35.449625.3733i, 41.5811 13.3395i), where the elements of Uo are the uniformly distributed points on a circle, which includes all Gerschgorin disks for A, So = I, tol = 106 and toll S104 then after 29 iterations we obtain a block Schur decomposition SHA where =D+N , D = diag( 3, 3, 0, 4, and N is the strictly upper triangular part of U. If we use the QR method 1, 1, 1, 1, 1, , 5), 68 the tolerance, and the coalescing tolerance are the same, then after 4 iterations we obtain a block Schur form PHAIP double implicit shift to QTAIQ, AzQ)V whereas if we apply the QR method with then after 19 iterations we obtain a real Schur form =rF. Example 2.6.7 If Algorithm 2.6.1 is applied to the matrix C in Example Uo = diag( 139.50, 132.6969+42.9537i, 1.3 with 112.9537+81.7022i, 82.2022+112.4537i, 43.4537 +132.1969i 0.50 + 139.00i 42.4537 + 132.1969i, 81.2022 + 112.4537i, 111.9537 + 81.7022i 131.6969+ 42.9537i, 138.50, 131.6969 42.9537i,  111.9537 1.7022i, 81.2022 112.4537i 112.4537i, 112.9537 42.4537132.1969  81.7022i, 132.6969 i, 0.50139.0i, 43.4537132.1969i, 82.2022  42.9537i ), where the elements of Uo are the uniformly distributed points on a circle, which includes all Gerschgorin disks for I, tol = 106 and toll = 104 then after 26 iterations we obtain a block Schur form SN CS= Figure 2.2 shows how the initial eigenvalues of Uo converged to the eigenvalues of C. If we use the QR method with double mplicit shift to C, then after 27 iterations we obtain a real Schur form QTCQ = Next consider the perturbed matrix C1 = C + P/1000, where the matrix from Example 2.1.3. If we apply Algorithm 2.6.1 to C1 with the tolerance, coalescing tolerance are the same, then after 2 terations we obtain a block Schur form pHc1p = f, whereas f we apply the QR method with double implicit shift to QTCtQ, then after 27 iterations we obtain a real Schur form VT(QTC1Q)V In the following table for different values of n we give the number of iterations required to obtain block Schur decompositions of A by Algorithm 2.6.1 with diagonal = '*, , So = = U, = S VT(QT I t I 150' 150 Figure 2.2. 100 50 0 50 100 The path of convergence of the eigenvalues of C. where ag, are real numbers and < 1, the tolerance coalescing tolerance toll = 104: number of iterations number of iterations for for our algorithm the QR algorithm with double shift 20 25 27 40 44 58 60 64 79 80 75 110 100 80 134 120 93 178 140 103 193 Algorithm 2.6.1 can be applied to all matrices A with So = I, and diagonal ma trices as Uo, where the elements of Uo are the uniformly distributed points on circles, = 106 convergence. To prevent it, we take one fourth of the stepsize from iteration 1 iteration 2 or 4. If the size of the matrix is very large, then we may have to take one fourth of the stepsize from iteration 1 to iteration 6 or 10. the following. The reason of doing this is If all diagonal entries of the upper triangular matrix at the beginning become real, then the approximate eigenvalues are clustered on a segment of the real line. When two elements are close to each other, but not within the coalescing tolerance, then in the updating process we are dividing by a small number ( which is their difference is the case when the size If this happens in almost all iterations at the beginning, of a matrix is greater than 10, which then it brings instability in the updating process not work very well, the given matrix is u . In fact for that reason the real version of Algorithm 2.6.1 does when a diagonal matrix formed from the diagonal elements of ised as the starting upper triangular matrix. 2.7 Parallel Processing in Eigencomputations We already discussed the derivation of five algorithms in the previous sections to find al eigenvalues and corresponding eigenvectors of a matrix. To implement each algorithm, we need either a matrix of approximate eigenvectors, or all approximate eigenvalues, or both. But except for the last one, we have not been able to come with ways to choose either a matrix approximate eigenvectors, or all approximate eigenvalues for the starting guess. So n this section, we will only discuss the implementation of the last algorithm in a parallel computer. Most of the algorithms, which are available to find all eigenvalues of an unsym metric matrix are serial algorithms. For example, consider the famous QR algorithm, which is used to find al eigenvalues of an n x n unsymmetric matrix To make it form. The algorithm, which reduces an n x n matrix A to a Hessenberg form H the algorithm which uses double implicit shift to H are both serial algorithms. The motivation behind the derivation of various algorithms in this chapter is to find all eigenvalues of an unsymmetric matrix by making use of a sensitivity result for eigenvalues and eigenvectors. Then try to determine whether an algorithm can be implemented in a parallel computer. We already mentioned that except for the last algorithm, all other algorithms described in this chapter have some shortcomings. So we will give a sketch, how and where to modify the last algorithm such that it can be implemented in a parallel computer. For clarity, we will assume that n = rp, where p is the number of processors. We will also use the notation Proc(i) to denote the ith processor. In Step 1, with p processors, we can accelerate the merging of diagonal blocks in the following way. For a fixed < NB and NB p, we can find aj = min{ ja wI : cr E A(Uii), and wE A(Ujj =i+l ,...,NB, in separate processors. If aj < toll , then we can merge Ujj with Ua in the increasing order of If NB i > p, then we have to use some processors more than once. Proc(() will compute ct, where l C +i : p: NB. Parallel pro cessors can also be used to arrange diagonal blocks of U the decreasing order of sizes. [(] denote the smallest integer such that x If U has k blocks along the diagonal, then not more than swaps among the diagonal blocks are necessary to arrange them in the decreasing order of times to finish sizes. swaps. use m processors, where m In a single processor, we have to execute But in a parallel computer with p processors, pifp and m = < p to finish k we can :(k 1) I I I I _ I 1 v c 72 there are two processors, then we can swap the elements of NS in the following way: a 1 2 a3 04 a5 a2 a' a4 a3 a5 a2 a4 al a3 a4 a2 Cs a 03 04 a5 02 a3 a' Here we use one processor to swap blocks enclosed by a brace on the right, and the other processor to swap blocks enclosed by a parenthesis on the right. Each processor does comparisons, and In Step swaps. to compute the strictly lower triangular matrix G( we can use parallel processors. Suppose A = [A1,. S., Ap], = [ ., S], and U are column partitionings of A, S. and U respectively, where each block column has width To compute B first we find D = SU, and then find B = AS  D. Since Uj [= U, . ,U1 ,,o. .T SCrxr so Dk z Thus each Dk can be computed in a separate processor. nce U is upper triangular, so D, involves much more work than To load balance, we assign Proc( ) the computation of : n= SU( , :p : n)U(l as suggested in [Gol90, page 291]. Next with SJ , Sij e Crr Bk = ASk Dk  Dk, k = 1,... ,p. So p processors can be used to compute B, where D can be overwritten by we mentioned n Section to compute G;, we need entries below G1i on the jth block column; narnel Gi+l, ,GNBj entries on left of on the ith *.. A = I[U, t ] .. . p:n, I _I 1 * = SUk Sj Ujk, : p :n), :n)= Figure 2.3. The direction , in which the elements of G( U) are computed, when the number of processors are greater than half of the total blocks. Suppose p Then 2NB .2 we use Proc(1) to compute GNBj,  1, ,  1 in the increasing order of the column index j, and the other p 1 processors to compute the remaining elements of G(S, U) in the decreasing order of the row index i as we explain below . We use Proc(2) to compute Gn, Proc(3) to compute Gi2, NB 1,...,3. Continuing the assignment in this way, we use Proc(p) to compute Gip1, NB  1, ..., p. NB  then will continue this ascending order of assignments to those processors except Proc(1). That means , Proc(2) to compute Gip, = NB ,..,p+ 1, Proc(3) to compute = NB and in 1, ... this order if necessary Proc(p) to compute Gi2p2, NB1,..., p 1. Pictorially it can be presented by arrow heads, where ,...,2, Gip+l, ,p+ 1p Figure 2.4. The direction, in which the elements of G(S, U) are computed, when the number of processors are less than half of the total blocks. Next, suppose p [NBt + 1. 2 JI This time we will assign each processor to compute the elements in the decreasing order of the row index Proc(1) to compute G1, S. 2, Proc(2) to compute Gf2 NB, signing the columns . S in this order, we use Proc(p) to compute Gip, = NB Next we will repeat the order of assignment as in the strategy. Proc(1) to compute Gip+1, , Proc(2) to compute Gip,+2 , *, i = NB,... ,p+3, and in this order Proc(p) to compute Gip, NB,..., 2p+l . If necessary, we will repeat it. Pictorially it can be shown by arrow heads, where the elements are computed towards the arrow heads as in Figure 2.4. both strategies, we can not start all processors simultaneously. the first ft ratieoev hofnrr Prnctfi. = ..... n. commutes an element G;;, we need to make .,p+ s algorithm to compute G(S, U) is a complicated function of the block 5 all blocks are 1 x 1 matrices, then the algorithm requires n3 flops. 3 sizes of U In Step 3, to evaluate f(t) for different values of t, we can use parallel processors. To compute the lower triangular matrix L, where Li, = tGij we can use one of the strategies discussed earlier to compute G(S, U), depending on the number of processors p. Assign Proc(s), to take L where 0, 1,..., and j We can use the ideas discussed in the analysis of Step 2, to find matrix products. Next we will give a sketch to find the inverse of a matrix in a distributed memory multiprocessor, ring. where processors are set up in a An efficient way to find the inverse of a matrix A is to find the LU factorization of A with partial pivoting first. Let piv(1  1) = 0 be a zero vector. Consider the following node program, which is due to G. H. Golub and C. F Van Loan, where ( is the identity of the th processor: pivot = piv((l :n1) L = length(col) while q if j = col(q) A j { Find the permutation index piv(j), and Gauss vector A(j + 1 : n,j). Determine k with j so IB(k,q) = B(j : n, q) pivot(q) ,1: L) B(k, 1 B(j + 1 :n, q) = B(j +1 : n,q)/B(j,q) Send pivot(q), B(j + 1: n, q) to processors on the right. { Update local columns. = I, = .k; pa+(, :n,~ : p :n; j+1; q=q+1 else Receive piv(j), and A(j + 1 : n, j) from the left, and if the processor on the right is not the processor which computed A(j + 1 :n,), and j is less than the last column ndex of the processor on the right, then send piv(j), and A(j + 1 : n,j) to the right. { Update local columns. B(j, 1 B(piv(j) : L) B(j +1 :n,q : L)= B(j + 1 :n,q : L) A(j +1 : n,j)B(j,q j=3+1 end end After each processor in a pprocessor ring finishes the execution of the above code, we will get the LU factorization of A; Proc(4) will house A(1 array B, a A is of the nd piv("( form PA = LR. 1) in a local vector pivot. Thb After simplifying, we get A1 : n) in a local e above LU factorization of SL' P, and therefore to obtain A 1 we need to determine L1 and R1L' First we will discuss how to find the inverse B of a lower triangular matrix L, where L is overwritten by Consider the following node program: D = [L(k : R, ik)], col= j=1 :p:n length(col) while j : n, ( ; N= < n : L) +*+ Update local columns. if q > 1 :q 1)= D(j D(j+1 :n, 1 ,1:q :q1)  D(j + 1  1)/D(,q) = D(j +1 : n, 1 q 1) q)D(j, 1 : q 1) end end D(j,q) = 1/D(j,q) D(j + 1 : n,q) = D(j + 1 : n,q)D(j,q) end +l1 else > col(l) Receive L(j : n,j) from the right, and if the processor on the left is not the processor which houses L(j : n,j), and j is greater than the first column index of the processor on the left, then send L(j : n,j) to the left. Update local columns. D(j, 1 ifj < 1 q)/L(j,j) \ s I 1 \ I  : q) = :n, rr I r if col(q) + p= j q+1 end else j+1 end end end If each node in a pprocessor ring executes the above code, then we will get the inverse of L, and upon completion Proc(() will house L(k local array : n in a : n,k) for k = Our next move is to find the inverse B of an upper triangular matrix R, where R is overwritten with B. A node program can be structured as follows: D = [R(1 =7n col = = length(col) q=N while j > 1 if j = col(q) Send D(1 q) to processors on the right. Update local columns. a q+1 : N) = D(j, q + I :N)/D(j, q) : k, k)], D(j, q) = 1/D(j, q) : j 1,q)= D(1 j 1,q)D(j,q) j 1 < col(N) Receive R(1 : j, j) from the left, and if the processor on the right is not the processor which houses R(1 :j,j), and j is less than the last column index of the processor on the right, then send R(1 :j,j) to the right. Update local columns. : N) = D(j, q N)/R(j,j) 1,q : N)= D(1 :j 1,q :N) R(1 j 1,j)D(j,q: N) end j=j if col(q)  q=q1 end else j1 If each processor in a pprocessor ring executes the above node program, upon completion Proc(() will house R(1 In the LU factorization of A, R can be s then : k, k) for k = : p : n in a local array D. tored in the upper triangular part of A, and the unit lower triangular matrix L can be stored in the strictly lower triangular part of A. Then we can use the above code to find the inverse of the upper triangular part of A , and after a slight modification of the node program, which finds the inverse of a lower triangular matrix, we can use it to find the inverse of the strictly lower triangular part of A. Since A1 = RRL'P , so our next goal is to create a node program to find the product R'L 1. Consider the following node program: : p :n) 1; col length(col); q = 1 while j if j = col(q) Send B(1 :j, q) to processors on the left. { Update local columns. 1 :q 1) + B(l:j,q)B(j,1 :q1) aql dimensional row zero vector. j+1> col() j > colqi) = A(1 : n, : p :n; j1 the left. { Update local columns. 1, 1 :q) +A(1 :j,j)B(j,:q) j+l if col(q) + p q+1 end else j+l end end end After each processor in a pprocessor ring executes the above code, we will obtain the product R1L1 which overwrites A, and upon : n) in a local array completion Proc(() Since A is overwritten by R1L will house , and the permutation matrix P is encoded in the (n 1) dimensional vector piv, so (R L1)P can be computed in the following way: for k piv(k) A(1 : n, k) A(1 : n, piv(k)) end end Next , if we store the upper triangular matrix U in the upper triangular part of a :q)= j,1 : n,( product G( U)U in a more efficient way. After computing UG(S, U), S1 S1A G(S,U)U, we can take advantage of parallel processors to compute U(t) U + t(UG( S'AS U)U). Next after Z(t) = A S(t)U(t) is calculated we can find trace(Z(t)HZ(t)) in the following way. Let x(1 = 0 be a zero vector. To compute ZH we use the following procedure: for i = 1 r y = y + [B(1 : n,i)]HB(1 : n,i) end Here each processor can execute the above code simultaneously, and upon com pletion Proc(t) houses xz() in a local variable y. Then f(t) = v/1 + t..+ XL. In Step 4, f NB m < p, then we can not use all processors to find i and iu. Suppose NB p. Using a similar technique, as d 1, we can get an array k such that ipk(i) II k(i + 1) described in the analysis of Step holds. To determine I = min{ .,NB m1 , Re (ri) can use p processors as follows. m] 2 Define /tk(1:s+1) Ik(s+1:2s+l) /lk((p2)s+l:(p1)s+l ILk((p1 )s+1:NBm) dk(l:s+l) dk(s+1:2s+1) dk((p2)s+l:(pl)s+l) dk((p1l)s+1:NBm) we =Z B y= x : S  ,U) + UG( f(t) = :n, 1 +(( /tk(i+1)  ik(i) dk(i) dk(i+l) >0& vf= () for i = L= length(f); r(1: L 1)= 0 1 :L  gi+x /=1; for i=1 if Re(ri) > IOAr I end end ifj =0 6=0 else 1+j end If each processor executes the above code, then upon completion Proc(C) houses = and Next the desired minimum is u(0o) = min{ 1,... ,p }, and if the corresponding v(0o) then define i = t 1 + min{k(v($o)), k(v(Co) + 1)}, and iu = t 1 + max{k(v(o)), k(v(o) + 1)} In Step we can use parallel processors, as described in the analysis of Step and Step 3 to obtain Unew Snew Z = Snew Unew ( Snew In Step 6, if m then we can assign a separate processor to implement the code to each diagonal block Ujj, < m. If m > p, then assign Proc(t) to implement = b. < p, u(t), a = col(q); Implement Step 6 to U00 end In Step we can use parallel processors to find a QR factorization of S. Here we will discuss how to find a QR factorization of a matrix A in a distributed memory multiprocessor, where processors are interconnected in a ring. Let b(1 :nl)= 1 0 be a zero vector. C.F Consider the following procedure, which is due to G. H. Golub and Van Loan: B = A(1 :n, 1; col = while q if j { Find Householder vector A(j + 1 B(J: n, q) j2 if a = 0 B(j : n, q) = 0; quit else f B(j, q) /3= aB(j, q)/ B(j, q)j else end B(j + 1 : n,q) = B(j + 1 B(j,q) = o SI I :n1) length(col) = col(q) Sn,j). q)+ + f; 9 /9 = B(j : nq)/ B(j :n,q+ 1 : L)= B(j : n,q+ 1 : L) + c(q)vvHB(j :n,q + 1 j+1; q=q+1 else Receive b(j), and A(j + 1: n,j) from the left, and if the processor on the right is not the processor which formed A(j + 1: n,j), and j is less than the last column index of the processor on the right, then send b(j), and A(j + 1 : n,j) to the right. { Update local columns. [ A(j+ 1 :n,q: L) :n, j) =B(J : n,q: L) + b(j)vvHB(j : n, q : L); j+1 end end After each processor in a pprocessor ring executes the above code, QR factorization of A, and Proc(() will house A(1 and b( : p : n 1) in a local vector c. we will get a : n) in a local array The upper triangular part of A is overwritten by the upper triangular part of R, and components j + 1: n of the jth Householder vector are stored A(j + : n, j), Here we need an explicit form of the unitary matrix Q. We find Q using the following procedure, which uses Householder vectors A(j + 1 :n,j) where j < n, and the vector b to form Q explicitly, and zeros subdiagonal elements of A. Let Q = I,: B = A(1 :n, I=Q(1 :n1) : n, ( c=b(( : ni,( Send c(q), and B(j + 1 : n, q) to procesors on the left. { Update local columns. B(j + 1 : n,q) : n,q) : n,q : L) = I(j :n,q: L) + c(q)vvHI(j : n,q :L); j= 1 else Receive b(j), and A(j + 1 : n, j) from the right, and if the processor on the left is not the processor which houses A(j + 1 : nj), and j is greater than the first column index of the processor on the left, then send 6(j), and A(j + 1 n,j) to the left. { Update local columns. A(j + 1 Sn, j) :n,q = I(j: n,q : L)+ b(j)vvHI(j :n,q j1 if col(q) p = j q= q1 end end end If each processor in a pprocessor ring executes the above code, then upon com pletion Proc(5) will house : n in a local array B, and : n) in a local array t we can use parallel processors, as described n the analysis of Step to find the product Unw' QHZQ. Finally in 3 to evaluate tep 9, we can use multiprocessors, as described in the analysis of Step f(O) = IA ZIF. B(j + 1 : L); CHAPTER 3 EIGENVALUES OF SYMMETRIC MATRICES 3.1 Diagonalization of a Symmetric Matrix using Armiio's Stepsize be an nx n symmetric matrix. Suppose the eigenvalues of are all distinct. In this section we will find an iterative method to compute a diagonalization = QAQT where A = diag l, A2, *.., An ), and s an orthogonal matrix such that the diagonal elements of A are the eigenvalues of A, and the columns of Q are eigenvectors of A. Consider the following algorithm to compute the eigenvalues, and corresponding eigenvectors of a matrix Detail is given in [Hag Spage 304 14], or in Chapter 1: \new new a (yoldT Aold I to n, " AxPtd n ew ~ ^a to n, (3.1) Y new (Xnew 1 Algorithm in (3.1) can be rewritten in the following way: A new Xnew where Diag(M) = diag( mn, Diag((Xold) Xd (I + F (X m22, (,0 S0 AXold), )d Anew)) (3.2) mnn ), and ,A new) = ((Xold 1AX od) f/j (Xold TTM"9 In (3.2), if we put ynew" = I + F (Xd , A"ew), and AOd (xo") AXOt"d then it reduces to the form: Diag((y"d)1 xold y new Aotdyoud) Hence n each iteration , we can update the eigenvalues, and corresponding eigenvec tors in the following way: A new y new AT (yold) old yold Diag (A "), I + F(Anew) Sold ynew (3.3) where fi (Anew) Jii A new a new aL fori new  aii updating process in (3.3) does , in the update Anew S(yold) not restore the Aotdyyotd symmetry of the given matrix In each iteration to restore the symmetry of A, and the orthogonality of X we do the following. Let ynew, = Q~uR"f"r be aQR factorization of Y n"1 .Now if we take An"f (QoldjT = Q AoldQold old snewr then Ane" will be symmetric, and X new will be orthogonal. With these modifi cations (3.3) becomes: Anew A new XTneux (Qold T AoldQold Diag (A"new), I + F(Anew), Qn"" Rnew Xold new ( QR factorization ) (3.4) where F AneW\ ) new j yUCw """" /  for t Anew ynew ,c ( Anew " the increment F (AnefC) may too large. However with a large increment, may likely have instability updating process, algorithm may verge. To restore the convergence of the algorithm, to have a steady change value of y"e"w we need a small increment in each iteration. achieve this, we redefine each a positive parameter, and iterate, and introduce a small positive parameter. define Z(s)ne" Let s = I, and = I + sF (Ane). _ ynew , the matrix generated (3.4) . Define IG(s) IF, where = lotr (( s)old) lotr(B) is a strictly lower triangular matrix formed from the strictly lower triangular part of B. Suppressing the superscripts in Z(s)new we have = I + sF(A) = lotr( (s>)AZ( Hence f(O)= ljlotr(A) IF, and f(1) Ilotr (Y 1AY) With a starting guess Xo, if the algorithm in (3.4) converges, then we must have < f(O) . As in Chapter here our goal is to find an s, for which holds. To this end , we use Armijo's rule from optimization theory. Armijo rule, we determine s in the following way. Evaluate f( s) at s=l, 1 '2 stopping when  f(o). To use the above inequality, f(s) must satisfy f'(0) = f(O). ce our next goal is to evaluate f'(O), when f(s) = IIG( s)llF with G( s) = lotr( s)IA (s)). Before we f'(0), we will prove the following basic result. Lemma 3.1.1 E IR7x n is a symmetric matrix, and B e Rnxn" a skew symmetric matrix, then AB BA is a symmetric matrix. (0)new (1),new G(s), _ __ _ I_ 1 _  A'BT (B)A A(B) ABBA Hence AB BA is a symmetric matrix. With the above definition of f(s), now we are ready to find f'(O), and will show f'(O) =  f(). Theorem 3.1.1 e RnXn a symmetric matrix. Suppose its eigenvalues all distinct. Define + sF(A), = lotr ((Z( where fj A) = ajj  a lotr(B) a strictly lower triangular matrix formed from the strictly lower triangular part of B. Then f'(0) = f(O). Proof of Theorem 3.1.1 Using the result (2.18) of Lemma 2.3.1, we have d f(O)f'(0) = trace G(O)T G(s ds Differentiating G(s ) s=o ) with respect to s, and then evaluating the derivative at s = 0, we obtain d G(s) I= o ds = lotr( d s) 1=o (0Z(o)A Z(O) + Z(0O)A Z(s)I ds Differentiating s) with respect to d s, we have Z(s) = F ds d SHence ds s)1So = F Since G(0) = lotr(Z(O)lAZ(0)) = lotr (IAI) = lotr(A), so (3.5) becomes f(0)f'(0) = trace ((lotr(A))T lotr (AF  FA)). (3.8) Because A is symmetric, so (lotr (A))T = uptr(A) , where uptr(A) is a strictly upper triangular matrix formed from the strictly upper triangular part of A. reduces to Hence (3 f(0)f'(0) = trace (uptr (A) lotr (AF FA)). (3.9) Next, for aj s3 a" Since symmetric, fi= fji. That is F is a skew symmetric matrix. Hence by Lemma 3.1.1, FA is symmetric. Now FA= all aii n Si=l2 ittz n a i=a1 22 ig2 Z2iail liai2 a EQ n 2 a ;=2a11 n1l nl i=1  aii ann ii a2iain ann  aii n a aliain all ali a2iain i=l 22 t12 n v ^ u;1 niail  a n1 + n i1 ana  22 ii nl1 i=1 ann tiai2 a  a i Therefore (3.9) gives: f(0)f'(0) = akiail  aji akiail a trace akiai2 na 99, a M, 2iail a akiai2 arj  aQ_ I n a i=211l 2  liai2 aliain  aii  aii n1 2 i=la :1 a~n aniai2 anln nin1  n1 t" afni i=1 ann Gin1  an where X are numbers. Simplifying the right hand side of the above expression, we f(0)f'(0) akiail all  aii 1=1  aii k=3 a22 ii akiai2 akk ai a22a21 12 (  a22 a33a32 +  nln aniain1 In1 a21a11 + .Q a, a22  all all  ann no, n2 n1 + 2 i=l ann anlal ann  S,_II  aji 1 i) afn2022 22 a33 a33 a22 )+ + a 2n ( annlnlnl 22  nn a22 1 <33 n1l  all  22  a22 1 n2n1 n2n nIn an2n2 ann anlni  n2n2  an2n2 0n2n2 nln1 1 an  nn1 Onn Clnln1 f (0) = f(O), provided Hence Theorem 3.1.1 implies that Armijo's rule can be applied to the algorithm +a23 (a Sannan1. +" "anln  \Unln1 nn a22  33 a22  all S+ + E *K ann aniain1 akiail akiai2 ( a11 n anl 232a22 ~a12a13l23  11n OnI anlni ann  f(O)2 Algorithm 3.1.1 (Symmetric Diaconalization) Given a symmetric matrix A e R"x" whose eigenvalues are all distinct, an orthogonal matrix Xo, a tolerance greater than the unit roundoff, the following algorithm computes a diagonalization = A, and A is overwritten with the diagonal matrix A. Step 0: Take X = Xo. Compute Ane" = XTAOldX f() = II lotr (A) hI. Step 1: Construct F = (fij), where ij  = IIG(s)IIF, where G(s) = lotr ((I + sF)< A(I+ sF)) Evalu ate f(s) at s = 1, * *, stopping when ) f(O). (3.10) Let t be the first value of s for which (3.10) holds, and let Y = I+tF Let Y = QR be the QR factorization of Y. Step 3: Update A, and X as follows. Anew = QTAodQ, and Xne" = XoldQ. Step 4: Evaluate f(0) = lotr (A) Go to Step 1 until f(O) < tol, where tol is the tolerance for the desired accuracy of the eigenvalues. Example 3.1.1 If Algorithm 3.1.1 is applied to the matrix with X0 = I, and = 106 , then after 6 iterations we obtain a diagonalization r'T A r S S S n  < 1 ajj aii which we obtain by perturbing the elements of A: A = If we use Algorithm 3.1.1 to A1 2.002 1.001 3.003 with Xa 1.001 1.001 2.002 3.003 2.002 1.004 and tol = 106 then after 2 iterations we obtain a diagonalization YTA1Y = A, whereas if we apply the QR method for a symmetric matrix with single mplicit shift to XT A1X then after 3 iterations we get a diagonalization ST(XTA1X)S = D. Example 3.1.2 If Algorithm 3.1.1 is applied to the matrix B = Qdiag(1,2, 3, 4, 5, 6)QT, where Q is an orthogonal matrix, with Xo = I, and tol = 106 , then after 7 iterations we obtain a diagonalization XT BX = diag( 5, 6, 2, , 4, 3), whereas if we apply the QR method a symmetric matrix with single implicit shift to B, then after iterations we get a diagonalization PTBP = diag Example 3.1.3 If Algorithm 3.1.1 6, 5, 1, 4, 2, 3). applied to the matrix with Xo xTCX.  106 then after 7 iterations we obtain a diagonalization If we apply the QR method for a symmetric matrix with single implicit shift to C, then after iterations we get a diagonalization QTCQ Next consider the following matr which we obtain by perturbing the elements of C: 6.004 4.006 4.006 4.007 2.007 1.004 2.006 2.009 1.002 2.003 3.007 3.004 = X D 