UFDC Home  myUFDC Home  Help 



Full Text  
DIFFUSION TENSOR FIELD RESTORATION AND SEGMENTATION By ZHIZHOU WANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2004 Copyright 2004 by Zhizhou Wang I dedicate this work to my family. ACKNOWLEDGMENTS I would like to first thank my advisor, Dr. Baba C. Vemuri, for everything he has done during my doctoral study. This dissertation would not be possible without him. Dr. Vemuri introduced me to the field of medical imaging and always helps and encourages me. His insight and experience have guided me through my research and he gave numerous II..iI li..i, to this dissertation. I have been very lucky to work with him. I would also like to thank my excellent committee, Dr. Yunmei Chen, Dr. Anand Rangarajan, Dr. Murali Rao, Dr. Sartaj K. Sahni and Dr. Baba C. Vemuri, for providing valuable advice. In addition, special thanks go to Dr. Arunava Banerjee for attending my defense. My doctoral research is a happy cooperation with many people. Dr. Vemuri has been involved with the whole progress, Dr. Chen has contributed a lot in the DTI restoration part, Dr. Thomas H. Mareci and Evren Ozarslan have been my resource of DTMRI 1]li',i , and have kindly provided the data for DTI segmentation, Dr. Rachid Deriche has provided excellent comments on my work in DTI segmentation and Tim McGraw has provided beautiful fiber visualizations to make my work more appealing. I would like to thank the editors in the Editorial office for their careful and thorough corrections. I would also like to thank the staff and the faculty members who have been very patient and helpful during my study here. In particular, Dr. David Gu and Dr. Arunava B.fl1 i1. have attended several of my seminars and showed me quite a few interesting points. I have also benefited a lot from the course taught by Dr. David Metzler and Dr. Anand Rangarajan. Friends in my lab, my department and the University of Florida are much appre ciated for sharing my joys and sadnesses. Special thanks go to Xiaobin Wu, Fangwen Chen, Fuwang Chen and Daying Zhang who have helped me to pull through the busiest time in my life. I am also fortunate to enjoy the friendship of Jundong Liu, Ju Wang, Haibin Lu, Fei Wang, Jie Zhou, Tim McGraw, Eric Spellman, Bing Jian, Neeti Vohra, Andy Shiue LeJeng, Lingyun Gu, Hongyu Guo and their families. Though it is a distant memory, I have always cherished the mentorship from some of my advisors in the long past: Anan Sha, Yining Hu, Shengtao Qin, Yuan Zhu and Yuqing Gu. They guided me through different stages of study and life, I would not have been able to achieve what I have today without them. Last, but not least, I would like to thank my parents, my sister and my wife Jing for their understanding and love during the past few years. Their support and encouragement are my source of strength. I am also thankful for my son who has inspired me a lot with his little discoveries. This research was supported in part by the grants NIH R01NS42075. I also have received travel grants from IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2003 and the Department of Computer and Information Science and Engineering, College of Engineering, University of Florida. Chapter 2 is copyrighted by IEEE and was published in Proceedings of the 2004 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, coauthored with Baba C. Vemuri, Evren Ozarslan, Yunmei Chen and Thomas H. Mareci. Chapter 3 is copyrighted by IEEE and will appear in IEEE Transaction on Medi cal Imaging, coauthored with Baba C. Vemuri, Yunmei Chen and Thomas H. Mareci. A preliminary version of chapter 4 is copyrighted by IEEE and will appear in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, coauthored with Baba C. Vemuri. TABLE OF CONTENTS page ACKNOWLEDGMENTS ............... .......... iv LIST OF TABLES ............... ............. ix LIST OF FIGURES . .............................. x ABSTRACT ................ ............... xii CHAPTER 1 INTRODUCTION . ............... ......... 1 1.1 ABC of Magnetic Resonance Imaging . . . ... 1 1.1.1 Nature of a Proton Spin with a Magnetic Field ...... 1 1.1.2 Net Magnetization and Its Detection ............ 2 1.1.3 Magnetic Resonance Imaging ...... . . ... 3 1.2 Background of DTMRI ................. .... 5 1.2.1 Diffusion Process . . . . . 5 1.2.2 Diffusion Weighted Images . . . . .. 5 1.2.3 Application of DTMRI . . . . . 7 1.3 Literature Review. . . . . . .. 9 1.3.1 Optimal b Values . . . . . .. 9 1.3.2 DTI Restoration . . . . .. . 10 1.3.3 DTI Segmentation . . . . . 12 1.4 Contributions . . . . . . . 14 2 STATISTICAL ANALYSIS OF A NONLINEAR ESTIMATOR FOR ADC AND ITS APPLICATION TO OPTIMIZING b VALUES . 17 2.1 Introduction . . . . . . . 17 2.2 Theory . . ...... . ....... 18 2.2.1 A Nonlinear Estimator for ADC . . . 18 2.2.2 Variance of the Nonlinear Estimate for ADC . . 20 2.2.3 Weighted COV of the Nonlinear Estimator for ADC . 21 2.2.4 Optimal Diffusion Weighting Factors . . 22 2.3 Results ....... . . .. ....... ...... 22 2.3.1 Simulation Results of Different Estimators . . 22 2.3.2 Variance of Different Estimates . . . 23 2.3.3 Weighted COV and Optimal Diffusion Weighting Factors 25 3 A CONSTRAINED VARIATIONAL PRINCIPLE FOR DIRECT ESTI MATION AND Y\IOOTHING OF THE DTI FROM COMPLEX DWIS 28 3.1 Introduction .................. ........... 28 3.2 Constrained Variational Principle Formulation .......... 29 3.2.1 The Complex Nonlinear Data Constraint ......... 31 3.2.2 LP Smoothness Norm .... ........... .. 32 3.2.3 The Positive Definite Constraint ............ 33 3.2.4 Existence of a Solution .. . . . .. 33 3.3 Numerical Methods . . . . . . 39 3.3.1 Discretized Constrained Variational Principle . . 40 3.3.2 Augmented Lagrangian Method . . . 40 3.3.3 Limited Memory QuasiNewton Method . .... 41 3.3.4 Line Search . . . . . . 44 3.3.5 Implementation Issues . . . . .... 44 3.4 Experimental Results . . . . . . 45 3.4.1 Complex Valued Synthetic Data . . . 45 3.4.2 Complex DWI of a Normal Rat Brain . . ... 48 4 DTI SEGMENTATION USING AN INFORMATION THEORETIC TEN SOR DISSIMILARITY MEASURE . . . . .. 58 4.1 Introduction . . . . . . . 58 4.2 A New Tensor Distance and Its Properties .. . . 59 4.2.1 Definitions . . . . . . 59 4.2.2 Affine Invariant Property. . . . . 61 4.2.3 Tensor Field Mean Value. . . . . 63 4.3 Active Contour Models for Image Segmentation . .... 66 4.3.1 Snake Energy Functional . . . ..... 66 4.3.2 Curve Evolution . . . . . . 69 4.3.3 Level Set Formulation . . . . . 70 4.4 Piecewise Constant DTI Segmentation Model . . .... 70 4.4.1 Curve Evolution Equation . . . ..... 72 4.4.2 Level Set Formulation . . . . . 72 4.4.3 Implementation and Numerical Methods . .... 73 4.5 Piecewise Smooth DTI Segmentation Model . . 75 4.5.1 Discontinuity Preserving Smoothing . . .... 75 4.5.2 Curve Evolution Equation and Level Set Formulation 77 4.5.3 Implementation and Numerical Methods . .... 77 4.6 Experimental Results . . . . . . 79 4.6.1 Synthetic Tensor Field Segmentation . . 79 4.6.2 Single Slice DTI Segmentation . . . 80 4.6.3 DTI Segmentation in 3D . . . ...... 80 5 CONCLUSIONS ................... ............ 92 5.1 Optimal b Values ................... ........ 92 5.2 DTI Restoration . . . . . . . 92 5.3 DTI Segmentation . . . . . ...... 94 REFERENCES ..... . . ... ............... 96 BIOGRAPHICAL SKETCH . . . . . . . 102 LIST OF TABLES Table page 1.1 MRI properties of different tissues. . . . . . 4 2.1 Optimum Qvalue for typical ADC range of human brains with multiple measurements. . . . . . . . 26 2.2 Optimum diffusion weighting factors with multiple measurements for normal rat brains. . . . . . . . 26 3.1 Comparison of the accuracy of the estimated DEVs using different methods for different noise levels. . . . . 48 LIST OF FIGURES Figure page 1.1 Proton spin and its interaction. . . . . . 2 1.2 Equilibrium magnetization of proton spins. . . . 3 1.3 Detection of net magnetization. ..... . . 4 1.4 Ellipsoid visualization of diffusion tensors. . . . 5 1.5 A slice of DWI taken from a normal rat brain. . . . 7 1.6 A slice of DWI taken from a normal rat brain with various diffusion weighting factors. . . . . . . ... 8 1.7 Fiber tract maps computed from diffusion tensor imaging . 9 1.8 A general framework of DTMRI analysis . . . 10 2.1 Distribution of ADC values in the white matter and the gray matter of a normal rat brain. . . . . . . 22 2.2 Simulation results of different ADC estimates. . . 24 2.3 Variance of different ADC estimates when a = 0.05. . . 25 2.4 Weighted COV of the nonlinear ADC estimate versus the diffusion weight factors. . . . . . . .... 27 3.1 A slice of several volumetric complex DWIs . . 50 3.2 A slice of the original (ground truth) and the estimated diffusion tensor fields .... . . ... .............. 51 3.3 A slice of the computed DEV field for the noisy synthetic data . 52 3.4 A slice of the normal rat brain DTI estimated using multivariate linear regression without ihiilm., viewed channel by channel. . 53 3.5 A slice of the normal rat brain DTI estimated using multivariate non linear regression without smoothing, viewed channel by channel. 54 3.6 A slice of the normal rat brain DTI estimated using using our proposed method, viewed channel by channel. . . .... . 55 3.7 A slice of the computed DEV field from a normal rat brain. . 56 3.8 A slice of the normal rat brain DTIs in the region of the cerebellum viewed channel by channel. . . . . .. 57 4.1 Transformation of tensor fields. . . . .. . 62 4.2 Evolution of a curve and its corresponding higher dimension function. 71 4.3 Segmentation of a synthetic tensor field where two regions differs only in the orientations, . . . . .. ...... 82 4.4 Segmentation of a synthetic tensor field where two regions differs only in scale, ..... . . ... ............ 83 4.5 Segmentation of a synthetic tensor field with two roughly homogeneous regions that only differ in the orientations. . . 84 4.6 Segmentation of a synthetic tensor field with two roughly homogeneous regions that only differ in scale. . . . .. . 85 4.7 A slice of the DTI of a normal rat spinal cord. . . 86 4.8 Segmentation of the slice of DTI shown in Fig. 4.7 . ..... 87 4.9 A slice of the DTI of a normal rat brain. . . . 88 4.10 Segmentation of the corpus callosum from a slice of DTI shown in Fig. 4.9.. . . . . . . . . 89 4.11 Segmentation of the corpus callosum from a slice of the DTI in Fig. 4.9 using piecewise smooth model. . . . ....... 90 4.12 3D Segmentation of the DTI of a normal rat spinal cord. . 90 4.13 3D Segmentation of the corpus callosum from the DTI of a normal rat brain. .... . . . ... .............. 91 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DIFFUSION TENSOR FIELD RESTORATION AND SEGMENTATION By Zhizhou Wang August 2004 Chair: Baba C. Vemuri AI. P] r Department: Computer and Information Science and Engineering Diffusion Tensor Magnetic Resonance Imaging (DTMRI) is a relative new MRI modality that can be used to study tissue microstructure, e.g., white matter con nectivity in brain in vivo. It has attracted vast attention during the past few years. In this dissertation, novel solutions to two important problems in DTMRI analysis: diffusion tensor field (aka diffusion tensor image, DTI) restoration and segmentation are presented. For DTI restoration, we first develop a model for estimating the accuracy of a nonlinear estimator used in estimating the apparent diffusivity coefficient (ADC). We also use this model to design optimal diffusion weighting factors by accounting for the fact that the ground truth of the ADC is a distribution instead of a single value. The proposed method may be extended to study the statistical properties of DTI estimators and to design corresponding optimal acquisition parameters. We then present a novel constrained variational principle for simultaneous smooth ing and estimation of the DTI from complex valued diffusion weighted images (DWI). The constrained variational principle involves the minimization of a regularization term of LP smoothness norms, subject to a nonlinear inequality constraint on the data. The data term we nnrpl: is the original St. iLi.1Tanner equation instead of the linearized version usually employed in literature. The complex valued nonlinear form leads to a more accurate (when compared to the linearized version) estimate of the DTI. The inequality constraint requires that the nonlinear least squares data term be bounded from above by a known tolerance factor. Finally, in order to ac commodate the positive definite constraint on the diffusion tensor, it is expressed in terms of Cholesky factors and estimated. The constrained variational principle is solved using the augmented Lagrangian technique in conjunction with the limited memory quasiNewton method. For DTI segmentation, we present a novel definition of tensor "iliI ... grounded in concepts from information theory and incorporate it in the segmentation of DTI. Diffusion tensor is a symmetric positive definite (SPD) tensor at each point of a DTI and can be interpreted as the covariance matrix of a local Gaussian distribu tion. Thus, a natural measure of dissimilarity between diffusion tensors would be the KullbackLeibler(KL) divergence or its relative. In particular, we define a tensor "ilii .11i.. as the square root of the Jdivergence (symmetrized KL) between two Gaussian distributions corresponding to the tensors being compared. Our definition leads to novel closed form expressions for the "ili1 .. itself and the mean value of a DTI. We then incorporate this new tensor "ilii.... in a region based active contour model for DTI segmentation and the closed expressions we derived greatly help the computation. CHAPTER 1 INTRODUCTION In their seminal work [1], Basser et al., introduced diffusion tensor magnetic resonance imaging (DTMRI) as a new MRI modality from which anisotropic water diffusion can be inferred quantitatively. Since then, DTMRI has became a powerful method to study the tissue microstructure, e.g., white matter connectivity in the brain in vivo. DTMRI analysis consists of a variety of interesting problems: diffusion weighted image (DWI) acquisition parameter optimization, diffusion tensor field (aka diffusion tensor image, DTI) restoration, DTI segmentation, DTI registration, DTI atlas construction, fiber tracking and visualization. Since this research area is still very young, most questions raised in the above problems have not been solved yet. In this chapter, the basics of MRI and DTMRI are reviewed, followed by a literature review of the state of art research in DTMRI analysis. 1.1 ABC of Magnetic Resonance Imaging Magnetic resonance imaging (\lI I) is a powerful noninvasive imaging modality used mainly to "see" the soft tissues inside the human body in medical sciences. It has its beginnings in the early 1970s and is based on the p1r, i of nuclear magnetic resonance that was thoroughly studied in the first half of the 20th century. Here only the most fundermental concepts and ideas in MRI are briefly presented; readers are referred to Haacke et al. [2] for an excellent and complete coverage of this topic. 1.1.1 Nature of a Proton Spin with a Magnetic Field 1H or proton MR is the most common technique for imaging biological systems due to its high concentration in the human body and high sensitivity. A classical view of a proton is a tiny spinning positive charge. The high angular rotation of It Bo I (a) (b) Figure 1.1: Proton spin and its interaction. (a) Proton spin. (b) Interaction of a proton spin with an external magnetic field. a proton causes a circular current loop I, which in turn brings a magnetic moment gt (see Fig. 1.1(a)) As shown in Fig. 1.1(b), a spinning proton immersed in an external magnetic field Bo will have a processional motion around the field direction if gt is not aligned with Bo. The frequency of this precession is called the Larmor frequency and is given by wo = 7Bo (1.1) where the constant 7 is called the gyromagnetic ratio. 1.1.2 Net Magnetization and Its Detection Consider an object that contains numerous protons and place it in a large static magnetic field. Eventually the processing protons will be aligned with the external magnetic field; however, the alignments for a group of protons might be parallel or antiparallel due to thermal agitation with the absolute temperature T. Still, the excess of parallel alignments will bring a net equilibrium magnetization Mo that is proportional to the spin density po everywhere (Fig. 1.2). Even in a large body, the nonvanishing net magnetization is not significant enough to be detected when it is aligned with the applied magnetic field. However, M B0 Figure 1.2: Equilibrium magnetization of proton spins. The black colored ones are parallel while the the gray colored ones are antiparallel to the external magnetic field. as shown in Fig. 1.3, the magnetization vector can be tipped away by a nearby radio frequency (rf) magnetic pulse and then process around the static magnetic field and its frequency as given by equation (1.1). It is now possible to detect this processing magnetization using a closely placed coil in which an electronic signal is induced and measured. There are three tissue properties that have important impact on the strength of the signal. One of them is the spin density, the other two are the longitudinal relaxation time T1 and the transversal relaxation time T2. Both T1 and T2 are tissuespecific time constants for protons and measure the rate of change in net magnetization. While Ti decides the time taken for spinning protons to realign with the external magnetic field, T2 measures the decay of magnetization perpendicular to the main magnetic field. 1.1.3 Magnetic Resonance Imaging Imaging aims to relate signal with the spatial localization of various factors. This is achieved by using a spatially variant magnetic field that leads to a spatially changing distribution of resonance frequencies wc(x) = y(Bo + G x), where G is the gradient of the magnetic field. Now the measured signal contains a spectrum of a rf pulse. (b) Processed net magnetization is detected by a nearby coil. Table 1.1: MRI properties of different tissues white matter grey matter water bone air tumor infarct TI bright grey dark dark dark dark dark T2 bright interim. bright dark dark bright bright received signals: S(t) JM(xi' C dx (1.2) where M(x) is the magnetization that is a combination of the spin density, longitu dinal relaxation, transversal relaxation and other factors like diffusion. M(x) can be computed by inverting the received signals using Fourier transformation: M(x) S(k)ikxdk (1.3) where k = is called the reciprocal space vector to make the Fourier transformation obvious. In practice, a particular slice in a 3D volume is selected using a magnetic field with changing magnitude along a certain direction. Different kinds of contrast images can be derived from the effective spin density. Table 1.1 shows the imaging properties of various normal and abnormal tissues in the brain [3], with the abbreviation interim. referring to an intermediate brightness value between dark and bright. (a) (b) Figure 1.4: Ellipsoid visualization of diffusion tensors. (a) Anisotropic diffusion ten sor. (b) Isotropic diffusion tensor. 1.2 Background of DTMRI 1.2.1 Diffusion Process Diffusion is a process of intermingling molecules as a result of random thermal agitation, and in our context, refers specifically to the random translational motion of water molecules in the part of the anatomy being imaged with MR. In three dimensional space, diffusivity can be described by a 3 x 3 matrix D called diffusion tensor that is intimately related to the geometry and organization of the microscopic environment. The diffusion tensor D can be easily visualized as ellipsoids shown in Fig. 1.4, where the axes of each ellipsoid correspond to the eigenvector directions of the diffu sion tensor and are scaled by the eigenvalues. The color of the ellipsoid ranging from blue to red shows the lowest to highest degree of anisotropy. For example, diffusion tensors in free water region are shown as large round blue ellipsoids that are just blue spheres. 1.2.2 Diffusion Weighted Images Diffusion weighted image SI and the diffusion tensor D are related through the St. iLi.1Tanner equation as given by [1] SI = Soe:D = Soe 1 E3_ (1.4) where b, is the diffusion weighting matrix of the 1th magnetic gradient, ":" denotes the generalized inner product for matrices. Equation (1.4) is a compact form suitable for mathematical analysis, some authors prefer to use the LeBihan's bfactor and the diffusion sensitizing gradient direction g = [g,, gy, g]T to reflect the imaging ]li, i.  as the follows: S, = SoebgTDg It is not hard to verify that bi,1 7,./2. b ,12  7,/, */ 7 1,13 / /, */ bl,21  7,/ /. *. l,22 7,./12 1,23 = 1 *I b1,31 */,. ,32 = 1, */ 1,33 = bg2 The popular phaseencoding method for acquiring DTMRI images yields com plex measurements; thus, SI and 5o will be complex variables and equation (1.4) still holds in such cases. In the following text, we assume SI = R, + ill and So = Ro + i1o are complex variables, where RI = real(Si), Ii = 11,"i ''1,1, i/(S), Ro = real(So) and Io = I"''',, '/(So). Taking the magnitude on both sides in equation (1.4) yields I bD _jSo b1 l,1ij Di (1.5) Taking log on both sides of equation (1.5) leads to the following linearized St. iL.ilTanner equation: log(S, ) = log(So 1) b, : D log(SO 1) b,,D (1.6) i=1 j=1 Figure 1.5 shows a slice of the magnitude, real part and imaginary part of a DWI. Note that in all of published literature to date, what have been considered is only the magnitude of the complex measurements and in particular the linearized equation (1.6). (a) Magnitude (b) Real part (c) Imaginary part Figure 1.5: A slice of DWI taken from a normal rat brain. Negative values in the real part and the imaginary part are shown in blue. 1.2.3 Application of DTMRI Given several noncollinear diffusion weighted intensity measurements, D can be estimated via multivariate regression models from equation (1.4) or its variants such as the linearized version in equation (1.6). To improve the accuracy, different b values are also eviil'rid to acquire the raw DWIs. Figure 1.6 shows the complex valued DWIs in different directions and with different b values, the changes in DWIs are clearly evident with different b values. Though the difference in DWIs is not as obvious with changing directions, it is still visible inside the corpus callosum where the diffusion tensor is anisotropic since the value of DWI at a pixel changes with directions only when the diffusion tensor at that location is anisotropic. A general principle is that water diffuses preferably along ordered tissues e.g., the brain white matter; thus, diffusion anisotropy can then be computed to show microstructural and .1li'1,i. .1. '4ical features of tissues [4]. Especially in highly organized nerve tissue, like white matter, diffusion tensor provides a complete characterization of the restricted motion of water through the tissue that can be used to infer fiber tracts. Another possible application of the diffusion tensor field is segmentation of anisotropic regions. Figure 1.6: A slice of DWI taken from a normal rat brain with various diffusion weighting factors. Left to right: Fixed direction, varying b values. Top to bottom: fixed b value, changing directions. DTMRI has recently became a powerful method to study the tissue microstruc ture e.g., white matter connectivity in the brain in vivo. To this end, Fig. 1.7 is an example depicting the fiber tract map which is traced using the diffusion tensor image inside the corpus callosum of a human brain and is visualized as colored streamtubes [5]. A general framework of DTMRI analysis is shown in Fig. 1.8. The first step of DTMRI analysis is the optimal acquisition of the diffusion weighted images and is of particular interest to imaging phi, ii , as well as analysists. The second step is the restoration of DTI and may involve both estimation and smoothing. Given the restored DTI, a large amount of published work in the literature has been focused on fiber tracking and visualization. However, with the rich information contained in DTI, it is also beneficial to segment some structures which might not be easy to achieve from the standard MRI images. This segmentation can also assist in focusing on a smaller region of interest (ROI) for fiber tracking, a useful task in many clinical Figure 1.7: Fiber tract maps computed from diffusion tensor imaging visualized as colored streamtubes. applications. Finally, it is crucial to validate the tracked fiber pathways with higher resolution images like fluro images of histology [6]. 1.3 Literature Review 1.3.1 Optimal b Values In DTMRI, the most important control parameters in data acquisition are the diffusion weighting factors (aka. bvalues) as shown in equation (1.4). By choosing these bvalues carefully, one can obtain high quality diffusion weighted images without additional cost. Da Xing et al. [7] proposed a scheme to find optimal bvalues for measurement of the apparent diffusion coefficient (ADC) based on the use of two diffusion weighted images acquired with b1 and b2. Armitage and Bastin [8] extends Xing's result to DTMRI. However, in [7] and [8], the reported methods use the linear equation (1.6). Though not aiming at optimizing b values, Pierpaoli and Basser [9], Basser and Pajevic [10], Anderson [11] and others have studied the effects of noise on RawdataS Subject Ivaia ion DT1 Retoraton Stream nes Visualization Computabon Segmentation Figure 1.8: A general framework of DTMRI analysis: from raw data to fiber tract map computation, visualization and segmentation. DTI estimation. Earlier works [9, 10] are based on numerical simulations while the latest work [11] aims to provide a theoretical analysis. Anderson [11] uses a power series of equation (1.5) to the first order to get a noise model for the linearized equation and then evaluates the perturbation of the estimated eigenvalues and eigenvectors. Just recently, BrihuegaMoreno et al. [12] aimed to optimize diffusion measurements by minimizing the CramerRao lower bound (CRLB), which is a function of the diffusion weighting factors. Their approach is independent of the estimation method; however, this poses a practical difficulty in choosing an estimator which can achieve the CramerRao lower bound. 1.3.2 DTI Restoration Currently there are two popular approaches, one involves smoothing the magni tude of the raw data IS while preserving relevant detail and then estimating the diffusion tensor D from the smoothed raw data [13, 14]. The raw data in this con text consist of several diffusion weighted images acquired for varying magnetic field strengths and directions. Note that at least seven values at each 3D grid point in the data domain are required to estimate the six unknowns in the 3x3 symmetric tensor D and one scale parameter Soll. The raw data smoothing or denoising can be formulated using variational principles which in turn require solution to PDEs or at times directly using PDEs which are not necessarily arrived at from variational principles [15, 16, 17, 18, 19, 20]. Another approach to restore the DTI is to smooth the dominant eigenvector (also refereed as principal diffusion direction) after the diffusion tensor has been estimated from the raw noisy measurements SS. In Poupon et al. [21], an energy function based on a Markovian model was used to regularize the noisy dominant eigenvector field computed directly from the noisy estimates of D obtained from the measurements IS using the linearized St. iL.ilTanner equation (1.6). Coulon et al. [22] proposed an iterative restoration scheme for principal diffusion direction based on direction map restoration work reported in Tang et al. [23]. Other sophisticated vector field restoration methods [24, 25, 26] can potentially be applied to the problem of restoring the dominant eigenvector fields computed from the noisy estimates of D. Weickert [27] used a nonlinear diffusion technique to restore a given tensor field. Their scheme was a generalization of the nonlinear anisotropic regularization of vector valued image to the matrixvalued image case. The regularization term there involved the trace of a penalized measure applied to the sum of the componentwise structure tensors of the given matrix. Recently, Chefd'Hotel et al. [28, 29] presented an elegant geometric solution to the problem of smoothing a noisy D that was computed from IS using the loglinearized model (1.6) described above. They assume that the given (computed) tensor field D from IS is positive definite and develop a clever approach based on differential geometry of manifolds to achieve constrained smoothing where the smoothed tensor field is constrained to be positive semidefinite. Interesting results of mapped fibers are shown for human brain MRI. The idea of simultaneous estimation and smoothing of the diffusion tensors from DWI was pioneered by our earlier work [30] and improved upon in [31]. This improve ment involved methods to overcome the problem of manual choice of regularization control parameters. In both these works [30, 31], the estimated smooth tensor field was guaranteed to be positive semidefinite. Moreover, these works were a report of the first use of the nonlinear St. i1k.liTanner equation in the estimation of the diffusion tensors. Recently, in Tschumperl6 and Deriche [32], a robust version of the linearized St. il.,lTanner equation is used as the data term along with a robust reg ularization term in a unified variational principle to estimate a smooth D from the noisy signal measurements. Note that the data term uses a linearized version of the St. i1k.liTanner equation as in earlier works [22, 28, 21, 14]. 1.3.3 DTI Segmentation One key factor in tensor field analysis is a proper choice of tensor distance that measures the similarity or dissimilarity between tensors and is particularly important in DTI segmentation and registration. In general, any kind of matrix norm can be used to induce a tensor distance. One such example is the tensor Euclidean distance obtained by using the Frobenius norm. Due to its simplicity, tensor Euclidean distance has been used extensively in tensor field restoration [28, 27]. Alexander et al. [33] considered a number of tensor similarity measures for matching diffusion tensor images and concluded empirically that the Euclidean difference measure yields the best results. Though not many sophisticated tensor distances have been proposed in tensor field analysis, there are quite a few in the machine learning literature. Miller and Chefd'hotel [34] proposed an interesting measure on transformation groups to design an invariant kernel for nonparametric density estimation. What is most closely related to our present work was proposed by Tsuda et al. [35]. They introduced information geometry in the space of positive definite matrices to derive a KullbackLeibler divergence for two matrices and then used it in an "em" algorithm (not the well known EM expectation maximization) to approximate an incomplete kernel. In the context of DTMRI segmentation, recently, Zhukov et al. [36] proposed a level set segmentation method which is in fact a segmentation of a scalar anisotropic measure of the diffusion tensor. The fact that a scalar field computed from the diffusion tensor field is used implies that they have ignored the direction information contained in the tensor field. Thus, this method will fail if two homogeneous regions of tensor field have the same anisotropy property but are oriented in a totally different direction! To the best of our knowledge, before the acceptance of our work in [37], the only published work which aims to segment matrixvalued images was reported in Feddern et al. [38]. In their work, Feddern et al. extended the concept of a structure tensor to a tensor field and then used it for extending the geodesic active contours to tensor fields. The stopping criterion in this case is chosen as a decreasing function of the trace of the sum of the structure tensors formed from individual elements of the given tensor field. This amounts to taking the Frobenius norm of the tensor field formed by the gradient magnitude of the individual channels of the given tensor field. This scheme is a gradient based active contour(snake) whose performance is lacking in absence of significant gradient in the measurements. Moreover, norm used there is not invariant to affine transformations of the input coordinates on which the original tensor field is defined. Affine invariance is a very desirable property for the segmentation scheme when dealing with data sets obtained from different hardware or from subjects exhibiting anatomical changes due to development of pathology etc. in medical imaging applications. In [37], we applied a region based model for tensor field segmentation by incorpo rating a tensor distance based on matrix Frobenius norm. Simultaneously, Rousson et al. [39] extended the classical surface evolution segmentation model by incorporat ing region statistics defined on the tensor fields for DTMRI segmentation. In both works [37, 39], the diffusion tensor is treated as a simple matrix and every component is treated equally. However, the diffusion tensor is actually the covariance matrix of a local diffusion process and different components have different importance/weight. In [40], we were the first to use this fact in tensor field segmentation. In particular, we proposed a novel tensor "iliI.m.. based on information theory and incorporated it in region based models for tensor field segmentation. Very recently, the concepts presented therein have been extended by Lenglet et al. [41], to the case of regions with piecewise constant nonunitvariances. They also use the Fisher information metric on the manifold of local Gaussians to achieve segmentation of the diffusion tensor field. 1.4 Contributions Statistical Analysis of a Nonlinear Estimator for ADC and Its Application to Optimizing b Values We develop a model for estimating the accuracy of a nonlinear estimator used in computing the apparent diffusivity coefficient (ADC) which provides useful information about the structure of tissue being imaged with diffusion weighted MR. Further, we study the statistical properties of the nonlinear estimator and use them to design optimal diffusion weighting factors. Specifically, we show that a weighted linear estimator can well approximate the nonlinear estima tor and thus can be used to analyze the statistical properties of the nonlinear estimator. Furthermore, to account for the fact that the ground truth of the ADC is a distribution instead of a single value, a weighted coefficient of variance (COV) is proposed as a criteria to be minimized for the determination of the optimal diffusion weighting factors. Our model has the potential to be extended to analyze the statistical properties of a nonlinear estimator for diffusion tensor and thus obtain the optimal b values for DTI acquisition. * A Constrained Variational Principle for DTI Restoration. We present a novel constrained variational principle for simultaneous smooth ing and estimation of the diffusion tensor field from complex valued diffusion weighted images (DWI). The constrained variational principle involves the min imization of a regularization term using LP smoothness norm, subject to a nonlinear inequality constraint on the data. The data term we employ is the original St. iL.ilTanner equation instead of the linearized version usually em p'.'v d in literature. We empirically show that the complex valued nonlinear form leads to a more accurate (when compared to the linearized version) es timate of the DTI. The inequality constraint requires that the nonlinear least squares data term be bounded from above by a known tolerance factor which can be computed from the data. Finally, in order to accommodate the positive definite constraint on the diffusion tensor, it is expressed in terms of Cholesky factors and estimated. The constrained variational principle is solved efficiently by using the augmented Lagrangian technique in conjunction with the limited memory quasiNewton method. * DTI Segmentation Using an Information Theoretic Tensor Dissimilarity Mea sure. We aim to segment the DTI using all the information contained in the tensors defining the field, not only the scalar anisotropic properties, but also the orientation information contained therein. The key step will be a novel definition of the matrix distance which can be used to measure heterogeneity of the diffusion tensor field quantitatively. We present a novel definition of tensor "ili.n.. grounded in concepts from information theory and incorporate it in the segmentation of tensorvalued images. In DTI, the symmetric positive def inite (SPD) diffusion tensor at each point can be interpreted as the covariance matrix of a local Gaussian distribution. Thus, a natural measure of dissimilar ity between diffusion tensors would be the KullbackLeibler(KL) divergence or its relative. In particular, we define a tensor "ili.i... as the square root of the Jdivergence (symmetrized KL) between two Gaussian distributions corre sponding to the tensors being compared. Our definition leads to novel closed form expressions for the "ilii.... itself and the mean value of a DTI. Unlike the traditional Frobenius normbased tensor distance, our "ili i... is affine invariant, a desirable property in many applications. We then incorporate this new tensor "ilii.i.. in a regionbased active contour model for DTI segmen tation, and the closed expressions we derived greatly helps the computation. To our knowledge, this is the first use of a regionbased active contour model for diffusion tensor field segmentation. CHAPTER 2 STATISTICAL ANALYSIS OF A NONLINEAR ESTIMATOR FOR ADC AND ITS APPLICATION TO OPTIMIZING b VALUES 2.1 Introduction Prior to the invention of DTMRI, apparent diffusion coefficient (ADC) has been used intensively as a contrast mechanism in clinical MRI. The principle underlying both ADC and DTMRI is the sensitivity of the magnetic resonance (lI R) signal to the random motion of water molecules. In the case of ADC, the diffusion weighted image (DWI) is also given by the St. iLi.1Tanner equation: S = SoebD (2.1) where D is the ADC, and So is the nonDW image. DWIs are often corrupted by noise, the noise model for complex valued data is [42] S = SoebD + n, + M (2.2) where n, and ni are uncorrelated Gaussian noise with zero mean and standard devia tion a. Given a set of DWIs acquired with different bvalues, ADC can be estimated via linear or nonlinear regression. Linear estimators of the ADC and the diffusion tensor have been well studied and also used in designing optimal b values. However, analysis regarding the nonlinear estimation has not been considered before. In this chapter, statistical analysis of a nonlinear estimator for ADC and its application to optimizing b values are carried out. The method used here also sheds some lights on how to analysis the nonlinear estimator for diffusion tensor and design a corresponding optimal acquisition method. 2.2 Theory 2.2.1 A Nonlinear Estimator for ADC Let So = Ro + ilo be the echo intensity without magnetic gradient, D be the ADC, Sk = Rk kilk be the measurement for the kth magnetic gradient, k = 1,..., K. Since Sk = SoebkD + n i + M as in equation (2.2), the sum of squared errors (SSE) that measures the goodness of estimation is E(So, D) [(Rk R bkD)2 + (ik 0bkD)2] (2.3) k Then we have a nonlinear estimate (Se, D*) that is the solution of a nonlinear opti mization problem, i.e. (St, D*) mn E(So, D) (2.4) SSo,D Though there is no analytical form for the nonlinear estimate (Se, D*), it can be efficiently solved numerically. LevenbergMarquardt (LM) method [43] is ideal for nonlinear least squares optimization problems like equation (2.4). Let R1 RoebiD RK RebKD F(So, D) II loeb1D IK 0ebKD Then E(So, D) = F(So, D)TF(So, D) The Jacobian matrix J(So, D) of F(So, D) is eb1D 0 b 1Roeb1D ebKD 0 b KRoebKD J(So, D) 0 e b1D b joe b1D 0 ebKD b K o0bKD Then the gradient G(So, D) of E(So, D) is given by G(So, D) = 2J(So, D)TF(So, D) If we denote the Hessian matrix of the kth component of F(So, D) as Hk(So, D), then the Hessian matrix H(So, D) of E(So, D) is K H(So, D) = 2J(So, D)TJ(So, D) + 2 Fk(So, D)Hk(So, D) k1 The essence of the LM method is to compute a search direction at iteration n accord ing to (JJ, + A ,I)d, = J,F, (2.5) where Ak > 0 is a control parameter. When Ak is zero, the search direction is com puted as the GaussNewton method. As As k goes to infinity, the search direction ap proaches the steepest descent direction. With proper choices of Ak, the LM method can perform like the gradient descent method when the current solution is far from the final solution and it can approximate the GaussNewton method when the current solution is close to the final solution. 2.2.2 Variance of the Nonlinear Estimate for ADC The nonlinear estimate given in (2.4) can only be computed numerically, thus it is impossible to analytically compute its statistical properties. We propose to use a weighted linear estimator to approximate the nonlinear estimator. Since the weighted linear estimator has a close form solution, we can analytically approximate the statistical behavior of the nonlinear estimator. The sum of weighted squared errors of the linearized form of the model in (2.1) is E( IS, D) = ?(log(S ) log(So,) + bD)2 (2.6) k where 11, = ,\,,' "911, k = 1,...,K are the weights, Dg is the ground truth. A corresponding weighted linear estimate is given by (I So I\*, D*) = min s, DE\(lI So\, D) (2.7) The solution (So*, D*) can be easily computed as In(So *) ] A [ Z lln( SI I ) (2.8) D In l (IS I I) where W 1 Yk i; Y: k ';k a\ A^ [ ZVib Z 07 Q It is pointed out in [42] that IS ~ ISollebkD + n when SNR > 3, then by using Taylor expansion on In(S, ), we have ln(S,\ ) n(llollebkD +n) In(lSebkD)+ (2.9) I ISO I ebkD thus oan( and we have S1e 0 f I bkJ It is easy to prove that equation (2.10) is the same as the CRLB given in [12] for two bvalues and we found empirically that this is true even for multiple bvalues more than two. However, whether equation (2.10) and the CRLB given in [12] are actually the same remains to be explored. Note that this weighted linear estimator is NOT a realistic estimator since the weighting are related to the ground truth, however, it serves well as a tool to analyze the nonlinear estimator which is validated by simulation experiments later. 2.2.3 Weighted COV of the Nonlinear Estimator for ADC Equation (2.10) shows the variance of the weighted linear estimator when the ground truth is a single ADC value. However, the ground truth Dg is usually a distribution instead of a single value as shown in Fig. 2.1. Thus we propose to use the following weighted COV of D*. COV,,(D*) f COV(D*D)p(D)dDg (2.11) where COV(D*Dg,) = Var(D*Dg)/Dg, p(Dg) is the probability density function of the ground truth. 0.12 Grey matter 0.1 White Matter S0.08 S0.06  0 2 0.04 0.02 0 0.5 1 1.5 2 2.5 Dg x103 Figure 2.1: Distribution of ADC values in the white matter and the gray matter of a normal rat brain. 2.2.4 Optimal Diffusion Weighting Factors The weighted COV of D* in equation (2.11) depends on the ground truth distri bution, total number of measurements and the choice of diffusion weighting factors. The ground truth distribution can not be controlled. Though increasing total number of measurements will decrease the weighted COV, this will also increase the acquisi tion time which is not desirable. When the total number of measurements is fixed, what one can do is to choose the diffusion weighting factors carefully to get a minimum weighted COV. 2.3 Results 2.3.1 Simulation Results of Different Estimators The simulation procedure that we developed consists of the following steps: 1. Choose a ground truth Dg and So. 2. Choose a set of diffusion weighting factors bl, ..., bN. 3. Generate n complex measurements using (2.2) with the selected set of bvalues. 4. Estimate D* using the nonlinear estimator in (2.4), the weighted linear estima tor in (2.7) and the linear estimator in Xing et al. [7] respectively. The last two steps are executed for several trials. The following settings were used in our simulation here: D, = 0.001mm2/s, So = 10 + lOi, bi = 100s/mm2, b2 500s/mm2 and b3  1OO0s/mm2, a = 0.5 or a = 1.0. The results of the simulation are shown in Fig. 2.2. It is evident that the nonlinear estimator yields the best results, the weighted linear least square estimator has similar outcomes like the nonlinear estimator while the linear least square model does not perform as good as the other two. This indicates that the nonlinear estimator should be used for better accuracy. 2.3.2 Variance of Different Estimates The variance of the nonlinear estimate can only be estimated by simulation. This simulation will be similar as described earlier. However, the total number of trial is set to be 2000 for a stable and accurate estimation of the variance. The variance of the linear (LSQR) estimate is given in Bito et al. [44] and the weighted linear estimate (weighted LSQR) can be computed from equation (2.10). The settings used here are: D, = 0.001mm2/s, So = 10 + 10i, bi = 0, bs = 2000s/mm2, b2 will vary from 0 to 3000s/mm2 and a = 0.05. The results of the computation are shown in Fig. 2.3. It is clear that the variance of the weighted linear estimate is a good approximation of the nonlinear estimate. Also we see that in most cases, the nonlinear estimate yields a smaller variance than the linear estimate. However, they happen to have the same variance when b2 = b1 or b2 = b3. x 103 Linear Complex Nonlinear Weighted Linear Complex Nonlinear 4 6 Trial 8 10 2 4 6 8 10 Trial x 103 2 4 6 8 10 2 4 6 8 10 Trial Trial Figure 2.2: Simulation results of different ADC estimates. Top: Simulation results when a = 0.5. Bottom: Simulation results when a = 1.0. x 103 2 x 103 i~J S1010 var(D*) versus b2 with C=0.05 LSQR Weighted LSQR x Complex NLSQR(Simulation) 2 1.5 0.5 0 500 1000 1500 2000 2500 3000 b2 Figure 2.3: Variance of different ADC estimates when a = 0.05. 2.3.3 Weighted COV and Optimal Diffusion Weighting Factors It is easy to compute the weighted COV in equation (2.11) by numerical inte gration given the ground truth ADC and the diffusion weighting factors and it is also possible to find optimal diffusion weighting factors numerically. Currently we use the mediumscale sequential quadratic programming (SQP) algorithm in Matlab as the numerical optimization procedure to achieve this. For our first experiment, we use the typical ADC range found in the human brain as given in BrihuegaMoreno et al. [12] and assume the distribution to be uniform. In this case, we can use the dimensionless 3 value which is defined as min(D,) b. The optimal 3values with multiple measurements up to five are summarized in table 2.1. Notice that our result is quite different from [12] and these differences are caused by the different precision in the numerical integration techniques used in the step to compute the weighted COV. Our integration has higher accuracy than the one 26 Table 2.1: Optimum 1value for typical ADC range of human brains with multiple measurements. m is the total number of measurements, the number of measurements at a particular 1value are shown in the bracket in the event that it is more than one and the unit of the D values is mm2 /s. D range is D range is [0.3 x 103, 3 x 103] [0.1 x 103, 3 x 103] m __ 12 13 2 0 0.210 3 0 0.223(2) 4 0 0.190(2) 0.380 5 0 0.201(3) 0.519 m 1 12 3 2 0 0.080 3 0 0.055 0.170 4 0 0.064(2) 0.276 5 0 0.070(3) 0.328 Table 2.2: Optimum diffusion weighting factors with multiple measurements for nor mal rat brains. m is the total number of measurements, the number of measurement at a particular bvalue is shown in the bracket in the event that it is more than one and the unit of the bvalues is s/mm2 Grey matter White matter m b6 b2 m b6 b2 2 0 2370 2 0 2830 3 0 2530(2) 3 0 3010(2) 4 0 2650(3) 4 0 3150(3) 5 0(2) 2460(4) 5 0 3250(4) used in [12] and hence yields higher accuracy results. For our second experiment, we use real distributions of the ADC values estimated from a normal rat brain. Histograms of the ADC values from different parts of a normal rat brain are used as these distributions. These are not uniform as evident from the Fig. 2.1. Results of weighted COV versus b2 and b3 using (0, b2, b3) as diffusion weighting factors are shown in Fig. 2.4. Furthermore, the optimal bvalues with multiple measurements of up to five are summarized in table 2.2. Notice that we get two different b values as the optimal diffusion weighting factors for these real distributions. S0.2 0 0 0 1000 2000 3000 4000 4000 A 1000 2000 3000 b3 b2 4 1000 2000 3000 2000 1000 b3 b2 Figure 2.4: Weighted COV of the nonlinear ADC estimate versus the diffusion weight factors. COVw(D*) of the gray matter (First row) and the white matter (Second row) versus b2 and b3 using (0, b2, bS) as diffusion weighting factors. 3000 4000 4000 CHAPTER 3 A CONSTRAINED VARIATIONAL PRINCIPLE FOR DIRECT ESTIMATION AND M\IOOTHING OF THE DTI FROM COMPLEX DWIS 3.1 Introduction We are the first to propose the idea of simultaneous estimation and smoothing of the diffusion tensors from DWI [30] and gave a significant improvement later on in [31]. We further extend our model [31] to restoration of DTI from complex valued diffusion weighted images. Specifically, we propose a novel formulation of the DTI estimation and smoothing as a constrained optimization problem. The specific approach we use is called the augmented Lagrangian technique which allows one to deal with inequality constraints. The in', 1'i/ of our formulation lies in the *1..:1:ii to directly, in a single step process, estimate a smooth D from the noisy complex measurements Si with the preservation of its positiveness. The formulation does not require any adhoc methods of setting 'iiiu,:'. parameters to achieve the solution. These are the key features .7/:liu,:',C;l.:::i,; our solution method from methods reported in literature to date. In contrast to our solution (to be described subsequently in detail), most of the earlier approaches used a two step method involving (i) computation of a D from S1 using a linear leastsquares approach and then (ii) computing a smoothed D via either smoothing of the eigenvalues and eigenvectors of D or using the matrix flows approach in Chefd'hotel et al. [28]. The problem with the two step approach to computing D is that the estimated D in the first step using the loglinearized model need not be positive definite or even semidefinite. Moreover, it is hard to trust the fidelity of the eigenvalues and vectors computed from such matrices even if they are to be smoothed subsequently prior to mapping out the nerve fiber tracts. Briefly, our model seeks to minimize a cost function involving, the sum of an LP norm based gradient of the Cholesky factor L which ensure the positiveness of D by the Cholesky factorization LLT and an LP norm based gradient of So, subject to a nonlinear data constraint based on the complex (not linearized) St. i1k.liTanner equation (1.4). The model is posed as a constrained variational principle which can be minimized by either discretizing the variational principle itself or the associated Euler Lagrange equation. We choose the former and use the augmented Lagrangian method together with the limited memory quasiNewton method to achieve the solution. This chapter is organized as follows: in section 3.2, the detailed variational for mulation is described along with the nonlinear data constraints, the positive definite constraint and the augmented Lagrangian solution. Section 3.3 contains the de tailed description of the discretization as well as the algorithmic description of the augmented Lagrangian framework. In section 3.4, we present experiments on appli cation of our model to synthetic as well as real data. Synthetic data experiments are conducted to present comparison of tensor field restoration results with a recently presented work of Coulon et al. [22]. Moreover, results of comparison between the use of the linearized St. i1k.liTanner model and the nonlinear form of the same are presented as well. 3.2 Constrained Variational Principle Formulation Our solution to the recovery of a piecewise smooth diffusion tensor field from the complex measurements Si is posed as a constrained variational principle. We seek to minimize a measure of lack of smoothness in So and the diffusion tensor D being estimated using an LP norm of the gradient in So and an LP norm of the gradient in the Cholesky factor L. This measure is then constrained by a nonlinear data fidelity term related to the complex St. i 1.i1Tanner equation (1.4). The nonlinear data term is constrained by an inequality which requires that it be bounded from above by a possibly known tolerance factor. The positiveness constraint on the diffusion tensor being estimated is achieved via the use of the Cholesky factorization theorem from computational linear algebra. The constrained variational principle is discretized and posed using the augmented Lagrangian technique [43]. Each subproblem in the augmented Lagrangian framework is then solved using the limited memory Quasi Newton scheme [43]. The novelty of our formulation lies in the unified framework for recovering and smoothing of the diffusion tensor field from the raw data SI. In addition, to our knowledge, this is the first formulation which allows for simultaneous estimation and smoothing of D as well as one in which the regularization parameter is not set in an adhoc way. Let So(x) = Ro(x) + ilo(x) be the complex DWI when no diffusionencoding magnetic gradient is present, D(x) be the unknown symmetric positive definite dif fusion tensor, LLT(x) be the Cholesky factorization of the diffusion tensor with L being a lower triangular matrix, S(x) = R (x) + il (x), 1 1, .., N are the complex DWIs measured after application of a diffusionencoded magnetic gradient of known strength and direction and N is the total number of measured DWIs The constrained variational principle is min S(So, L) [VRol' + VoP + VLP2] dx So,L I N s.t. C(So, L) a2 2 (F 2 + F12)dx > 0 (3.1) where Q is the image domain, the first and the second terms in the variational principle are LP smoothness norm on the real and imaginary part of So, the third term is an LP smoothness norm on L, where pi > 6/5 for Ro and 1o and r_ > 1 for L. IVL P2 = E VL. I!', where d ED = {xx, yy, zz, xy, yz, xz} are indices to the six nonzero components of L. The lower bounds on the value of p1 and p2 are chosen so as to make the proof of existence of a solution for this minimization (see theorem 3.2.4 and its proof) mathematically tractable. a is a constant scale factor and a2 is the noise standard deviation in the measurements SI. FRi and FI1 are defined as FRi = R1Roeb':LLT, F1 = I bi:LL 3.2.1 The Complex Nonlinear Data Constraint The St. iLi.Tanner equation (1.4) shows the relation between the complex dif fusion weighted image SI and the diffusion tensor D. However, multivariate linear regression based on (1.6) has been used to estimate the diffusion tensor D [1]. It was pointed out in [1] that these results agree with nonlinear regression based on the magnitude St. iLi.Tanner equation (1.5). However, if the signal to noise ratio (SNR) is low and the number of SI is not very large (unlike in [1] where N = 315 or N = 294), the result from multivariate linear regression will differ from the nonlin ear regression significantly. A robust estimator belonging to the Mestimator family was used by Poupon et al. [21], however, its performance is not discussed in detail. In Westin et al. [45]), an analytical solution is derived from (1.6) by using a dual tensor basis, however, it should be noted that this can only be used for computing the diffusion tensor D when there is no noise in the measurements SI or the SNR is extremely high. Our aim is to provide an accurate estimation of the diffusion tensor D for prac tical use, where the SNR may not be high and the total number of DWIs N is restricted to a moderate number. The nonlinear data fidelity term based on the com plex St. iLi.Tanner equation (1.4) is fully justified for use in such situations. This nonlinear data term is part of an inequality constraint that imposes an upper bound on the closeness of the measurements SI to the mathematical model Soeb,:LLT The bound aa2 may be estimated automatically [46, 47]. 3.2.2 LP Smoothness Norm There are numerous image smoothing techniques, however, not many of them are suitable for keeping sharp discontinuities. Partial Differential Equation based(PDE) methods are the few that can capture edges and in particular total variation (TV) norm based restoration [18] introduced by Rudin et al. is an excellent PDEbased edge preserving denoising method. TV restoration aims to minimizing the total variation functional TV(u) = Vu (3.2) subject to the noise constraint: \\Au < 2 (3.3) where z is a noisy scalar image defined on domain Q, u is the smoothed image, A is bounded linear operator and it is an identity for image denoising, and a quantifies the amount of Gaussian noise added to the observed image. TV based restoration can be extended to L" smoothness norm based method [48] by replacing the total variational functional with Lp(u) f= I Vu (3.4) where p > 1. When p = 1, equation (3.4) becomes total variation norm, when p = 2, equation (3.4) becomes H1 norm. In Blomgren et al. [48], it was shown that LP smoothness norm based restoration doesn't admit discontinuous solutions as the TV norm does when p > 1. However, when p is chosen close to 1, its behavior is close to the TVnorm for restoring edges. We adopt LP smoothness norm in our constrained model. In particular, we need pi > 6/5 for regularizing So and p2 > 1 for L to ensure existence of the solution described in 3.2.4 Note that what is of importance here is the estimation the diffusion tensor D and therefore, the edgepreserving property in the estimation process is more relevant for the case of D than for So. In our experiment, we choose pi = 1.205 for So and p_ = 1.00 for L. 3.2.3 The Positive Definite Constraint In general, a matrix A Ec nxn is said to be positive definite if xTAx > 0, for all x $ 0 in R". The diffusion tensor D happens to be a symmetric positive definite matrix but due to the noise in the data SI, it is hard to recover a D that retains this property unless one includes it explicitly as a constraint. One way to impose this constraint is using the Cholesky factorization theorem, which states that: If A is a *;lim,: 1, i positive /. fi':.l:l matrix then, there exists a unique factorization A = LLT where, L is a lower ti i.:ii.uir matrix with positive diagonal elements. After doing the Cholesky factorization, we have transferred the positive definiteness constraint on the matrix D to an inequality constraint on the diagonal elements of L. This is however still hard to satisfy theoretically because, the set on which the minimization takes place is an open set. However, in practise, with finite precision arithmetic, testing for a positive definiteness constraint is equivalent to testing for positive semi definiteness. This is because for any symmetric positive definite matrix D, its machine representation D = D + E with  E < e  D , where c is a small multiple of the machine precision. When D is a small symmetric positive definite matrix, D can become a semidefinite matrix, it follows that in finite precision arithmetic, testing for definiteness is equivalent to testing for semidefiniteness. Thus, we repose the positive definiteness constraint on the diffusion tensor matrix as, xTDx > 0 which is satisfied when D = LLT. 3.2.4 Existence of a Solution Justification for using the augmented Lagrangian method for constrained prob lems is given in Nocedal and Wright [43], thus we only need to prove there is a solution for the following subproblem: min (So, L; A, [t) (So,L)eA S(So, L) S(So,L) AC(So, L) + ,L if C(So, L) < pA LA2 otherwise where A > 0 is an estimate of the Lagrange multiplier, ft > 0 is a p. n.11,I' parameter and A = {(So, L) L c LP(Q) where p > 1 and Ro, lo c W1'P(Q) where p > 6/5}. Here Q C R3, LP(Q) denotes the space of functions with bounded LP norms, L2( ) is the space of square integrable functions on Q and W1'P(Q) denotes the Sobolev space of order p on Q [49]. Consider the augmented Lagrangian formulation (3.5) which serves as a subproblem of (3.1), the existence theorem will be stated and proved after the following two lemmas. If not pointed out, the definitions and theorems employed in the proof can be found in Evans [49]. Lemma 1 Let Ai = {(So, L)  (So, L) E A and C(So, L) < ipA, and suppose RI, I E L2(Q), then the following minimization problem (3.6) has a solution (So, L*) E A, if 1 min (So, L; A, p) = S(So, L) AC(So, L) C2(So, L) (3.6) (So,L)eAi 2[/ Proof 1 We will verify the following three statements one by one and then prove this lemma: The first term S(So, L) in (3.6) is lower semicontinuous with respect to L in LP(Q), p > 1 and weakly lower semicontinuous with respect to So in W1'P(Q), p > ql. The second term C(So, L) in (3.6) is continuous with respect to So in W"P'(Q) when p > 6/5, and is continuous w ith respect to L in LP'() when p geql. The third term C2(So, L) in (3.6) has the same continuity property as the second term. (3.5) As in (3.6) is lower bounded, there exists a d :."^, i sequence (S', L") for it, where L" E LP(Q), d E D, p > 1, R" and I" E W1'P(Q), p > 6/5 and C(S", L") < pX. Then {L }I, d E D is bounded in LP (Q), p > 1. {R }'1 and {fIW}T, are bounded in W1P(Q)). T.i. fore there is a subsequence {(ST", L" )}, L Ec LP (Q),p > 1,d E D and R, I* G W1'P(Q), p > 6/5 such that when nk oo, {L""} L, d E D in L1(Q) and a.e. on Q. (From the compactness property of LP(Q), p > 1). {R"} I R* and {IJ"} 1 I1 in W1P(Q)). (From the weak compactness of W1'P) {R"} I R* and {1J"} 1f in L2(Q) and a.e on Q. (From RellichKondrachov Compactness TbI .,. : when p > 6/5 and n = 3, this is why we need p > 6/5!). 1) Now we are ready to demonstrate the lower semicontinuity of the first term S(So, L) in (3.6). By the lower semicontinuity of LP norm in LP(Q), p > 1, we have IVL*P2dx 1VL7 dx ddD By the lower semicontinuity of LP norm in W1'P(Q), we have SIVR"P'dx < liminf IVRf I P'dx f VI "'dx < lim inf VIk I'odx In "n0" In C(S*, L*) [fVR*(x)Pl + VI7*(x)p\ + VL*(x)P21 dX C(So L*) lim C(S",k L k) Ufk O Since C(S*, L*) N IS,9 U f2 l 1 I SI S* bl:L*L*T 2dX S ebl:L*L"'k 12dX, we only need to show I N lim IS, rn~o S1 S"kebl:L Lk(L'k)T 2dx 0ncb.L^T This will be proved in several stages (a) Let FR* = R Reb:L*L* and FRi, I [FR, nk Jo RI Re b:Llk(L^k)T, then FR*]2 dx R^' bl:L'k(L'k)T  In 1 R6ebl:L*L*T 2 dX  b:L*L*T)]2 d 2 j k ) bl:L*L*T 2 [(R n R*c I 2A +2B Since IRO zn ip() is ';i.f*'1 : i.i' bounded in nk, by the Sobolev embedding theorem, for R3, we have R" l(2 < c61/ :k,, LP(n ) < C Noting the fact that Lnk has a strong convergence towards L* in L'(), we have from the Dominant Convergence T1i i.1 1,,n [49/ mli b:L'k(Ln)T b:L*L*T I2Q Thus 2) Next we claim (3.7) (3.8) (3.9) Thus we have A Rj [LRFk(ebl:L k(Lgk) < I R  l1(22 e bl:L1k!(Lk)T From the strong convergence of R"" to R* B < IR R2dX Now as A  0 and B + 0, we have lim \\FR; FR,n 112(o) (b) We now prove T e bl:L*L*T)] 2 Sb:L*L*T 2 0 in L2(Q) and eb:L*L*T < 1, we have Ro H RIl L2(n) 0O lim j [FR FRTk]2 dx= 0 nk^00 JI (3.10) FR2dX( lim JFR ndx (3.11) First of all, if f E L2(Q) and g E L2(Q), then we have f g L2(Q) and f +g L2(Q). Further we have 22 )9 f22dX 2(X Lf2(o) g L2(Q) jf9dx jg"dx S/(f2 g2)X< f 2 g2 dx / f f gldx< I It is easy to verify that FR* E L2(Q), FR, E L2(Q) and are ';ii:.I' i,/ bounded in L2(Q). Thus ii,.'1';,.i/ the above equation, we have j FR2d jFRX XL2 = () L FRink I L2( (c) Similarly as previous step, we can prove IL Ie b:L*L*T dx = lim jt IJfebj:L'k(L)T 2 dx (3.13) Combining with (b), it is easy to verify equation (3.9). 3) Now we will show that C(S, L*)2 = lim C(S"", Ln")2 (3.14) nk00 The above can be I..;1 verified since C(S*, L*), C(ST", L"n) are bounded and C(S, L*) lim C(ST", L" ) (3.15) nk00 Finally, we have from 1), 2) and 3) L/(S*, L*; A, p)) TI/, i fore, (So, L*) is a minimizer of (So, L; A, [t) as .7. f.1. in (3.6). Lemma 2 Let A2 = {(So, L) (So, L) E A and C(So, L) > pXA}, and suppose RI, I E L2(Q), then the following minimization problem (3.17) has a solution (S, L*) E A2 if A2 0 min (So, L; A, [t) S(So, L) A2 (3.17) (So,L)eA2 2 The proof is similar as in the lemma 1. Theorem 1 Suppose R1I,I E L2(), then the augmented Lagrangian formulation (3.5) has a solution (S*, L*) E A. Proof 2 It is easy to see A / 0, as a matter of fact, constant functions will be members of A. Thus, there will be three cases: 1. Ai 0 and A2 = 2. A2 7 0 and A, 0, 3. Ai / 0 and A2 0. Here we provide a proof for case 3, case 1 and 2 are trivial to prove. Let (Sf, L1) be the solution for (3.6) and (S2, L2) be the solution for (3.17), it is not hard to see that the solution of (3.5) is (ST, L*) (S, L) if (So, LV; A, p) < (S2, L2; , (S02, L2) otherwise Finding a solution of the constrained variation principle (3.1) involves solving a sequence of (3.5) with fixed A and ft at each stage. It is much more difficult than when dealing with the problems of recovering and smoothing separately. However, there are benefits of posing the problem in this constrained unified framework, namely, one does not accumulate the errors from a two stage process. Moreover, this framework incorporates the nonlinear data term which is more appropriate for low SNR values prevalent when the magnitude of the diffusionencoded magnetic gradient is high. Also, the noise model is correct for the nonlinear complex data model unlike the loglinearized case. Lastly, in the constrained formulation, it is now possible to pose mathematical questions of existence and uniqueness of the solution which was not possible in earlier formulations reported in literature. 3.3 Numerical Methods The minimization problem given by (3.1) can only be solved numerically. Here, we discretize the constrained variational principle (3.1), transform it into a sequence of unconstrained problems by using the augmented Lagrangian method and then iu]'' the limited QuasiNewton technique [43] to solve them. Note that this framework allows us to solve the minimization without resorting to adhoc methods of choos ing the "tuning" parameters. Also limited memory QuasiNewton is the method of choice here due to the advantages it affords in the context of memory/storage savings. Proper line search technique is employed once the search direction is found by using limited memory QuasiNewton method. 3.3.1 Discretized Constrained Variational Principle We use the standard finite difference method to discretize the problem. Let FR1,ijk Rl,ijk 0,jke 1. L LT FIl,ijk=I1,ijk 0,ijke1. L L Fi,ijk=FRijk + iFljijk VRo ijk [ (ARo2 + (A o)2 + (A+R)2 ijk VIo ijk [(A0)2 +(A+10)2 +(AJ+10)2 j LVO Ik J Y+ ijk VLd ijk= (AL)2 + (A Ld)2 + (A,+Ld)2 ijk VLIA k VL,1'1A dcED where A+, A' and A+ are forward difference operators, c is a small positive num ber used to avoid singularities of the LP norm when p < 2. Now the discretized constrained variational principle can be written as min(So, L) = (VRr, + VI, + VLk) So,L i,j,k N s.t. C(So, L) =(T2 F , >0 (3.18) i,j,k 1= 1 3.3.2 Augmented Lagrangian Method The above problem is now posed using the augmented Lagrangian method, where a sequence of related unconstrained subproblems is solved, and the limit of these solutions is the solution to (3.18). Following the description in Sijbers [43], the kth subproblem of (3.18) is given by min (So, L, s; Ak, [Ik) = S(So, L) Ak [C(So, L) s] + [C(So, L) s]2 (3.19) where s > 0 is a slack variable, Mtk, Ak are the barrier parameter and the Lagrange multiplier estimate for the kth subproblem respectively. One can explicitly compute the slack variable s at the minimum as s = max (C(So, L) ukAk, 0) and substitute it in (3.19) to get an equivalent subproblem in (So, L) given by min (So, L; Ak, [Ik) (3.20) S(So, L) AkC(So, L) + SL if C(So, L) < kAk S(So, L) Ai otherwise The following algorithm summarizes the procedure to find the solution for (3.18): Algorithm 1 Augmented Lagrangian Algorithm 1: Initialize So(0), L(0) using the nonlinear regression, choose an initial [lo and Ao. 2: for k = 1, 2, ... 3: Find an approximate minimizer So(k), L(k) of (, .; Ak, Ilk) as in (3.20) starting with So(k 1), L(k 1); 4: If final convergence test is satisfied 5: STOP with an approximate solution So(k), L(k); 6: Update the Lagrange multiplier using Ak+1 = max(Ak C(So, L)/ kk, 0); 7: Choose a new penalty parameter kk+1 = [tk /2; 8: Set the new starting point for the next iteration to So(k), L(k); 9: end(for) 3.3.3 Limited Memory QuasiNewton Method Due to the large number of unknown variables in the minimization, we solve the subproblem using limited memory QuasiNewton technique. Hessian matrix at each iteration of the optimization by using only the first derivative information. In the LimitedMemory BroydenFletcherGoldfarbShano (BFGS) method, search direction is computed without storing the approximated Hessian matrix which can be a very large matrix in general (O(N6) size for O(N3) unknowns). Let x = (So, L) be the vector of variables, and f(x) = (So, L; A, [p) denote the augmented Lagrangian function (3.20) to be minimized. For simplicity, we write f(x) = (So, L) by omitting the fixed parameter A and ft in the following description. At kth iteration, let Sk = Xk+i Xk be the update of the variable vector x, yk = Vfk+i Vfk the update of the gradient and Hk1 the approximation of the inverse of the Hessian. The inverse of the approximate Hessian can be approximated using the BFGS updating formula: T Hk, = VkHe VT + SkSk (3.21) Yk Sk where Vk I ykskT Yk1Sk Then we can use the following LBFGS twoloop recursion iterative procedure, which computes the search direction Hk1Vfk efficiently by using last m pairs of (Sk, Yk) [43]. Algorithm 2 Search Direction Update Algorithm 1: q Vfk; 2: for i = k 1,k 2,...,k m 3: ai < pisq; 4: q q aiyi; 5: end(for) 6: r (H,)1q; 7: for i k m,k m 1,...,k 1 8: 3 < pyf r; 9: r r + si(Qa ) 10: end(for) 11: stop with result Hk1Vf = r where pk =  and (Ho)1 is the initial approximation of the inverse of the k Sk T Hessian, we set (Ho)1 = 7kI where 7_k s _yk_1 k II Ykyk1 The gradient of our energy function is a W(So,L) a I(So,L) aI (So,L) aI (So,L) fORo 1 o O L,, OLYY ( (So,L) ( (So,L) L (So,L) (So, L) L,, L,, 9Ly L Lyz where j/k V i... i'j' k 2 i'j'k' .1, 8VI.. i'j'k' O8Ld,ijk V I 'jk' + 2(A / OdLd,ijk i'j'kl if C(So, L) N C(SoL) )FR FRi.j I 1 (1 if C(So, L)  C(So,L) ZF aFh ,ijk lijk '1. l= 1  PkAk > 0 otherwise LtkAk > 0 otherwise if C(So, L) PkAk > 0 C(SoL) F i ,ijk ( RFiL,ijk otherwise d = xx, yy, zz, xy, yz, xz (3.23) Here ,i,7'k is computed over a neighborhood of the voxel (i, j, k) where the forward differences involves the variables Ro,ijk, I0,ijk or Ld,ijk. Each term in equation (3.23) can be computed analytically, for example, aFR1,ijk Laxx,ijk aF11,ijk 89Lxxijk 9b : LijkL k OLxxjk jk:LjkLTk Ob : LijkL T Lijk 0,ijkeb:LijkLk Ob : LijkLk jkC i Lx,ijk (3.22) aL(so, L) 9Ro,ijk aL(so, L) 0Io,ijk ac(So, L) OLd,ijk (2btxxLxx,ijk + 2b ,,yLxy,ijk + 2b,,zLaxz,ijk) 3.3.4 Line Search The limited Quasinewton method computes a search direction in which the minimum or maximum is estimated to lie and yield a one dimension optimization problem as mm f (x+ a d) (3.24) a where d is computed by algorithm 2 described in the previous subsection. Equation (3.24) needs to be solved by a search procedure where the solution is find by the following update iteration: Xk+1 = x k + ak d There are many line search techniques can can be roughly categorized as line search without derivatives like Fibonacci method and line search with derivatives like poly nomial method. As we have an analytical form of derivative, we use a cubic interpo lation [43] that is generally the most efficient for continuous functions like our energy function. 3.3.5 Implementation Issues The constraint in (3.18) is directly related to the standard deviation a of the noise which can be computed as in Seber [47]. Since there are N complex measurements and 8 unknown parameters at each voxel (note: So is complexvalued, so it is treated as 2 unknowns), we have >77 E I Sl(x) S(x)eb:L(x)L(x)T 2 2N 8 Initialization is very crucial for nonlinear optimization. In our case, we use the fol lowing nonlinear regression with positive definiteness constraint as the initial guess: N mMin (So(x), L(x)) 1 S(x) So(x)eb:L(x)L(x)T 2 (3.25) I=1 The above minimization is a simple nonlinear least square problem and can be efficiently solved by the LevenbergMarquardt method [43] using the results of the corresponding linearized least square problem as initial guess. There are a few practical issues in implementing the augmented Lagrangian method and the QuasiNewton method, these are settled by using the II' 1 ions in Nocedal and Wright [43] or empirically. For example, in the augmented Lagrangian method (see algorithm 1), we start with a penalty control parameter ft = 5.0, decrease it by a factor of 2 in each step until it is less than 0.01. We also choose Ao = 1.0. Note that the augmented Lagrangian method is quite robust with respect to the choice of plo and Ao since plo will eventually decrease to 0 and Ao approaches the Lagrange multiplier. The final convergence test has two criteria: The subproblem converges and ft < 0.01. As the subproblem is just a standard unconstrained minimization problem, the criteria to check whether it converges or not is achieved using any of the standard criteria in iterative optimization schemes [43] and for the line search, we employ cubic interpolation and Wolfe convergence criterion, see Nocedal and Wright [43] for more details. For the limited memory QuasiNewton method, we use the last 5 update pairs to update the search direction. 3.4 Experimental Results In this section, we present three sets of experiments on the application of our direct tensor smoothing and estimation model. One is on complex valued synthetic data sets and the other two are on a complex valued DWI data acquired from a normal rat brain. For the synthetic data example, we compare the results obtained from our estimation procedure with competing methods published in literature. 3.4.1 Complex Valued Synthetic Data We synthesized an anisotropic tensor field on a 3D lattice of size 32 x 32 x 8. The volume consists of two homogeneous regions with the following values for So and Regional: So = 10.0eio D = 0.001 x [0.970 1.751 0.842 0.0 0.0 0.0], Region2: So = 8.0eio D = 0.001 x [1.556 1.165 0.842 0.338 0.0 0.0] where the tensor D is depicted as D d )yy), d... diy, dz dyz the dominant eigenvector (DEV) of the first region is along the y axis, while that of the second region is in the xy plane and inclined at 600 to the y axis. 0 is chosen to be 450 to yield an even distribution of the real and the imaginary part. The complex diffusion weighted images SI are generated using the St. l;.]l Tanner equation at each voxel x given by Sj(x) So(x)ebl:D(x) + n(x), (3.26) where n(x) ~ N(0, aN) + iN(0, aTN), N(0, (aN) is a zero mean Gaussian noise with standard deviation aN. As the signal measured before Fourier transform in MRI is complex, it is reasonable to assume the noise is an additive complex Gaussian noise. The noise in the DWIs remains to be complex valued after the Fourier Transform. Thus our simulated data reflects the pir, i. , of MRI imaging. Note that the noise in the magnitude of the complex DWIs have a Rician distribution and approximates a Gaussian distribution when the SNR is high [42]. We choose the 7 commonly used configurations x, y, z, xy, xz, yz, xyz for the directions of the diffusionencoded magnetic gradient as in Basser et al. [1] and use 3 different field strengths in each direction (100, 500 and 10OOs/mm2). Thus we have a total of 21 different b matrix. A slice of the generated data set is shown in Fig. 3.1, note that the DWIs are different when either the directions or the magnitudes of diffusionencoded magnetic gradient are different. For better illustration of the superior performance of our model, we compare performance with the following methods in our experiments: (i) Linear linear regression on (1.6) as used in [1], (ii) Nonlinear nonlinear regression applied to (1.4), (iii) Linear + EVS (Eigenvector smoothing) linear regression followed by the DEV smoothing method described in Coulon et al. [22], (iv) Nonlinear + EVS nonlinear regression plus the smoothing as in (iii), and (v) OursOur method. Note that the EVS method [22] is a direction field restoration scheme that preserves discontinuities and is based on Chan and Shen's work [50]. Figure 3.2 shows an ellipsoid visualization of the restored diffusion tensor field for the synthetic data set with oN = 0.5. It is evident that our method restored the noisy tensor field quite well in comparison to the nonlinear regression method which did not perform well and the linear regression technique which performed worst. For further comparison, Fig. 3.3 shows the DEV computed from the original and the restored diffusion tensor field using all five methods as mentioned before. This figure clearly shows that our model yielded the best estimation of the original DEV field. The method in Coulon et al. [22], however, did not work well at voxels where the estimated DEVs are almost orthogonal to those in their neighborhoods. In such cases, Coulon et al.'s method will treat them as discontinuities and does not smooth them. Though it is possible to treat these locations as outliers, it is difficult to set a reasonable criteria to achieve the same. It is interesting to notice that the Nonlinear+EVS method which serves as an improvement of the existing Linear+EVS method can diminish this problem. Additional quantitative measures, described below, confirm the visual comparison results. To quantitatively assess the proposed model, we compare the accuracy of the DEV computed using the previously mentioned methods. Let 0 be the angle (in 48 Table 3.1: Comparison of the accuracy of the estimated DEVs using different methods for different noise levels. an = 0.5 Linear Nonlinear Linear+EVS Nonlinear+EVS Ours [to 9.57 7.44 1.63 1.19 0.80 (as 6.93 5.08 1.57 0.84 0.96 an 1.0 Linear Nonlinear Linear+EVS Nonlinear+EVS Ours [to 22.28 16.94 6.67 3.78 1.99 as 17.46 13.86 13.06 8.58 2.74 an = 1.5 Linear Nonlinear Linear+EVS Nonlinear+EVS Ours toL 33.14 26.40 14.55 9.09 4.08 as 22.60 20.19 22.39 17.46 4.70 degrees) between the estimated DEV and the original DEV, Table 3.1 shows the mean [o and standard deviation as of 0 using different methods for the synthetic data with different levels of additive Gaussian noises. A better method is one that yields smaller values. From this table, we can see that our model yields lower error values than all other methods under various noise levels. It is also clear from this table that the methods using the original nonlinear complex St. iLi. Tanner equation (1.4) are more accurate than those using the linearized one (1.6). The advantage of our method and the nonlinear approaches are more apparent when the noise level is higher, which supports our discussion in section 3.2.1. 3.4.2 Complex DWI of a Normal Rat Brain The normal rat brain data is imaged using a 17.6T (750MHz) Bruker Avance Imaging Spectrometer system with the following settings: TR = 3058ms, TE = 28.8ms, A = 17.8ms, 6 = 2.4ms, diffusion time = 17.0ms, bandwidth = 40kHz. The field of view is 15 x 15 x 21mm3 with a resolution of 117 x 117 x 270um3. The same set of 7 diffusionencoded magnetic directions as the synthetic data are used with two different magnitudes (100, 500s/mm2). With a number of averages equal to 8 for each signal measurement S1, the raw data is a set of 14 complex DWI volume data, each with a size of 128 x 128 x 78. We extract a 128 x 128 x 10 volume in the region of the corpus callosum for our first experiment. Figure 3.4.2 depicts the restored images of the six independent components of the estimated diffusion tensor. As a comparison, Figs. 3.4 and 3.5 show the same images computed using linear regression applied to (1.6) and the nonlinear regression applied to (1.4) from the raw data respectively. For display purposes, we use the same brightness and contrast enhancement for displaying the corresponding images in all the three figures. We also present the computed DEV of the estimated diffusion tensor in Fig. 3.7. We didn't compare with the EVS methods because the sorting problem of the eigenvectors is very severe in free water or other isotropic regions, thus it is necessary to exclude those regions to make effective usage of EVS methods. This involves a segmentation issue which is nontrivial. We then extracted a 10 x 127 x 78 volume in the region of the cerebellum and show the sagittal view of the results in Fig. 3.8. The brightness and contrast enhancement of the figures are the same as in the previous experiment. In Figs. 3.4.2 and 3.8, the edge preserving smoothing is evident especially in the off diagonal terms of the diffusion tensor D which are essential in evaluating the structural anisotropy. We also notice that there are some differences in the region of free water between Figs. 3.4 and 3.5 visible in the offdiagonal terms while there is no difference visible inside the corpus callosum between these two figures. However, Fig. 3.7 gives more insight into this via the depiction of the computed DEVs. Note that smoothing effect on DEV is evident in the shadowed box in Fig. 3.7. Our model for estimating a smooth diffusion tensor field from the noisy DWIs may be used to achieve better accuracy in fiber tractography. Our future efforts will focus on applying the model described in this paper to achieve better fiber tract maps. Figure 3.1: A slice of several volumetric complex DWIs generated with UN = 0.5. Left to right: real and imaginary pairs of complex DWIs with varying magnitude of diffusion encoded magnetic gradient. Top to bottom: complex DWIs for varying directions of diffusion encoded magnetic gradient. poee poee sees sees 0see 'see 0000 '000 (a) Original &V fOe ApIp 0 A *eee areA 00 1t 0e~ 5eD4 0ee' %*flm *ee*e (b) Linear eggs ease eggs ease dp~vopdv &V tvdp d dp 4p p 4 4p ~ Ap 40 0 dp V 4 0000r (c) Nonlinear Figure 3.2: A slice of the original (ground fields for the noisy synthetic data with (oN (d) Our Model truth) and the estimated diffusion tensor = 0.5. /777 I\i! /; I /K \U\I// I I / linear+EVS,/'/ nolna+V nd olur model. m ^ //'/ \/\/ // I// \I/=\ \ \ I \ //^/ //*^//// W / \] /1J , //, / / / // i*// j; iilI /\ / /u \\I/ "\ / // 4 / / \ \. '\ / \ I/I\l // //I I 'it J ////// > /XX; I ,//\ /\ \/^ .~... / ^ / ) // ,. / / i i /1 1\i // \/ S/ i i ./II / / / I i/ / /* *i/ /  /*/^ ///.// / / / Figure 3.3: A slice of the computed DEV field for the noisy synthetic data with aN = 1.5. Top left image is the DEV field computed from the original tensor field, and the other images arranged from left to right, top to bottom are the DEV field computed from estimated tensor field using the following methods: linear, nonlinear, linear EVS, nonlinear EVS and our model. Figure 3.4: A slice of the normal rat brain DTI estimated using multivariate linear regression without ii.."llii':. viewed channel by channel. First row, left to right: D,,, D,, and D1,. Second row, left to right: So, Dy and Dyz. Last row, left to right: FA, < D > and D,,. Figure 3.5: A slice of the normal rat brain DTI estimated using multivariate nonlinear regression without ii.....llii. viewed channel by channel. Arrangement of the figures are the same as in Fig. 3.4. Figure 3.6: A slice of the normal rat brain DTI estimated using using our proposed method, viewed channel by channel. Arrangement of the figures are the same as in Fig. 3.4. Figure 3.7: A slice of the computed DEV field from a normal rat brain. Top to bottom: Linear, nonlinear regression and our model. Left column: Region of interest for depicting the DEV indicated by the white box superposed on the D,, image. Right column: The computed DEV field inside the white rectangle on the left and the smoothing effect of our model is clearly visible in the shaded box. (a) (b) Figure 3.8: A slice of the normal rat brain DTIs in the region of the cerebellum viewed channel by channel. The DTIs are estimated using (a) multivariate nonlinear regression without smoothing and (b) our proposed method (bottom row). Both (a) and (b) are sagittal views and are arranged as in Fig. 3.4. CHAPTER 4 DTI SEGMENTATION USING AN INFORMATION THEORETIC TENSOR DISSIMILARITY MEASURE 4.1 Introduction We tackle the DTI segmentation problem using regionbased active contour model by incorporating an information theoretic tensor dissimilarity measure. Ge ometric active contour models have long been used in scalar and vector image seg mentation [51, 52, 53]. Our work is based on the active contour models derived from the MumfordShah functional [54], and can be viewed as a .;f,/?.,:i extension of the work on active contours without edges, by Chan and Vese [55] and the work on curve evolution implementation of the MumfordShah functional by Tsai et al. [53] to diffusion tensor images. Our key contributions are, (i) the .7.1 [hirl of a new discriminant of tensors based on information theory, (ii) a theorem and its proof that allows for the computation of the mean value of an SPD tensor field required in the active contour model in closed form and facilitates the i FT, .* :,t segmentation of the DTI, (iii) extension of the popular regionbased active contour model to handle matrixvalued images, e.g., DTI. Rest of this chapter is organized as follows: in section 4.2, the new definition of tensor distance is introduced and its properties are discussed in detail. In section 4.3, active contour model for image segmentation is reviewed. Then in section 4.4, the piecewise constant regionbased active contour models for DTI segmentation is discussed. We describe the associated EulerLagrange equation, the curve evolution equation, the level set formulation and the implementation details. Similarly in sec tion 4.5, we discuss the piecewise smooth regionbased active contour models for DTI segmentation. Finally in section 4.6, we present experiments on application of our model to synthetic tensor fields as well as real DTIs. 4.2 A New Tensor Distance and Its Properties 4.2.1 Definitions In the context of DTMRI, diffusion of water molecules may be characterized by a 2tensor D which is symmetric positive definite. This D is related to the displacement r of water molecules at each lattice point in the volumetric data at time t via 1 TD 'r p(r t, D) 2 4t (4.1) p(2x) 2tD Note that the mean of r is S(r) = 0 and the covariance matrix of r is D(r) = 2tD [47]. Thus, it is natural to use the distance measure between Gaussian distributions to induce a distance between these tensors. The most frequently used information theoretic "ilii...i measure is the KullbackLeibler divergence defined as KL(1,) = p(x)log X) dx (4.2) f iq(x) for two given densities p(x) and q(x). KL divergence is not symmetric and a popular way to symmetrize it is J(p, q)= (KL(.,) +KL((I +)) (4.3) 2 which is called the Jdivergence. We propose a novel definition of tensor "li. .. I " as the square root of the Jdivergence, i.e. d(TI, T2) t, TI),p(rt, T2)) (4.4) It is known that twice the KL divergence and thus twice the Jdivergence is the squared distance of two infinitesimally nearby points on a Riemannian manifold of parameterized distributions [56]. Thus taking the square root in equation (4.4) is justified. Furthermore, equation (4.4) turns out to have a very simple form given by d(TI, T2) trT1T2 T2) 2n (4.5) where tr() is the matrix trace operator, n is the size of the square matrix T1 and T2 Derivation 1 : For Gaussian distributions p(rt, TI) andp(rIt, TI) given as in equa tion (4.1), we have KL (p(rt, Ti) 1/(rt, T2)) Sp(rt, Ti)log (r.t, TI) dr p(r t, T2) Ip(rlt, Ti)log rcTVIr r"T dr II Sp(r t, TI) 1 T2 2 IT1 j 1 T21 log 2 IT1 1 T2 log 2 IT1 1 T2  log 2 T1 21 T2 p(rt, Ti)dr rT T1 r rT 1r dr 4t 4t  fp(r t, T.i) T rdr+ 4t p(rt, T) 2 dr 4t S(r r) S(r r) 4t 4t tr [7i (2tTl)] tr [4 (2tT)] 4I V4I I In tr( ) 2 (4.6) Switching the role of p(r t, TI) and p(r t, T2) in the above steps leads to KL (p(r t, T2) (r tT1))= log IT n + tr(T11T 2 1 T2 (4.7) Str( ) 2 n+tr(T21TI) Combined with the definition of Jdivergence, we have J(p(r t, Ti), p(r t, T2)) 1 [KL(p(rlt, Ti) /.(rt, T2))+ KL(p(rlt, T2) /(r t, Ti))1 2 [log 1 n +tr(T TI)] + [log IT n + tr(T T2) 4 IT 4 IT2l 1 [tr(T(T 2+ T2T1) 2n] (4.8) Thus we have equation (4.5). U Now we give an example of the above "iii.i.. When T1 and T2 can both be diagonalized using the same orthogonal matrix 0, which means T1 = ODOT and T2 = OEOT, where D = diag{di, ..., d,} and E = diag {e, ..., e,}. Then we have tr(T, tr(TT l) i= 1 i= 1 Thus d(TT2) 2n ,i + 2) i= 1 i= 1 i= 1 1 K d e 2d e 1 (d e)2 ( 2 ) 2 C 2 (4 .9 ) 2 Cidi 2 Cidi 4.2.2 Affine Invariant Property When a coordinate system undergoes an affine transformation, the DTI will also be transformed. If the coordinate system undergoes an affine transform y = Ax + b, then the displacement of the water molecules will be transformed as r = Ar. Since r has a Gaussian distribution with covariance matrix 2tT, the transformed displacement r has a covariance matrix of 2tATAT. Thus, the transformed DTI will be T(y) AT(x)AT, y Ax + b (4.10) Figure 4.1: Transformation of tensor fields. Left: Original tensor field. Right: Trans formed tensor field. Figure 4.1 shows an example of a 32 x 32 2D diffusion tensor field where the affine transformation is 0.8 0.0 2.0 A= b 0.0 0.8 2.0 Our definition of tensor "1:i.1.. 11 is invariant to such transformations, i.e. d(TI, T2) d(ATIAT, AT2A' (4.11) Though the transformation of the diffusion tensf r is actually a congruent transfor mation, however, we still refer the above invariance as *.ilhw invariance because in the context of segmentation the invariance property has been linked with the trans formation of the coordinate system. formation of the coordinate system. Derivation 2 : First of all, we have tr [(ATlAT) (AT2AT)] Str(A T T A AT2AT) Str(A TT T2A T) Str(T T2ATA T) tr(T 1T2) (4.12) Similarly we get tr [(AT2AT)1(ATA T)] = tr(T21T1), thus we have d(AT1AT, AT2AT) I tr [(ATlA T)1(AT2A T) + (AT2AT) 1(ATIAT)] 2n 'tr(T'T2 T21T1) 2n d(Ti, T2) (4.13) Thus our ": i./,. is invariant under affine coordinate transformation. 0 It is easy to show that Frobenius norm of the tensor difference commonly used in published work [28, 27, 33] does not have this property. 4.2.3 Tensor Field Mean Value We now develop a theorem that allows us to compute the mean value of a ten sor field. This is essential for the regionbased active contour model used in the segmentation algorithm. Theorem 2 The mean value of a tensor field .7. if 1 as M(T, R) = minMGSPD(n) f d2 [M, T(x)] dx (4.14) is given by M [= BAVB VB (4.15) where A = fRT(x)dx, B = J T (x)dx and SPD(n) denotes the set of 1iil o lit' positive definite matrices of size n. Proof 3 Let E(M) = f d2 [M, T(x)] dx, we have E(M) = d2' [M, T(x) dx R = tr(M1T(x) +T(x)M) dx Str [M (T(x)dx) ( T (x)dx)M] nR 4 R 2 1 n tr(MA +BM) RI 4 2 For a small perturbation in the neighborhood N(M) C SPD(n) represented by M + tV, where t is a small enough positive number, V is ; iim. hl', matrix, we have (M + tV) = M1 (I tVM1 + o(t)) Thus E(M + tV) 1 n tr [(M tV)1A B(M tV)] IRI 4 2 Str [M (I tVM o(t))A B(M tV)] IRI 4 2 1 n tr(M A BM) RI 4 2 +tr(BV MVM A)t + o(t) 4 E(M) tr(BV M AMV)t + o(t) 4 Thus at the critical point, we need tr(BV M1AM 1V) = 0, V Iric V (4.16) This is actually equivalent to MBM = A and can be solved in closed form [571 ;/.:. J1.,/ the desired result in equation (4.15). Since A and B are both SPD matrices, M is also an SPD matrix, thus it is indeed a solution to the minimization equation (4.14). 0 In some special cases, the tensor field mean value can be computed easily. If the tensor field is constant, T(x) Te, then we have A = Tc\R and B = T,1IRI. Substituting into equation (4.15) we have M = T,. So the mean value of a constant tensor field is the constant itself which makes perfect sense. Now if all the tensors in the tensor field is diagonal, T(x) D(x) = diag{di(x),..., d,(x)}, we have A = D(x)dx = diag {di(x), ..., d(x)}dx JR JR = diag{ di(x)dx, ..., jd (x)dx} Similarly B diag{ j (x)dx,..., (x)dx} JR d1 JR "n As both A and B are diagonal matrices, their polynomial forms are all diagonal and the multiplication between these terms are commutable. Then we have M B2B BA B BB BA (= A2B ) d]af R di(x)dx fR dn(x)dx diag . 7 V fR(x)dx' f (x)dx In general, AB $ BA won't commute and equation (4.15) can not be simplified further and needs to resort to the matrix diagonalization. 4.3 Active Contour Models for Image Segmentation Active contour models (aka snakes) have been successfully used in numerous ap plications including computer vision, medical illi.'il12. computer graphics etc. Gen erally speaking, active contour models deal with hypersurfaces in R' and they are often referred as curve evolution models in 2D and surface propagation models in 3D. The basic idea of active contour models is simple, it is just the movement of a hypersurface in R'U (that represents the boundary of regions) in the direction of the normal with an application driven velocity. Level set methods serve as a powerful mathematical tool to analyze and implement the hypersurface evolution equations in active contour models. They involve expressing the geometric hypersurface evolution in an Eulerian form allowing the use of efficient and robust numerical methods. In addition, topology change of the hypersurface is transparent. The time dependent level set formulation was proposed independently by Dervieux and Thomasset [58] to study multifluid incompressible flows, and Osher and Sethian [59] to study front propagation. In the context of computer vision, the history of active contour models began with the pioneering work by Kass, Witkin and Terzopoulos [60]. A few years later, Malladi et al. [61], and Caselles et al. [62] independently proposed the seminal idea of modeling image segmentation with geometric snakes in a level set formulation. There are also many other significant contributions in image segmentation using ac tive contour model and level set methods. A brief review can be found in Vemuri et al. [63]. For a complete discussion on the general curve evolution and level set meth ods, the readers are referred to two outstanding books, one by Sethian [64] and the other by Osher and Paragios [65]. In this section, only basic concepts and methods of 2D active contour models are presented for the sake of self containedness. 4.3.1 Snake Energy Functional Most active contour models aim to minimize a snake energy functional. The minimization of a functional is also called a variational principle and offers several advantages over other methods in image segmentation. The most significant one is that the incorporation of prior shape information into this framework is straightfor ward. The vast majority of active contour models can be categorized as two types: edgebased snakes and regionbased snakes. The classical edgebased snake was first proposed by Kass et al. [60] and involved minimizing the following energy functional: E(C) = j [Eint(C(p)) + Eimage(C(p)) + Econ((p))] dp (4.17) where C is a parameterized curve that separates the different regions, Eint = a Cp, 2 PCp12 is the internal energy that measures the length and the stiffness of the curve, Eimage represents the image force that attracts the curve to locations with large image gradient (as edges), Econ is given by the external constraint forces that usually refine the curve C in the segmentation and is not used in noninteractive segmentation tasks, a and 4 in the internal energy are control parameters that balance the geometric behavior of the snake. Though the classical snake solves the problem of linking edges to form a segmen tation, it has several disadvantages. The most severe one is that it can not converge when far away from the true solution. The balloons model introduced by Cohen [66] incorporates an additional constant inflating or shrinking external force as well as stable numerical techniques to achieve a larger range of convergence. This constant force was shown to be part of an area energy minimization term by K. Siddiqi et al. [67]. The resulting energy functional is E(C) = [Eint(c(p)) + Eimage(C(p))] p + dx (4.18) Jo0 Ri where Ri is the region inside the boundary C. The geometric active contour models were introduced by Malladi et al. [61, 51] and Caselles et al. [62] in the curve evolution context by adding an image stopping term, and can also be derived as part of a weighted length energy functional [52]: E(C)= gy( VI) C'(p) dp (4.19) where g() is an inverse function of the image gradient VI. The above three types of active contour models are all edge based since the stopping criterion in all of them is a function of the image gradient. A more robust type of active contour models is the regionsbased model and is based on the Mumford Shah functional [54]: E(f, C) (f g)2dX+a Vf2dx+ C (4.20) where Q is the image domain, g is the original noisy image data, f is a piecewise smooth approximation of g with discontinuities only along C, C is the arclength of the curve C, a, 3 and 7 are control parameters. The first term models the data fidelity and 7y is inversely proportional to the variance of the noise process, the sec ond term measures the smoothness of the approximation f and can be viewed as a prior model of f given the discontinuity at C, the third term aims to remove tiny disconnected regions and keep a smooth object boundary. With all these three terms, the MumfordShah functional provides an elegant framework for simultaneous image segmentation and smoothing. However, it is a difficult problem to solve the original MumfordShah functional and numerous methods were proposed to tackle its various approximations. For example, Chan and Vese [55] and Tsai et al. [53] presented a twostage implementation of (4.20) where the first stage involves, for a fixed C, constructing a constant or smooth approximation to the image function inside and outside of C and the secondstage evolves C for a fixed f. 4.3.2 Curve Evolution The exact form of curve evolution is C(x,=t) a(x)T(x) + P(x)N(x) (4.21) where a and 3 are the speeds of the curve along the tangential and normal directions respectively, T is the tangent and N is the unit outer normal to the curve. However without loss of generality, equation (4.21) can be rewritten as a deformation along the normal only in the form of the following PDE [68]: C( t)= F(x)N(x) (4.22) where F(x) = P(x) is the speed of the curve at location x along the normal and might depend on various factors like local curvature, global properties of the curve etc. In the following text, we will not explicitly specify the domain of a function when it is clear from the context. For example, we will use F instead of F(x), N instead of N(x) and etc. Particular form of F is the focus of many current researchers and are usually derived from application related variational principles. With proper design of the speed function, curve evolution can be used in the context of image segmentation where F is related to the image data and the curve smoothing term. Though curve evolution can be used directly for various applications, a more natural and elegant way is to derive it from the first principles as gradient flows that minimize the snake energy functional. For example, the gradient flow that minimize (4.19) yields [52] F = (gk + Vg N) For computing the curve evolution equation from region based active contour energy functionals, we refer the readers to Aubert et al. [69] for the shape gradient method. 4.3.3 Level Set Formulation Though traditional techniques for solving (4.22) including the marker/string methods and the volumeoffluid techniques [64] are useful under certain circum stance, they have quite a few draw backs. For example, the marker/string methods provide numerical scheme based on parameterized curves and have instability and topological limitations. The volumeoffluid techniques are not accurate. In contrast, levelset methods [59] offer several advantages at the same time as follows: 1. Topological changes as the evolving curves merge and split are transparent in an implicit formulation. 2. Efficient and accurate numerical methods can yield stable solutions when shocks form. 3. Extension from 2D curve evolution to 3D surface propagation is easy. 4. Can use fast solution techniques such as narrow banding and fast marching for speedup. The key idea of the level set methods is very simple. Given a higher dimensional function q whose zero level set is the curve C (see Fig. 4.2), it is not hard to derive a corresponding update equation for 9 as S( F,) F (4.23) The higher dimensional function 9 is usually chosen to be the signed distance function of the curve C. In such a case, Vo = 1 and will ensure numerical stability. 4.4 Piecewise Constant DTI Segmentation Model Our model for DTI segmentation in 32 is posed as minimization of the following variational principle based on the MumfordShah functional [54]: E(T, C) d2(T(x), To(x))dx a VdT(x) 2dx + 3C (4.24) Jo Jn/C FN (a) (b) Figure 4.2: Evolution of a curve and its corresponding higher dimension function. (a) An evolving curve with speed FN. (b) Higher dimension function whose zero level set is the evolving curve in (a). where the curve C is the boundary of the desired unknown segmentation, Q C R 2 is the image domain, To is the original noisy tensor field, T is a piecewise smooth approximation of To with discontinuities only along C, C is the arclength of the curve C, a and 3 are control parameters, dist(., .) is a measure of the distance between two tensors, VdT(x) E R2 is defined as dist(T(x), T(x + dtv)) VdT(x) v limdtodo where v G ?2 is a unit vector. VdT can be called the gradient of T under the tensor distance dist(., .) and its magnitude is a measure of the smoothness of the tensor field T using dist(.,.) as a tensor distance. The extension of the MumfordShah functional to 3D is straight forward by replacing the curve C with a surface S and the implementation in 3D is similar as in 2D. The above variational principle (4.24) will capture piecewise smooth regions while maintaining a smooth boundary, the balance between the smoothness of the DTI in each region and the boundaries is controlled by a and P. When a is extremely large, equation (4.24) is reduced to a simplified form which aims to capture piecewise constant regions of two types: E(C, T1, T2) fdt2 (T(x), Ti)dx I diSt2(T(x), T2)dxx + C (4.25) where R is the region enclosed by C and R' is the region outside C, T1 and T2 are the mean values of the diffusion tensor field in region R and R' respectively. The above model can be viewed as a modification of the active contour model without edges for scalar valued images by Chan and Vese [55]. It can segment tensor fields with two constant regions (each region type however can have disconnected parts) in a very efficient way. We incorporate our new tensor "iiI.. in this active contour model to achieve DTI segmentation. 4.4.1 Curve Evolution Equation The Euler Lagrange equation for the variational principle (4.25) is [Lk d2(T, T) + 2(T, T2)] N = 0 Ti = M(T, R), T2 M(T, R) where k is the curvature of the curve C, N is the outward normal to the curve. When T1 and T2 are fixed, we have the following curve evolution form for the above equation : at [pk d2(T T1(t)) + d2(T T2(t))] N where T1 = M(T, R), T2 M(T, Rc) (4.26) 4.4.2 Level Set Formulation The curve evolution equation (4.26) can be easily implemented in a level set framework. As shown in section 4.3, we have W L/3V d (T. TI) + d(T. T2)] Vq (4.27) where 0 is the signed distance function of C. 4.4.3 Implementation and Numerical Methods We follow the two stage implementation as described in Chan and Vese [55] and our modified procedure is as follows: Algorithm 3 Two Stage Piecewise Constant Segmentation of DTIs 1: Set initial curve Co and compute its signed distance function 9o. 2: Compute TI and T2 according to equation (4.15). 3: Update signed distance function 0 according to equation (4.27). 4: Reinitialize 0 using the updated zero level set. 5: Stop if the solution is achieved, else go to step 2. The key to the computation of T1 and T2 is the computation the square root of an SPD matrix. We use the matrix diagonalization to achieve this, a real symmetric matrix A can be diagonalized as A = ODOT where 0 is an orthogonal matrix and D = diag {di, d2, dn} is a diagonal matrix. Then the polynomial form of A is A' = OD'OT where Dc = diag {dc, dc, ..., d }. Note that in equation (4.15), v BA B / BiAAB'i, it has to be computed as follows: Algorithm 4 Computing VB1 A 'B 1: Diagonalize B as OBDBOBT 2: Compute P VB as OBV BOBT. 3: Compute Q as PAP. 4: Diagonalize Q as QqDQOqT 5: Compute \Q as OQ/DQOQT Equation (4.27) can be easily discretized using an explicit Euler scheme. We can assume the spatial grid size to be 1, then the finite differences of the partial derivatives are 1 1 A %_ i i+1,j ij, A 'I i+1 i _1) 2 2 At+ Qj ij, 1A3 1. i+1 A__ = ti+l,j *i, A ,ij At *t+j 2 + ij sA.,= 1 i+1,j+1 i+i,j1 i j+ + iiiji) Ajj '. j '* +1 2. + *. ,1 In this case, we have the following update equation: At [ d2(Tj,^ T'n) +d2(Tj,.^ T)] *(At)2+ (A3 )2 (4.28) where the curvature Qk of ,'* can be computed as Aii .(A i i)2 2A* *A Atj *i +A"t (Aj .2 %:3 [(A )2 (Aj i)2] 3/2 Update according to equation (4.28) on the whole domain Q has a complexity of O(2) and is going to be slow when the the domain is huge. As we are most interested in the evolving zero level set, updating only a narrow band around the zero level set will be sufficient and this can be achieved using the narrow band method described in [70, 71]. In order to maintain q as a signed distance function of C, it is necessary to reinitialize q and can also be done only within a narrow band. There are also several other efficient numerical schemes that one may em"'l'' for example the multi grid scheme as was done in Tsai et al. [53]. At this time, our explicit Euler scheme with the narrow band method yielded reasonably fast solutions (35secs. for the 2D synthetic data examples and several minutes for the 3D real DTI examples on a IGhz Pentium3 CPU). 4.5 Piecewise Smooth DTI Segmentation Model In certain cases, the piecewise constant assumption will be violated and the piecewise smooth model (4.24) has to be employed in such cases. Following the curve evolution implementation of the MumfordShah functional by Tsai et al. [53], we have the following twostage scheme. In the smoothing stage, the curve is fixed and a smoothing inside the curve and outside the curve is done by preserving the discontinuity across the curve. In the curve evolution stage, the inside and outside of the smoothed tensor field are fixed while the curve is allowed to move. 4.5.1 Discontinuity Preserving Smoothing If the curve is fixed, we have Ec(T)= d2 (T(x), To(x))dx a f 7T(x) 2d (4.29) For implementation, the above VP is discretized as follows: Ec(T) = d2(T(x,To(x))+a d2(T(x),T(y)) (4.30) x (x,y)CNc where Nc defines a collection of neighboring pixels. If a pair (x, y) cuts across the boundary, it is excluded from Nc. As we have an energy functional of a tensor field on a discrete grid, we can compute its gradient with respect to this discrete tensor field. A straightforward way to do this is to treat all the independent components of the tensors as the components of a vector and compute the gradient of this energy function with respect to this vector. However, the form of the gradient will not be compact. Instead, we use the derivative of a matrix function f(A) with respect to its matrix variable as follows: f (A) [ 9f(A) (4.31) OA [A(i, j It is not hard to see that f (A) OA(i, j) f(A + dtEj) f (A) limdto dt dt (4.32) where Eyj is a matrix with 1 at location (i, j) and 0 elsewhere. As the directional derivative with respect to a perturbation V on T(x) is given Ec (T(x) + V) Ec (T(x)) [d2(T(x) + V, To(x)) + a yecNc(x) d2(T(x) + V, T(y)) 2(T (x),To(x)) a d2(T(x), T(y)) yeCNc(x) tr [(To (x) T (x)To(x)T (x))V] +a tr [(T (y) T(x)T(y)T (x))V] yCeNc(x) tr [(B T (x)AT (x))V] A =a Y yecNc(x) B a 5 yeNc (x) T (y) + To(x) T(y) + To(x) We have the gradient from equation (4.32) and equation (4.33): Ec 1 [B T (x)AT'(x)] ST(x) 4 So the minimizer of the VP (4.30) satisfies B = T (x)AT'(x) where (4.33) (4.34) (4.35) 4.5.2 Curve Evolution Equation and Level Set Formulation Once the boundary discontinuity preserving smoothed tensor field is fixed, we have ET(C) d2(TR(), To(x))dx+ j 2 (TR(x), To(x))dx +a VdTR(x) 2d + a VdTR(X) 2d +x 3 C (4.36) R fRc Thus the curve evolution equation will be t { d2(TR, To) d2(TR, To)] a '( VdTRc 2 IVdTR 2) 3k} N (4.37) Again for implementation, we have aC at [d2(R, T) d2(TR, To)] N +a d2(TR, TR(y)) N (d2(TR, TR(y)) N yeNRc (x) yEcNR(x) 3kN (4.38) The level set formulation for (4.38) is then given by a t V V + F] 1V7 (4.39) where ( is the signed distance function of C and the data dependent speed F is given by F = d2(TTo) d2(TRc, To)] a d2(TRc, TRc(y)) d2(TRTR(Y)) ycNRc(x) yEcNR(x) 4.5.3 Implementation and Numerical Methods Similarly as in algorithm 3 and also as in Tsai et al. [53], we have the following twostage implementation. The major difference here lies in the computation of TR 78 Algorithm 5 Two Stage Piecewise Smooth Segmentation for DTI 1: Set initial curve Co and compute its signed distance function as Qo. 2: Compute TR and TRC. 3: Update level set y according to equation (4.39). 4: Reinitialize 9 using the updated zero level set. 5: Stop if the solution is achieved. Otherwise repeat from step 2. and TRC. As the gradient can be computed, it is easy to design efficient numeri cal algorithm to achieve the discontinuity preserving smoothing. Currently, we use gradient descent with adaptive step size due to its simplicity. 4.6 Experimental Results In this section, we present several sets of experiments on the application of our DTI segmentation model. The first one is on 2D synthetic data sets, the second one is on single slices of real DTIs estimated from DWIs, the last one is on 3D real DTIs. In the experiments below, if not explicitly stated, the segmentation model used is the piecewise constant model. 4.6.1 Synthetic Tensor Field Segmentation The purpose of these experiments is to demonstrate the need to use the full information contained in the tensors for segmentation purposes. To this end, we synthesize two tensor fields, both are 2 x 2 symmetric positive definite matrix valued images on a 128 x 128 lattice and have two homogeneous regions. The two regions in the first tensor field only differ in the orientations while the two regions in the second tensor field only differ in the scales. These two tensor fields are visualized as ellipses as shown in Fig. 4.3(a) and Fig. 4.4(a) respectively. Each ellipse's axes correspond to the eigenvector directions of the diffusion tensor and are scaled by the eigenvalues. With an arbitrary initialization of the geometric active contour, our proposed model can yield high q1i.iliv i] segmentation results as show in Figs. 4.3 and 4.4. The evolving boundaries of the segmentation are shown as curves in red. Note that the first tensor field can not be segmented by using only the scalar anisotropic properties of tensors as in [36] and the second tensor field can not be segmented by using only the dominant eigenvectors of the tensors. These two examples show that one must use the full information contained in tensors in achieving quality segmentation. As our model is a region based segmentation method, it is more resistant to noise than the gradient based snakes. Figures 4.5 and 4.6 depict the segmentation process for synthetic noisy tensor fields and the results are very satisfactory. 4.6.2 Single Slice DTI Segmentation Figure 4.7 shows a slice of the DTI estimated from the DWIs of a normal rat spinal cord. Each of the six independent components of the individual symmetric positive definite diffusion tensors in the DTI is shown as a scalar image in the top row. The arrangements of the components from left to right are D,,, D,,, Dzz, D,,, Dyz and D ,. The off diagonal terms are greatly enhanced by brightness and contrast factors for better visualization. An ellipsoid visualization of same slice as the top row is shown Fig. 4.7 (g). Each ellipsoid's axes correspond to the eigenvector directions of the 3x3 diffusion tensor and are scaled by the eigenvalues and its color from blue to red shows the lowest to highest degree of anisotropy. For example, diffusion tensors in free water region are shown as large round blue ellipsoids. Figure 4.8 demonstrates the separation of the gray matter and white matter inside the normal rat spinal cord with the evolving curve in red superimposed on the ellipsoid visualization of the DTI. Similarly, Fig. 4.9 shows a slice of DTI of a normal rat brain in the region of corpus callosum and Fig. 4.10 depicts the segmentation procedure of the corpus callosum. In the final step, the essential part of the corpus callosum is captured by the proposed piecewise constant segmentation model. To further get the bending horns of the corpus callosum, we use the segmentation results of the piecewise constant model as initialization and apply the piecewise smooth model. The result is shown in Fig. 4.11. Now we have a much better refinement. In all the above experiments, we exclude the free water regions which are not of interest in the biological context. 4.6.3 DTI Segmentation in 3D Now we demonstrate some segmentation results for 3D DTI. Figure 4.12 depicts the segmentation procedure for a normal rat spinal cord DTI of size 108 x 108 x 10. Figure 4.12 (a)(d) clearly depict the surface evolution in 3D and Fig. 4.12 (e)(h) depict the intersection of the propagating surface in (a)(d) with a slice of the D1, component of the DTI. The separation of the gray matter and white matter inside 81 the normal rat spinal cord is achieved with ease. Similarly, Fig. 4.13 (a)(h) show the segmentation procedure for a normal rat brain DTI of size 114 x 108 x 12. In addition, intersections of the final 3D segmentation with different slices of the D,, component of the DTI are shown in Fig. 4.13 (i)(l). It is evident that a majority of the corpus callosum inside this volume is captured. Again, we exclude the free water regions which are not of interest in the biological context. I ~ , * A~ "1rrN ~~~ r~~' 4 A ~'1 ~ Ii~ Ti 7l,~;:3 ii`i < <..\,< 4.A 4'z; r~I7T TLI4~ *A ":~~ L' A ' it~ i l II~1 C/ l ~ i A3 LC t C4!: ~1ttsJ, t Figure 4.3: Segmentation of a synthetic tensor field where two regions differs only in the orientations, (b),(c) and (d) are the initial, intermediate and final steps of the curve evolution process in the segmentation. rr. i"i t.~ tC="3 r~~~~~ ~~~~~ `',( ~ IC.r I i,, r i rl3? ~S S cr 1~ .r,~ '1?~ 31`(' ?~ ~:: 4.* rr 3 k~ii::~L:~: 7 *k.z ~ 5~r ~ t' i :~ F4E~:S:~: Ci *~: ciicZ Yxrir .y^CC .T'  T1""L :.*. i .y^r~ "^.1 "^r 1\ ,z * .aa t~i ir,x i,a i. c ~I :,i c. i* I.* rr d'," T~~C~" I r ..2, ~:1 d~ IC_,C~L. .rlrE,~IIi; " ~C~rir, .r r: ~ is .,:~,.r:i:~=~j:C .~L~i~rp _ ..s 1~~ ~'~ ~~~Yill)i i. Ii .i ir :rr rjc ~ F~:* II..  rl, r,49: ir :rl J: r:iT i 7 r%3":1 51111:r ~:rikr ,x~r:4., rl~i~C r : *: i :** r l'I * u 1 1 14 *i!. i< i t " 3 ': ic iLi  ft" Ii it oi I. * I Iii II *1: ii i i i ? ":" : ,,;( i ii i" ",i;' ijZ ""I";l!i ? "i:' "I( C? ~ii Figure 4.4: Segmentation of a synthetic tensor field where two regions differs only in scale, (b), (c) and (d) are the initial, intermediate and final steps of the curve evolution process in the segmentation. I.I 1 1 1.1 n ui n; ii, I i ri ": '' " i: I I r; I ;i tr I~ I ii" i ii iii i i: i: i : :: i: I, i. I c I: ii :: I i;; I :: i I; L. i i r, 1; r: Ilr t i? Ir? I ': I "" :" :: " ;I ii I~ ii i: i i, II I: FO i ~ ;r [I I~ "" ii Llii jj I, iiIl :: : .: I ( i. c I: ; ~; ii ~li uli 4* A 9 >7 a a L A, *~s~~~Ir .AAA. 9 i(F sA, i 'U" AA A 1' ~~"Cr w 'ru i 'I~, ~ r r 9~  ) a. 9.A ISr ~  a' *r 5 i'L .AA I ~". .p A t.: I. r AL .9. 7 t.*.. ,: I I 9* : i ;[ ATi >7 ci Y L r:" ". I 7 ;  1 9 t * 2 7 7** ''. A.: ^ ..* .. : : .. il 7' <;... '."' :T. j ^" i .* r .1* .'..;' * *": .<*.e *'._'  ; '' *; *, *X t  .; _. ; Ar . 71, '9 'A~d^J ~ .~ a t .' < 94 F ~,r Tii A ii~i7~ * ~ :l r Ale~.~ , , A' L S.2 7, 1 i A ,`II  *i r 9A  ~~i" ~ < A~T"., r';S~ ADq,"6 S 9 ' m *'"s'r/'; * * Ar* Al'  9' : 71 F :? ' Figure 4.5: Segmentation of a synthetic tensor field with two roughly homogeneous regions that only differ in the orientations. (b)(d) are the initial, intermediate and final steps of the curve evolution process for segmentation of (a).  7 ,AI 7 >7. _7~~ ~ I >7 *9i xia  rr~ i ' r ~ 1X :I?:irr :i, ; : I : :~, 9~ :. ." (~9~~i:~ ~I r "ii: I ` i: lb ? i ; ": ": "Ih (c) (d) Figure 4.6: Segmentation of a synthetic tensor field with two roughly homogeneous regions that only differ scale. (b)(d) are the initial, intermediate and final steps of the curve evolution process for segmentation of (a). I NE 4LM (a) D11 (b) D9, (c) D, (d) D,, (e) D, (f) D, Figure 4.7: A slice of the DTI of a normal rat spinal cord. (a)(f): viewed channel by channel as gray scale images. (g): viewed using ellipsoids. (a) (b) (c) (d) Figure 4.8: Segmentation of the slice of DTI shown in Fig. 4.7. (a)(d): initial, intermediate and final steps in separating the gray and white matter inside the rat spinal cord. 