
Citation 
 Permanent Link:
 http://ufdc.ufl.edu/AA00039579/00001
Material Information
 Title:
 Image reconstruction algorithms for achieving highresolution positron emission tomography images
 Creator:
 Chang, JiHo
 Publication Date:
 2004
 Language:
 English
 Physical Description:
 viii, 160 leaves : ill. ; 29 cm.
Subjects
 Subjects / Keywords:
 Algorithms ( jstor )
Image contrast ( jstor ) Image reconstruction ( jstor ) Matrices ( jstor ) Objective functions ( jstor ) Penalty function ( jstor ) Pets ( jstor ) Photons ( jstor ) Standard deviation ( jstor ) Tumors ( jstor ) Dissertations, Academic  Electrical and Computer Engineering  UF Electrical and Computer Engineering thesis, Ph. D Image processing  Digital techniques ( lcsh ) Imaging systems in medicine  Mathematical models ( lcsh ) Tomography, Emission ( lcsh )
 Genre:
 bibliography ( marcgt )
theses ( marcgt ) nonfiction ( marcgt )
Notes
 Thesis:
 Thesis (Ph. D.)University of Florida, 2004.
 Bibliography:
 Includes bibliographical references.
 General Note:
 Printout.
 General Note:
 Vita.
 Statement of Responsibility:
 by JiHo Chang.
Record Information
 Source Institution:
 University of Florida
 Holding Location:
 University of Florida
 Rights Management:
 The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. Â§107) for nonprofit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
 Resource Identifier:
 022820658 ( ALEPH )
880637461 ( OCLC )

Downloads 
This item has the following downloads:

Full Text 
IMAGE RECONSTRUCTION ALGORITHMS
FOR ACHIEVING HIGHRESOLUTION
POSITRON EMISSION TOMOGRAPHY IMAGES
By
JIHO CHANG
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2004
Copyright 2004
by
JiHo Chang
I dedicate this work to my family.
ACKNOWLEDGMENTS
First of all, I would like to thank my academic advisor, Dr. John M.M. Anderson, for his great guidance of my research. I also would like to thank my committee members Dr. Jian Li. Dr. Fred J. Taylor, and Dr. Bernard A. Mair for their precious comments and corrections on my dissertation. Especially, I would like to thank Dr. Bernard A. Mair for his comments and corrections on convergence proof of the proposed algorithms. I thank Yoonchul Kim and Kerkil Choi who have shared with me valuable discussions related to my research. Finally, I am grateful to my family, who have always prayed for my health.
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS ............................. iv
A BSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
CHAPTER
1 BACKGROUND ............................... 1
1.1 Overview of Positron Emission Tomography (PET) .......... 2
1.2 PET Scanner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Sources of Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 System Model for Emission Data .................. 10
2 LITERATURE REVIEW .......................... 16
2.1 Penalized Maximum Likelihood Algorithms ................ 17
2.2 Scatter Correction Methods ..................... 19
2.3 Summary of the Proposed Algorithms ................ 20
3 PENALIZED MAXIMUM LIKELIHOOD ALGORITHM ............ 23
3.1 Penalized Maximum Likelihood (PML) Algorithm ........... 24
3.2 Convergence Proof .......................... 33
3.3 Properties of the PML Algorithm .................. 37
4 ACCELERATED PENALIZED MAXIMUM LIKELIHOOD ALGORITHM 39
4.1 Convergence Proof .......................... 40
4.2 Accelerated Penalized Maximum Likelihood (APML) Algorithm 44 4.3 Direction Vectors ................................. 52
4.4 Properties of the APML Algorithm ..................... 55
5 QUADRATIC EDGE PRESERVING ALGORITHM .............. 57
6 JOINT EMISSION MEAN AND PROBABILITY MATRIX ESTIMATION 63
6.1 Scatter M atrix M odel ......................... 65
6.2 Joint Minimum KullbackLeibler distance Method ........... 68
6.3 Probability Correction in Projection Space (PCiPS) Algorithm . . 69
7 SIMULATIONS AND EXPERIMENTAL STUDY ................ 79
7.1 Regularized Image Reconstrnction AlgorithMns ........... 80
7.1.1 Synthetic D ata . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.1.2 R eal D ata . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2 APNIL algorithm ........................... 112
7.3 PCiPS Algorithm ........................... 116
8 CONCLUSIONS AND FUTURE WORK .................. 138
8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.2 Future W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
APPENDIX
A tHUBER'S SURROGATE FUNCTIONS .................. 144
B SURROGATE FUNCTIONS FOR PENALTY FUNCTION ........ 145 C STRICT CONVEXITY OF PML OBJECTIVE FUNCTION ........ 147 D SOLUTION TO UNCONSTRAINED OPTIMIZATION PROBLEM
IN MODIFIED APML LINE SEARCH ................. 149
E CONVEXITY OF SURROGATE FUNCTIONS
FOR OBJECTIVE FUNCTIONS IN APML LINE SEARCH. .....151 REFERENCES .. ......... .. .... ..... ... .. ...... .. 154
BIOGRAPHICAL SKETCH ............................ 160
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
IMAGE RECONSTRUCTION ALGORITHMS FOR ACHIEVING HIGHRESOLUTION POSITRON EMISSION TOMOGRAPHY IMAGES By
JiHo Chang
August 2004
Chair: John M. M. Anderson
Major Department: Electrical and Computer Engineering
In this dissertation, we present four algorithms for reconstructing highresolution images in PET. The first algorithm, referred to as the penalized maximum likelihood (PML) algorithm, iteratively minimizes a PML objective function. At each iteration, the PML algorithm generates a function, called a surrogate function, that satisfies certain conditions. The next iterate is defined to be the nonnegative minimizer of the surrogate function. The PML algorithm utilizes standard decoupled surrogate functions for the maximum likelihood objective function of the data and decoupled surrogate functions for a certain class of penalty functions. As desired, the PML algorithm guarantees nonnegative estimates and monotonically decreases the PML objective function with increasing iterations. For the case where the PML objective function is strictly convex, which is true for the class of penalty functions under consideration, the PML algorithm has been shown to converge to the minimizer of the PML objective function.
The drawback of the PML algorithm is that it converges slowly. Thus, a "fast" version of the PML algorithm, referred to as the accelerated PML (APML) algorithm,
was developed where an additional search step, called a pattern search step. is performed after each standard PML iteration. In the pattern search step, an accelerated iterate, which has lower cost than the standard PIL iterate, is found by solving a certain constrained optimization problem that arises at each pattern search step. The APML algorithm retains the nice properties of the PML algorithm.
The third algorithm, referred to as the quadratic edge preserving (QEP) algorithm., aims to preserve edges in the reconstructed images so that fine details, such as small tumors, are more resolvable. The QEP algorithm is based on an iteration dependent, decoupled penalty function that introduces smoothing while preserving edges. The penalty function was developed by modifying the surrogate functions of the penalty function for the PML method.
In PET, there are several errors that have the net effect of introducing blur into the reconstructed images. We propose a method that aims to reduce blur in PET images. The method is based on the assumption that the "true" probability matrix for the observed emission data is a product of an unknown nonnegative matrix, called a scatter matrix, and a "conventional" probability matrix. Under the suggested framework, the problem is to jointly estimate the scatter matrix and emission means. We propose an alternating minimization algorithm to estimate them by minimizing a certain distance.
The algorithms are qualitatively and quantitatively assessed using synthetic phantom data and real phantom data.
CHAPTER 1
BACKGROUND
Medical imaging modalities, such as Xray computed tomography and magnetic resonance imaging, are used to obtain images of anatomical structures within the human body. However, in certain medical applications, it is also important to get physiological information. The reason is because physiological changes can indicate disease states earlier than anatomical changes [1]. Positron emission tomography (PET) and singlephoton emission computed tomography (SPECT) are widely used medical imaging modalities that acquire physiological information on both human and animal subjects.
In SPECT, physiological information is obtained by imaging the distribution of gammaray or Xray emitting radioisotopes within the human body [1]. After the radioisotopes are introduced into the human body, the radioisotopes decay and emit gammaray or Xray single photons. A SPECT scanner detects these photons with help of collimators that are made of lead. Since a large percentage photons are absorbed by the lead collimators, the sensitivity and accuracy of SPECT is limited.
In PET, there is no need for lead collimators because collimation is performed by electronic circuits that are connected to the detectors. Consequently, PET possesses relatively high sensitivity and accuracy, when compared to SPECT. Although cost is the major limitation of PET, recent research advances, such as a less expensive materials for detectors and scanner configurations that need a smaller number of detectors than conventional scanners, have helped decrease its cost [1]. The main advantage of SPECT over PET is its substantially lower cost.
2
\Ve present a brief overview of PET, PET scanner, and certain errors in PET in Section 1.1, 1.2, and 1.3, respectively. Then, we provide some background on the image reconstruction problem in PET.
1.1 Overview of Positron Emission Tomography (PET)
Most clinical applications of PET are in oncological cases involving the diagnosis and staging of cancer, treatment planning and monitoring of tumors, detection of recurrent tumors, and localization of biopsy sites in cases when there are tumors in the head or neck [2,3]. PET is also used in the diagnosis of coronary artery disease and other heart diseases [2,3].
In PET, physiological information is acquired by imaging the distribution of positronemitting isotopes, such as IaN, 150, '8F, and iC, within the human body [3]. The positronemitting isotopes are bound to compounds with known biochemical properties. Compounds that are labeled with positronemitting isotopes are called radiopharmaceuticals. The choice of the radiopharmaceutical depends on its application. For example, 2[18F]fluoro2deoxydglucose ([lSF]FDG) is used for imaging brain tumors while [I3N]ammonia is used for the detection of coronary artery disease [3]. After the radiopharmaceutical is introduced into the subject (injection or inhalation), positrons are emitted as the positronemitting isotopes decay. An emitted positron annihilates with a nearby electron within the body causing the generation of two high energy (511keV) photons. The two photons, which can penetrate the subject, travel in almost opposite directions. Ideally, photon pairs that are generated by a positron annihilating with an electron will be detected by a pair of detectors.
If two electronically connected detectors detect a pair of photons within a short time interval (e.g., < 10 ns), then the detection is recorded along the line connecting the two detectors, which is called a lineofresponse. In the absence of an error, the detection indicates that there is an annihilation somewhere along the lineofresponse. The detection of a photon pair is referred to as a coincidence. In addition,
the timing interval of a scanner that is used to define a coincidence is called the scanner's coincidence timing window. Since there are many, many detector pairs in a PET scanner, sufficient information is available for reconstructing a map of the concentration of the radiopharmaceutical. For each detector pair, the number of coincidences that occur (luring the scan are summed. The emission data is the coincidence sums for all of the possible detector pairs.
The key idea in PET is that emission data depends on the distribution of the radiopharmaceutical within the subject being scanned, which in turn depends on the metabolism of the subject. Consequently, numerous researchers have developed algorithms that reconstruct PET images whose pixel values represent the distribution of the radiopharmaceutical, and, ultimately, the metabolism of the subject.
1.2 PET Scanner
Typical PET scanners have a diameter of 80  100 cm and an axial extent of 10 20 cm [4]. Figure 1 1 (a) shows a simplified PET scanner and Figures 1 1 (b) and
(c) show twodimensional views of the scanner. Usually, a PET scanner consists of hundreds of rectangular bundles of crystals that are formed to make between 20  30 rings of detectors, where each detector ring contains 300 600 detectors. Each bundle of crystals is connected to a few (e.g., 2  8) photomultiplier tubes (PMTs). Figure 1 1 (d) shows such a block of crystals coupled to four PMTs. When a photon interacts with a crystal, light photons are emitted and the PMTs collect the photons. From the collected light, a PET scanner determines the crystal within which the scintillation occurs.
Stateoftheart PET scanners generally provide two scan modes: slicecollimated mode and fully threedimensional mode [4]. In slicecollimated mode or septaextended mode, thin tungsten rings, called septa, are placed between the detector rings. Figure 1 2 (a) illustrates slicecollimated mode in which coincidences are collected within a detector ring. In fully threedimensional mode, the septa is removed from the
(a) (b)
(c) 1_1y (d)
X
~PMTs
Y Crystals
Figure 1 1: Simplified PET scanner: (a) a simplified fullring PET scanner with 8 detector rings, (b) twodimensional view of the scanner at x = 0, (c) twodimensional view of the scanner at y = 0, and (d) a block detector consisting of an array of 8 x 8 crystals coupled to four PMTs, where the origin of the rectangular coordinate system is at the center of the scanner.
(a) (b) Septa
II ILI
Figure 1 2: Scan modes: (a) septaextended mode and (b) fully threedimensional mode. The dotted lines represent lineofresponses.
(a) (b)
Figure 1 3: Projection definition using zigzag scan: (a) a set of detector pairs that define a projection and (b) another projection. A dashed line represents a detector pair.
Figure 1 4: Illustration of positron range and a coincidence. Two concentric circles denote a detector ring.
scanner. Figure 1 2 (b) illustrates fully threedimensional mode in which allowable lineofresponses are not restricted to occur within a detector ring.
Within a detector ring, detector pairs are grouped into projections, where a projection is a set of detector pairs that are defined by a particular "zigzag scan". Figure 1 3 (a) shows a projection and the defining zigzag scan. Figure 1 3 (b) shows another projection. A sinogram is defined to be a S x T matrix, where each row of the matrix contains a projection, S is the number of detector pairs within the projection, and T is the number of projections in the emission data.
1.3 Sources of Error
After a positron is emitted, it travels a short distance before it annihilates with a nearby electron within a subject. The distance between the locations at which the emission and the annihilation take place is called positron range. The positron range is proportional to the reciprocal of the density of the material an emitted positron travels [5]. Figure 1 4 is an illustration of positron range for a simplified scanner geometry. The positron range depends on the energy an emitted positron deposits (and,
Figure 1 5: Illustration of noncollinearity of lineofresponse. The dashed line indicates the path of the photon pair if they had departed in exactly opposite directions. The arrows show the actual photon paths.
consequently, the chosen positronemitting isotope). For typical positronemitting isotopes, a full width half maximum (FWHIM) of the distribution of positron range is a few millimeters. For example, the FWIIMs of the distribution of the positron range for 18F and 150 are about 2 mm and 8 mm, respectively [5, p. 331].
Two photons generated by an annihilation usually do not propagate in exactly opposite directions. This phenomenon is called noncollinearity of lineofresponse. Figure 1 5 is an illustration of noncollinearity of lineofresponse for a simplified scanner geometry. For a fixed direction that one of the two photons propagates, we refer to the angle between the opposite direction of the fixed direction and the actual direction that the other photon travels in as the "angle of noncollinearity". The distribution of the angle of noncollinearity can be approximated by a Gaussian distribution with a FWHM of 0.5 degree [5].
The angle between the path that a photon propagates and the face of a detector when the photon hits the detector is referred to as the incident angle. The way that a photon interacts with a detector depends on the incident angle. If the path that a
(a) (b)
Figure 1 6: Single and scatter: (a) Illustration of attenuation and a single, and
(b) Illustration of a scattered coincidence. The arrow penetrating the detector ring denotes that the photon is scattered through an oblique angle such that it does not hit a detector. The dotted line denotes the incorrectly positioned lineofresponse. photon travels is not perpendicular to the face of a detector when the photon hits the detector, then it is possible that the photon does not interact with the detector that it strikes. The photon may interact with another detector that is nearby the detector the photon hits originally. This phenomenon is termed by detector penetration. Some methods are proposed to account for detector penetration [6, 7].
When a 511keV photon propagates within a subject and interacts with an electron, the photon may undergo a phenomenon known as Compton scattering. When a photon hits an electron, the photon gives part of its energy to the electron and deflects from its original path if the photon has sufficient energy. This phenomenon is called Compton scattering. Compton scattering can lead to three kinds of error: attenuation, scatter, and accidental coincidence.
Most scattered photons are scattered out of the scanner's fieldofview so that many of them are not detected. This phenomenon is called attenuation. Figure 1
6 (a) illustrates attenuation, where one of the two photons of an annihilation does not hit a detector due to Compton scattering. Figure 1 6 (a) also illustrates an event called a single, where only one of the emitted photons of an annihilation is
(a) (b)
Figure 1 7: Accidental coincidence: (a) Illustration of an accidental coincidence due to two annihilations occurring at almost the same time and (b) Illustration of an accidental coincidence due to Compton scattering. detected. Note that attenuation leads to an incorrect decrease in emission counts. In order to address attenuation, numerous correction methods have been proposed (see e.g., [8 10]).
Consider an annihilation event where Compton scattering occurs. It is still possible that a detector pair detects both photons even though one or both of the photons may have undergone Compton scattering. Such an event is called a scattered coincidence or scatter, which is illustrated in Figure 1 6 (b). Note that a scattered coincidence leads an incorrect increase in emission counts. Since a scattered photon loses part of its energy, the energy of detected photons may be used to discriminate between unscattered photons and scattered photons [11, pp. 6569].
Two photons arising from different annihilations can be recorded by a detector pair. This event is called an accidental coincidence. Figure 1 7 (a) illustrates an accidental coincidence due to two annihilations occurring at almost the same time. Sometimes, an accidental coincidence may be due to Compton scattering as illustrated in Figure 1 7 (b). Like scatter, an accidental coincidence leads an incorrect increase
in emission counts. For many PET scanners. the imean accidental coincidence rate is estimated using a delayed" tining window technique [12].
The efficiency of a detector pair is defined to be the probability that a coincidence is recorded when a photon pair hit the detectors. Ideally, this probability should be one. However, the efficiencies of detector pairs are nonuniform because of their geometric differences and the nonuniform physical characteristics of detectors. The nonuniformity of detector pairs is referred to as detector inefficiency. To address detector inefficiency, correction methods such as [13 15] have been proposed.
1.4 System Model for Emission Data In 1982, Shepp and Vardi proposed a Poisson model for PET emission data and an algorithm, known as the maximum likelihood expectation maximization (MLEM) algorithm, for reconstructing maximum likelihood (ML) emission images [16]. In the Poisson model, the regionofinterest is divided into J equal sized volume elements, called voxels. Ultimately, the goal is to estimate the mean number of positrons emitted from each voxel. Let the ith component of a vector d, di, represents the observed number of photon pairs recorded by the ith detector pair and the jth component of a vector x, xj, represents the unknown mean number of emissions from the jth voxel. Further, let I and J denote the number of detector pairs and number of voxels, respectively. The I x 1 vector d and the J x 1 vector x are the emission data and the unknown emission mean vector, respectively. Let Pij denotes the probability that an annihilation in the jth voxel leads to a photon pair being recorded by the ith detector pair. The I x J probability matrix P has Pij as its (i, j)th element. In the Poisson model, Shepp and Vardi assumed that the emission data d is an observation of a random vector D with elements {D}i=1 that are Poisson distributed and independent. For all i, the mean of the random variable Di is
J
E{Dij : 7' . (1.1)
j=l
In practice, the probability is unknown and must be estimated somehow. The simplest way to estimate the probability matrix P is the angleofview (AOV) method [16]. In the AOV method. a detector ring and a detector are modelled by a circle and an arc on the circle, respectively. Moreover, within a voxel, all emitted positrons are assumed to be emitted from the center of the voxel. Figure 1 8 illustrates a detector ring, a detector pair, and the tube (i.e., spatial extent) that is defined by the detector pair. In the figure, the AOV from the point qj to the detector pair (y, z) is also shown. Specifically, this AOV is defined to be
AOV from g. to (y, z)
{ in Zagjb, Za gjb, (r Zaygjb), (r  Zbygjaz)}, gj E tube (y, z)
(1..2)
0, otherwise
Said another way, the AOV in (1.2) is the maximum angle for which a line that goes through the point qj will simultaneously intersect both detectors of the detector pair (y, z). In the AOV method, the probability Pi is defined to be A AOV from gj to (y, z) (1.3)
where the detector pair (y, z) is the ith detector pair and the point gj is the center point of the jth voxel. In the AOV method, it is assumed that a photon is detected by a detector whenever the photon hits the arc corresponding to the detector. Clearly, the AOV method does not account for detector penetration discussed in Section 1.3. Some methods have been developed to address errors due to detector penetration [6,7].
In this dissertation, we make the following mild assumptions on the probability matrix P and the emission data d:
" (AS1) P has no row vector of zeros
" (AS2) dPj 0 for all j, where d' is the transpose of d and Pj is the jth
column of P.
AOV from g, to (y, z)
Figure 1 8: The angleofview from the point gj to the detector pair (y, z) is shown. The circle denotes a detector ring. The arcs (ay, by) and (as, b,) represent the detectors y and z, respectively. The area between the vertical lines (ay, b.) and (by, a,) represents the tube defined by the detector pair (y, z).
To see the implication of the second assumption, consider all the detector pairs where the probability of recording a photon pair generated by an annihilation in the jthl voxel is nonzero. Among the set of detector pairs. the second assumption implies that there exists at least one detector pair with nonzero emission counts. The (AS2) is expected to hold whenever the duration of the emission scan is a reasonable length of time. In addition to (AS1) and (AS2). we assume that the probability matrix P accounts for errors due to attenuation, detector inefficiency, detector penetration, positron range, noncollinearity of lineofresponse, and scatter. In practice., there are correction methods for attenuation [9], detector inefficiency [15], and detector penetration [7]. However, other correction methods for detector penetration. attenuation, and detector inefficiency could be used in conjunction. Note that in Chapter 6, we do not assume that the probability matrix accounts for errors due to scatter and noncollinearity of lineofresponse. Instead, we present a method that estimates the probability matrix in such a way that errors due to scatter noncollinearity of lineofresponse are addressed.
When accidental coincidences (i.e., randoms) are considered, the Poisson model must be modified so that now the emission data d is an observation of a random vector D that is Poisson distributed with mean (Px + p), where the Zth component of p, pi, is the mean number of accidental coincidences recorded by the ith detector pair, i = 1, 2,... , I. Usually, it is assumed that the mean accidental coincidence rate p is known. In practice, the mean accidental coincidence rate is estimated using a "delayed" time window technique [12].
Given the emission data d, mean accidental coincidence rate p, and probability matrix P, the problem of interest is to estimate the mean number of positrons emitted from each voxel. Since it is assumed that the data are independent, it follows that
14
the likelihood function for emission data d is given )y I [P x + p i e ['Px p,(14
Pr{D = dlx} d c (1.4) The ML estimate of x is defined to be the naxinizer of the likelihood function over the feasible set. Alternatively, the NIL estimate of emission mean vector is given by
A
XAIL = arg max T1(x) , (1.5) x>O
where T(x) ï¿½ log Pr{D = dlx} is the log likelihood function: I I I
11(x) Edi log([Px + p]j) E[Px]i + E{Pi log(di!)} (1.6)
i1 i=1 i=1
(note: maximizing the likelihood function or log likelihood function are equivalent operations).
Although the ML estimator has several nice theoretical properties [17, ch. 7], images produced by the ML method (i.e., the MLEM algorithm) have the drawback that they are extremely noisy. This is due to the PET image reconstruction problem is illposed because of the facts that (1) scan times for data acquisition is short, (2) emission data contain errors due to attenuation, scatters, and accidental coincidences, and (3) the data obey Poisson statistics. Currently, the most popular way to address the illposed nature of the image reconstruction problem is through the use of penalty functions. Numerous penalized maximum likelihood (PML) methods (also known as Bayesian and maximum a posteriori methods (MAP)) have been proposed [18 35].
In the MAP method, x is assumed to be an observation of a random variable X with known distribution and the a posteriori distribution (i.e., conditional probability density function of X given D = d) is maximized. After some manipulations [17, p. 351], the MAP estimate is found to be the maximizer of the log likelihood function
plus the log of the probability density function of X:
A
XA1p L arg max II(x) + log Q(x) , (1.7) x>O
where the function Q is the probability density function of X (i.e., prior distribution). It is through the prior distribution Q that MAP methods have the ability to regularize the image reconstruction problem.
The form of the prior distributions commonly used in PET is Q(x) = Ce A(x), where the function A is a scalar valued function and C and fI are constants. The constant C > 0 is chosen such that the area under the distribution Q equals one. The constant 0 > 0, known as the penalty parameter, controls the penalty function's degree of influence. Since it is known that PET images should be highly correlated, the penalty function A is designed in such a way that it forces the estimates of neighboring voxels to be similar in value. Given the definition of Q, the PML or MAP estimate is the nonnegative minimizer of the PML objective function
(1.8)
D()= x + O x
CHAPTER 2
LITERATURE REVIEW
Although the MLEM algorithm by Shepp and Vardi [16] produces ML estimates of the emission means, the images produced by the MIL method are extremely noisy due to the fact that the image reconstruction problem in PET is illposed. As discussed in Section 1.4, the reasons are that (1) scan times for data acquisition is short,
(2) emission data contain errors due to attenuation, scatters, and accidental coincidences, and (3) the data obey Poisson statistics. One way to obtain PET images with sufficient smoothness is to terminate the MLEM algorithm before the log likelihood function is maximized. Of course, the resulting images are not ML images. Another modification of the MLEM algorithm is to first obtain an ML image and then filter it with a low pass filter. The drawback of this postfiltering is that it is not clear how the filter is chosen. A variation of filtering approach just described is to filter every MLEM iterate, which was suggested by Silverman [36]. Silverman did not provide an answer as to how the filter should be chosen. Denoising emission data (i.e., observed data) is another way to regularize the PET image reconstruction problem [37,38].
Currently, the most popular way to introduce regularization is through the use of penalty functions that force the estimates of neighboring voxels to be similar in value. The basis for such penalty functions is that PET images should be highly correlated. The first part of the chapter is devoted to socalled penalized maximum likelihood (PML) algorithms. PML algorithms are algorithms that minimize PML objective functions, which are a sum of the negative log likelihood function and a penalty function. In PET, PML and maximum a posteriori (MAP) are terms that are used for methods that minimize PML objective functions. In Section 2.1, we briefly review existing PML algorithms.
Reconstructed PIL images are blurred by errors due to detector penetration, positron range., noncollinearity, and scatter. Errors due to scatter are difficult to correct because scatter depends oi the activity and attenuation within the subject and the scanner design. In Section 2.2, some scatter correction methods are reviewed. The regularized image reconstruction algorithms and scatter correction algorithm we propose are compared with the existing algorithms in Section 2.3.
2.1 Penalized Maximum Likelihood Algorithms
In 1990, Green [231 proposed a PML algorithm, known as the onesteplate (OSL) algorithm. The algorithm can be viewed as a fixed point iteration that is derived from the KuhnTucker equations [39, pp. 3649] for the PIL optimization problem. Incidentally, Shepp and Vardi showed that the MLEM algorithm could be derived from the KuhnTucker conditions in a similar way. The OSL algorithm is straightforward to implement, but nonnegative estimates cannot be guaranteed and, like many existing algorithms, convergence is an open issue. Lange's goal [24] was to modify the OSL algorithm in such a way that the modified algorithm converges to the PML estimate. It should be pointed out that Lange's algorithm requires line searches, which can be computationally expensive.
Alenius et al. [32] suggested a Gaussian "type" prior that depends on the median of voxels within local neighborhoods, and introduced an algorithm called the medianrootprior (MRP) algorithm. The MRP algorithm is based on an iteration dependent objective function. Consequently, it really cannot be considered a PML algorithm. Nevertheless, the MRP algorithm generates "good" images in the sense that noise level of the reconstructed images is suppressed. It should be mentioned that a PML algorithm was derived by Hsiao et al. [40] that resembles the MRP algorithm and performs similar to the MRP algorithm. The PML algorithm by Hsiao et al. was derived using a prior that is based on a certain auxiliary vector.
Levitan and Herman [21] proposed a PML algorithm based on an assumption that the prior (list ribution of the true emission means was a multivariate Gaussian distribution. The assumption led to a penalty function that was in the form of a weighted leastsquares distance between x and a reference image. However, they did not indicate how the reference image to be chosen.
An algorithm was proposed by Wu [27] using a wavelet decomposition formulation. Specifically, the author assumed that a vector consisting of the wavelet coefficients of the true emission means is a zeromean Gaussian random vector with a known covariance matrix. From this assumption, a prior distribution for the emission means was derived. The prior distribution is a zeromean Gaussian random vector with a covariance matrix that depends on the choice for the wavelet transform and the assumed distribution for the vector of wavelet coefficients. It should be pointed out that the assumption was not clearly justified in the paper.
Researchers have used an optimization algorithm, called the iterative coordinate descent (ICD) algorithm [41, pp. 283287], to obtain estimates for various penalty functions [31], [42]. Convergence results are given for the penalized weighted leastsquares method [42] and both algorithms (i.e., [31], [42]) enforce the nonnegativity constraint. Algorithms based on ICD algorithm update each voxel in a serial manner so that parallel implementation for them may not be possible.
De Pierro [25,30] derives PML algorithms that minimize certain surrogate functions that he constructs by exploiting the fact that the log likelihood function is concave and penalty functions, such as the quadratic penalty function, are convex. Except for the quadratic penalty function, closed form expressions for the minimizers of the surrogate functions do not exist. Consequently, an optimization method, such as Newton's method [43, pp. 201202], is needed to minimize the surrogate functions. De Pierro presents some convergence results, however the utility of his methods is unclear because no experimental results were provided. It should be noted that, in
19
the transmission tomography paper by Erdogan and Fessler [8], a quadratic surrogate function was used for a certain class of penalty functions. The quadratic surrogate function was developed by Huber [44, pp. 184186].
A fast PML method. based on the ordered subsetEM algorithm [45], was proposed by De Pierro and Yaniagishi [33]. The authors show that if the sequence generated by the algorithm converges, then it must converge to the true PML solution. Recently, Ahn and Fessler proposed algorithms [35] that are based on the ordered subsetEM algorithm [45], an algorithm by De Pierro and Yamagishi [33], and an algorithm by Erdogan and Fessler [10]. Like other algorithms based on the ordered subsetEM algorithm, there is some uncertainty as to how the subsets are to be chosen. In the paper by Ahn and Fessler, the algorithms are said to converge to the nonnegative minimizer of the PMIL objective function for certain penalty functions and their accompanying parameters by using a relaxation parameter that diminishes with iterations. Open issues are how the relaxation parameters should be chosen in practice and how they affect the performance of the algorithms. Convergence rate varies with the relaxation parameter.
2.2 Scatter Correction Methods
Many methods have been proposed to correct scattered coincidences [11, ch. 3]. They can be classified into a few categories: (1) energy window method, (2) convolution/deconvolution method, and (3) calculating scatter distribution method.
One of the scatter correction methods is based on the use of multiple (two or more) energy windows [46,47]. Recall that photons lose their energy when they have undergone Compton scattering. The principle of the method utilizing energy windows is to discard detection of a photon whenever the energy of the photon is less than 511Kev. Since detectors have finite energy resolution, there is a limitation of energy window based methods. Consequently, it is preferred to use another method jointly with the energy window techniques.
Another correction method for scatter is the convolution/deconvolution method [48 51]. The methods in [48,49] assume that scattered coincidences (i.e., scatter) can be approximated by a convolution of unscattered coincidences and a certain scatter function. Under the assumption, the mean scatter in the observed coincidences is estimated that can be subtracted from emission data or incorporated in the system model for emission data. The method by McKee et al. assumes that the distribution of scattered annihilations can be approximated by the convolution of the distribution of uiscattered annihilations and some scatter response function [50]. The issue of the convolution/deconvolution method is that the distribution of unscattered coincidences (or annihilations) and scatter response function (or scatter function) are not known.
Ollinger introduced a scatter correction method that calculates scatter distribution using an analytical equation, transmission images, emission images, and scanner geometry [52]. Computational cost of the method is excessively expensive, thus it might be hard to be accepted in clinical use at the moment of writing.
2.3 Summary of the Proposed Algorithms
, In Chapter 3, we present an algorithm that obtains PML estimates for a certain class of edgepreserving penalty functions. The PML algorithm is derived by combining the convexity idea by De Pierro [25, 30] and Huber's surrogate functions [44]. Combining two existing theories in such a way that the PML algorithm is convergent is new to PET community. In theory, the algorithm guarantees nonnegative iterates, monotonically decreases the PML objective function with increasing iterations, and converges to the solution of the PML problem. In practice, it is straightforward to implement (i.e., no additional hyperparameter and no need of line search) and it can incorporate many edgepreserving penalty functions.
* In Chapter 4. we develop an accelerated version of the PML algorithm by
using the pattern search idea of Hooke and Jeeve [41, pp. 287291]. Using this approach, we solve a constrained problem at each pattern search step that leads to improved convergence rates. A modification of Hooke and Jeeve's direction vector is also introduced that improves performance. It should be mentioned that Hooke and Jeeve's method has not been used in PET image reconstruction.
The proposed algorithm inherits the nice properties of the PIL algorithm and converges to the minimizer of the PNL objective function. In experiments, the accelerated algorithm needed less than about one third of the CPUtime that
was necessary for the PML algorithm to converge.
" In Chapter 5, we propose a regularized image reconstruction algorithm, referred
to as the quadratic edge preserving (QEP) algorithm, that aims to preserve edges through the use of certain newly developed decoupled penalty functions that depend on the current estimate. The QEP algorithm was motivated by the analysis of the PML algorithm. The algorithm by Alenius et al. [32] also uses an iteration dependent objective function. However, it should be mentioned that the algorithm uses the OSL algorithm to generate the next iterate. The drawback of Alenius' approach is that the OSL algorithm does not guarantee
convergence.
" In Chapter 6, we propose a model for emission data where an unknown matrix,
called a scatter matrix, is introduced. The model aims to account for errors due to scatter and noncollinearity. Based on the model, a certain minimization problem is constructed that allows for the scatter matrix and emission mean vector to be jointly estimated. Since the minimization problem is impossible to solve, we propose an algorithm that greatly reduces the number of unknowns in the scatter matrix and alternately estimates the scatter matrix and emission mean vector. It should be mentioned that Mumcuojlu et al. [53] used the same
22
model. However, they assumed that the scatter matrix is known and accounts for errors due to detector penetration as well. Their scatter matrix was obtained through MonteCarlo simulations.
CHAPTER 3
PENALIZED MAXIMUM LIKELIHOOD ALGORITHM
Although the ML estimates of the emission means are available by using the MLEM algorithm [16], as discussed in Section 1.4, the resulting PET images are extremely noisy due to the fact that the PET image reconstruction problem is illposed. This is because of short scan times, errors in the emission data, and the fact that the data obey Poisson statistics. The most popular way to address the illposed nature of the PET image reconstruction problem is through the use of penalty functions. Penalty functions used in PET are designed in such a way that estimates for the emission means of neighboring voxels are forced to be similar in value, unless there is an "edge" within neighborhood. By an edge, we mean that there is a group of connected voxels that have significantly greater activity than the other voxels in neighborhood. For example, suppose there is only one voxel with significantly greater activity than the other voxels in its neighborhood. Then, we would say that there is no edge within the neighborhood. Simply stated, penalty functions provide a means for reconstructing PET images that have considerably less noise than MLEM images, yet retain edges (e.g., tumors) which may convey important information.
In Section 3.1, we first derive an algorithm, called the penalized maximum likelihood (PML) algorithm, that incorporates a wide class of edgepreserving penalty functions. Then, we prove that the PML algorithm converges in Section 3.2. Finally, we summarize the properties of the PML algorithm in Section 3.3. It should be mentioned that we presented the PML algorithm in [18] without a proof of convergence. Our proof of convergence can now be found in a recent manuscript [19].
. , (t)
At)
Figure 3 1: Onedimensional illustration of the optimization transfer method. At each iteration, a surrogate function is obtained and a minimizer of the surrogate function is defined as the next iterate. Ideally, it is "easy" to get the minimizer of the surrogate function.
3.1 Penalized Maximum Likelihood (PML) Algorithm
The problem of interest is to determine the nonnegative minimizer of the PML objective function (x) = XV(x) +3A(x) (P is defined in (1.6)), where A is a penalty function that forces emission mean estimates of neighboring voxels to be similar in value. In other words, we want to solve the following optimization problem:
(P) XPML = arg min 1(x) .
x>O
The penalty functions we consider are of the form
J
A(x) = 1 E jkg(Xj,Xk) , (3.1) j=1 kEN
where Nj is a set of voxels in a neighborhood of the jth voxel, the constants {Wjk} are positive weights for which Wjk = Wkj for all j and k, and g(s, t) = A(s  t) whereby the function A satisfies the following assumptions:
" (AS3) A(t) is symmetric
" (AS4) A(t) is everywhere differentiable
* (AS5) A(t) A dA(t) is increasing for all I (assmnption implies that A is strictly
convex)
" (AS6) (t)  is nonincreasing for t > 0 * (AS7) y(O) = lim y(t) is finite and nonzero t0
" (AS8) A(t) is bounded below (assumption implies that A(x) is bounded below). Examples of functions that satisfy (AS3)(AS8) are the quadratic function A(t) = t2 and Green's logcosh function A(t) = log(cosh(t)) [23]. Regarding the neighborhood Nj, the jth voxel is excluded from the set Nj and, if the kth voxel is in Nj, then the jth voxel is in Ak. A common choice for Nj is the eight nearest neighbors of the th voxel.
Since it is not possible to get a closedform solution to the minimization problem
(P), iterative optimization methods are necessary. The PML algorithm we propose is based on the optimization transfer method [10,25,30.34,54] where, at each iteration, a function that satisfies certain conditions is obtained and the next iterate is defined to be a minimizer of the function. The function found at each iteration is referred to as a surrogate function for the function to be minimized. This idea is illustrated with the onedimensional example in Figure 3 1. In the figure, the problem is to find the minimizer of the function f, which is t*. It is assumed that a closedform solution is not available to the minimization problem. Given an initial guess t(ï¿½), a surrogate function f(0) that depends on t(ï¿½) is determined. Then, the next iterate 1) is generated by finding the minimizer of f(0). To get the following iterate t(2), a surrogate function f(1) that depends on t1) is obtained and then minimized. These steps are repeated until some convergence criterion is met.
For a vector argument t, a surrogate function f(n) satisfies the following conditions:
" (C1) f(n)(t) > f(t) for all t E {domain of f}
" (C2) f (,)(t(n)) = f (t(n))
* (C3) Vf('")(t(')) = Vf(t(")).,
where t(') is the nth' iterate, V denotes the gradient of a function. and the superscript
(n) indicates that the functions {ff(")} and the iterates {t0')} depend on the iteration number. The next iterate t(+ F1) is defined to be a minimizer of f
t(n~l) arg min f(')(t) subject to t E {domain of f . (3.2)
t
Defining the iterates in this way insures that the objective function f decreases monotonically with increasing iterations. To prove this fact, we first note that f(t(n+K)) < f(nl)(t(n+1)) by (Cl). Since f(n)(t("+')) < f(n)(t(n)) by (3.2), it follows by
(C2) that for all n,
f(t(n+l)) < f()(t(n+l)) < f(n)(t(n)) = fpt(n)) .(3.3)
It should be mentioned that (C3) is not necessary for the monotonicity in (3.3). However, (C3) is often needed in order to prove an algorithm that utilizes the optimization transfer method converges (see [18, 25, 30]). Although the optimization transfer method is straightforward in principle, the difficulty in practice is that it may be difficult to find surrogate functions that satisfy the conditions (C), (C2), and (C3).
De Pierro developed surrogate functions for the negative log likelihood function
q/ by using the convexity of the negative log function [25,30]. His idea is based on the following property of convex functions [55, pp. 860861]: For a convex function
f( atj) < Zajf(t) (3.4) J Ji
where Ej ajtj E {domain of f}, tj E {domain of f} for all j, aj > 0 for all j,
j aj = 1, and {domain of f} is a convex set. Specifically, De Pierro utilized the
following inequality in AIL est imation where f(t) = log(t): f([Px],) = f(. [Px(")] ,[,,) I j] (3.5) [, r x(f? )]i .
< Ef .[4"' 'ij' (3.6)
[ ") 0.i Jnd [("1) ] 0foalian where x(') is the nt"' iterate of x. Pij > 0. x > 0, an( [Px()]j $ 0 for all 1, j. and n. Let
fi~x f f([.px]i (37)
'() j (3.8) ) f
Then, it is straightforward to see that, for all i, (1) fin)(X) > fi (x) for all x > 0,
(2) fitfl)(x(n)) = fi(x(n)), and (3) Vft(n)(x(n)) = Vfi(x(n)). Thus, for all i, j/(n) is a surrogate function for fi,.
Although De Pierro developed the surrogate functions for the log likelihood function under the assumption of no accidental coincidences (i.e., p = 0), the surrogate functions can be easily modified to account for accidental coincidences. Observe that [?Px + p], can be written as a convex combination
' P,X (n) [pX(n) + ph A
(+ p+ Pi ( ) + [x ) + p]i. (3.9) = ++pj +[Px() + piA
Using the convexity of the negative log function and the fact that JP(~ ï¿½r p(n) xn +p
ji t'ijXj Pi
+ t A 1 (3.10) = [p)' X + (n) + ['Px() + P]i ,(.0
we have the following inequality for f(t) log(t):
f(Pj Pij (n)~ f[Xn+p]Xj
f(Px pi): J [pX(n)p] (.n)S~
j=l X ] + PA ([pX(n ) +P],) (3.11) +[p)X(n + p],
Given the inequality in (3.11), the surrogate function at the nth iteration for the negative log likelihood function T can be expressed as L Eijxj log(xj) + COO (3.12) i=1 j=1
where
CIn (n 9 .X ,n )
pi+log(di) dlog([Px + p] j)ï¿½ d [Px(+)] log(X.1 (3.13)
1 j=1
It is straightforward to show that the surrogate function I,(n) satisfies (Cl), (C2), and (C3): (1) TI(n)(X) < T(x) for all x > 0, (2) 'I(n)(X(n )) = I(x(")), and (3)
Since surrogate functions for the negative log likelihood function T are available, we only need to find surrogate functions for the penalty function A(x) = EjEk ZkA(Xj  Xk). Under assumptions (AS3)(AS8), Huber developed a surrogate function for A in [44] (see also [8]). Given an arbitrary point (n), Huber's surrogate function for A, which is defined by
( )(t) + (t())(t _ t(n)) + 1_,(t(n))(t _ t())2,(.4 1(t(n))
has the property that A(n)(t) > A(t) for all t (see Appendix A), A(n)(t(n)) = A(&)),
(n)
and A (t(n)) =(t(n)), where the dot over a function represents its first derivative. For t A  it follows that a surrogate function for A is A( )x)= Wjk ()xj~k) , (3.15)
j=1 kcNj
where g(fl)(s, t)  A(' (s t).
Using (n) as a starting point, we will now construct a surrogate function for A that has a more convenient form. By the convexity of the square function, we have
the following inequality
[xi  X..  (X  2A,  [2 (  2 +xn) 2 r
< (2 2x + (2x()  2x,.)2 (3.16)
2 J )2 k2
It should be mentioned that De Pierro [30] and Hsiao et al. [34] utilized this property of the quadratic function in the PET, and Erdogan and Fessler [10] applied it to a nonquadratic convex function for transmission tomography. Motivated by (3.16)., we define
(A, xk) = ,\(jn ( x ,),)+ (" 7 )) [(x  X))(k  X4())
9(7)(,,Xk k k ) [(j , k", ()2
)2 ) + (2Xk x.n))2 . (3.17)
By construction, the following statements are clear: (1) .(")(Xj, Xk) > g(xj, ak) for all xj and Xk from (3.16), (2) g()(x~), (,), g( ), () ) k
a~_,(~x, (n) ___(n)) (1),
x x('), x(')), and (4) I = Lgnj) ,( ). The difference between g(,) and g(n) is that g(n) is decoupled in the sense that it does not have terms of the form XjXk. This difference is important because, as we show later, using g(,) enables us to construct surrogate functions for 1D that have closed form expressions for their minimizers. Using g(f), an alternative surrogate function for A is
J
A(n)(x) = E E wjkg(n)(xjxk) (3.18) j=1 kCNj
It is clear that, by construction, A(n) satisfies the following properties: (1) A(n)(x) > A(x) for all x, (2) A(n)(x(n)) = A(x(n)), and (3) VA(n)(x(n)) = VA(x(n)).
Now, using Tj(n) and AW, the desired surrogate function at the nth iteration for 4is
4 ()(x) = (")(x) + O3A(n)(x) .
(3.19)
From the properties of TOO ani A("). it follows that tle surrogate flnction 4(n) possesses the requisite properties:
" (P1) (4(")(x) > 4)(x) for all x > 0
" (P2) 4)(")(x(')) = (I(x('1))
" (P3) V4)(')(x(')) = V4)(x(n)).
Given x("), the next iterate x("11) is found by minimizing 4)('): x arg rin (I)((x) . (3.20) x>0
Defining the iterates in this way insures that the objective function (P decreases with increasing iterations as shown in (3.3):
4)(x(")) > ()(x(n+i)) for all n. (3.21) All that remains now is to solve the optimization problem in (3.20). To this end, we write A(n)(x) as
A(n)(x) = 2 1) hi (X) + C2 (3.22) j= kcN,
where
h(n) ( ( (n)
jk Mt =wjkY(xj  X~k )(  mn) (3.23)
(n ) + X (+ )
T'rlk 2 (3.24)
J
') Ewk A xn _(n) ) _ (n) (n)),, (n) ))
2 ) A(X  X7))(4n  )k) (3.25)
j i kiNj
(see proof in Appendix B). Since "(n) and A('") are decoupled. (,) can be written as
(3.26)
{[PXi
Edi log(xj> j=1 '[PX(1 +F ib
J
+2/3 (n) (Xj) + C(n)+ OC2(n)
j=1 kcN,
I ., (n)
log(Xj) E di [Pxï¿½
= [I)X(n) + pli
2/3 E WJk3(xV3
kENj
x(n)) ( 2 (n) + (n)2"\ k)  2mjkJ + k
YE E) log(xj) + F(n)X2 (n) . I+ )
j + G i
j=1
( I  j N)
i 1 [P X(n) + p],
kCNj
G( 2/3 E wki(x4  w~x ) ()
X   4/3 E Wjk y(Xn ~)rnn
a =   k )m k
i=1 kCNj
c3tl) CnC + OC2ï¿½) + 2( E S W')'  4())M (n)2 1,  k 1mjN j=1 kcNj
(3.28) (3.29)
(3.30) (3.31) (3.32)
(3.33)
Since 41)(n) is decoupled, as seen from (3.29), it follows that the solution to (3.20) is given by
x(n+l) arg min Oin)(xj) j 2, J
Xj >0 3
(n() E (n log(t) + F(n)t2 + G (nt
J, 3
(3.34)
(3.35)
jl I
E=mi=
(3.27)
J
+l
j=l
+ C +/3C )
where
where
P(it)( )
=  T(,)(x) + 3Ai(,)( X)
32
Fortunately. the function is strictly convex for all j and n under the assumption that xt2 > 0 for all j and n. We will prove this statenent by showing that the second derivative of' (P(') is positive, when I > 0 for all j and 'n. First, note that E ") is negative and F(') is positive for all j and i. The fact that E(') > 0 is due to
J J J
the fact that d'Pj $ 0 (see (AS2) in Section 1.4) and the assumption that .r > 0. The fact that F(") > 0 follows from the positivity of the function . weights {WUjk}, and penalty parameter 13. To see why  (t) > 0 for co < t < oc, recall that A(t) is a symmetric, strictly convex function by (AS3) and (AS5). It follows that A(t) > 0 over (0, so) and A(t) < 0 over (so, 0). Using the fact that y(0) is finite and nonzero (see (AS7)), we have that y(t) > 0 for oc < t < oo. Now, consider the second derivative of 0('). Easy calculations show that ï¿½f)((t) . (E n)/t2 + 2F(")), where the double dot over a function represents the second derivative of a function. Since F(n ) > 0 and Etn) < 0, it follows that the second derivative of 0j'((t) is positive for all j and n. Thus, 0(")(t) is strictly convex for all j, n, and t > 0, and, from (3.29), (D(n) is strictly convex over the set {x : x > 0}.
Since (n) is decoupled, 0(.) is strictly convex, and ( n)(t) , oc as t , 0+, it is true that
(nj 1) (n x(n+l)
x( ' > 0 and 0j'( ,Xj 0 (3.36) Note that (3.36) satisfies our assumption that x~n) > 0 for all j and n. To solve (3.34), we compute the first derivative of 0(n) and set it to zero. Since &) < 0 and F(") > 0, the root of the quadratic equation that preserves the nonnegativity constraint is
( V (n)
xin l  F , j  1,2,... ,J. (3.37)
Observe, as /  0, (3.37) approaches Ej/n) E Pij by L'Hospital's rule. Thus, the iteration in (3.37) is equivalent to the MLEM algorithm when/3 = 0 .
In summary. given a strictly positive initial estimate x()0 > 0. the steps of the PNIL algorithm are: for n = 0, 1, 2....
" Step 1 Let x(ï¿½) > 0 be the initial estimate
" Step 2 Construct the surrogate function 4)(,) from the current iterate x(') using
(3.29). (3.30). (3.31), and (3.32)
ï¿½ Step 3 Get x +1 using (3.37)
" Step 4 Iterate between Steps 2 and 3 until sonic chosen stopping criterion is
met.
3.2 Convergence Proof
Using (Pl)(P3) and (AS1)(AS8), we now prove that the PML algorithm converges. The following convergence proof is based on the convergence proof by Lange and Carson [56] (see also [30]) of the MLEM algorithm by Shepp and Vardi [16].
By (3.21), the PML algorithm has the property that it decreases the objective function D with increasing iterations:
e (P4) (I(x(n+l)) :< 4)(x(n)) for all n _> 0. Another property of the algorithm is that
* (P5) the sequence {O(X(n))} is convergent.
This property follows from (P4) and the fact that (F is bounded below by (AS8) (see [57, Theorem 1.4, p. 6]).
Proposition 1 The sequence {a(')} is bounded.
Proof: From (P4), it follows that 'I(X(n)) < 'I(x(0)) for all n > 0. Consider the set B = {x > 0 : 4(x) < P(x(0))}. Then, clearly {atn)} C B. So, to prove {a()I is bounded, we will prove that the set B is bounded. It is straightforward to see that B is bounded below by 0. Now, suppose that B is not bounded above. Then, there exists a point z E B such that iziJl ï¿½ oo. This result means that for some j there exists zj such that zj oc. Since (AS2) implies that P has no column vector of zeros, it follows that [Pz]i 5 oo for some i and 4>(z)  oc. This implies that z is not
an element of the set, B because all the elements of lB have an objective value that is less than or equal to 4)(x(ï¿½)) < oo, which is a conltradiction. Therefore. the set, B is 1)ounded above. 0 Proposition 2 There exists some constant c1 > 0 such that D(x("))  d(x(u' 1)) > C1 Ix(1)  X(n+1)112 for all n > 0.
Proof: By (P1), (P2), and (3.29). we have the following inequality
.1
__X~ ) _ D X n ) n( ~ ) ~ ) X I~ )1)( n) (n) ( .+ 1), j=1
(3.38)
Suppose, for each n, the function , )(t) is expanded into a secondorder Taylor series [55, pp. 868869] about the point r and evaluated at t Then, the right hand side of (3.38) can be written as
j
()(X(n))( (n)(X(n+l))  j (X~n)x('n+l)) ')0,.7 (' ni )'1 (x(n)J _ 'j X(n+l)'2 (n) (.t(n+l)' j)"j \ j=l
(3.39)
where the double dot over a function represents its second derivative and :t is a
(n) a(n) (n+l)
point between .and +) Since j ) 0 by (3.36), it follows
.(n)(x(n)) _ 4o(n)(x(n+l))  I 1(X n)  xn+))2 (nj+). (3.40) j=l
Now, recall that j )(t)= (Ej,)/t2 + 2Fn)) with F 2Q ZkCNj WjkY(x  ) and E{ ) < 0. Since {x(')} is bounded and 'y(t) > 0 is a continuous function for
oc < t < oc, there exists a number yo > 0 such that y(x5n) _(n)) > yo for all j, k, and n. Hence, Fn) > cl for all j and n, where cl = 2Qyo min WkENj Wjk > 0.
3  i
Therefore, n)(xj) > 2c, for all j and n, and we obtain the desired result
S)(X(n))_(D(x(n+l)) > C1 IIx(n)  X(+)112 ï¿½ (3.41)
35
From (P5) and Proposition 2. it follows that
(P6) the sequence {x(")  x(,"') } converges to 0.
The following proposition will be used later to prove not only that a limit point of the sequence {x(')} satisfies one of the KuhnTucker conditions [55, p. 777] but also that the sequence {x(")} has a finite number of limit points. Proposition 3 Let x* be a limit point of the sequence {x( )}. Then, for all j such that x; $ 0,
 (x)  0. (3.42) Ox X =
Proof: By Proposition 1, there is a subsequence {X(n )} that converges to x* (see the BolzanoWeierstrass theorem in [58, p. 108]). By (P6), the subsequence {x( ",)} also converges to x*. Recall from (3.29) that 0',()(t)=E(')/t+ 2F')t +
If 0. it follows that x and Ej') /x (nIï¿½) converge to the same limit, and hence
li (nj) (nj), :(7 ), (71~ ) ( .3 li j (,xj limi Oj (xj (3,43 Since .'( = 0 by (3.36), it follows from (P3) that
(X ) : 0 for all j such that x* 4 0. (3.44)
Using Proposition 3, we can prove the following proposition, which will be used to prove that the sequence {x(n) } converges. Proposition 4 The sequence {X(I)} has a finite number of limit points.
Proof : Consider the following sets
" {1, 2...., J} (3.45) Z* {J E Y:x = 0} (3.46) = jj EY ** =0}, (3.47)
36
where x* and x** are limit points of {x(")}. Now. let the function *(x) be the restriction of O(x) to the reduced parameter set, W* {X > 0 : XJ = 0 for j G Z*}. Since (P*(x) is strictly convex over a convex set, the unique minimizer of (D*(x) is its only stationary point [59, Proposition 2.1.2. p. 87]. Thus, by Proposition 3, there is only one limit point of {x(')} in the set W*. In other words, if Z* = Z**, then x* = x**. Therefore, the number of limit points are bounded above by the number of subsets of Y. which is clearly finite. 0 Theorem 1 The sequence {a(n)} converges to the unique minimizer of 4).
Proof: Let x* be a limit point of the sequence {x(')}. Using the theorem in [60., p. 173], which says that the set of limit points of a bounded sequence {x(?)} is connected if {X(n)  X(n+1)} + 0, we obtain the fact that the set of limit points of {X(n)} is connected by Proposition I and (P6). Since the number of limit points of {x()} is finite by Proposition 4, {X(n)} has only one limit point. Thus, {x()}  x*. Now, note that the PML objective function 4 is strictly convex on the set {x ï¿½ x > 0} (see Appendix C), so there is only one minimizer. To prove the sequence {X() } converges to the unique minimizer, we need to show that x* satisfies the KuhnTucker conditions [55, p. 777]: for all j,
X* > 0 (3.48)
xj 1(x) X=X* = 0 (3.49) a4(x) > 0. (3.50)
Since x(') > 0 for all n, it must be true that the limit point x* is nonnegative (i.e., (3.48) is satisfied). By Proposition 3,
a0j (x) = 0 for all j such that $ 0. (3.51) Jx x;X (351
So, (3.49) is satisfied. Now, we consider the case =0. For jsuch that 0. suppose
Ox ( xLx'
Then, it follows that lim Cp(n)(xSn)) < 0 by (P3). and (x ) < 0 for sufficiently ;(,n),((+l), =E(n ), I(,+l) +2F"')x(n+') +G(" ). If X(nn)< X(,n) large n. Consider (xn1)) /xj +1) ï¿½ j + fj < then
(n1+) (n)
S+ 2F)x (n+1) +G >0 (3.53)
(n) (n+l) 3 'I
3 Xj
" n,(n+') ()Xn l)< 0 n
because Oj)xn ) = 0 by (3.36) and En/xSn~l) < 0. Moreover, the fact Fn > 0 implies that
_+ 2F , ï¿½ G ( )) > 0 , (3.54) x(n) +z3 x1 +G 33
which is a contradiction. Thus, xj1) > X for all sufficiently large n. However, this contradicts the fact x * 0. So, it is true that x I(x) > 0 for all jsuch thatx 0. (3.55)
Oj X=X*
This satisfies (3.50). U
3.3 Properties of the PML Algorithm
We now provide a summary of the desirable properties of the PML algorithm:
" The PML algorithm is straightforward to implement because there are no hyperparameters required for the algorithm itself and it has closedform expressions for the iterates. Some algorithms require hyperparameters, such as relaxation parameters, in addition to the penalty parameter [33,35], while others [24,30]
do not have closedform expressions for the updates.
" The PML algorithm theoretically guarantees nonnegative iterates, whereas some
algorithms [33, 35] set any negative element of the iterates to a small positive
number.
38
* The PML algorithm monotonically decreases the PAIL objective function unlike
the algorithms in [23, 35].
e The PML algorithm can incorporate a large class of edgepreserving penalty
functions unlike the algorithm by De Pierro [30].
* The PML algorithm converges to the minimizer of the PML objective function.
Convergence proofs for the algorithms in [23, 33] are not available.
CHAPTER 4
ACCELERATED PENALIZED MAXIMUM LIKELIHOOD ALGORITHM
Although the PML algorithm presented in Chapter 3 converges to the nonnegative minimizer of the PML objective function, it has the drawback that it converges slowly. In PET, a popular way to accelerate iterative image reconstruction algorithms is through the use of so called orderedsubsets [45]. In orderedsubsets based reconstruction algorithms, the observed data, d, is divided into a predefined number of subsets via some chosen rule. Then, the iterative reconstruction algorithm to be accelerated is applied sequentially to each data subset. In [45], Hudson and Larkin developed the first PET image reconstruction algorithm that used the orderedsubsets idea. Since the MLEM algorithm was applied to each data subset, they called their algorithm the orderedsubsets expectation maximization (OSEM) algorithm. In [61], Browne and De Pierro showed that the OSEM algorithm did not converge and introduced another orderedsubsets based image reconstruction algorithm that employed a relaxation parameter. It should be pointed out that some convergence results are available for ordered subsets based algorithms that use relaxation parameters [33, 35, 61].
With orderedsubsets based algorithms, there is uncertainty as to how many subsets to be used and how the data should be divided. Moreover, it is not clear how relaxation parameters should be chosen in practice because, generally, they depend on the data.
In this chapter, we introduce an accelerated version of the PML algorithm, referred to as the accelerated PML (APML) algorithm, that uses a pattern search suggested by Hooke and Jeeve [41, pp. 287291]. A pattern search has also been exploited to accelerate an algorithm in the transmission tomography [62]. In Section
~0
X2
xI
Figure 4 1: Twodimensional illustration of the sequence {X(f)}. The single circles and double circles denote the accelerated iterates {f (n)} and standard iterates {x()}, respectively. Each ellipse represents a set of points that have same cost. The mark x denotes the minimizer of the function subject to the constraints x, > 0 and x2 > 0. 4.1, using the mathematical ideas in the convergence proof of the PML algorithm, we show that a sequence that satisfies certain conditions converges to the minimizer of the PML objective function. Then, we use this result to prove that the APML algorithm, which is developed in Section 4.2, converges to nonnegative minimizer of the PML objective function. In Section 4.3, we introduce the direction vector to be used in the pattern search. Finally, we summarize the properties of the APML algorithm in Section 4.4. It should be mentioned that we introduced the APML algorithm in [19].
4.1 Convergence Proof
In this section, we prove that the sequence {Xln} {If(ï¿½), X(1),;(1) X( (2), .} converges to the minimizer of the PML objective function I), where i(o) > 0 is an
A ...(n)
initial guess, x("') = argmin >' (x) subject to x > 0, 4.(n) is of the same form of the surrogate function for the PML objective function 4) at the iterate x('), I) in (3.29), except that (n) is defined at the point i(n) instead of X(n), and the point ,i(n) > 0 satisfies the following conditions for all n
* (C4) (I()) J(x(n))
* (C5) there exists some constant c.2 > 0 such that
>_C  ;j ( )112
This convergence result will from the basis for the convergence proof of the APML algorithm. Note, the strict positivity of' for all n is necessary because the surrogate function for the PAIL objective function, is undefined for vectors with zero or negative elements. An example of such a sequence {x(')} is illustrated in Figure 4 1. In the figure, the single circles and double circles represent the accelerated iterates { (,)} and standard iterates {x(t)}, respectively.
First, note that the following convergence proof is mainly based on the convergence proof of the PIL algorithm in Chapter 3. Since (D is bounded below and the sequence {X(n)} monotonically decreases the PML objective function 4) by (P4) and
(C4), it follows that (see [57, Theorem 1.4, p. 6])
* (P7) the sequence {f(X (n)} converges.
By (P4) and (C4), it is straightforward to see that {X(n)} C 13  {x > 0 ï¿½ (x) < J(J(O))}. Since the set 1 is bounded (see Proposition 1), it follows that
e (P8) the sequence {X(n)} is bounded.
Now, note that x(+') is the minimizer of the surrogate function Vn), which satisfies
(1) 4 (n) (x) > I) (x) for all x > 0, (2) (.) (i(n)) = ) and (3) V (n)(,() = VD(x(n)). Thus, by (P8) (see Proposition 2), we have the following property:
* (P9) there exists some constant c3 > 0 such that J(i(n))  cI(x(n+1)) >
C3 lxtin +112
By (P7) and (P9), we obtain the property
e (P10) that the sequence {(n)  x(n+l)} converges to 0.
Also, by (P7) and (C5), it must be true that the sequence {X(n) i(n)} converges to 0. Thus, from the fact that ]l(n)  i(n+1)112 < 11,(n)  x(n+)12 + IIx('+1)  ,(n+l) 12, the property follows:
9 (P11) the sequence {f(n)  Z(n+l)} converges to 0.
For the discussion to follow, consider the surrogate function for the PML objective function 1I at the iterate 0, ). (see (3.29))
2_( (xi) +C3), (4.1) j=l
O ((t) + (n))
where (t) ! ï¿½j log(t) + F>t2 ï¿½ (j for t > 0. independent of x, and
1  E di (4.2) )A [Pi(d) + P]i
p(n) A 2/3 3Wk"(Xn  47) (4.3) kcN
I
. 2 .jk'(Xj + (4.4) P~ 20E j X ;k )j Xk) i~l kENj
(not: (n) J) (n) () 03(n
(note: E) 1) p) G j, and ) result by substituting ;r for x in (3.29), (3.30), (3.31), (3.32), and (3.33), respectively). To prove that the whole sequence {X(n)} converges, as done in the convergence proof of the PXIL algorithm, we first present the following proposition:
Proposition 5 Let x* be a limit point of the sequence {(n)}. Then, for all j such that x* 7 0,
4)O7 O(x) = 0. (4.5) Proof: By (P8), the sequence {(n)} is bounded and there is a subsequence { (ni)} that converges to x*. By (Pl0), the subsequence {x(nl+1)} also converges to
(ne)
x*. Recall that j (t)= E>()/t+ 2 ï¿½ G(nL). Now, if x* = 0, it follows that E  (nt),n)  (nt)/ (n,11) jn /xj(n) and Ej /xj converge to the same limit, which in turn implies that
S(n ,)n (n1+1)(4.6) li Oj (ixj )= lim 3 (x )(4 )
Since j (x( 0 by (3.36), and V41)((n))  Vi(&(n,)), it can be said that
09 (X) x= 0 for all j such that x* $ 0. (4.7)
Ogxj IXX
0
Theorem 2 The sequence {X(") converges to the unique minimizer of 4).
Proof: Let x* be a limit point of the sequence {(") }. Since the set of limit points of the sequence {5:('} is connected by (P8) and (P11) (see [60, p. 173] and Theorem 1 in Chapter 3) and there are a finite number of limit points of {i(n)} by Proposition 4 and Proposition 5, it follows that there is only one limit point in the set of limit points. Thus, {5:(i)}  x*. Since lim{I 5(1)  x*112} = 0, by (P10) we have lim{IJx(+')  x*112} = 0 (note: lJx(n+1)  x*112 < Ix(n+t)  ;(n)I112 + 1.(n) x*112). Hence, {X(n+l)} + x* (i.e., {X(n)} , x*). Therefore we can deduce that the whole sequence {X(n)I + x*. To prove the sequence {X(n)} converges to the unique minimizer of the PML objective function, we must show that x* satisfies the KuhnTucker conditions (i.e., (3.48), (3.49), and (3.50)). Since all the points in the sequence {X()} are positive, it must be true that the limit point x* is nonnegative (i.e., (3.48) is satisfied). By Proposition 5,
a D(x) = 0 for all j such that x* :0. (4.8)
Thus, (3.49) is satisfied. For j such that x* = 0, suppose a D(x) < 0. (4.9) . (n) (n)(n E(n)i(),
Then, it follows that limOj (Jrj < 0 by the property ) Vi( (t )),
(n) (n)
an j n) Oj) (n+l), 7() nl
and Xj (J) < 0 for sufficiently large n. Consider j (Xj )  /j +
S(n). If (n ( then
(n+l) ~ (n)
) + 2F n)xj ( j"n) > 0 (4.10) xn xn
(n) ( n
because Oj (x ' 1)) = 0 by (3.36) and Ej"/xj < 0. Moreover. the fact that F!j is positive implies that
,,) +2 12F y ( + &0 .0") > 0 (4.11)
which is a contradiction. Thus, x > k') for all sufficiently large n. However, this contradicts the fact "(n)  0. Therefore.
9 4 (x) x > 0 for all j such that x= 0. (4.12)
This satisfies (3.50). U
4.2 Accelerated Penalized Maximum Likelihood (APML) Algorithm
In Section 4.1, we showed that the sequence {X(n)} = {ff (0) X(1), (1), X(2), 5:(2) ...} converges to the nonnegative minimizer of the PML objective function 41 if: (1) 5c(ï¿½) > 0, (2) X(n+l) is the nonnegative minimizer of the surrogate function for the PML objective function 4I at the iterate (n) (), and (3) ,(n) > 0 satisfies (C4) and (C5). In this section, we present an algorithm that produces such a sequence {X(n)}1.
First, consider the following steps: given an initial guess v(o) > 0, for n 0,1,2,...
" Step 1 Get the standard PML iterate X(n+l) A arg min (n) (x) subject to
x>O.
" Step 2 Get the accelerated PML iterate (n+l) & x(n+l) + ol+l)v(n+l),
where V(n+l) = 0 is the chosen search direction (i.e., direction vector) and
&(n+l) argmin O(x(n+l) + av(n~)) . (4.13)
* Step 3 Repeat the steps above until some chosen stopping criterion is met.
With v(n+l  X l_(n), Step 2 is the pattern search step put forth by Hooke and Jeeve [41, pp. 287291].
WNe now modify Step 2 so that the sequence produced by Steps 1, 2. and 3 converges to the nonnegative minimizer of the PAIL objective function. Step 2 does not guarantee that the accelerated iterate is positive. Consequently. we modify the optimal step size as follows
a (n+1) = argmin 4(x(n +1) + av(n Fl) subject to a E A(n+1) (4.14)
where
(n+1) (71+1) (n+1) >(n+l)
A+ Z > for all j} (4.15)
(n+1)
ï¿½+1) A mnxj+*
mi n + m (4.16)
Observe, for simplicity, we use the same notation for the optimal step sizes defined by (4.13) and (4.14). For a E A('+'), it is straightforward to see that X(n+l)+ V(n+l) > 0
_(n+)
because x . > 0 for all j and n. Thus, by adding the constraint a E A(n+1) to the problem in (4.13), it follows that (n) > 0 for all n. Because x( > (('+') for all j and n, it is evident that ((nï¿½+) has been chosen in such a way that 0 E A('+') for all n. The fact that 0 E A(n+1) will be used later to prove that the proposed algorithm monotonically decreases the PML objective function (D with increasing iterations.
Remark : If we allow some elements of i(n) to be zero, then the feasible region (n+l) (n+,)
of the function 1D(x(n+l) + a v(+l)) is {a ï¿½ zj + a, > 0 for all j}. However, for the sequence {X(n)} to converge, we must constrain all elements of C(n) to be positive because the surrogate function for the PML objective function '1, i(n), is not defined at x = i (n) where .i(') = 0 for some j. A feasible region of the function I)(x(n+l) + OeV(n+l)) that appears natural is
A (n+1) a{C : (~n+l) (n+)
0 XJ + avj > 0 for allj} (1
(4.17)
(a) (b)
X2' F("" V (cr)
V(t+I
x,"
Figure 4 2: Illustration of the pattern search step: (a) twodimensional illustration of 4, (b) onedimensional slice of the function along the chosen direction vector. The single circle and double circle denote an accelerated iterate 5(n) and a PML iterate x(n+1), respectively. The mark x denotes the minimizer of the function subject to the constraints x, > 0 and x2 >_ 0.
However, the set A('+') is an open set. Consequently, the optimization problem
argmin (x(n+1) + aZV(n+l)) subject to a E A('+') (4.18)
00
may not have a solution. To see this fact, suppose that the function ID(x(n+l) + av(n+l)) is strictly convex and A(n+ l) < a < Un+') } for some L(n+l) and U('+l). If the function (D(x(n+l) + av(n+l)) has as its minimizer a = L(n+1) over the set la : < a < U"0 l, then the optimization problem in (4.18) does not have a solution.
One more point to mention about the set A(n+1) is that, strictly speaking, ((n+1) can be any positive number as long as it is chosen in such a way that 0 C A(n+l) for all n. However, in addition to the condition 0 E A(n+1), it is reasonable to keep the set A('+') as close as possible to A('+') in (4.17). In this sense, the chosen ((n+1) in (4.16) is optimal when n is sufficiently large. U
Another issue with Step 2 is that it is hard to solve the optimization problem in (4.13). A suboptimal solution can be found by determining a surrogate function
for F(D(+l)(a) a + vt 1)). which we denote by p(n' )(a). that satisfies the following conditions:
" (C6) F(n+l)(a) > +(a) for a C A("')
ï¿½ (C7) 0 for o =0.
By incorporating the surrogate function F(n+l) with the constraint a E A("+1), an alternative to Step 2 is:
* Step 2a Get the accelerated PML iterate g(n+1) A x(ntl) qa(n+l)v(n+l) where
a argnin F(n'1)(a) subject to a E A(+) . (4.19) It is important to point out that, for convenience, .7(n+1) has been redefined in Step 2a. This new definition will be used throughout the rest of the dissertation. In Figure 4 2, the alternative pattern search step with the surrogate function F(n l) is illustrated. In Figure 4 2 (a), a twodimensional example of 41 is shown with a direction vector V(n+l). In addition, the onedimensional slice of 1) along the direction vector v(nll), which we denote .ï¿½ , and a surrogate function f('+l) that satisfies
(C6) and (C7) are shown in Figure 4 2 (b).
By design, the sequence {X(n)} produced by Steps 1, 2a, and 3 satisfies the monotonicity condition (C4). To see this fact, note that 4~(n+l)) : l+l)(a(nï¿½1)) by the definition of i(n+l). By (C6), it follows that D('+')(a(n+l)) < (n+(CV(n+l) Also, from the definition of a('+') in (4.19) and the fact that 0 E A+, we obtain the result F(n+')(a(n+l)) < F(n+1)(0). Finally, by (C7), it can be concluded that ,(,i(n+1)) < F(n+l)(0) = (Dn+1)(0) = I)(x(n+l)) for all n.
We now present our choice for the surrogate function F(n+l) that satisfies (C6) and (C7). First, note that the negative log likelihood function T can be expressed
as
I I
 T'(X) Z[Pxl, di lng(['Px +4 PI ï¿½)I+ p log(d4!)} (4.20)
I
SZ I([xi) +C5, (4.21)
=1
where ,i(t) ï¿½ t di log(t + pi) and C5 A pi + log(d!)}. Suppose a function, 0( ", can be found such that
* (C8) 0 ,o(" (a) > f En+)(O) fur c l)
(C9) 0(f+()()() for a = 0,
where ]i() A ,b(Px("+)I + a[Pv~n1)]i). Then, the function
= 71) (a) +C5 (4.22)
i1
will satisfy the conditions
* (C10) e(n+l)(a) > I'(x(n+l) + a V(n+l)) for a E A(fl+1)
* (Cll) e(n+l)(a) =I(x(n+l) + av(n+')) for a = 0. A function that satisfies (C8) and (C9) is
0(n+1) (n+1) a  (0)a + 0n) (0) (4.23)
2
(n+1)(n+1). ofn(nl1
where (nï¿½l) A max{i$n~l)(o) subject to a E A(n+l)}. From the definition of 0(n+1)
it s ovios tat0(n+1)(0) = : Vnl
it is obvious that (0). Thus, (C9) is satisfied. To see that n+ ~(n+l)(, . (n+l)(a _ (n+l)(, )
satisfies (C8), consider the function zi (a)  0i (a)  i From the
(n+l) ..(n+l) Z(n+l)
definition of p , it is clear that i > 0. Thus, it follows that i is a convex function. Moreover, a = 0 is a minimizer of z because zi (0) = 0 by the definition of 0fn+). Since z~n+l)(0) = 0 by (C9), it is straightforward to see that zn+(a) > 0. This result implies that 0+(a) _ n+l)(,) for all a E A(n+l). So, ((n+1)
(C8) is satisfied. To calculate pi , it is worthwhile to note that the set A(n+l) can
be written as A(" ) {a L( [1) < a < U01t1)}, where
(ni) j such that(n)
= max (n)+) t0 (4.24)
U('1) A mi n +1x~ such that (n+il) < 0~ (4.25) jj
Observe that L(n+1) < 0 and U(n+l) > 0. Since the second derivative of + is di([pVv(n+l)]i)2 (4.26) i Q(p)([T'(+1)J~cx + [7)X(n+l)], + Pi)2
the maximum second derivative of <,(n+1) for a G A(n+l) is di ([pv(,,+I) [pV(n+V)], > 0
([7)v(n+l)]iL(n+l)+[,pX(n+l)]i_ p,)2
/t(n+l) =d([pV(,+D)j)2 [pV(n+l)], < 0 (4.27)
Pi  ([p:v(n+ l)]i U(n+,l) +[ :pX(n + l)]i Tpi )2,
0, [Pv(n+l)], = 0. It should be noted that [Pv(n+1)]ia + [Px(+l)]li + pi > 0 for a E A(n+l) (see (AS1)).
At this point, we need a surrogate function for the penalty function A once again. As mentioned in Chapter 3, under assumptions (AS3)(AS8), Huber developed a surrogate function for A, denoted by A(') (see (3.14)). By the properties of the surrogate function A(n), it is clear that the surrogate function A('), which is defined by
J
A(n)(x) = E E w2kA(n)(xi  xk) , (4.28) j=1 kcNj
satisfies A(n)(x) > A(x) for all x and A&(n)(x(n)) = A(x(n)). Thus, A(+')(x(+') + av(n+l)) will satisfy
" (C12) A(n+l)(x(n+1) + av(n+1)) > A(x(n+l) + av(n+l)) for a A(n+l)
" (C13) A(n+l)(X(n+l) + OV(n+l)) A(x(n+1) + av(n+l)) for a = 0. Finally, by (C10)(C13), it is clear that the function
ir(n+l)(oa) =___ E(n+l1)(a) + O/ (n+l1)(x(n+l) + O(V4(n+2))
(4.29)
satisfies (C6) and (C7).
To solve (4.19). we first, determine the unconstrained minimizer of F(n +p):
+ argm nun in 1) (a) . (4.30)
0
Since 0' +: is strictly convex by y(t) > 0 for oc < t < 0c, v("+l) 0 0, (AS1), and (AS2) (see Appendix D), the expression for d('+') can be found by simply computing the first derivative of F(n 1) and setting it to zero (see Appendix D):
 E il i'+' (0) j. X(11 (n+l) _ (n+l ) V (11+ ). d(n+l) _j=1I } + )( ) E  E kCN, I'li , j  k: ) V k
(71+:)   I3 (n+l) 1(n+:) )((n+:) (n+l))2
ii 1PIi + 3 =~1 ZkCNj Wjk~Y(X  Xkf l)vjf~l  V
(4.31)
Given (4.24), (4.25), and (4.31), the solution to the constrained optimization problem in (4.19) is
SU(n+1), if c)(n+l) > U(n+l)
(n+1) = (n+l), if Q(n+1) < L (+l) (4.32) d(n+l), otherwise
All that remains now is to show that the sequence {X() } produced by Step 1, Step 2a with F(n+l) in (4.29), and Step 3 satisfies (C5). To see this, note that by
(C6), (C7), and the definition of ( we have the following inequality
I)(X(n+l)) _ (D(i(n+:)) > F(n+I)(0)  F(.+)(a(n+1) (4.33)
Suppose, for each n, the function F(n+l) is expanded into a secondorder Taylor series about the point a(n+ 1) and evaluated at a = 0. Then, the right hand side of (4.33) can be written as
p(n+1)(0) _ F(n+)(C(1+I)) = t(n+) (o(n+l))( (n+ï¿½)) + 1p(n+i)(d(n+:))(_Co(n+I))2 (4.34)
51
where 6(n 1) is a point between 0 and a('+1). By the strict convexity of F(n+1) L("'+) < 0, and U(' 1) > 0. it follows that 1(n 1)(a(f+1))(a('1)) > 0 for all n. Thus,
F1(0) F(fl+l)(a(n+t)) > 1 p(n 1 ))(a(n+ >)2 (4.35)
)2
Since there exists a symmetric positive definite matrix M, which is independent of n, such that F(,,+)(&(,')) > 2(v(n+l))'A41(v(n+l)) (see Appendix E), it follows that
F(n+l) (0) _ (n+l)(a(n+l)) : (v(n+l))/. (V(n+l))(oI(n+l))2 . (4.36)
Hence, to prove (C5), we show that there exists some constant c2 > 0 such that (v(n+l))I'4(V(n+l) ) > C211 V(n+l)ll1 2, (4.37)
where we used the fact that IIx(n+l) ;i(+1) 11 2 = (a(,+l))211v(,+1)l2 by the definition of i(n+1). Since M is a symmetric matrix, it can be factored as A = TAT' by the Spectral theorem [63, p.309], where the columns of the matrix T contain orthonormal eigenvectors of A and the diagonal matrix A contains corresponding eigenvalues along its diagonal. Using the fact that TT produces the J x J identity matrix and Rayleigh's quotient with l  Tz~n+') (i.e., z T'v( +1)) [63, p.348], it follows that
(V(n+l))IA4(v(n+l)) (TZ(n+l))'.A (TZ(n+l))
= (4.38)
Ilv(n+1)112 (Tz(n+1))2(Tz(n+)) (z(n+l))IA(Z(n+l))
= (z(n+l)),(z(n+l)) (4.39) Zj eJ(Zfn+l))2 (4.40) > ema, (4.41) where {e3j 1 are the eigenvalues of M and em is the smallest eigenvalue. Since M is positive definite, em is positive. Therefore, with c2 = em > 0, we obtain
(V (71l)) (V (1 +l>)C2 IV('1) 112 and
4)(X(n, )) _ 4(.(&(nl )) _ C'2 11 X 1')  " 2. (4.42)
4.3 Direction Vectors
In some algorithms, the gradient of a function is used as a direction vector, as in the method of steepest descent [55, cl. 14]. Suppose the direction vector v(f'l+) was chosen to be the gradient of the P1IL objective function 4) evaluated at the PAIL iterate x(n+l) (i.e., v(n+l) = VO(x(+l))). Then, the gradient must be calculated at each step. However, the computational cost of the gradient of ï¿½ is on the same order as the computational cost of single PAIL iteration. Moreover, in experiments, the APML algorithm with v(n+l) = V4(x(n+l)) decreased the PML objective function slower than the PML algorithm.
The current direction vector in the pattern search step by Hooke and Jeeve is the difference of the two most recent iterations. Specifically, the direction vector is defined by j(n+1) x(n1) g(n). This choice can be justified in a reasonable manner. To see why, assume that a closedform expression is available for the minimizer of 1(x(n+l) + OV(n+l)). Then, the "best" direction vector is (X(n+l)  x*) because 1(x(n+l) + CeV(f+l)) "contains" the point x* (note: x* would result with a = 1), where x* is the minimizer of the PML objective function. However, x* is not known at the nth iteration. The "best" estimate of x* at the nth iteration is Z(n), namely the accelerated iterate at the (n  1)th iteration. In this section, we introduce a direction vector that works better than Hook and Jeeve's direction vector in terms of convergence rate.
The direction vector we choose is a simple variation of 0('+) that is not computationally expensive (note: J subtractions are required for i(n+l)). Since there are no positrons emitted outside the subject being scanned, the PML estimate will contain many values near zero. This claim is supported by Figure 4 3. In the figure,
(a) The 13th PML iterate
(c) The 15th PML iterate
(b) The 14th PML iterate
(d) The 1000th PML iterate
Figure 4 3: PML iterates are shown: (a) the 13th PML iterate, (b) the 14th PML iterate, (c) the 15th PML iterate, and (d) the 1000th PML iterate. The images were generated using a real thorax phantom data. The plane considered contains activity due to the heart, lungs, spine, and background. For display purposes, all the images were adjusted so that they have the same dynamic range.
PNIL images corresponding to different iteration numbers are shown. The images were generated by applying the PMIL algorithm to real thorax phantom data (scan duration was 14 minutes) with an uniform initial estimate. The penalty parameter was 3 = 0.02 and A(t) = log(cosh( )) with 6 = 50 was the penalty function. As can be seen in Figure 4 3 (a). (b), and (c), the early iterates contain values near zero outside of the body. Consider the figure in Figure 4 3 (d), which is the 1000th PML iterate (practically speaking, the iterate is the minimizer of 4 because it did not decrease after the 791"t iteration up to the 5000th iteration). The 1000th iterate also contains many values near zero outside the body. Inside the body, on the other hand, the 1000th iterate is very different from the early iterates. From the above observations, it can be said that convergence rate varies significantly between voxels inside the body and voxels outside the body.
From the above observations that the voxels outside the body converge faster than the voxels inside the body and the voxels outside the body converge to values near zero that is the boundary of the set {x : x > 0}, we claim that it is better to search for an accelerated iterate along the boundary whenever the current iterate is "near" the boundary. By "boundary", we mean the set {x > 0 : xi = 0 for some j}. This can be explained by the example shown in Figure 4 4. In Figure 4 4 (a), the second PML iterate is heading toward the x2axis. If we perform the acceleration step with the direction vector P("+1), then the accelerated iterate will lie on the x2axis as shown in Figure 4 4 (b). A better direction to be searched is the one that is parallel to the x2axis as shown in Figure 4 4 (c). The principle of the proposed direction vector is to exclude the coordinates corresponding to the voxel values that are "near" the boundary. An easy way to incorporate this idea is to determine the voxels whose values are less than a small positive value, c. Specifically, the proposed direction,
55
(a) (b) (c)
2 X, F
X1X XI
Figure 4 4: Direction vectors: (a) Standard PML iterates are shown., (b) an accelerated iterate (the single solid circle) is shown by using the direction vector P('+'), and
(c) an accelerated iterate (the single dotted circle) is shown b3y using the proposed direction vector v(n+l). The double circles denote PMIL iterates. The mark x denotes the minimizer of the function subject to the constraints x, > 0 and x2 > 0. V/n+1) is
(n+l) A 0, if X n+l) < E i (n+1) _ (n) otherwise ( xj j )
where c > 0 is a userdefined parameter (note: V(n+l) . p(n+l) if E = 0) and j = 1,2,...,J.
4.4 Properties of the APML Algorithm
We now provide a summary of the desirable properties of the APML algorithm:
" The APML algorithm needs only one additional parameter (i.e., E), whereas
orderedsubsets based algorithms [33,35] need at least three extra parameters (i.e., a relaxation parameter, the number of subsets, and a small positive number
that is set to any nonpositive elements of an iterate).
" In experiments, the proposed direction vector performed better than the direction vector put forth by Hooke and Jeeve [41, pp. 287291].
" The APML algorithm monotonically decreases the PML objective function unlike the algorithms in [23,35].
56
" The APML algorithm theoretically guarantees nonnegative iterates, whereas
some algorithms [33, 35] set any negative element of the iterates to a small
positive number.
" The APML algorithm can incorporate a large class of edgepreserving penalty
functions unlike the algorithm by De Pierro [30].
" The APML algorithm converges to the minimizer of the PML objective function.
Convergence proofs for the algorithms in [23,33] are not available.
CHAPTER 5
QUADRATIC EDGE PRESERVING ALGORITHM
In this chapter., we present a regularized image reconstruction algorithm that aims to preserve edges in the reconstructed images so that fine details are more resolvable. We refer to the proposed algorithm as the quadratic edge preserving (QEP) algorithm. The QEP algorithm results via a certain modification of the surrogate function 4(n) for the PML objective function. It should be mentioned that the QEP algorithm was first introduced in [18].
Recall that the nth surrogate function for the PML objective function JD, V(), can be expressed as ()(x) = j n)i(X) + C3() (see (3.29) and (3.35)). For the discussion to follow, it will be convenient to express ( n) in (3.35) in a different manner
t Ejn log(t) + F(n) t2 + Gf')t
 En) log(t)+2/3 ( Wjk(y(n x7))t2 kENj
+ (2  4 w jl(X ) ~7 ) m)) t
I
= E n)log(t) + t Y Pi + 2/3 E wjy(Xfn)  x(n))(t2 2r ii kcNk
1 (n)[t / (n) t)(n) _ (n),, (n)2
 (t)ï¿½2/3 h (t)  23 E Wjk(y(x k
kENj kcNj
(5.1) (5.2) (5.3)
'(, ) (5.4)
(5.5)
where
/jn)(t) ___" E(n) log(t) + t 7,j
i=1
(  (X ) _ )) ( t M)2 h(')(t) = UjkT(  Y)( ()
i k i ' k )
(5.6) (5.7)
anl injk. E('1 F . and G are defined in (3.24). (3.30), (3.31), and (3.32). respectively. Note that the j element of the next PML iterate x("l). x(l), is defined to be the nonnegative ninimizer of the function 4. The function n is quadratic so
( ) (n)
its aperture, Wjk'Y(Xj X ), and minimizer, rnjk . are expected to play key roles in the regularization process (for a quadratic function. f(t) = at2 + bt + c, the constant a is called the aperture of the function). To see the role of the function h(') on determining x +l), suppose/3 0 in (5.5). Then, < )(t) = lJ'O(t) and the minimizer of the function Oj9)(t) is the "pure" log likelihood iterate (i.e., minimizer of (n)For 0, intuitively speaking, it is evident that the functions h()jkN j act to bias x from h ( e :
the pure log likelihood iterate towards the minimizers of {h1n~}kENj (observe: the last term in (5.5) is independent of x). The degree of influence that the functions { jkCN3 possess is controlled by their apertures and the penalty parameter /3.
To highlight the role of the function y, we let/3 = and Wjk = for all j and k. In this case, the aperture of h(") equals 'Y(xn)_ (n)). For the quadratic penalty function (i.e., A(t) = t2), the aperture of h(') is a constant for all j, k, and n. Consequently, for all k E Nj, the function h(n) has the same degree of influence regardless of the absolute difference between and xk(n. Practically speaking, this means that the quadratic penalty will overly smooth edges. To preserve edges, it would be helpful to lessen the degree of influence of the functions {hj)}kEj whenever the absolute difference between and x(k) is sufficiently large. Figure 5 1 is a plot of y when A(t) = log(cosh(l)) (recall: y(t) =A (t)/t in (AS6)) is used in the penalty function with J = 10, 50,100,500. From the figure, it can be seen that 'Y becomes very small compared with y(O) for Itl >> 6. This means that, whenx()  x(n)j >> 6, h(n) will have a "very small" aperture, and consequently the function h(n) will not have much
' jk
influence on x n+1). Said another way, the logcosh penalty function helps preserve edges whose "heights" are on the order of 6.
(a) x 10' (b)
0.01 4
0.008 0.006
0.004
0.002L
0 0
2000 1000 0 1000 2000 2000 1000 0 1000 2000 t t x 10 4 (C) x 10 (d)
3.5
0.8
0.6
0.6 2.5 O4 2
1.5
0.2
0 0.5
2000 1000 0 1000 2000 2000 1000 0 1000 2000 t t Figure 5 1: The function y is shown when A(t) = log(cosh(')) is used as the penalty function with: (a) 6=10, (b) 6=50, (c) 6=100, and (d) 6=500.
We will now move the discussion from the aperture of the function ) to the jk
minimizer of h( ). As stated previously, the minimizers of the functions {h(n)} N play an important role in the regularization procedure. More specifically, when the aperture of h() is sufficiently large, the (n+1)st iterate is biased towards the minimizer
i k
of h(') which ism k' ( + x As a result, there is inherent averaging that
"jk I lljk X k )2 s rslteei neetaeaigt takes place with the PML algorithm.
Consider a penalty function, where, for certain functions { },it is of the form
A()X) 2 (n)s
Ae(x p 2 Oj t)(Xi). (5.8) j=1 kEN
In order to better preserve edges, we believe an improvement would be to construct so that it has the same aperture as h() but a different minimizer. The minimizer of Irk, which would depend on whether an edge is present, is chosen to be
(5.9)
? (n)xn) I](x(.n)  (n))
j~k j 3 X
where il is a function such that
'I(0, { t <  (5.10) t, otherwise
and is a parameter that represents the "heights" of the edges that are to be preserved. An example of a function that can be used is qj(t)  { tanh( ). In order for 1ujk to be the minimizer of ("), we define to be 'jk (7jk (to (be
jk M=(t)  jk' ( k )(t (5.11) Wn (n). (n)
Using oj . , when the absolute difference between x( and x1k is less than (i.e., no edge present), smoothing takes place because u(n) ( On the other hand, when the absolute difference between x1 and x( is greater than (i.e., edge present), less smoothing takes place because there are upper and lower bounds for .k(n) To help make our idea clearer, consider Figure 5 2. In the figure, ,(n) and u(") are plotted as a function of ?7), where xt 800 (value is chosen arbitrarily) and
77(t) = 250 tanh(o). It can be observed that u is approximately x) + 250 when Xk ) is larger than ( 0 and approxiately 250 when Xn) is less than x ()  250. In other words, the function q prevents xjnï¿½l) from being overly biased towards x7) when the absolute difference between x (n) and x() is greater than 250.
Using the edge preserving penalty function A an algorithm for obtaining regularized estimates of the emission means follows:
" Step 1 Let x(ï¿½) > 0 be the initial estimate
" Step 2 Construct Ap)(x) from the current estimate n using (5.8), (5.9),
(5.10), and (5.11)
* Step 3 Compute x(n+l) = argmin "(')(x) subject to x > 0, where E(n)(X)
 T (x) + OACP(X).
* Step 4 Iterate between Steps 2 and 3.
1100
1000 900 Bo 800
1000 1200 1400
Figure 5 2: Plots of rn5") and )versus xk) are shown, where x" = 800 is fixed and 71(t)= 250 tanh(2 ).
It turns out that a closed form expression for the problem in Step 3 is not possible. An alternative to Step 3 is:
* Step 3a Find x(n+l) such that () ) )
The problem in Step 3a can be solved by defining x('+') as
x arg min () , (5.12) where
I)x) A I(")(x) + 3AC)(x) (5.13) and T(n) is defined in (3.12). Defining the iterates according (5.12) and (5.13) insures that E(n)(x(n+1)) < =(n)(x(')). To see this fact, note that E(N)(X(n+1)) < 1/Nl)(x(n+l)) because VI(n)(x) ql(x) for all x > 0. Since Cl(P)(x(n+1)) K I()(x(a)) by (5.12), it follows that
(5.14)
=() n l) 4(n) (n+l)) 5 (~ ) X ) = 7 n( ~ )
All that remains now is to solve the optimization )roblem in (5.12). Observe that 4(p) can be written as
FP(")(x) = xP(")(x) + jA(31)(x) (5.15) {LPx] i d F'j' log(x)) 1j= 1[PX() p],
J
+2/3 5 7jk (xi) (5.16) j=1 kCN,
J ( I I (n j=l i=1 i=1[gOx~) 5 , +
+ 2/3 E wY( x xn) 2u x u )2 (5.17)
j=l A!CNj
E{E () log(xj)+ F 2 + H()xj + C+n) (5.18)
j=l
where
I
H7n) A 5 ij  4/3 L wJkT (Xn,  ))u(n) (5.19) i=1 kcN
4n ~ )(n))u(n)2(.0
 + 2/ C5jkY(Xn  X ) , (5.20) j=1 kENj
and C(n), E and are defined in (3.13), (3.30), and (3.31), respectively. Since ).P is decoupled, it follows that the solution to (5.12) is given by
(n+1) ()r()2 ,~ ) "
x( = argmin fj log(xj)+ F F(n)X2 +H' xj , j= 1,2,..., J. (5.21)
Repeating the steps used to derive (3.37), it is straightforward to see that the solution to the optimization problem in (5.21) is
Hn) + (nH)28E(n)F (n)
(n+I)  ï¿½
x4F (n) j = 1, 2,.. .,J. (5.22)
1 3 3
CHAPTER 6
JOINT EMISSION MEAN AND PROBABILITY MATRIX ESTIMATION
In Chapters 3, 4, and 5, we assumed that the probability matrix 'P accounts for errors due to attenuation. detector inefficiency, detector penetration, noncollinearity, and scatter. However, we now assume that an I x J corrected matrix 'PC is available that accounts for errors due to attenuation, detector inefficiency, and detector penetration. And, we develop a method for correcting errors caused by scatter and noncollinearity. Before presenting the proposed method, we briefly review standard approaches for obtaining Pc in practice.
The (i, j)th element of Pc, '1 , denotes the probability that an annihilation in the j" voxet leads to a photon pair being recorded by the it detector pair when there are no errors due to scatter and noncollinearity. In practice, one can reliably estimate the probability matrix 'P' by using the detector penetration correction method in [7] with an attenuation correction method [9] and detector inefficiency correction method in [15]. However, other correction methods for detector penetration, attenuation, and detector inefficiency could be used in conjunction.
To address attenuation errors, two scans, known as a transmission scan and blank scan, are taken. During a transmission scan, 1  3 rotatingrods filled with positronemitting isotopes rotate outside the subject and the transmission data are the coincidence sums. A blank scan is taken in the same way as the transmission scan except that there is no subject inside the scanner. The coincidence sums during the blank scan form the blank scan data. Given the transmission and blank scan data, an estimate of the attenuation map can be generated. Using the estimated attenuation map, attenuation correction factors are computed and used for attenuation correction. Numerous attenuation estimation methods have been proposed [8 10].
64
a b3
Ib I2
1b 1b I
Figure 6 1: A simple example to illustrate the geometry of PET image reconstruction with three voxels (vi, v2, and v3) and three detector pairs ((aj,bi),(a2,b2), and (a3,b3)). The dashed lines define the tubes of the detector pairs.
To address errors due to detector inefficiency, a blank scan of extremely long duration is performed. From the blank scan, the relative efficiency of the detector pairs can be estimated. Recall that the efficiency of a detector pair is defined to be the probability that a coincidence is recorded when a photon pair is incident on the detectors. Consider a set of detector pairs where each detector pair has same spatial extent. One simple way to estimate the efficiency of a detector pair in the set of detector pairs is to define its efficiency to be the ratio of the number of photon pairs recorded by the detector pair and the mean number of photon pairs recorded by a detector pair in the set of detector pairs. Once, the estimates of efficiencies for all detector pairs are available, they can be incorporated into the probability matrix. To address detector inefficiency, more complicated correction methods such as [13 15] have been proposed.
As discussed in Section 1.3, two photons generated by an annihilation usually do not propagate in exactly opposite directions, which is called noncollinearity of lineofresponse [5]. As illustrated in Figure 1 5, it is possible that an annihilation
65
is recorded by detector pair i1 given that the annihilation would have been recorded by detector pair i2 if both photons propagate in exactly opposite directions. where
il ? i2.
Since scatter depends on the activity and attenuation within the subject and the scanner design., it is not straightforward to correct for errors due to scatter. Photons lose energy when they are under gone Compton scattering. Thus, the energy of detected photons may be used to discriminate between unscattered photons and scattered photons [11, pp. 6569]. Scatter correction methods that are based on multiple (two or more) energy windows have a limitation because detectors have finite energy resolution [46, 47]. As discussed in Chapter 2, stateoftheart scatter correction methods, such as the method in [52] where the scatter distribution is calculated using an analytical equation, are not practical because of their extremely high computational cost. Consequently, a simple scatter correction method would be beneficial to the PET community. In this chapter, we propose a method that estimates the probability matrix in such a way that errors due to scatter and noncollinearity are addressed.
In Section 6.1, we first propose a model for emission data that accounts for errors due to scatter and noncollinearity. Then, in Section 6.2, we present a method where the unknown emission mean vector and an unknown matrix in the proposed model are jointly estimated. Finally, we propose an algorithm, referred to as probability correction in projection space (PCiPS) algorithm, that estimates the unknown emission mean vector and the unknown matrix in the proposed model.
6.1 Scatter Matrix Model
Consider Figure 6 1 in which a simplified PET scanner consisting of three detector pairs is depicted. For this discussion, we will focus on the voxels v1, v2, and v3 shown in Figure 6 1. Let the detector pairs (ai,bi), (a2,b2), and (a3,b3) be the first, second, and third detector pairs, respectively. For the scanner in Figure 6 1, we
assume that
0.04 0 0
p _ 0 0.04 0 (6.1)
0.03 0.03 0.02
Suppose during a scan there are no errors due to scatter and noncollinearity. Then, the mean number of photon pairs recorded by the detector pair (alibi) is P'Itl, where x1 is the mean number of positrons emitted from voxel vj. The mean number of photon pairs recorded by the detector pairs (a2,b2) and (a3,b3) are P222 and (W31Xl + P32X2 + 'P33X3), respectively, where X2 and X3 are the mean number of positrons emitted from voxels V2 and v3, respectively. As discussed in Section 1.3, a photon's original flight path is altered when it undergoes Compton scattering. Consequently, an annihilation in v, that would have been recorded by detector pair (ai,bi) if both photons had not undergone Compton scattering may instead be recorded by detector pair (a2,b2) when scatter occurs. When we consider noncollinearity, it is also possible that an annihilation in v, is recorded by detector pair (a2,b2) given that the annihilation would have been recorded by detector pair (al,bl) if both photons propagate in exactly opposite directions.
Let /C1 2 denotes the conditional probability that an annihilation is recorded by detector pair il given that the annihilation would have been recorded by detector pair i2 if both photons produced by the annihilation had not undergone Compton scattering and both photons propagate in exactly opposite directions, where il = 1,2,... ,I and i2 = 1,2,..., I. It is through these unknown probabilities {Ci,i12} that we model scatter and noncollinearity. Now, accounting for scatter and noncollinearity, the mean number of photon pairs recorded by detector pair (ai,b1) is {cP11it + I12222 +KC13(P31li +P32X2 + P333)}. Moreover, the mean number of photon pairs recorded by detector pairs (a2,b2) and (a3,b3) are {K21PC1xi1+C22PC222+ ï¿½23(PC1h +P32x2+P33x3) and {A31P11ï¿½l +ï¿½32P22ï¿½2+K33 (P31ï¿½ +P32x2 +P33x3) }
respectively. In matrix notation. the mean of the data E[D] can be expressed as Kll k12 K13 0.04 0 0 [1 E[D] iC21 K22 C23 0 0.04 0 [2 (6.2) iC31 /(:32 C33 0.03 0.03 0.02 t3 = 7cf:, (6.3)
where : = [x1 x2 X3]' and the I x I matrix C denotes the first matrix on the righthand side of (6.2). We will refer to KC as the scatter matrix. Since the emission means can be expressed as kP'Y: in this model, we define the "true" probability matrix as ptrue ](7= k , (6.4)
where the (i, j)th element of ptru, pl7u, , denotes the probability that an annihilation in the jth voxel leads to a photon pair being recorded by the ith detector pair. To our knowledge, the scatter matrix model we propose is not available in literatures. However, similar factorization of ptruc in (6.4) can be found in [28,53, 64,65].
With the definition of Ptrue in (6.4), the mean number of photon pairs recorded by the ith detector pair can be expressed as J J I
Z Ptruex =>3> K c X"2 (6.5) E ptiJ Zj E E /Cii2"Pi: J (5
j=1 j=1 i2=1
I J
 > Ki2 > Pic2jxj (6.6) i2=1 j=l
I
 Ek2 i2[1PcXi2 (6.7) i21
In other words, the mean number of photon pairs recorded by the it' detector pair is a weighted sum of the mean number of photon pairs recorded by all detector pairs when there are no errors due to scatter and noncollinearity. Regarding the matrix )C, two constraints are necessary. Since K is a probability matrix, it must be true that 0  ili2  1 for all il and i2. Recall that ki2 is the conditional probability that
an annihilation is recorded by detector pair i1 given that the annihilation would have been recorded by detector pair i2 if both photons produced by the annihilation had not undergone Compton scattering and both photons propagate in exactly opposite directions. Since a photon pair that is recorded by detector pair iI can not be recorded by the other detector pairs, we require that
_ilui2 < 1 for all i2 (6.8) i1=1
6.2 Joint Minimum KullbackLeibler distance Method
We now consider the Poisson model by Shepp and Vardi [16]. In their model, the emission data d is an observation of a random vector D that is Poisson distributed with mean (Pt"ux + p), where x is the unknown emission mean vector and p is the known mean accidental coincidence rate.
Since d is the ML estimate of the unknown mean (ptrucX + p) 1 , it can be said that d P (5ptruex + p). Using the definition in (6.4), we can state that
d ,:z caxp + p. (6.9) It is important to point out that there are two unknowns in (6.9): the scatter matrix kC and emission mean vector x. Thus, given the emission data d, mean accidental coincidence rate p, and probability matrix P', the problem of interest is to estimate the emission mean x and scatter matrix C. We define the estimate (;, C) to be a
1 If d is an observation of a random vector D that is Poisson distributed with unknown mean A, then the likelihood function for d is Pr{D = dAl= deA Since d maximizes the log likelihood function log Pr{D = dlA}, d is the ML estimate of A.
minimizer of the KullbackLeibler (KL) distance [66] between d and (IC7Px + p): (&,/() A arg min KL(d, KICPx + p) subject to
I
x > 0, K > 0, and
where the KL distance between a and b is defined by KL (a, b)=. ai log a_ai +bi (.1
and/KC > 0 means that Cii2 > 0 for all il and i2. The definition of the estimate of (x, 1C) is motivated in part by the fact that in [26], Byrne derived Shepp and Vardi's MLEM algorithm [16] by minimizing the KL distance between the emission data d and Ptrucx using the alternating projection algorithm by Csiszir and Tusnddy [67], where P"tru is assumed to be known. It should be mentioned that Byrne derived the MLEM algorithm when p = 0.
6.3 Probability Correction in Projection Space (PCiPS) Algorithm
Since it is difficult to solve the problem in (6.10), we propose an alternating minimization algorithm where, in a repetitive fashion, one of the two unknowns (i.e., IC and x) is estimated while the other is fixed. Suppose an initial estimate for the scatter matrix IC (e.g., an I x I identity matrix), denoted by 10(), and an initial estimate x () > 0 are available. Then, for n = 0, 1, 2,..., the steps of the proposed algorithm are
* Step 1 Get the current estimate of emission mean vector x('+l) using AC('), the
current estimate of the scatter matrix IC:
arg min KL(d, A(n)Pcx + p). (6.12) x>O
" Step 2 Get the current estimate of the scatter matrix ('1) using x("'+1). the
current estimate of the emission mean vector x:
kC("n1) A arg min KL(d, ICTPx(r'd) + p)
ï¿½:0
I
subject to /V1l12 < 1 for all i2 . (6.13)
" Step 3 Repeat the steps above until some chosen stopping criterion is met.
Note that the problem in Step 1 can be solved using the MLEM algorithm [16,26]. The reason is because Byrne showed that Shepp and Vardi's MLEM algorithm [16] minimizes the KL distance between the emission data d and Ptruex, where ptru is known. Specifically, given an initial guess x(10O) > 0, the iteration for obtaining x('+1) is
(nmIi) X(nm) I [K(n)Pc]iX(.nm)
(n l ci , di /1 d , j 1,2,..., J, (6.14)
[=()PC] Z [[K()PCX(n) + Pi
for m = 0, 1, 2, .. ., M1. The current estimate of x is defined to be X(n+l) A X(nM11+1), the 1M1h iterate of (6.14).
One of the issues surrounding the constrained minimization problem in (6.13) is that it is impossible to estimate the scatter matrix C when all of the elements in the matrix are assumed to be unknown. The reason is because there would be too many unknowns, which would result in the problem being underdetermined (note: the dimension of /C is I x I, while there are only I data points). To reduce the number of unknowns in /C, we assume that /Ci2 = 0 if the detector pairs il and i2 are not in the same projection (see Figure 1 3 for the definition of a projection). This assumption means that an annihilation that would have been recorded by a detector pair within a certain projection if both photons had not undergone Compton scattering cannot be recorded by a detector pair within some other projection. Under the stated assumption, the number of unknown parameters in the matrix IC is dramatically reduced.
Moreover, K is a block diagonal matrix with T x T submatrices along its diagonal 1c1 0 0 0
0 K2 0 0 (6.15)
0 0 . 0
0 0 0 Ivs
where T is the number of detector pairs within each projection (e.g., 160), S is the number of projection angles (e.g., 192), and it is assumed that emission counts of the detector pairs within a projection are placed as a "chunk" in the emission data d (i.e., d = [(Ist projection), (2,d projection), ... , (Ith projection)]').
Since IC is a block diagonal matrix, the minimization problem in (6.13) can be broken into S subproblems. Consider the sinogram of d, denoted by Y, which is a T x S matrix. The Sth column of the sinogram y is the projection that corresponds to a certain projection angle. Figure 6 2 (a) shows an example of a sinogram. In the figure, the sinogram of the 14 minute emission data for plane 21 is shown. The I x 1 vector Pcx('+) is known in the PET community as the forward projection of X(n+l). We define a matrix W(n+l) where the first column is the first S elements of Pcx(n+i) and the second column contains the (S + 1)th to (2S)th elements of Pcx(n+l) and so on. In other words, the ordering of the detector pairs associated with W(n+') and y match. Figure 6 2 (b) shows an example of WV(n+l), where x('l+) was generated by running the MLEM algorithm for 1000 iterations on the emission data in Figure 6 2 (a). Under the assumption that K/ili2 = 0 if the detector pairs il and i2 are not in the same projection, the minimization problem in (6.13) can be expressed as: for S = 1, 2,. .., S,
K;(n+l) arg min KL(ys, K ',,(n+)
T
subject to Z[2]i2 _< 1 for i2 = 1,2,..., T , (6.16)
(a) Emission Data
160 I I I I i
20 40 60 80 100 120 140 160 180
(b) Forward Projection
I I I IIIII /
1 6 0  I I I I
20 40 60 80 100 120 140 160 180
Figure 6 2: Sinograms: (a) 14 minute emission data for plane 21 (i.e., Y) and (b) forward projection of x(fl') (i.e., W/;(n+l) = PcX(n+I)), where X(n+l) is the 1000th MLEM iterate using the emission data in (a). Note that the images were adjusted with their own dynamic range.
I I
A
where z, = [p{( 1)T+1} P{(s 1)T+2} ... PfsT}]1. Ks is the sth submatrix of k,. and y, and wn 1) are the sth column of Y and 110"), respectively.
Since there are T x T unknowns in each minimization problem in (6.16). there would be still too many unknowns. To reduce the number of unknowns, we assume that, for all s, K is a Toeplitz matrix. An example of a Toeplitz matrix is a3 a2 a, 0 0 0
a4 a3 a2 a, 0 0
a5 a4 a3 a2 a, 0
0 a5 a4 a3 a2 a,
0 0 a5 a4 a3 a2
0 0 0 a5 a4 a3
The assumption that IC, is a Toeplitz matrix implies that there are at most T unknowns in each submatrix K. Moreover, the assumption means that a single kernel can account for scatter within each projection.
The assumption that K, is a Toeplitz matrix can be justified for regions with approximately uniform attenuation, such as the brain. Consider Figure 6 3 in which a simplified PET scanner consisting of four detector pairs is depicted. In the figure, the dotted circle defines a region with uniform attenuation. We refer to the detector pairs (a,, b1), (a2, b2), (a3, b3), and (a4, b4) as the first, second, third, and fourth detector pairs, respectively. Note that the geometry of the first and second detector pairs and the geometry of the second and third detector pairs are approximately same. Because of the approximately uniform attenuation of the subject and geometric similarity of the detector pairs, it can be said that the conditional probabilities K:21 and IC32 are approximately same. Using this rationale, for a projection of a PET scanner, it can be assumed that K~i2i P Ki312, when (i2  il) = (i3  i2). Thus, we can construct K, that can be approximated as a Toeplitz matrix.
a* I
I I
Figure 6 3: Geometry of a simplified PET image reconstruction problem: three voxels (v1, v2, and v3) and four detector pairs ((al,bm),(a2,b2), (aa,b3), and (a4,b4)). The dashed lines define the tubes of the detector pairs. The dotted circle defines a region with uniform attenuation.
Now, consider the following functions: for an integer t,
= [yJt, t=(6.17) y8(t) (
0, otherwise
r (n+l)l
t = 1,2,...,T (6.18) 0, otherwise
z (t) ! z,]t, t 1, 2,. .., T .9
1 0, otherwise
Under the assumption that k, is a Toeplitz matrix, y,(t) is approximately equal to the convolution of w~n l)(t) and an unknown nonnegative function, denoted by k(s,)(t), that depends on the sth projection angle. Thus, for t = 1, 2,.. T,
y"(t) (k(8, ) * w~n~1))(t) + z8(t) = k(s,r)(u)W( +1)(t  u) + z,(t) , (6.20)
u=r+l
where k(,.,)(t) E [0, 1] for an integer I and k(,,)(t) = 0 for Itl T T. Tile parameter T is defined by the user, but must satisfy the constraint T < (T + 1)/2j. Observing (6.20). the product kisT)(u)w "+l)(t u) equals the proportion of photon pairs that would have been recorded by detector pair (t u) if both photons produced by the annihilation had not undergone Compton scattering and they had propagated in exactly opposite directions, but instead are recorded by detector pair t. The parameter T restricts the number of detector pairs to be convolved. In principle, the parameter T is to be chosen in such a way that T is proportional to the mean number of scatter events. Since the mean number of scatter events is proportional to the attenuation of the material, it is possible that attenuation map could be used to determine the parameter T.
The convolution in (6.20) can be expressed in a matrix notation: for
(6.21)
where k(,,,) = [k(s,)(T + 1) k(,,)(T + 2)... k(sT)(7 T x (2T  1) matrix that is of the form
[W(n+l)l, [ (n+l)],l
[(n+1)
81)IT
[Ws IT
0
0
W(n+l)] [wS )IT
0
[W (n+1)1l
r ( n + l ~ r (n+l~ r ( n + l ~ ï¿½ " [s T,r+2
r ( n + l ~
ï¿½ "" Ws IT
)a (n+l)
1)'and ., is a known
0 0
r. [ (n+l)
[Ws1)IT2 r+2 ... [W(n+l)]
S " W T2r+3
ï¿½ï¿½" [W s Jl)T +l
Thus, y, can be approximated as
(6.22)
(k(,,,) * w (n+I) [13(n+') k t , t = 1, 2,
R S
Y. ,,L(n+)
5s 1" s (s,r) + Z s .
With (6.22), the minimization problem in (6.16) is now
) Arg min KL(y,, B +)k(,,) + z.s) (s,,)  k(.a,)r>0 2r A
subject to E [k(,)]t < 1 . (6.23) t=1
The constrained optimization problem in (6.23) is difficult to solve because of the
271
constraint _t=1 [k(s,,r)]t < 1. Consequently, we first solve the following optimization
problem:
(l,) = arg min KL(ys, Sn1)ks,,r' + zS) . (6.24) k(.,)>O
(n+1) (n+1)
Then, to get k(n,) we normalize k(,,) so that the sum of its elements equals one:
(n+1) [k(n+1)l
get((slr A tr (s,r) J
[k( It (S, [k I tn l = 1, 2,..., (2T  1) . (6.25) Et=l (s,,r)JIt
The minimization problem in (6.24) can be solved by the MLEM algorithm [16,26]
because it has the same form as the optimization problem in Step 1. Specifically,
given an initial estimate ,(n,O) > 0, the iteration is as follows: for m = 0, 1,... M2,
[(nm). T [3(n+l1)
.(n,m+l)l [k(s ) ]j [ ij[Ysi
[k",) =J T r(,+) E n+l) , j = 1, 2,..., (2T  1). (6.26)
1 1L(~ [Bn ) W e d e n ei s ki j = L s ( si z " s i
k(n+l) A k(n,M2+l k(n+l)(n1
We define "(s'Tr) "sr) 1)and normalize (,,r) using (6.25) to get k(n,,)
Given k 1(',), we determine A,,+ï¿½) by using the following equation:
TS7)
K tn 1)
[k(,+)]7
(in+1)l
[k (n+1]
0
0
[k (n + 1) [k ni)
(s ) 2r2 [k (n+r)] 1
0
... [k(n' + )]
[k (n+) ] [ (810 IT
... [k~(n+l)]
k (S,)0
ï¿½""0
0 0
0 0
[k(n+l)] (sjr)
(6.27)
Finally, K(ntl) is defined as a block diagonal matrix with {fkn+1)} along its diagonal:
&+(n+1)
k(n+l)
1
0 0 0
0
iC(n+1)
2
0 0
0 0 0 0
0
o (n+l)
0 "S
(6.28)
Repeating Steps 1 and 2 generate the estimates of IC and x. Once the estimate of k is available, denoted by k, a regularized image reconstruction algorithm, such as the PML, APML, and QEP algorithm, can be used to estimate the emission mean vector x. More specifically, we first let our estimate for 'ptrue be ptrue = APC. Then, the probability matrix P true is used in the image reconstruction algorithm of choice.
In summary, the Steps of the PCiPS algorithm are: for n = 0, 1, 2,...
* Step 1 Get an initial estimate for the emission mean vector x(ï¿½) > 0 and an
initial estimate for the scatter matrix 100)
* Step 2 Get the current estimate of the emission mean vector x('+') using ('),
which is the current estimate of the scatter matrix K:
x(n+1) . arg min KL(d, I Pc()'x + p).
X>0
Step 3For s= 1,2,..., S, get k(,r) using (6.26).
(6.29)
78
9 Step 4 For s = 1, 2,. S . normalize k I) using (6.25) to get k(n')
* Step 5 For s = 1,2,...,S, get Id"'s+" using (6.27). " Step 6 Get KC(n+l) using (6.28).
* Step 7 Repeat Steps 2 through 6 for a chosen number, Al. of iterations.
treAtrue
" Step 8 Define P'ue A IC(A!+1)'Pc and use P in the APML algorithm or some other algorithm of choice.
CHAPTER 7
SIMULATIONS AND EXPERIMENTAL STUDY
To evaluate the algorithms in Chapters 3, 4, and 5, and the method in Chapter 6, we applied them to real thorax phantom data and compared them quantitatively and qualitatively to certain existing algorithms. Also, in Section 7.1 simulation studies with computergenerated synthetic data are presented for the PML and QEP algorithms in Chapters 3 and 5, respectively. It should be noted that simulation results with the synthetic data have limitations because they are generated under the assumption that the system model for emission data in Section 1.4 is exactly correct. However, the system model is not perfect due to errors discussed in Section 1.3.
Thorax phantom data was obtained from the PET laboratory at the Emory University School of Medicine. The phantom was filled with 2[1F]fluoro2deoxydglucose (['5F]FDG) and scanned using a SiemensCTI ECAT EXACT [model 921] scanner in slicecollimated mode (i.e., septaextended mode). Thirty independent data sets were generated from multiple scans of duration 7 minutes. Fifteen realizations of 14 min data were generated by adding nonoverlapping two 7 min data sets. The [1F]FDG concentration for the heart wall, heart cavity, liver, three tumors, and thorax cavity of the thorax phantom by Data Spectrum Inc., were 0.72pCi/ml, 0.23jtCi/ml, 0.72jiCi/ml, 2.01pCi/ml, and 0.24jtCi/ml, respectively. The lungs, which contained styrofoam beads, were filled with a 0.25,Ci/ml solution of [1F]FDG. The concentrations were chosen to mimic those observed in wholebody scans. The tumors were of size 1 cm, 1.5 cm, and 2 cm. The sinogram consists of 160 radial bins and 192 angles. The physical dimensions of the image space is 43.9 x 43.9 cm2 and the reconstructed images contain 128 x 128 voxels (voxel size is 3.43 x 3.43
80
rm2). Two planes (10 and 21) were considered in the experiments. Plane 10 contains activity due to the heart. lungs, spine, and background, while plane 21 contains activity due to the heart, two tumors (1.5 cm and 2.0 cm), and background. The total number of prompts for planes 10 and 21 was 397, 000 and 340, 000, respectively, for 14 minute data. The randoins makeup about 10% and 12% of the data for planes 10 and 21, respectively.
The probability matrix ' was computed using the angleofview method [16] with corrections for errors due to attenuation and detector inefficiency. To get the attenuation correction factors., postinjected transmission scan data was collected for three minutes and the attenuation correction method by Anderson et al. [9] was employed. A normalization file was used to correct for detector inefficiency. Finally, the randoms were used as noise free estimates of the mean numbers of accidental coincidences.
For all of the experiments and simulations, we used a uniform initial estimate (all voxels equal Ei di/J), the eight nearest neighbors of the jth voxel were used for Nj, and the weights, { Wjk}, are one for horizontal and vertical nearest neighbors and 1/v/'2 for diagonal nearest neighbors.
In Section 7.1, we applied the PML and QEP algorithms to real thorax phantom data and computergenerated synthetic data, and compared them quantitatively and qualitatively. Then, performance of the APML algorithm will be evaluated in Section 7.2. Finally, experimental results with the probability estimation method in Chapter
6 will be presented in Section 7.3.
7.1 Regularized Image Reconstruction Algorithms
In this section, we compare the PML and QEP algorithms, quantitatively and qualitatively, to the MLEM algorithm and a penalized weighted leastsquares (PWLS) algorithm [42]. Two adhoc forms of regularization include the postfiltering of MLEM estimates and early termination of the MLEM algorithm (usually quite far from the
MLEM estimate). Given their simplicity, we also compared the postfiltering (MLEMIF) and earlystopping (MLEMS) strategies to the proposed algorithms.
Quantitative comparisons were made using contrast as a figureof merit: Cotas I iAInoi  A'JB
Contrast A AI B (7.1) MB
where MRQo and IB denote the mean of a chosen regionofinterest (ROI) and background, respectively. We define another figureofmerit that quantifies the distinguishability of the two tumors:
A T MI7
Distinguishability NIT MB (7.2) where MT and All denote the mean activity of the two tumor regions and intermediate region between the two tumors, respectively. If two tumors overlap each other, then M, = AIT and the distinguishability will be zero. On the other hand, if intermediate region between the two tumors has the same mean as the background, then the distinguishability will be one.
For image comparisons, converged images were used for the MLEM, PML, and PWLS algorithms. The QEP images result by running the QEP algorithm for the same number of iterations as the PML algorithm. This was necessary because the QEP algorithm does not have a single objective function. Recall that the QEP algorithm defines a new objective function to be minimized at each iteration. For the PML, QEP, and PWLS images, the penalty parameter, /, was chosen in such a way that the standard deviation of their softtissue (i.e., background) regions were equal. In this sense, the algorithms are "balanced" with respect to /. For the MLEMS and MLEMF images, early MLEM iterates and filtered converged MLEM images were obtained that matched the standard deviation of the softtissue regions in the PML, QEP, and PWLS images. To filter the MLEM image, we used 5 x 5 Gaussian filters with different standard deviation values.
7.1.1 Synthetic Data
In this subsection, we present simulation results for software phantoms. Fifty realizations were used in the simulation study. To generate emission data, a software phantom was forward projected (i.e., Px) using the P matrix, where it was assumed that there are no errors except accidental coincidences. Then. for each bin, the prompts and randoms were generated using pseudorandom Poisson variates with mean [Pxji + p and p, respectively. The constant p was chosen such that the mean accidental coincidence rate was approximately 10%. The mean number of prompts and randoms are about 550,000 and 50,000, respectively. For simulation studies, dimensions of the image space follow the corresponding ones of the real phantom image space. The total number of intensities within a software phantom was about 500,000.
We first consider a tumor phantom. Figure 7 1 shows a software phantom that consists of two tumors (1.7 cm and 2.4 cm in diameter) in a uniform circular background with a diameter of 30.5 cm, where two tumors and background intensities are 7 x 74 and 74, respectively (tumor contrast is 6). The image also depicts regions for contrast and distinguishability calculation. The intermediate region between the two tumors consists of 14 voxels and the large and small tumors consist of 45 and 21 voxels, respectively.
For the tumor phantom in Figure 7 1, we used the logcosh penalty function A(t) = log(cosh(l)) with 6 = 20 in the PML and QEP algorithms, while the quadratic penalty function was used in the PWLS algorithm. For the QEP algorithm, rq(t) tanh(t) was used with = 150 unless noted. The parameters 6 and were chosen experimentally.
Figure 7 2 is a plot of the images obtained by the MLEM, MLEMS, MLEMF, PML, QEP, and PWLS algorithms when the tumor phantom in Figure 7 1 was used.
The MLEM image is the 500h MLEM iterate and 200 iterations were used to reconstruct the PML, QEP, and PWLS images. For the PML, QEP, and PWLS images, /3 was chosen such that the standard deviation of the background was approximately 12. The MLEMS image is the 20t1h MLEMI iterate. To get the MLEMF image, the MLEM image in Figure 7 2 (a) was filtered once using a 5 x 5 Gaussian filter with standard deviation of 1.27 in voxels. As stated, all of the images in Figure 7 2 have the same background standard deviation except for the MLElI image. The MLEM image in Figure 7 2 (a) is considerably noisy (background standard deviation is about 78) compared to the other images. Figures 7 2 (d) and (e) illustrate that the PML image and the QEP image are smooth and, at the same time, the tumors in the images are resolvable and differ greatly from the background. On the other hand, Figures 7 2 (c) and (f) demonstrate that the images generated by the MLEMF and PWLS algorithm are too smooth, especially near the boundary of the tumors. Figure 7 2 (b) shows that edges of the tumors in the MLEMS image are not as clear as the ones in the PML and QEP images. In Figure 7 3, the images in Figure 7 2 are plotted with their own dynamic range.
For the images in Figure 7 2, the contrast of the QEP image was 1%, 12%, 21%, 4%, and 22% higher than the MLEM, MLEMS, MLEMF, PML, and PWLS images, respectively, for the large tumor. The increased contrast of the QEP image for the small tumor with respect to the MLEM, MLEMS, MLEMF, PML, and PWLS images was 4%, 16%, 31%, 5%, and 34%, respectively. The increased tumor distinguishability of the QEP image with respect to the MLEM, MLEMS, MLEMF, PML, and PWLS images was 5%, 16%, 34%, 5%, and 40%, respectively. The MLEM image outperformed the QEP image in contrast and distinguishability comparison. However, as can be seen in Figure 7 2 (a), the MLEM image is extremely noisy.
Figures 7 4 (a) and (b) are line plots (the row is shown in Figure 7 1) of the images in Figure 7 2. For the row under consideration, it can be seen from the line plots that the PAIL and QEP images have a higher degree of contrast than the other images. except for the MILEM image. And, the edges in the QEP image are sharper than those in the PML image. As expected. the MLEMI image is excessively noisy from the line plot.
Figures 7 5 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifty realizations, respectively, when the tumor phantom in Figure 7 1 was used. Further, a plot of the average standard deviation of the large tumor versus the average background standard deviation for the fifty realizations is shown in Figure 7 5 (c). Finally, in Figure 7 5 (d), a plot of the average distinguishability of two tumors versus the average background standard deviation for fifty realizations is shown. As can be seen from Figures 7 5 (a), (b), and (d), the contrast curves and the distinguishability curve of the QEP algorithm lie above the curves of the other algorithms for comparable "background noise". Thus, for a fixed level of comparable background noise, the QEP images, on average, have the greatest contrast and distinguishability. The average standard deviation curves of the PML and QEP algorithms in Figure 7 5 (c) generally lie below the corresponding curves of the other algorithms for reasonably small background noise.
To see where the PML and QEP algorithms break down in terms of contrast, we performed simulations using the synthetic tumor phantom in Figure 7 1 with four different tumor contrast values (tumor contrast equals 3, 1.5, 0.75, and 0.5). For the PML and QEP algorithms, 0 = 1/16 and 1/32 were used, respectively, and 200 iterations were used. For the QEP algorithm, = 150 was used for tumor contrast of 3, whereas = 80 was used for the other tumor contrast values (i.e., tumor contrast equals 1.5, 0.75, and 0.5). The parameters /7 and were chosen
experiinent ally. Figures 7 6 (a) and (b) are the PNIL and QEP iiages. respectively, obtained by using the pihantonm in Figure 7 1 with tumor contrast of 3. As can be seen in the figures, when tumor contrast was 3, the tumors in the PNL and QEP images are clearly resolvable. Figures 7 6 (c) and (d) are the P1\L and QEP images, respectively, obtained by using the phantom in Figure 7 1 with tumor contrast of 1.5. From the figures. the tumors in the PMIL and QEP images are clear enough when tumor contrast was 1.5. It should be mentioned that the images in Figure 7 6 have the same standard deviation of the background that is approximately 10. Figures 7 7 (a) and (b) are the PML and QEP images, respectively, obtained by using the phantom in Figure 7 1 with tumor contrast of 0.75. From the figures, the tumors in the PML and QEP images are still resolvable when tumor contrast was 0.75. Consider the PNIL and QEP images in Figures 7 7 (c) and (d), respectively, that are obtained by using the phantom in Figure 7 1 with tumor contrast of 0.5. The tumors in the PML and QEP images are hardly resolvable which implies that the PML and QEP algorithms break down when tumor contrast was 0.5. The images in Figure 7 7 have the same standard deviation of the background that is approximately 9.
Figures 7 8 (a), (b), (c), and (d) are plots of the average contrast of the small tumor versus the average background standard deviation using fifty realizations, when tumor contrast of the phantom in Figure 7 1 were 3, 1.5, 0.75, and 0.5, respectively. As can be seen from Figures 7 8 (a) and (b), the contrast curves of the QEP algorithm lie above the curves of the PML algorithm for a fixed background noise when tumor contrast equals 3 and 1.5. For tumor contrast of 0.75 and 0.5, the curves of the PML and QEP algorithm coincide as can be seen in Figures 7 8 (c) and (d) which implies that the PML and QEP algorithms perform similar to each other when tumor contrast is small enough.
To see where the PML and QEP algorithms break down in terms of spacing between two tumors, we performed simulations using four different synthetic tumor
)hantomls where each phantom has different tinor spacing. WXe compared tie PAIL and QEP images both visually and in terms of (list inguishability. Figures 7 9 (a),
(1)), and (c) show software phantoms that consist of two tumors (each tunor equals 3 x 3 voxels). For the phantoms in Figures 7 9 (a), (b), and (c). tumor contrast is 3 and the spacing between two tumors is 2, 3. and 4 voxels, respectively. Figure 7 9
(d) shows a software phantoms that consists of two tumors (each tumor equals 3 x 3 voxels) with the spacing of 2 voxels, where tumor contrast is 6. The images in Figure 7 9 also depict regions for distinguishability calculation. The intermediate regions between the two tumors consist of 6, 8, 10, and 6 voxels for the phantoms in Figures
7 9 (a), (b), (c) and (d), respectively.
Figure 7 10 is a plot of the PML and QEP images obtained by using the phantois in Figures 7 9 (a) and (b). Figure 7 11 is a plot of the PIL and QEP images obtained by using the phantoms in Figures 7 9 (c) and (d). The PML and QEP images in Figures 7 10 and 7 11 were from 200 iterations. The PML and QEP images in Figures 7 10 and 7 11 have the same standard deviation of the background that is approximately 10 and 9, respectively. For the PML and QEP images in Figures 7 10 and 7 11, fi = 1/16 and 1/32 were used, respectively. Figure 7 10 indicates that the PML and QEP algorithms generate images where the tumors are clearly separated when the spacing between the tumors is 3 and 4 in voxels. As can be seen in Figures 7 11 (a) and (b), the PML and QEP algorithms were not able to resolve the two tumors when the spacing between the tumors is 2 in voxels. However, when the tumor contrast was increased, the PML and QEP algorithms worked well when the spacing between the tumors is 2 in voxels as shown in Figures 7 11 (c) and (d).
Figure 7 12 is a plot of the average distinguishability of two tumors versus the average background standard deviation using fifty realizations, when the tumor phantoms in Figure 7 9 were used. As can be seen from Figures 7 12 (a), (b), (c), and
(d), the distinguishability curves of the QEP algorithm lie above the curves of the
. . . . . . . . . . . . . . . . . . . . .  
F     
Figure 7 1: A software tumor phantom is shown. Contrast of the tumors in the phantom is 6. The regions surrounded by the dotted and dashed lines define the tumor intermediate region (i.e., MI) and background region, respectively. PML algorithm for a fixed background noise. Thus, for a fixed level of background noise, the QEP images, on average, have greater distinguishability.
7.1.2 Real Data
In this subsection, we present experimental results using real phantom data for plane 21. Unless noted, the data are from 14 minute scans. The image in Figure 7 13, which was produced by averaging fifteen converged MLEM images, depicts the regions that were used in the contrast and distinguishability calculations. In the large and small tumor ROIs and intermediate region, the number of voxels were 24, 12, and 16, respectively.
In the PML and QEP algorithms, we used the logcosh penalty function A(t) = log(cosh(l)) with 6 = 50. Since noise in real data is stronger than the synthetic data due to errors (e.g., scatter), we increased the value of the parameter 6 to reduce background noise (observe that we used 3 = 20 for the synthetic phantom in Figure 7 1). For the QEP algorithm, r(t) = tanh(') was used with = 500 and 1000 for 7 minute and 14 minute real data, respectively. We varied because the data sets lead to reconstructed images that have different edge "heights". As in the simulations
(a) MLEM (b) MLEMS
0410
(c) MLEMF (d) PML
so
(e) QEP (f) PWLS
Figure 7 2: Comparison of emission images when the synthetic phantom in Figure 7 1 was used: (a) MLEM image, (b) MLEMS image, (c) MLEMF image, (d) PML image, (e) QEP image, and (f) PWLS image. The images in (a) and (b) are from 500 and 20 iterations, respectively, while the images in (d), (e), and (f) are from 200 iterations. The image in (c) was obtained from filtering the MLEM image once with a 5 x 5 Gaussian filter with standard deviation of 1.27 in voxels. For the PML, QEP, and PWLS images, 3l was chosen in such a way that the standard deviation of the background is approximately 12. Specifically, j3 = 0.0415, 0.021, and 0.006 for the PML, QEP, and PWLS images, respectively. The standard deviation of the background of images in (b) and (c) is also approximately 12. For display purposes, all the images were adjusted so that they have the same dynamic range.
(b) MLEMS
(c) MLEMF
(d) PML
0.
as
(e) QEP
(f) PWLS
5.0
Figure 7 3: The images in Figure 7 2 are shown with their own dynamic range.
(a) MLEM
QO
90
(a)
600
..True MLEM 500 MLEMF
400
300
200100
20 40 60 80 100 120
(b)
600
...True
PML 500" QEP i  iPWLS 400 i
300
200
100
20 40 60 80 100 120
Figure 7 4: A line plot comparison of the reconstructed images in Figures 7 2 (a),
(b), and (c) is shown in (a). A line plot comparison of the reconstructed images in Figures 7 2 (d), (e), and (f) is shown in (b).
(a) Large Tumor
6
5
4
3 re MLEMF o LEPML
QEP 2+ PWLS
0 20 40 61 Background Std. Dev.
(c) Large Tumor
;0
10
*i PWLS
0 20 40 61 Std. Dev. of Background
(b) Small Tumor
e MLEMS MLEMF
 PML
QEP
 PWLS
20 40 6 Background Std. Dev.
(d) Tumors
0 20 40
Background Std. Dev.
Figure 7 5: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively, when the synthetic phantom in Figure 7 1 was used. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifty synthetic data realizations were used in the study. For the MLEMS curves, the images from iterations 5  160 were used. For the MLEMF curves, the MLEM images were filtered once by 5 x 5 Gaussian filters with a standard deviation range of 0.44  3.0 voxels (each voxel is 3.43 x 3.43 mm2). For the PML, QEP, and PWLS algorithms, the images were reconstructed using two hundred iterations for j3 2129, 21 29, and 2 212, respectively.
(a) PML,Contrast=3
we
(c) PML,Contrast=1.5
(d) QEP,Contrast=1.5
Figure 7 6: Comparison of emission images: (a) PML image and (b) QEP image when tumor contrast of the phantom in Figure 7 1 was 3, and (c) PML image and
(d) QEP image when tumor contrast of the phantom in Figure 7 1 was 1.5. All images are from 200 iterations and they have the same background standard deviation that is approximately 10. For the PML and QEP images, /3 = 1/16 and 1/32 were used, respectively. For the QEP images in (b) and (d), = 150 and = 80 were used, respectively. For display purposes, each set of images (i.e., (a) and (b), (c) and (d)) were adjusted so that they have the same dynamic range.
(b) QEP,Contrast=3
00

Full Text 
PAGE 1
IMAGE RECONSTRUCTION ALGORITHMS FOR ACHIEVING HIGHRESOLUTION POSITRON EMISSION TOMOGRAPHY IMAGES By JIHO CHANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2004
PAGE 2
Copyright 2004 by JiHo Chang
PAGE 3
I dedicate this work to my family.
PAGE 4
ACKNOWLEDGMENTS First of all. I would like to thank my academic advisor, Dr. John M.M. Anderson, for his great guidance of my research. I also would like to thank my committee members Dr. Jian Li, Dr. Fred J. Taylor, and Dr. Bernard A. Mair for their precious comments and corrections on my dissertation. Especially, I would like to thank Dr. Bernard A. Mair for his comments and corrections on convergence proof of the proposed algorithms. I thank Yoonchul Kim and Kerkil Choi who have shared with me valuable discussions related to my research. Finally, I am grateful to my family, who have always prayed for my health. IV
PAGE 5
TABLE OF CONTENTS page ACKNOWLEDGMENTS iv ABSTRACT vii CHAPTER 1 BACKGROUND 1 1.1 Overview of Positron Emission Tomography (PET) 2 1.2 PET Scanner 3 1.3 Sources of Error 6 1.4 System Model for Emission Data 10 2 LITERATURE REVIEW 16 2.1 Penalized Maximum Likelihood Algorithms 17 2.2 Scatter Correction Methods 19 2.3 Summary of the Proposed Algorithms 20 3 PENALIZED MAXIMUM LIKELIHOOD ALGORITHM 23 3.1 Penalized Maximum Likelihood (PML) Algorithm 24 3.2 Convergence Proof 33 3.3 Properties of the PML Algorithm 37 4 ACCELERATED PENALIZED MAXIMUM LIKELIHOOD ALGORITHM 39 4.1 Convergence Proof 40 4.2 Accelerated Penalized Maximum Likelihood (APML) Algorithm . 44 4.3 Direction Vectors 52 4.4 Properties of the APML Algorithm 55 5 QUADRATIC EDGE PRESERVING ALGORITHM 57 6 JOINT EMISSION MEAN AND PROBABILITY MATRIX ESTIMATION 63 6.1 Scatter Matrix Model 65 6.2 Joint Minimum KullbackLeibler distance Method 68 6.3 Probability Correction in Projection Space (PCiPS) Algorithm . . 69 v
PAGE 6
7 SIMULATIONS AND EXPERIMENTAL STUDY 79 7.1 Regularized Image Reconstruction Algorithms 80 7.1.1 Synthetic Data 82 7.1.2 Real Data 87 7.2 APML algorithm 112 7.3 PCiPS Algorithm 116 8 CONCLUSIONS AND FUTURE WORK 138 8.1 Conclusions 138 8.2 Future Work 142 APPENDIX A HUBERÂ’S SURROGATE FUNCTIONS 144 B SURROGATE FUNCTIONS FOR PENALTY FUNCTION 145 C STRICT CONVEXITY OF PML OBJECTIVE FUNCTION 147 D SOLUTION TO UNCONSTRAINED OPTIMIZATION PROBLEM IN MODIFIED APML LINE SEARCH 149 E CONVEXITY OF SURROGATE FUNCTIONS FOR OBJECTIVE FUNCTIONS IN APML LINE SEARCH 151 REFERENCES 154 BIOGRAPHICAL SKETCH 160 vi
PAGE 7
Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy IMAGE RECONSTRUCTION ALGORITHMS FOR ACHIEVING HIGHRESOLUTION POSITRON EMISSION TOMOGRAPHY IMAGES By JiHo Chang August 2004 Chair: John M. M. Anderson Major Department: Electrical and Computer Engineering In this dissertation, we present four algorithms for reconstructing highresolution images in PET. The first algorithm, referred to as the penalized maximum likelihood (PML) algorithm, iteratively minimizes a PML objective function. At each iteration, the PML algorithm generates a function, called a surrogate function, that satisfies certain conditions. The next iterate is defined to be the nonnegative minimizer of the surrogate function. The PML algorithm utilizes standard decoupled surrogate functions for the maximum likelihood objective function of the data and decoupled surrogate functions for a certain class of penalty functions. As desired, the PML algorithm guarantees nonnegative estimates and monotonically decreases the PML objective function with increasing iterations. For the case where the PML objective function is strictly convex, which is true for the class of penalty functions under consideration, the PML algorithm has been shown to converge to the minimizer of the PML objective function. The drawback of the PML algorithm is that it converges slowly. Thus, a Â“fastÂ” version of the PML algorithm, referred to as the accelerated PML (APML) algorithm, vii
PAGE 8
was developed where an additional search step, called a pattern search step, is performed alter each standard PML iteration. In the pattern search step, an accelerated iterate, which has lower cost than the standard PML iterate, is found by solving a certain constrained optimization problem that arises at each pattern search step. The APML algorithm retains the nice properties of the PML algorithm. The third algorithm, referred to as the quadratic edge preserving (QEP) algorithm, aims to preserve edges in the reconstructed images so that fine details, such as small tumors, are more resolvable. The QEP algorithm is based on an iteration dependent, decoupled penalty function that introduces smoothing while preserving edges. The penalty function was developed by modifying the surrogate functions of the penalty function for the PML method. In PET, there are several errors that have the net effect of introducing blur into the reconstructed images. We propose a method that aims to reduce blur in PET images. The method is based on the assumption that the Â“trueÂ” probability matrix for the observed emission data is a product of an unknown nonnegative matrix, called a scatter matrix, and a Â“conventionalÂ” probability matrix. Under the suggested framework, the problem is to jointly estimate the scatter matrix and emission means. We propose an alternating minimization algorithm to estimate them by minimizing a certain distance. The algorithms are qualitatively and quantitatively assessed using synthetic phantom data and real phantom data. viii
PAGE 9
CHAPTER 1 BACKGROUND Medical imaging modalities, such as Xray computed tomography and magnetic resonance imaging, are used to obtain images of anatomical structures within the human body. However, in certain medical applications, it is also important to get physiological information. The reason is because physiological changes can indicate disease states earlier than anatomical changes [1], Positron emission tomography (PET) and singlephoton emission computed tomography (SPECT) are widely used medical imaging modalities that acquire physiological information on both human and animal subjects. In SPECT, physiological information is obtained by imaging the distribution of gammaray or Xray emitting radioisotopes within the human body [1], After the radioisotopes are introduced into the human body, the radioisotopes decay and emit gammaray or Xray single photons. A SPECT scanner detects these photons with help of collimators that are made of lead. Since a large percentage photons are absorbed by the lead collimators, the sensitivity and accuracy of SPECT is limited. In PET, there is no need for lead collimators because collimation is performed by electronic circuits that are connected to the detectors. Consequently, PET possesses relatively high sensitivity and accuracy, when compared to SPECT. Although cost is the major limitation of PET, recent research advances, such as a less expensive materials for detectors and scanner configurations that need a smaller number of detectors than conventional scanners, have helped decrease its cost [1]. The main advantage of SPECT over PET is its substantially lower cost. 1
PAGE 10
2 We present a brief overview of PET. PET scanner, and certain errors in PET in Section 1.1, 1.2, and 1.3, respectively. Then, we provide some background on the image reconstruction problem in PET. 1.1 Overview of Positron Emission Tomography (PET) Most clinical applications of PET are in oncological cases involving the diagnosis and staging of cancer, treatment planning and monitoring of tumors, detection of recurrent tumors, and localization of biopsy sites in cases when there are tumors in the head or neck [2,3]. PET is also used in the diagnosis of coronary artery disease and other heart diseases [2,3]. In PET, physiological information is acquired by imaging the distribution of positronemitting isotopes, such as 13 N, 15 0, 18 F, and n C, within the human body [3]. The positronemitting isotopes are bound to compounds with known biochemical properties. Compounds that are labeled with positronemitting isotopes are called radiopharmaceuticals. The choice of the radiopharmaceutical depends on its application. For example, 2[ 18 F]fluoro2deoxydglucose ([ 18 F]FDG) is used for imaging brain tumors while [ 13 N] ammonia is used for the detection of coronary artery disease [3]. After the radiopharmaceutical is introduced into the subject (injection or inhalation), positrons are emitted as the positronemitting isotopes decay. An emitted positron annihilates with a nearby electron within the body causing the generation of two high energy (511keV) photons. The two photons, which can penetrate the subject, travel in almost opposite directions. Ideally, photon pairs that are generated by a positron annihilating with an electron will be detected by a pair of detectors. If two electronically connected detectors detect a pair of photons within a short time interval (e.g., < 10 ns), then the detection is recorded along the line connecting the two detectors, which is called a lineofresponse. In the absence of an error, the detection indicates that there is an annihilation somewhere along the lineofresponse. The detection of a photon pair is referred to as a coincidence. In addition,
PAGE 11
3 the timing interval of a scanner that is used to define a coincidence is called the scannerÂ’s coincidence timing window. Since there are many, many detector pairs in a PET scanner, sufficient information is available for reconstructing a map of the concentration of the radiopharmaceutical. For each detector pair, the number of coincidences that occur during the scan are summed. The emission data is the coincidence sums for all of the possible detector pairs. The key idea in PET is that emission data depends on the distribution of the radiopharmaceutical within the subject being scanned, which in turn depends on the metabolism of the subject. Consequently, numerous researchers have developed algorithms that reconstruct PET images whose pixel values represent the distribution of the radiopharmaceutical, and, ultimately, the metabolism of the subject. 1.2 PET Scanner Typical PET scanners have a diameter of 80 100 cm and an axial extent of 10 Â— 20 cm [4]. Figure 1 1 (a) shows a simplified PET scanner and Figures 1 1 (b) and (c) show twodimensional views of the scanner. Usually, a PET scanner consists of hundreds of rectangular bundles of crystals that are formed to make between 20 30 rings of detectors, where each detector ring contains 300600 detectors. Each bundle of crystals is connected to a few (e.g., 2 Â— 8) photomultiplier tubes (PMTs). Figure 1 1 (d) shows such a block of crystals coupled to four PMTs. When a photon interacts with a crystal, light photons are emitted and the PMTs collect the photons. From the collected light, a PET scanner determines the crystal within which the scintillation occurs. Stateoftheart PET scanners generally provide two scan modes: slicecollimated mode and fully threedimensional mode [4], In slicecollimated mode or septaextended mode, thin tungsten rings, called septa, are placed between the detector rings. Figure 1 2 (a) illustrates slicecollimated mode in which coincidences are collected within a detector ring. In fully threedimensional mode, the septa is removed from the
PAGE 12
4 (a) (b) Figure 1 1: Simplified PET scanner: (a) a simplified fullring PET scanner with 8 detector rings, (b) twodimensional view of the scanner at x = 0, (c) twodimensional view of the scanner at y = 0, and (d) a block detector consisting of an array of 8 x 8 crystals coupled to four PMTs, where the origin of the rectangular coordinate system is at the center of the scanner.
PAGE 13
(a) (b) Figure 1 2: Scan modes: (a) septaextended mode and (b) fully threedimensional mode. The dotted lines represent lineofresponses. (a) (b) Figure 13: Projection definition using zigzag scan: (a) a set of detector pairs that define a projection and (b) another projection. A dashed line represents a detector pair.
PAGE 14
6 Figure 1 4: Illustration of positron range and a coincidence. Two concentric circles denote a detector ring. scanner. Figure 1 2 (b) illustrates fully threedimensional mode in which allowable lineofresponses are not restricted to occur within a detector ring. Within a detector ring, detector pairs are grouped into projections, where a projection is a set of detector pairs that are defined by a particular Â“zigzag scanÂ”. Figure 1 3 (a) shows a projection and the defining zigzag scan. Figure 1 3 (b) shows another projection. A sinogram is defined to be a S x T matrix, where each row of the matrix contains a projection, S is the number of detector pairs within the projection, and T is the number of projections in the emission data. 1.3 Sources of Error After a positron is emitted, it travels a short distance before it annihilates with a nearby electron within a subject. The distance between the locations at which the emission and the annihilation take place is called positron range. The positron range is proportional to the reciprocal of the density of the material an emitted positron travels [5]. Figure 1 4 is an illustration of positron range for a simplified scanner geometry. The positron range depends on the energy an emitted positron deposits (and,
PAGE 15
7 Figure 1 5: Illustration of noncollinearity of lineofresponse. The dashed line indicates the path of the photon pair if they had departed in exactly opposite directions. The arrows show the actual photon paths. consequently, the chosen positronemitting isotope). For typical positronemitting isotopes, a full width half maximum (FWHM) of the distribution of positron range is a few millimeters. For example, the FWHMs of the distribution of the positron range for 18 F and 15 0 are about 2 mm and 8 mm, respectively [5, p. 331]. Two photons generated by an annihilation usually do not propagate in exactly opposite directions. This phenomenon is called noncollinearity of lineofresponse. Figure 1 5 is an illustration of noncollinearity of lineofresponse for a simplified scanner geometry. For a fixed direction that one of the two photons propagates, we refer to the angle between the opposite direction of the fixed direction and the actual direction that the other photon travels in as the Â“angle of noncollinearityÂ”. The distribution of the angle of noncollinearity can be approximated by a Gaussian distribution with a FWHM of 0.5 degree [5]. The angle between the path that a photon propagates and the face of a detector when the photon hits the detector is referred to as the incident angle. The way that a photon interacts with a detector depends on the incident angle. If the path that a
PAGE 16
8 (a) (b) Figure 1 6: Single and scatter: (a) Illustration of attenuation and a single, and (b) Illustration of a scattered coincidence. The arrow penetrating the detector ring denotes that the photon is scattered through an oblique angle such that it does not hit a detector. The dotted line denotes the incorrectly positioned lineofresponse. photon travels is not perpendicular to the face of a detector when the photon hits the detector, then it is possible that the photon does not interact with the detector that it strikes. The photon may interact with another detector that is nearby the detector the photon hits originally. This phenomenon is termed by detector penetration. Some methods are proposed to account for detector penetration [6,7]. When a 511keV photon propagates within a subject and interacts with an electron, the photon may undergo a phenomenon known as Compton scattering. When a photon hits an electron, the photon gives part of its energy to the electron and deflects from its original path if the photon has sufficient energy. This phenomenon is called Compton scattering. Compton scattering can lead to three kinds of error: attenuation, scatter, and accidental coincidence. Most scattered photons are scattered out of the scannerÂ’s fieldofview so that many of them are not detected. This phenomenon is called attenuation. Figure 1 6 (a) illustrates attenuation, where one of the two photons of an annihilation does not hit a detector due to Compton scattering. Figure 1 6 (a) also illustrates an event called a single, where only one of the emitted photons of an annihilation is
PAGE 17
9 (a) (b) Figure 1 7: Accidental coincidence: (a) Illustration of an accidental coincidence due to two annihilations occurring at almost the same time and (b) Illustration of an accidental coincidence due to Compton scattering. detected. Note that attenuation leads to an incorrect decrease in emission counts. In order to address attenuation, numerous correction methods have been proposed (see e.g., [8 10]). Consider an annihilation event where Compton scattering occurs. It is still possible that a detector pair detects both photons even though one or both of the photons may have undergone Compton scattering. Such an event is called a scattered coincidence or scatter, which is illustrated in Figure 1 6 (b). Note that a scattered coincidence leads an incorrect increase in emission counts. Since a scattered photon loses part of its energy, the energy of detected photons may be used to discriminate between unscattered photons and scattered photons [11, pp. 6569]. Two photons arising from different annihilations can be recorded by a detector pair. This event is called an accidental coincidence. Figure 1 7 (a) illustrates an accidental coincidence due to two annihilations occurring at almost the same time. Sometimes, an accidental coincidence may be due to Compton scattering as illustrated in Figure 1 7 (b). Like scatter, an accidental coincidence leads an incorrect increase
PAGE 18
10 ill emission counts. For many PET scanners, the mean accidental coincidence rate is estimated using a Â“delayedÂ” timing window technique [12]. The efficiency of a detector pair is defined to he the probability that a coincidence is recorded when a photon pair hit the detectors. Ideally, this probability should be one. However, the efficiencies of detector pairs are nonuniform because of their geometric differences and the nonuniform physical characteristics of detectors. The nonuniformity of detector pairs is referred to as detector inefficiency. To address detector inefficiency, correction methods such as [13 15] have been proposed. 1.4 System Model for Emission Data In 1982, Shepp and Vardi proposed a Poisson model for PET emission data and an algorithm, known as the maximum likelihood expectation maximization (MLEM) algorithm, for reconstructing maximum likelihood (ML) emission images [16]. In the Poisson model, the regionofinterest is divided into J equal sized volume elements, called voxels. Ultimately, the goal is to estimate the mean number of positrons emitted from each voxel. Let the i th component of a vector d. di, represents the observed number of photon pairs recorded by the i th detector pair and the j th component of a vector x, Xj, represents the unknown mean number of emissions from the j th voxel. Further, let I and J denote the number of detector pairs and number of voxels, respectively. The / x 1 vector d and the J x 1 vector x are the emission data and the unknown emission mean vector, respectively. Let Vij denotes the probability that an annihilation in the j th voxel leads to a photon pair being recorded by the i th detector pair. The I x J probability matrix V has Vij as its ( i,j) th element. In the Poisson model, Shepp and Vardi assumed that the emission data d is an observation of a random vector D with elements {Z?j}( =1 that are Poisson distributed and independent. For all i , the mean of the random variable D j is j E{Di} = ^ VjjXj. j = i ( 1 . 1 )
PAGE 19
11 In practice, the probability is unknown and must be estimated somehow. The simplest way to estimate the probability matrix V is the angleofview (AOV) method [16]. hi the AOV method, a detector ring and a detector are modelled by a circle and an arc on the circle, respectively. Moreover, within a voxel, all emitted positrons are assumed to be emitted from the center of the voxel. Figure 1 8 illustrates a detector ring, a detector pair, and the tube (i.e., spatial extent) that is defined by the detector pair. In the figure, the AOV from the point gj to the detector pair (y, z ) is also shown. Specifically, this AOV is defined to be AOV from gj to ( y , z) = I min {Zdygjby, Z a z g : b z , (tt /La y gjb z ) : (n Zb y gja z )}, jo, o, G tube (y, z) ] ( 1 . 2 ) otherwise Said another way, the AOV in (1.2) is the maximum angle for which a line that goes through the point gj will simultaneously intersect both detectors of the detector pair (y, z). In the AOV method, the probability V%j is defined to be AOV from gj to (y, z) 7T (1.3) where the detector pair (y, z ) is the i th detector pair and the point gj is the center point of the j th voxel. In the AOV method, it is assumed that a photon is detected by a detector whenever the photon hits the arc corresponding to the detector. Clearly, the AOV method does not account for detector penetration discussed in Section 1.3. Some methods have been developed to address errors due to detector penetration [6,7]. In this dissertation, we make the following mild assumptions on the probability matrix V and the emission data d. Â• (AS1) V has no row vector of zeros Â• (AS2) dVj ^ 0 for all j, where d is the transpose of d and Vj is the j th column of V.
PAGE 20
AOV from g, to (y, z) Figure 1 8: The angleofview from the point gj to the detector pair (y, z) is shown. The circle denotes a detector ring. The arcs ( a y ,b y ) and (a z , b z ) represent the detectors y and z, respectively. The area between the vertical lines (a y , b z ) and ( b y , a z ) represents the tube defined by the detector pair {y,z).
PAGE 21
13 To see the implication of the second assumption, consider all the detector pairs where the probability of recording a photon pair generated by an annihilation in the j th voxel is nonzero. Among the set of detector pairs, the second assumption implies that there exists at least one detector pair with nonzero emission counts. The ( AS2) is expected to hold whenever the duration of the emission scan is a reasonable length of time. In addition to (ASl) and (AS2), we assume that the probability matrix V accounts for errors due to attenuation, detector inefficiency, detector penetration, positron range, noncollinearity of lineofresponse, and scatter. In practice, there are correction methods for attenuation [9], detector inefficiency [15], and detector penetration [7]. However, other correction methods for detector penetration, attenuation, and detector inefficiency could be used in conjunction. Note that in Chapter 6, we do not assume that the probability matrix accounts for errors due to scatter and noncollinearity of lineofresponse. Instead, we present a method that estimates the probability matrix in such a way that errors due to scatter noncollinearity of lineofresponse are addressed. When accidental coincidences (i.e., randoms) are considered, the Poisson model must be modified so that now the emission data d is an observation of a random vector D that is Poisson distributed with mean (Vx + p), where the i th component of p, pi , is the mean number of accidental coincidences recorded by the i th detector pair, i = 1,2 Usually, it is assumed that the mean accidental coincidence rate p is known. In practice, the mean accidental coincidence rate is estimated using a Â“delayedÂ” time window technique [12]. Given the emission data d , mean accidental coincidence rate p, and probability matrix V, the problem of interest is to estimate the mean number of positrons emitted from each voxel. Since it is assumed that the data are independent, it follows that
PAGE 22
14 the likelihood function for emission data d is given by Pr{D = d\x) = n ^Â±Â£lt e \Px +PI , (1 . 4 ) 7=1 dl The ML estimate of x is defined to be the maximizer of the likelihood function over the feasible set. Alternatively, the ML estimate of emission mean vector is given by x ML = argmax T(a;) , (1.5) x>0 where T(x) = log Pr{D = d\x} is the log likelihood function: i ii ^i x ) = ^2 di log {[Px + p]i) YyP x \i + _ lÂ°g(^J)} (16) i = 1 i=l i = 1 (note: maximizing the likelihood function or log likelihood function are equivalent operations). Although the ML estimator has several nice theoretical properties [17, ch. 7], images produced by the ML method (i.e. , the MLEM algorithm) have the drawback that they are extremely noisy. This is due to the PET image reconstruction problem is illposed because of the facts that (1) scan times for data acquisition is short, (2) emission data contain errors due to attenuation, scatters, and accidental coincidences, and (3) the data obey Poisson statistics. Currently, the most popular way to address the illposed nature of the image reconstruction problem is through the use of penalty functions. Numerous penalized maximum likelihood (PML) methods (also known as Bayesian and maximum a posteriori methods (MAP)) have been proposed [18 35]. In the MAP method, x is assumed to be an observation of a random variable X with known distribution and the a posteriori distribution (i.e., conditional probability density function of X given D = d) is maximized. After some manipulations [17, p. 351], the MAP estimate is found to be the maximizer of the log likelihood function
PAGE 23
15 plus the log of the probability density function of X : x MAP = argmax T(a?) + logfl(x) , (1.7) x>0 where the function Q is the probability density function of X (i.e., prior distribution). It is through the prior distribution f! that MAP methods have the ability to regularize the image reconstruction problem. The form of the prior distributions commonly used in PET is Q(x) = where the function A is a scalar valued function and C and f3 are constants. The constant C > 0 is chosen such that the area under the distribution fl equals one. The constant f3 > 0, known as the penalty parameter, controls the penalty functionÂ’s degree of influence. Since it is known that PET images should be highly correlated, the penalty function A is designed in such a way that it forces the estimates of neighboring voxels to be similar in value. Given the definition of G, the PML or MAP estimate is the nonnegative minimizer of the PML objective function $(*) = Â— 'L(cc) + /3A(cc) . ( 1 . 8 )
PAGE 24
CHAPTER 2 LITERATURE REVIEW Although the MLEM algorithm by Shepp and Vardi [16] produces ML estimates of the emission means, the images produced by the ML method are extremely noisy due to the fact that the image reconstruction problem in PET is illposed. As discussed in Section 1.4, the reasons are that (1) scan times for data acquisition is short, (2) emission data contain errors due to attenuation, scatters, and accidental coincidences, and (3) the data obey Poisson statistics. One way to obtain PET images with sufficient smoothness is to terminate the MLEM algorithm before the log likelihood function is maximized. Of course, the resulting images are not ML images. Another modification of the MLEM algorithm is to first obtain an ML image and then filter it with a low pass filter. The drawback of this postfiltering is that it is not clear how the filter is chosen. A variation of filtering approach just described is to filter every MLEM iterate, which was suggested by Silverman [36]. Silverman did not provide an answer as to how the filter should be chosen. Denoising emission data (i.e. , observed data) is another way to regularize the PET image reconstruction problem [37,38]. Currently, the most popular way to introduce regularization is through the use of penalty functions that force the estimates of neighboring voxels to be similar in value. The basis for such penalty functions is that PET images should be highly correlated. The first part of the chapter is devoted to socalled penalized maximum likelihood (PML) algorithms. PML algorithms are algorithms that minimize PML objective functions, which are a sum of the negative log likelihood function and a penalty function. In PET, PML and maximum a posteriori (MAP) are terms that are used for methods that minimize PML objective functions. In Section 2.1, we briefly review existing PML algorithms. 16
PAGE 25
17 Reconstructed PML images are blurred by errors due to detector penetration, positron range, noncollinearity, and scatter. Errors due to scatter are difficult to correct because scatter depends on the activity and attenuation within the subject and the scanner design. In Section 2.2, some scatter correction methods are reviewed. The regularized image reconstruction algorithms and scatter correction algorithm we propose are compared with the existing algorithms in Section 2.3. 2.1 Penalized Maximum Likelihood Algorithms In 1990, Green [23] proposed a PML algorithm, known as the onesteplate (OSL) algorithm. The algorithm can be viewed as a fixed point iteration that is derived from the KuhnTucker equations [39, pp. 3649] for the PML optimization problem. Incidentally, Shepp and Vardi showed that the MLEM algorithm could be derived from the KuhnTucker conditions in a similar way. The OSL algorithm is straightforward to implement, but nonnegative estimates cannot be guaranteed and, like many existing algorithms, convergence is an open issue. LangeÂ’s goal [24] was to modify the OSL algorithm in such a way that the modified algorithm converges to the PML estimate. It should be pointed out that LangeÂ’s algorithm requires line searches, which can be computationally expensive. Alenius et al. [32] suggested a Gaussian Â“typeÂ” prior that depends on the median of voxels within local neighborhoods, and introduced an algorithm called the medianrootprior (MRP) algorithm. The MRP algorithm is based on an iteration dependent objective function. Consequently, it really cannot be considered a PML algorithm. Nevertheless, the MRP algorithm generates Â“goodÂ” images in the sense that noise level of the reconstructed images is suppressed. It should be mentioned that a PML algorithm was derived by Hsiao et al. [40] that resembles the MRP algorithm and performs similar to the MRP algorithm. The PML algorithm by Hsiao et al. was derived using a prior that is based on a certain auxiliary vector.
PAGE 26
18 Levitan and Herman [21] proposed a PML algorithm based on an assumption that the prior distribution of the true emission means was a multivariate Gaussian distribution. The assumption led to a penalty function that was in the form of a weighted leastsquares distance between x and a reference image. However, they did not indicate how the reference image to be chosen. An algorithm was proposed by Wu [27] using a wavelet decomposition formulation. Specifically, the author assumed that a vector consisting of the wavelet coefficients of the true emission means is a zeromean Gaussian random vector with a known covariance matrix. From this assumption, a prior distribution for the emission means was derived. The prior distribution is a zeromean Gaussian random vector with a covariance matrix that depends on the choice for the wavelet transform and the assumed distribution for the vector of wavelet coefficients. It should be pointed out that the assumption was not clearly justified in the paper. Researchers have used an optimization algorithm, called the iterative coordinate descent (ICD) algorithm [41, pp. 283287], to obtain estimates for various penalty functions [31], [42]. Convergence results are given for the penalized weighted leastsquares method [42] and both algorithms (i.e. , [31], [42]) enforce the nonnegativity constraint. Algorithms based on ICD algorithm update each voxel in a serial manner so that parallel implementation for them may not be possible. De Pierro [25,30] derives PML algorithms that minimize certain surrogate functions that he constructs by exploiting the fact that the log likelihood function is concave and penalty functions, such as the quadratic penalty function, are convex. Except for the quadratic penalty function, closed form expressions for the minimizers of the surrogate functions do not exist. Consequently, an optimization method, such as NewtonÂ’s method [43, pp. 201202], is needed to minimize the surrogate functions. De Pierro presents some convergence results, however the utility of his methods is unclear because no experimental results were provided. It should be noted that, in
PAGE 27
19 the transmission tomography paper by Erdo/jan and Fessler [8], a quadratic surrogate function was used for a certain class of penalty functions. The quadratic surrogate function was developed by Huber [44, pp. 184186]. A fast PML method, based on the ordered subsetEM algorithm [45], was proposed by De Pierro and Yamagishi [33]. The authors show that if the sequence generated by the algorithm converges, then it must converge to the true PML solution. Recently, Ahn and Fessler proposed algorithms [35] that are based on the ordered subsetEM algorithm [45], an algorithm by De Pierro and Yamagishi [33], and an algorithm by Erdogan and Fessler [10]. Like other algorithms based on the ordered subsetEM algorithm, there is some uncertainty as to how the subsets are to be chosen. In the paper by Ahn and Fessler, the algorithms are said to converge to the nonnegative minimizer of the PML objective function for certain penalty functions and their accompanying parameters by using a relaxation parameter that diminishes with iterations. Open issues are how the relaxation parameters should be chosen in practice and how they affect the performance of the algorithms. Convergence rate varies with the relaxation parameter. 2.2 Scatter Correction Methods Many methods have been proposed to correct scattered coincidences [11, ch. 3]. They can be classified into a few categories: (1) energy window method, (2) convolution/deconvolution method, and (3) calculating scatter distribution method. One of the scatter correction methods is based on the use of multiple (two or more) energy windows [46,47]. Recall that photons lose their energy when they have undergone Compton scattering. The principle of the method utilizing energy windows is to discard detection of a photon whenever the energy of the photon is less than 511Kev. Since detectors have finite energy resolution, there is a limitation of energy window based methods. Consequently, it is preferred to use another method jointly with the energy window techniques.
PAGE 28
20 Another correction method for scatter is the convolution/deconvolution method [48 51]. The methods in [48,49] assume that scattered coincidences (i.e. , scatter) can be approximated by a convolution of unscattered coincidences and a certain scatter function. Under the assumption, the mean scatter in the observed coincidences is estimated that can be subtracted from emission data or incorporated in the system model for emission data. The method by McKee et al. assumes that the distribution of scattered annihilations can be approximated by the convolution of the distribution of unscattered annihilations and some scatter response function [50]. The issue of the convolution/deconvolution method is that the distribution of unscattered coincidences (or annihilations) and scatter response function (or scatter function) are not known. Ollinger introduced a scatter correction method that calculates scatter distribution using an analytical equation, transmission images, emission images, and scanner geometry [52], Computational cost of the method is excessively expensive, thus it might be hard to be accepted in clinical use at the moment of writing. 2.3 Summary of the Proposed Algorithms Â• In Chapter 3, we present an algorithm that obtains PML estimates for a certain class of edgepreserving penalty functions. The PML algorithm is derived by combining the convexity idea by De Pierro [25, 30] and HuberÂ’s surrogate functions [44], Combining two existing theories in such a way that the PML algorithm is convergent is new to PET community. In theory, the algorithm guarantees nonnegative iterates, monotonically decreases the PML objective function with increasing iterations, and converges to the solution of the PML problem. In practice, it is straightforward to implement (i.e., no additional hyperparameter and no need of line search) and it can incorporate many edgepreserving penalty functions.
PAGE 29
21 Â• In Chapter 4, we develop an accelerated version of the PML algorithm by rising the pattern search idea of Hooke and Jeeve [41, pp. 287291]. Using this approach, we solve a constrained problem at each pattern search step that leads to improved convergence rates. A modification of Hooke and JeeveÂ’s direction vector is also introduced that improves performance. It should be mentioned that Hooke and JeeveÂ’s method has not been used in PET image reconstruction. The proposed algorithm inherits the nice properties of the PML algorithm and converges to the minimizer of the PML objective function. In experiments, the accelerated algorithm needed less than about one third of the CPUtime that was necessary for the PML algorithm to converge. Â• In Chapter 5, we propose a regularized image reconstruction algorithm, referred to as the quadratic edge preserving (QEP) algorithm, that aims to preserve edges through the use of certain newly developed decoupled penalty functions that depend on the current estimate. The QEP algorithm was motivated by the analysis of the PML algorithm. The algorithm by Alenius et al. [32] also uses an iteration dependent objective function. However, it should be mentioned that the algorithm uses the OSL algorithm to generate the next iterate. The drawback of Alenius approach is that the OSL algorithm does not guarantee convergence. Â• In Chapter 6, we propose a model for emission data where an unknown matrix, called a scatter matrix, is introduced. The model aims to account for errors due to scatter and noncollinearity. Based on the model, a certain minimization problem is constructed that allows for the scatter matrix and emission mean vector to be jointly estimated. Since the minimization problem is impossible to solve, we propose an algorithm that greatly reduces the number of unknowns in the scatter matrix and alternately estimates the scatter matrix and emission mean vector. It should be mentioned that Mumcuoglu et al. [53] used the same
PAGE 30
22 model. However, they assumed that the scatter matrix is known and accounts lor errors due to detector penetration as well. Their scatter matrix was obtained through MonteCarlo simulations.
PAGE 31
CHAPTER 3 PENALIZED MAXIMUM LIKELIHOOD ALGORITHM Although the ML estimates of the emission means are available by using the MLEM algorithm [16], as discussed in Section 1.4, the resulting PET images are extremely noisy due to the fact that the PET image reconstruction problem is illposed. This is because of short scan times, errors in the emission data, and the fact that the data obey Poisson statistics. The most popular way to address the illposed nature of the PET image reconstruction problem is through the use of penalty functions. Penalty functions used in PET are designed in such a way that estimates lor the emission means of neighboring voxels are forced to be similar in value, unless there is an Â“edgeÂ” within neighborhood. By an edge, we mean that there is a group of connected voxels that have significantly greater activity than the other voxels in neighborhood. For example, suppose there is only one voxel with significantly greater activity than the other voxels in its neighborhood. Then, we would say that there is no edge within the neighborhood. Simply stated, penalty functions provide a means for reconstructing PET images that have considerably less noise than MLEM images, yet retain edges (e.g., tumors) which may convey important information. In Section 3.1, we first derive an algorithm, called the penalized maximum likelihood (PML) algorithm, that incorporates a wide class of edgepreserving penalty functions. Then, we prove that the PML algorithm converges in Section 3.2. Finally, we summarize the properties of the PML algorithm in Section 3.3. It should be mentioned that we presented the PML algorithm in [18] without a proof of convergence. Our proof of convergence can now be found in a recent manuscript [19]. 23
PAGE 32
24 Figure 3 1: Onedimensional illustration of the optimization transfer method. At each iteration, a surrogate function is obtained and a minimizer of the surrogate function is defined as the next iterate. Ideally, it is Â“easyÂ” to get the minimizer of the surrogate function. 3.1 Penalized Maximum Likelihood (PML) Algorithm The problem of interest is to determine the nonnegative minimizer of the PML objective function $(*) =  (x)+f3A(x) (T is defined in (1.6)), where A is a penalty function that forces emission mean estimates of neighboring voxels to be similar in value. In other words, we want to solve the following optimization problem: (P) 2! PML = arg min $(a:) . x>0 The penalty functions we consider are of the form j M x ) = 51 Y! u i k 9(?j, x k) , (3.1) 3 = 1 keNj where Nj is a set of voxels in a neighborhood of the j th voxel, the constants {a ij k } are positive weights for which u jk = w kj for all j and k, and g(s, t ) = A(s t) whereby the function A satisfies the following assumptions: Â• (AS3) A(i) is symmetric Â• (AS4) A (t) is everywhere differentiable
PAGE 33
25 Â• (AS5) A (t) Â— j t \ (t) is increasing for all t (assumption implies that A is strictly convex) Â• (AS6) 7 (t) = is nonincreasing for t > 0 Â• (AS7) 7(0) = liin7(f) is finite and nonzero Â• (AS8) A (t) is bounded below (assumption implies that A(x) is bounded below). Examples of functions that satisfy (AS3)(AS8) are the quadratic function A(f) = t 2 and GreenÂ’s logcosh function A(f) = log(cosh(f)) [23]. Regarding the neighborhood Nj, the j th voxel is excluded from the set Nj and, if the k th voxel is in Nj, then the j ,h voxel is in N^. A common choice for Nj is the eight nearest neighbors of the j th voxel. Since it is not possible to get a closedform solution to the minimization problem (P), iterative optimization methods are necessary. The PML algorithm we propose is based on the optimization transfer method [10,25,30,34,54] where, at each iteration, a function that satisfies certain conditions is obtained and the next iterate is defined to be a minimizer of the function. The function found at each iteration is referred to as a surrogate function for the function to be minimized. This idea is illustrated with the onedimensional example in Figure 3 1. In the figure, the problem is to find the minimizer of the function /, which is t*. It is assumed that a closedform solution is not available to the minimization problem. Given an initial guess AÂ°\ a surrogate function /(Â°> that depends on < (0) is determined. Then, the next iterate is generated by finding the minimizer of /iÂ°b To get the following iterate t^ 2 \ a surrogate function that depends on is obtained and then minimized. These steps are repeated until some convergence criterion is met. For a vector argument t, a surrogate function / (n) satisfies the following conditions: Â• (Cl) f^ n \t) > f(t) for all t Â€ {domain of /} Â• (C2) /G)(tG)) = /(Â£(Â«))
PAGE 34
26 Â• (C3) V/ (n) (< ( " ) ) = V/(Â£ (n) ), where t 1 "' 1 is the n th iterate, V denotes the gradient of a function, and the superscript (n) indicates that the functions {f^ n) } and the iterates {Â£ (n) } depend on the iteration number. The next iterate Â£ (n+1) is defined to Ire a minimizer of / (n) : t ( " 1 = argrmn f^ n \t) subject to Â£ Â€ {domain of /} . (3.2) Defining the iterates in this way insures that the objective function / decreases monotonically with increasing iterations. To prove this fact, we first note that /(* (n+1) ) < / (n) (Â£ (n+1) ) by (Cl). Since < /(")(Â£(">) by (3.2), it follows by (C2) that for all n, /(Â£ (n+1) ) < /(Â«)(Â£(Â«+!)) < /( n ) (Â£<Â”)) = /(Â£(")) . (3.3) It should be mentioned that (C3) is not necessary for the monotonicity in (3.3). However, (C3) is often needed in order to prove an algorithm that utilizes the optimization transfer method converges (see [18,25,30]). Although the optimization transfer method is straightforward in principle, the difficulty in practice is that it may be difficult to find surrogate functions that satisfy the conditions (Cl), (C2), and (C3). De Pierro developed surrogate functions for the negative log likelihood function by using the convexity of the negative log function [25,30]. His idea is based on the following property of convex functions [55, pp. 860861]: For a convex function /, f ( a jtj) Â— a j / (tj) (34) j j where Yhj a jtj Â€ {domain of /}, tj Â€ {domain of /} for all j, aj > 0 for all j, 'Yjjdj = 1, and {domain of /} is a convex set. Specifically, De Pierro utilized the
PAGE 35
27 following inequality in ML estimation where /(f) = log(f): Â’Pijx'j" pPx<Â”>] /(MO = / Â£ [PsWjj x ( n ) < y VijX ^ } f( ^ x{n) ^ x \ (3.5) (3.6) where x<") is the n th iterate of x. V i: > 0, x 'J n) > 0, and [Px^] t Â± 0 for all i, j, and n. Let /,(*) = f([Px) j) / 7 'w * (3.7) (3.8) Then, it is straightforward to see that, for all i, (1) fl n \x) > fi(x) for all x > 0, (2) / n) (xW) = ^(xW), and (3) V/ (n) (xW) = V/^W). Thus, for all i, f\ n) is a surrogate function for /;. Although De Pierro developed the surrogate functions for the log likelihood function under the assumption of no accidental coincidences (i.e., p = 0), the surrogate functions can be easily modified to account for accidental coincidences. Observe that [Px + p\i can be written as a convex combination \vx+p }y VijX ^ ] \ Vx(n) + d [P x(n) + ph y Xj + pi [V a?( n ) + p], [PxW + p],. (3.9) Using the convexity of the negative log function and the fact that J v .J") , Pi 1 [p x (n) pj . + p]j Â’ we have the following inequality for /(f) = Â— log(f): V,iX { r ] ( [Px (n) + p\i (3.10) /(PH.) < Â„(n) Â— ( [PxW Ip\i [Va :M + p\ i f (j rxin) + p]l ) Â• (3.11)
PAGE 36
28 Given the inequality in (3.11), the surrogate function at the n th iteration for the negative log likelihood function Â—'I' can be expressed as V\j x j (n) â€¢ n \x) = V { [Px]i V di ... ^ I J y [TUcH + p] t lÂ°g(^j) } + c [ n) , (3.12) i=l where C i n) = {pÂ»+ 1 Â°g(d i !)daog([iPa : (w) + pj i )+y di { log(af)} . (3.13) L l Vx( n >\i + pi J It is straightforward to show that the surrogate function \lA n ) satisfies (Cl), (C2), and (C3): (1) \k (n )(a:) < 'F(x) for all x > 0, (2) vp( Tl )( x (")) = ^/(aA")^ and (3) W (n )(Â® (n) ) = V'k(a: (n) ). Since surrogate functions for the negative log likelihood function Â— are available, we only need to find surrogate functions for the penalty function A (a?) = Sj 12 k Â— x k ). Under assumptions (AS3)(AS8), Huber developed a surrogate function for A in [44] (see also [8]). Given an arbitrary point t^ n \ HuberÂ’s surrogate function for A, which is defined by A {n \t) = A y>) + A {t (n) ){t i<">) + f (n) ) 2 , (3.14) has the property that A (n) (t) > A (t) for all t (see Appendix A), A (n) (t (n) ) = A (t (n) ), (n) . and A (v n Â’) = A(UÂ’h), where the dot over a function represents its first derivative. For Â— x^\ it follows that a surrogate function for A is a w (*) = E E , j = 1 keNj (3.15) where ^(s^) = A ( Â”)( s _ ^ Using A^ n ) as a starting point, we will now construct a surrogate function for A that has a more convenient form. By the convexity of the square function, we have 9
PAGE 37
29 the following inequality It should be mentioned that De Pierro [30] and Hsiao et al. [34] utilized this property of the quadratic function in the PET, and Erdo^an and Fessler [10] applied it to a nonquadratic convex function for transmission tomography. Motivated by (3.16), we define 9 {n) (xÂ„x k ) = \{xÂ™xÂ™) + \(x (n) Â„(Â«K ( n ) X (n) ) ( X J X (n) ) ( x k X M) + Mxf ] x[ n) ) [(2 xj 2x [ J l) f + (2x k 2x[ n) ) (3.17) By construction, the following statements are clear: (1) g (n \xj,x k ) > g(xj,x k ) for all xj and x k from (3.16), (2) g (n) (x { ^ x[ n) ) = g{xf\ x^), (3) Â£g (n) {x^ , x[ n) ) = Â£~g( x< j \ x k *)> an d (4) ^g (u) {x * n) ,x[ n) ) = ^g{Xj U \x [c*). The difference between 9^ and g ^ is that g is decoupled in the sense that it does not have terms of the form XjX k . This difference is important because, as we show later, using g ^ enables us to construct surrogate functions for ( E> that have closed form expressions for their minimizers. Using g (n \ an alternative surrogate function for A is j A (n) (x) = u j k g (n \ x jiX k ) (3.18) j = 1 keNj It is clear that, by construction, A^ satisfies the following properties: (1) A^ n ^(cc) > A(Â®) for all x, (2) A (n >(Â®( n )) = A(a: (n )), and (3) VA (n )(Â®( n )) = VA(cc ( ' l) ). Now, using 'P (n) and A (n) , the desired surrogate function at the n th iteration for 4> is $ {n) (*) = T (n) (Â®) +/?A (n) (aO . (3.19)
PAGE 38
30 From the properties of and A (n) , it follows that the surrogate function (n) possesses the requisite properties: Â• (PI) for all x > 0 Â• (P2) = $(* (u) ) Â• (P3) V (n )(a;(Â”)) = V$(*( n )). Given x^ n \ the next iterate a; (n+1) is found by minimizing 0 (3.20) $(a: (n) ) > $(* (n+1) ) for all n. (3.21) j (3.22) 3=1 k&Nj 4 n) x*5 n) 4 n) ) (3.25) (3.24) (3.23) where
PAGE 39
31 (see proof in Appendix B). Since tA") and are decoupled, (n) (a;) = â€¢ n \x) + /3A {n \x) 1 ( 1 v = E Â‘ 1 (3.26) where i = 1 Â“ ' [VxW + p]i log (Xj) (n) +2p T,T. h 'pÂ’^i)+c++pc^ j = 1 keNj J (I I sp (n) E{^Ep u iog(x j) :d. I ^] (3.27) (Â«) i ar<(. n ) E 2/3 E W(4"Â’ 4"Â’) (4 2 m 0 where = E ( ; ] log (t) + Fj n) t 2 + Gf ] t . (3.34) (3.35)
PAGE 40
32 Fortunately, the function (f)^ is strictly convex for all j and n under the assumption that Xj > 0 for all j and n. We will prove this statement by showing that the second derivative of ^ is positive, when > 0 for all j and n. First, note that Ej n) is negative and p'f' is positive for all j and n. The fact that E ( l) > 0 is due to the fact that dVj^O (see (AS 2 ) in Section 1.4) and the assumption that r^ 71 ' 1 > 0. The fact that Fj n) > 0 follows from the positivity of the function 7, weights {ujjk}, and penalty parameter /3. To see why 7 (t) > 0 for 00 < t < 00, recall that A (t) is a symmetric, strictly convex function by (AS 3 ) and (AS 5 ). It follows that A (t) > 0 over (0, 00) and A (t) < 0 over (Â— 00, 0). Using the fact that 7(0) is finite and nonzero (see (AS 7 )), we have that 7 (t) > 0 for Â—00 < t < 00. Now, consider the second derivative of <^ n) . Easy calculations show that 0 and E I J "' 1 < 0, it follows that the second derivative of 0, and, from (3.29), T (n) is strictly convex over the set {x : x > 0}. Since is decoupled, is strictly convex, and ) Â— > 00 as t Â— > 0 + , it is true that cc (n+1) > 0 and <^ n) (xJ n+1) ) = q . ( 3.36) Note that (3.36) satisfies our assumption that > 0 for all j and n. To solve (3.34), we compute the first derivative of <^ n) and set it to zero. Since E^ n) < 0 and Fj n) > 0, the root of the quadratic equation that preserves the nonnegativity constraint is G[ (n+l) _ ?r > + ji G (n)2 _ 8E Wp (n) 7r(n) 4 F (n) , j = 1,2, ...,J . (3.37) Observe, as f3 Â— > 0, (3.37) approaches Â—E^/ Pij by LÂ’HospitalÂ’s rule. Thus, the iteration in (3.37) is equivalent to the MLEM algorithm when (3 = 0 .
PAGE 41
33 In summary, given a strictly positive initial estimate a: (0) > 0, the steps of the PML algorithm are: for n = 0, 1, 2, . . . Â• Step 1 Let cc (l)) > 0 be the initial estimate Â• Step 2 Construct the surrogate function 4> (n) from the current iterate x (n) using (3.29), (3.30), (3.31), and (3.32) Â• Step 3 Get x n+l using (3.37) Â• Step 4 Iterate between Steps 2 and 3 until some chosen stopping criterion is met. 3.2 Convergence Proof Using (P1)(P3) and (AS1)(AS8), we now prove that the PML algorithm converges. The following convergence proof is based on the convergence proof by Lange and Carson [56] (see also [30]) of the MLEM algorithm by Sliepp and Vardi [16]. By (3.21), the PML algorithm has the property that it decreases the objective function $ with increasing iterations: Â• (P4) $(a;( n+1 >) < $(*(")) for all n > 0. Another property of the algorithm is that Â• (P5) the sequence {4>(a: (n) )} is convergent. This property follows from (P4) and the fact that 4> is bounded below by (AS8) (see [57, Theorem 1.4, p. 6]). Proposition 1 The sequence {a^")} is bounded. Proof : From (P4), it follows that 0. Consider the set B = {x > 0 : 4>(cc) < 4>(cc (0) )}. Then, clearly {a:^} C Â®. So, to prove {x^ n) } is bounded, we will prove that the set B is bounded. It is straightforward to see that B is bounded below by 0. Now, suppose that B is not bounded above. Then, there exists a point z e B such that z oo. This result means that for some j there exists Zj such that zj oo. Since (AS2) implies that V has no column vector of zeros, it follows that [Pz\i ^ oo for some i and $(z) oo. This implies that z is not
PAGE 42
34 an element of the set IB because all the elements ol B have an objective value that is less than or equal to < oo, which is a contradiction. Therefore, the set B is bounded above. Proposition 2 There exists some constant cj > 0 such that $(x(")) Â— (x( n+1 )) > Cj x (n) Â— x (n+1)  2 for all n > 0. Proof : By (PI), (P2), and (3.29), we have the following inequality j J (x (n) ) $(x (n+1) ) > $< n >(x (n) ) 4> (n) (x ( " +1) ) = ]T {^ n) (^ n) } ^ n) (xJ" +1) ) . 3 = 1 (3.38) Suppose, for each n, the function 4>^\t) is expanded into a secondorder Taylor series [55, pp. 868869] about the point x jÂ” +1) and evaluated at t = xj n) . Then, the right hand side of (3.38) can be written as $(Â»)(*("))$(")(*("+!)) = J2{ (4 n) ^ n+1) )^ n) (4Â” +1) )+^ (^ n) xf +l) ) 2 4>f ] (xf +1) ) } , 3 = 1 2 (3.39) where the double dot over a function represents its second derivative and a 1 ' 1 1 is a point between x { l) and x { J 1 ' u . Since <^ n) (xj n+1) ) = 0 by (3.36), it follows J 1 $ W (x (n) ) (n) (x (n+1 )) = J] xf +x) Y'f\xf +l) ) . (3.40) 3 = 1 Now, recall that $ n) (t) = ( E$ n) /t 2 + 2 Fj n) ) with F$ n) = 2(3 Z keNj ^(xf x< n) ) and < 0. Since {x^} is bounded and 7 (t) > 0 is a continuous function for 00 < t < 00 , there exists a number 70 > 0 such that 7 (xj n) x[ n) ) > 70 for all j , k, and n. Hence, F j n ^ > cj for all j and n, where Ci = 2(3^q min Ylk<=N > 0Therefore, 2ci for all j and n, and we obtain the desired result 4>(x (n) ) $(x (n+1) ) > ci x (n) x (n+1)  2 . (3.41)
PAGE 43
35 From (P5) and Proposition 2 , it follows that Â• (P6) the sequence {ad'd Â— ad"+d} converges to 0. The following proposition will be used later to prove not only that a limit point of the sequence {cc ( ,l) } satisfies one of the KuhnTucker conditions [55, p. 777] but also that the sequence {x 1 ' 1 ' 1 } has a finite number of limit points. Proposition 3 Let x* be a limit point of the sequence jad'd}. Then, for all j such that x* ^ 0, ^f {x] = 0 x=x * (342) Proof : By Proposition 1 , there is a subsequence jadÂ”d} that converges to x* (see the BolzanoWeierstrass theorem in [58, p. 108]). By (P6), the subsequence {ajlni+i)} a i so converges to x*. Recall from (3.29) that 4>^ l \t) = Â£Â•"'> /t + 2 + G^ l \ If x* ^ 0, it follows that E^/x^ and / x^ l+l>> converge to the same limit, and hence lim 0("' ) (x( n ' ) ) = lim ^"' ) (a:( n ' +1) ) . (3.43) lÂ—> oo J J lÂ—> oo J J Since 4>f l) {xf l+1) ) = 0 by (3.36), it follows from (P3) that _d_ dx $(cc) X=XÂ‘ = 0 for all j such that x* ^ 0 (3.44) Using Proposition 3, we can prove the following proposition, which will be used to prove that the sequence jad'd} converges. Proposition 4 The sequence jad'd} has a finite number of limit points. Proof : Consider the following sets Y = {1, 2, . . . , J} Z* = {j 6 Y : x* = 0} Z** = {je Y : x** = 0} , (3.45) (3.46) (3.47)
PAGE 44
36 where x* and x** are limit points of {ad n) }. Now. let the function *(cc) be the restriction of 4>(a;) to the reduced parameter set W* = {a; > 0 : Xj = 0 for j 6 Z*}. Since ( f>*(a;) is strictly convex over a convex set, the unique minimizer of *(ai) is its only stationary point [59, Proposition 2.1.2, p. 87]. Thus, by Proposition 3, there is only one limit point of {ad n) } in the set W*. In other words, if Z* = Z**, then x* = x**. Therefore, the number of limit points are bounded above by the number of subsets of â€¢, which is clearly finite. Theorem 1 The sequence {ad")} converges to the unique minimizer o/T. Proof : Let x* be a limit point of the sequence {ad")}. Using the theorem in [60. p. 173], which says that the set of limit points of a bounded sequence {ad")} is connected if {ad") Â— ad" +1) } Â— > 0, we obtain the fact that the set of limit points of {ad")} is connected by Proposition 1 and (P6). Since the number of limit points of {ad n) } is finite by Proposition 4, {ad")} has only one limit point. Thus, {ad")} Â— > a;*. Now, note that the PML objective function <4> is strictly convex on the set {a: : x > 0 } (see Appendix C), so there is only one minimizer. To prove the sequence {ad")} converges to the unique minimizer, we need to show that x* satisfies the KuhnTucker conditions [55, p. 777]: for all j, x >L* {x) * x t > 0 (3.48) x=x Â• = 0 (3.49) k* (x) x=x * > 0 . (3.50) Since ad") > 0 for all n, it must be true that the limit point x* is nonnegative (i.e., (3.48) is satisfied). By Proposition 5, _d_ dxj 4>(a:) x=x = 0 for all j such that x* ^ 0 . (3.51)
PAGE 45
37 So, (3.49) is satisfied. Now, we consider the case x* = 0. For j such that x* = 0, suppose d $(* dxj < 0 x=x* (3.52) Then, it follows that lim j>[ n \xP) < 0 by (P3), and (f) { J l> (x ( J l> ) < 0 for sufficiently n Â— J J J J large n. Consider $ n) (z$ n+1) ) = /xf +l) + 2 Fj n) x'? +1) + GÂ™ . If z} n+1J < xf\ A n ) /^.( n + 1 ) (Â«)^.( n + 1 ) . r<( n ) Jn+l) (n) then (n+l) ri(Â«) r Â— H ttt + 2F} n) ^ n+1) + G' n) > 0 Â„(Â«) (n+l) J J J (3.53) because ^"\x^ n+1 ^) = 0 by (3.36) and /x^ +V> < 0. Moreover, the fact > 0 implies that E (n) hr + + G<Â”> = $\zf ') > 0 , x\ (3.54) which is a contradiction. Thus, x^ n 1 1 ^ > xp ] for all sufficiently large n. ffowever, this contradicts the fact x ( p Â— 0. So, it is true that x=x * > 0 for all j such that x* = 0 (3.55) This satisfies (3.50). 3.3 Properties of the PML Algorithm We now provide a summary of the desirable properties of the PML algorithm: Â• The PML algorithm is straightforward to implement because there are no hyperparameters required for the algorithm itself and it has closedform expressions for the iterates. Some algorithms require hyperparameters, such as relaxation parameters, in addition to the penalty parameter [33,35], while others [24,30] do not have closedform expressions for the updates. Â• The PML algorithm theoretically guarantees nonnegative iterates, whereas some algorithms [33, 35] set any negative element of the iterates to a small positive number.
PAGE 46
38 Â• The PML algorithm monotonically decreases the PML objective function unlike the algorithms in [23,35]. Â• The PML algorithm can incorporate a large class of edgepreserving penalty functions unlike the algorithm by De Pierro [30]. Â• The PML algorithm converges to the minimizer of the PML objective function. Convergence proofs for the algorithms in [23, 33] are not available.
PAGE 47
CHAPTER 4 ACCELERATED PENALIZED MAXIMUM LIKELIHOOD ALGORITHM Although the PML algorithm presented in Chapter 3 converges to the nonnegative minimizer of the PML objective function, it has the drawback that it converges slowly. In PET, a popular way to accelerate iterative image reconstruction algorithms is through the use of so called orderedsubsets [45]. In orderedsubsets based reconstruction algorithms, the observed data, d. is divided into a predefined number of subsets via some chosen rule. Then, the iterative reconstruction algorithm to be accelerated is applied sequentially to each data subset. In [45], Hudson and Larkin developed the first PET image reconstruction algorithm that used the orderedsubsets idea. Since the MLEM algorithm was applied to each data subset, they called their algorithm the orderedsubsets expectation maximization (OSEM) algorithm. In [61], Browne and De Pierro showed that the OSEM algorithm did not converge and introduced another orderedsubsets based image reconstruction algorithm that employed a relaxation parameter. It should be pointed out that some convergence results are available for ordered subsets based algorithms that use relaxation parameters [33,35,61], With orderedsubsets based algorithms, there is uncertainty as to how many subsets to be used and how the data should be divided. Moreover, it is not clear how relaxation parameters should be chosen in practice because, generally, they depend on the data. In this chapter, we introduce an accelerated version of the PML algorithm, referred to as the accelerated PML (APML) algorithm, that uses a pattern search suggested by Hooke and Jeeve [41, pp. 287291]. A pattern search has also been exploited to accelerate an algorithm in the transmission tomography [62], In Section 39
PAGE 48
40 Figure 4 1: Twodimensional illustration of the sequence {x (n) }The single circles and double circles denote the accelerated iterates {x (n ^} and standard iterates {*(Â”)}, respectively. Each ellipse represents a set of points that have same cost. The mark x denotes the minimizer of the function subject to the constraints x x > 0 and x 2 > 0. 4.1, using the mathematical ideas in the convergence proof of the PML algorithm, we show that a sequence that satisfies certain conditions converges to the minimizer of the PML objective function. Then, we use this result to prove that the APML algorithm, which is developed in Section 4.2, converges to nonnegative minimizer of the PML objective function. In Section 4.3, we introduce the direction vector to be used in the pattern search. Finally, we summarize the properties of the APML algorithm in Section 4.4. It should be mentioned that we introduced the APML algorithm in [19], 4.1 Convergence Proof In this section, we prove that the sequence cc (1 \ a; (2) , x^ 2 \ . . .} converges to the minimizer of the PML objective function $, where x < ' 0 ' 1 > 0 is an initial guess, a; (n+1 > = argmin 4> ( n \x ) subject to x > 0, 4> (n) is of the same form of the surrogate function for the PML objective function $ at the iterate x (n \ 4> (rl) in (3.29), except that 4^ J is defined at the point x ^ instead of x and the point x (n) > 0 satisfies the following conditions for all n Â• (C4) $(*<")) < 4>(cc^)
PAGE 49
41 Â• (C5) there exists some constant c 2 > 0 such that $(*<")) c 2 a;( n > x (n)  2 . This convergence result will from the basis for the convergence proof of the APML algorithm. Note, the strict positivity of x {n ' ) for all n is necessary because the surrogate function for the PML objective function, 4> , is undefined for vectors with zero or negative elements. An example of such a sequence {x^} is illustrated in Figure 4 1. In the figure, the single circles and double circles represent the accelerated iterates and standard iterates {a^ n )}, respectively. First, note that the following convergence proof is mainly based on the convergence proof of the PML algorithm in Chapter 3. Since 4> is bounded below and the sequence {x^} monotonically decreases the PML objective function 0 : 4>(a;) < 4>(5 (0) )}. Since the set B is bounded (see Proposition i), it follows that Â• (P8) the sequence {x^} is bounded. Now, note that cc (n+1 ) is the minimizer of the surrogate function 4> l, ) , which satisfies (1) 4> {n) (*) > $(*) for all x > 0, (2) l> (n) (:K (n) ) = ), and (3) VlÂ» (n) (Â® (n) ) = V4 >(ce^). Thus, by (P8) (see Proposition 2), we have the following property: Â• (P9) there exists some constant C3 > 0 such that d>(Â®^) Â— d>(cc( n+1 )) > c 3 * (n) cc(" +1 ) 2 . By (P7) and (P9), we obtain the property Â• (P10) that the sequence {x^ Â— ad n+1 )} converges to 0. Also, by (P7) and (C5), it must be true that the sequence { x ^ Â— x^} converges to 0. Thus, from the fact that  x (7l) 5 {n+1)  2 < * (n) x ^ n+ ^\\ 2 +  x (" +1 ) Â£ (n+1)  2 , the property follows: Â• (PH) the sequence { x ^ Â— Â£^ n+1 ^} converges to 0.
PAGE 50
42 For the discussion to follow, consider the surrogate function for the PML objective function at the iterate P n \ +j \t) = Â£']" ) log(f) F Fph 2 + G^pt for t > 0, Cp ] is independent of x, and 'PijXj (n) tr [vx^+ P \ t kzNj = A ~ ^ E 4Â” ) )(4" ) + 4"Â’ i = i fceWj (4.2) (4.3) (4.4) (note: $ l \ Ep\ Pj"\ Gp\ an< ^ resu ^ by substituting x for x in (3.29), (3.30), (3.31), (3.32), and (3.33), respectively). To prove that the whole sequence {%(")} converges, as done in the convergence proof of the PML algorithm, we first present the following proposition: Proposition 5 Let x* be a limit point of the sequence {x^}. Then, for all j such that x* ^ 0, _d_ dxj $(x) = 0 . x=x* (4.5) Proof : By (P8), the sequence {x (,l) } is bounded and there is a subsequence {x^} that converges to x*. By (P10), the subsequence {x^ 71 '" 11 ^} also converges to x*. Recall that Â• (4.6) lim _ lÂ—> oo 4>j (xpÂ’) = P (x { p +1) ) J J 1* oo J J i(ni) dxj (x) x=x = 0 for all j such that x* ^ 0 . Since 4>] (x< n,+1) ) = 0 by (3.36), and V$'Â”' ; (x (ni) ) = V4>(x ( Â” i} ), it can be said that d (4.7)
PAGE 51
43 Theorem 2 The sequence {x^} converges to the unique minimizer o/4>. Proof : Let x* be a limit point of the sequence {ct^ n) }. Since the set of limit points of the sequence {x (n) } is connected by (P8) and (Pll) (see [60, p. 173] and Theorem 1 in Chapter 3) and there are a finite number of limit points of {ai^} by Proposition 4 and Proposition 5, it follows that there is only one limit point in the set of limit points. Thus, { x Â— Â» x*. Since lim{i^ Â— cc* 2 } = 0, by (P10) we have lim{a:( n+1 ) Â— Â£c* 2 } = 0 (note: z: (n+1 ) Â— a;* 2 <  aJ ( rl + 1 ) _ x^ n ^ 2 + ai^ Â— x* 2 ). Hence, {x^" +1 ^} Â— > x* (i.e. , { x ( "^} Â— Â» a;*). Therefore we can deduce that the whole sequence {x^} x * To prove the sequence {x ( T} converges to the unique minimizer of the PML objective function, we must show that x* satisfies the KuhnTucker conditions (i.e., (3.48), (3.49), and (3.50)). Since all the points in the sequence {x^} are positive, it must be true that the limit point x* is nonnegative (i.e., (3.48) is satisfied). By Proposition 5, l; Hx] X=XÂ‘ = 0 for all j such that x* ^ 0 . (4.8) Thus, (3.49) is satisfied. For j such that x* = 0, suppose < 0 . ^Â—$(x) dx Then, it follows that lim 0( x ^) < 0 by the property "'(x [n) ) Â— V<3>(al w ), and (f)j (x^) < 0 for sufficiently large n. Consider ^ (xj" +1 ^) = / Xj H+1 ^ + 2F\ n) xf +l) + Gf\ If x' n+1) < xf\ then (n+l) r(n) 2 + 2 F (n y." +1) + G { " ] > 0 x=x * (4.9) (4.10)
PAGE 52
~ (74) because ^ (x^ n+1) ) = 0 by (3.36) and Ej n ' > /xj l+1 ' > < 0. Moreover, the fact that Fj is positive implies that ?(n) X (n) i orÂ’( n )~( n ) i /lira) .(n)/~(n)\ ^ n + 2 FÂ’Xj + Gj Â’ = Cfj ) (Xj)> 0 , (4.11) which is a contradiction. Thus, > x ^ for all sufficiently large n. However, this contradicts the fact Â— Â» 0. Therefore, x=x * > 0 for all j such that x* = 0 . (4.12) This satisfies (3.50). 4.2 Accelerated Penalized Maximum Likelihood (APML) Algorithm In Section 4.1, we showed that the sequence {x^} = {a^Â°\ x^ 2 \ x^ 2 \ . . .} converges to the nonnegative minimizer of the PML objective function $ if: (1) Â£c (0) > 0, (2) a4 n+ 0 is the nonnegative minimizer of the surrogate function for the PML objective function at the iterate cc (n) , I*'"*, and (3) > 0 satisfies (C4) and (C5). fn this section, we present an algorithm that produces such a sequence {x (n) }First, consider the following steps: given an initial guess x (0 ^ > 0, for n = 0 , 1 , 2 ,... Â• Step 1 Get the standard PML iterate cc^" +1 ^ = argmin ^^(a:) subject to x > 0. Â• Step 2 Get the accelerated PML iterate = x^ n+1 ^ + ai n+1 ' , i/ n+1 ), where n^ n+1 ^ ^ 0 is the chosen search direction (i.e., direction vector) and q.G+ 1) = argmin (a^" +1 ) + at/ n+1 )) . (4.13) a Â• Step 3 Repeat the steps above until some chosen stopping criterion is met. With t/ n+1 ) = a4" +1 ) Â— x^ n \ Step 2 is the pattern search step put forth by Hooke and Jeeve [41, pp. 287291].
PAGE 53
45 We now modify Step 2 so that the sequence produced by Steps 1. 2. and 3 converges to the nonnegative minimizer of the PML objective function. Step 2 does not guarantee that the accelerated iterate is positive. Consequently, we modify the optimal step size as follows a ! n+1) = argmin (cc (n+1) + ar (n+1) ) subject to a G A (ra+1) , (4.14) a where A (n+1) = {a : zj n+1) + av\ n+1) > C ( Â” +1) for all j} (4.15) a r (" +1 ) C (n+1) = min Â— Â— Â— (4.16) j n + 2 v ' Observe, lor simplicity, we use the same notation for the optimal step sizes defined by (4.13) and (4.14). For a G A (,l+1 ^. it is straightforward to see that a ;( n + 1 )+at;( n + 1 ) > o because > 0 for all j and n. Thus, by adding the constraint a Â€ A^" +1 ^ to the problem in (4.13), it follows that x (ri) > 0 for all n. Because xj n+1) > Â£ (n +i) for all j and n, it is evident that (4 n+1 ) has been chosen in such a way that 0 G A (n+1) for all n. The fact that 0 G A^" +1 ^ will be used later to prove that the proposed algorithm monotonically decreases the PML objective function 4> with increasing iterations. Remark : If we allow some elements of x ^ to be zero, then the feasible region of the function $(x (n+1) + cru (n+1) ) is {a : a^" +1) + > 0 for all j}. However, for the sequence {x^} to converge, we must constrain all elements of x ^ to be positive because the surrogate function for the PML objective function 4>, $ , is not defined at x = x ^ where x ^ = 0 for some j. A feasible region of the function $(a;( n+ b + a"ih n+1 )) that appears natural is A^n+p a . x ( rt+1 ) _p cnv^ +V> > 0 for all j} . (4.17)
PAGE 54
46 Figure 4 2: Illustration of the pattern search step: (a) twodimensional illustration of ( 1>, (b) onedimensional slice of the function along the chosen direction vector. The single circle and double circle denote an accelerated iterate x ^ and a PML iterate cc(" +1 ), respectively. The mark x denotes the minimizer of the function subject to the constraints x\ > 0 and %2 > 0. However, the set A [" ' 1 1 is an open set. Consequently, the optimization problem argmin
PAGE 55
47 for 4>Â„Â’ fl ^(a) = 4>(aA J+ b 4Qw* n+1 ^), which we denote by r( n+1 )(a), that satisfies the following conditions: Â• (C6) r(" +1 )(a) > $L n+1) (ct) for a G A(Â” +1 > Â• (C7) r<" +1 )(a) = 4 n+1) (a) for a = 0. By incorporating the surrogate function r (n+1) with the constraint a G A (n+1) , an alternative to Step 2 is: Â• Step 2a Get the accelerated PML iterate = ;*4 n+1 ) + rd n+1 M" +1 ), where Q.in+i) ^ argmin r^" +1 ^(a) subject to a G A^ n+1 ^ . (419) a It is important to point out that, for convenience, x (n+1) has been redefined in Step 2a. This new definition will be used throughout the rest of the dissertation. In Figure 4 2, the alternative pattern search step with the surrogate function r^ n+1 ^ is illustrated. In Figure 4 2 (a), a twodimensional example of <4> is shown with a direction vector t/ n+1 ). In addition, the onedimensional slice of 4> along the direction vector which we denote 4 >q 1+1 \ and a surrogate function T^ n+1 ^ that satisfies (C6) and (C7) are shown in Figure 4 2 (b). By design, the sequence {%( n )} produced by Steps 1, 2a, and 3 satisfies the monotonicity condition (C4). To see this fact, note that 4>(cc (n+1) ) = 4>i n+1 ^(a ( " +1) ) by the definition of By (C6), it follows that $L" +1) (a (ri+1 )) < r^" +1) (o: ( " +1 ^). Also, from the definition of a^" +1) in (4.19) and the fact that 0 G A ( " +1) , we obtain the result r(" +1 )((A ra+1 )) < r^" +1 ^(0). Finally, by (C7), it can be concluded that $(5 (n+1) ) < r(Â” +1 )(0) = $i n+1) (0) = ) for all n. We now present our choice for the surrogate function r^ n+1 ^ that satisfies (C6) and (C7). First, note that the negative log likelihood function Â— 4/ can be expressed
PAGE 56
48 as / i ~^{x) = ^2{[Px\ t di log ([Pa; + p],)} + + log(rfj)} (4.20) i = 1 i=l 1 = + C*5 , (4.21) i = 1 where = t Â— di\og(t + p,;) and C5 = {p> + lÂ°g(rfÂ» !) } Â• Suppose a function, can be found such that Â• (C8) 0 8 (n+1) (a) > 4 n+1) (a) for a Â£ A {n+1) Â• (C9) dl n+1 \a) = ipl n+1 \a) for a = 0, where ^ +l \a) = ^([Px^+% + a[Vv^ n+ %). Then, the function 7 0 (n+D( a ) ^ J2 e[ n+1 \a) + C 5 (4.22) i=l will satisfy the conditions Â• (CIO) 0( n+1 )(a) > T(a;( n+1 ) + av( n+1 )) for a Â£ Al" +1 ) Â• (Cll) Â©l" +1 l(a) = ^(a;(" +1 ) +an( n+1 l) for a = 0. A function that satisfies (C8) and (C9) is et +l \a) = /4" +1) y + ^ ( Â” +1) (0 )a + ^ ( " +1) (0) , (4.23) where pj n+1) = ma x{fy n+1 \a) subject to a Â£ A (n+1) }. From the definition of 0j n+1) , it is obvious that (9j n+1) (0) = ^" +1) (0). Thus, (C9) is satisfied. To see that 6^ ra+1) satisfies (C8), consider the function 2" +1 '(q) = 6^ n+1 \a) Â— '0j nt ^(a). From the definition of pn+1 \ it is clear that > 0. Thus, it follows that is a convex function. Moreover, a = 0 is a minimizer of zn+1) because in+1) (0) = 0 by the definition of Since z,l+1) (0) = 0 by (C9), it is straightforward to see that z n+1 \a) > 0. This result implies that #j" +1> (a) > ^ n+1 \a) for all a Â€ A ( " +1) . So, (C8) is satisfied. To calculate it is worthwhile to note that the set Al n+1 l can
PAGE 57
49 be written as Ab ,+1 ) = {a : L^ +1 ) < a < fA n+1 )}, where Â’ Â£(n+i) _ x ( n+ b L (n+1) = max (Â»i+i) j such that uj n+1 ^ > 0 A f A(n+ 1 ) _ Â™( n +l) U {n+1) = min i : j j such that u" +1) < 0 J j (n+l) (4.24) (4.25) Observe that L (n+1) < 0 and Â£/ (,l+1) > 0. Since the second derivative of ^" +1) is di([Vv^+%Y ([Pv( n +%a + [Px( n +% + Pi y Â’ the maximum second derivative of ^" + b for a Â£ A^ n+1 ^ is (4.26) ^ +1) = < M(^ +1 >],) 2 f7?Â„(Â«+i)l . > o ([pv( n+1 )]jL("+i)+[ra( n + 1 )] j + ft )2i v v J* ^ u Ml ? y (n+1) ].) 2 fT>Â„(Â«+l)l. < 0 ([p t ;(n+l)]. f/ (n + l) + [p a ;(n + 1 )]. +/3 ,) 2 l P U U ^ U 0, [Pv ( n+ % = 0 . (4.27) It should be noted that \Pv (n+l \a + [Px^ n+V> ]i + pi > 0 for a G A (n+ b (see (ASl)). At this point, we need a surrogate function for the penalty function A once again. As mentioned in Chapter 3, under assumptions (AS3)(AS8), Huber developed a surrogate function for A, denoted by A (n) (see (3.14)). By the properties of the surrogate function \( n \ it is clear that the surrogate function A^b, which is defined by j A (n) (aO = E E ^ (n) (M x k ) , (4.28) j = 1 keNj satisfies A (n) (Â») > A(x) for all x and A ( ")(aA")) = A(x (ri) ). Thus, A (n+ 1 )(a;( n+1 ) + au( n+1 )) will satisfy Â• (C12) A (n+ 1 )( x (Â«+i) + av (n+1 )) > A(aA n+1 ) + au (n+1 )) for a Â€ A< n+1 ) Â• (C13) A (n+ 1 ) (x( n+1 > + ai/ n+: b) Â— A(xl n+ b 4 a'i/" + b) for a = 0 . Finally, by (C10)(C13), it is clear that the function F ( Â«+i)( a ) Â£ 0(" +1 )(a) + (3A( n+1 \x in+ V + av (Â«+!)) (4.29)
PAGE 58
50 satisfies (C6) and (C7). To solve (4.19), we first determine the unconstrained minimizer of rG+b : d (n+1) = argmin r (n+ 1 ) (a) (4.30) Since T (n+1) is strictly convex by j(t) > 0 for Â— oo G+b ^ 0, (ASl), and (AS2) (see Appendix D), the expression for dG+b can be found by simply computing the first derivative of pG+b and setting it to zero (see Appendix D): ^Â” +1, (0) P zu ^V4" +1 Â’ 4Â” +, Â’)(*> (n+1) Â„( n + 1 h/Â’Â„,G+b ct (n+1) (n+l) Ei=i ^ + P EU * } 4 ) 2 (4.31) Given (4.24), (4.25), and (4.31), the solution to the constrained optimization problem in (4.19) is a (n+l) u {n+1 \ if a (n + b > t/ (n +b r (n+l) ) if dG+b < pG+b . (4.32) aG+b otherwise All that remains now is to show that the sequence {x^} produced by Step 1, Step 2a with rG+b in (4.29), and Step 3 satisfies (C5). To see this, note that by (C6), (C7), and the definition of a^ n+1 \ we have the following inequality $(a: (n+1) ) $(Â®G+b) > T (,l+1) (0) r (n+ 1 ) (a (n+1) ) . (4.33) Suppose, for each n, the function rG+b i s expanded into a secondorder Taylor series about the point cnG+b and evaluated at a = 0. Then, the right hand side of (4.33) can be written as rG+b(0) r (n+ 1 )( a (n+l)) = rG+ 1 )( a ( n + 1 ))(_ a G+D) __ IfG+b^G+b^QjG+b^ ^ (4.34)
PAGE 59
51 where a (n+1) is a point between 0 and a (n+1) . By the strict convexity of r (n+1 \ Â£('Â»+!) < q_ anc j [/("+i) > o, it follows that r( n+1 )( a ( ri + 1 )^_ a (Â«+i)) > q for all n. Thus, T (n+1) (0) r (n+1) (a (n+1) ) > If ( " +1) (a (n+1 >)(a (n+1) ) 2 . (4.35) Since there exists a symmetric positive definite matrix A4, which is independent of n, such that f (Â«+ 1 )(a("+ 1 )) > 2{v^)' M{v^ n+ ^) (see Appendix E), it follows that r (n+1 >(0) T (n+1) (a (n+1) ) > (r; ( " +1) ) / M(v (n+1 ))(a (ri+1) ) 2 . (4.36) Hence, to prove (C5), we show that there exists some constant C 2 > 0 such that ( v ("+ 1 )yx( t ,("+ 1 )) > C2 v (n+1)  2 , (4.37) where we used the fact that x (n+1) Â— Â® (n+1)  2 = (a( n + 1 )) 2  v ( n + 1 ) 2 by the definition of Since M is a symmetric matrix, it can be factored as M = TAT' by the Spectral theorem [63, p.309], where the columns of the matrix T contain orthonormal eigenvectors of A4 and the diagonal matrix A contains corresponding eigenvalues along its diagonal. Using the fact that T'T produces the J x J identity matrix and RayleighÂ’s quotient with i/ n+1 ) = Y z (n+1) (i.e., z (n+1) = YV^ 1 )) [63, p.348], it follows that (y(n+ 1 ))'_A4(v ( " +1 )) ^(n+l) 112 (Yz( n+1 ))'A4(Yz^" +1 )) (Yz("+ 1 ))'(Yzb+ 1 )) ( z (n+l))/^( z (n+l)) ~ (z( n+1 )) / ( z ( n + 1 )) e ,( 4" +1> ) 2 _ 7 (4.38) (4.39) (4.40) (4.41) where {ej}j =1 are the eigenvalues of A4 and e m is the smallest eigenvalue. Since M is positive definite, e m is positive. Therefore, with c 2 = e m > 0, we obtain
PAGE 60
52 (i>( n+ i)y.M('u( n+ i)) > C 2 u^ n+1 ^ 2 and 4>(z: (n+1) ) $(x ( Â” +1) ) > c 2 x ( Â” +1) x (n+1)  2 . (4.42) 4.3 Direction Vectors In some algorithms, the gradient of a function is used as a direction vector, as in the method of steepest descent [55, ch. 14]. Suppose the direction vector r/ n+1 ) was chosen to be the gradient of the PML objective function evaluated at the PML iterate (i.e. , rh n+ P = V$(Â®* n+1 ^)). Then, the gradient must be calculated at each step. However, the computational cost of the gradient of 4> is on the same order as the computational cost of single PML iteration. Moreover, in experiments, the APML algorithm with r>( n +!) = V(aA l+1 )) decreased the PML objective function slower than the PML algorithm. The current direction vector in the pattern search step by Hooke and Jeeve is the difference of the two most recent iterations. Specifically, the direction vector is defined by iA n+ P = cc(" +1 ) Â— x^ n \ This choice can be justified in a reasonable manner. To see why, assume that a closedform expression is available for the minimizer of ^)( a; (n+ 1 ) __ Q;t ;( n + 1 )). Then, the Â“bestÂ” direction vector is (Â®("+i) Â— x*) because (ah n+ b + a?;( n + 1 )) Â“containsÂ” the point x* (note: x* would result with a = Â— 1), where x* is the minimizer of the PML objective function. However, x* is not known at the n th iteration. The Â“bestÂ” estimate of x* at the n th iteration is x^ n \ namely the accelerated iterate at the (n Â— l) t/l iteration. In this section, we introduce a direction vector that works better than Hook and JeeveÂ’s direction vector in terms of convergence rate. The direction vector we choose is a simple variation of p( ra+1 ) that is not computationally expensive (note: J subtractions are required for Â£>( n+1 )). Since there are no positrons emitted outside the subject being scanned, the PML estimate will contain many values near zero. This claim is supported by Figure 4 3. In the figure,
PAGE 61
53 Figure 4 3: PML iterates are shown: (a) the 13 th PML iterate, (b) the l^ th PML iterate, (c) the 15 th PML iterate, and (d) the 1000^' PML iterate. The images were generated using a real thorax phantom data. The plane considered contains activity due to the heart, lungs, spine, and background. For display purposes, all the images were adjusted so that they have the same dynamic range.
PAGE 62
54 PML images corresponding to different iteration numbers are shown. The images were generated by applying the PML algorithm to real thorax phantom data (scan duration was 14 minutes) with an uniform initial estimate. The penalty parameter was f3 = 0.02 and A (t) = log(cosh()) with S = 50 was the penalty function. As can be seen in Figure 4 3 (a), (b), and (c), the early iterates contain values near zero outside of the body. Consider the figure in Figure 4 3 (d), which is the 1000 th PML iterate (practically speaking, the iterate is the minimizer of <Â£> because it did not decrease after the 791 st iteration up to the 5000 th iteration). The 1000 th iterate also contains many values near zero outside the body. Inside the body, on the other hand, the 1000 th iterate is very different from the early iterates. From the above observations, it can be said that convergence rate varies significantly between voxels inside the body and voxels outside the body. From the above observations that the voxels outside the body converge faster than the voxels inside the body and the voxels outside the body converge to values near zero that is the boundary of the set {cc : x > 0}, we claim that it is better to search for an accelerated iterate along the boundary whenever the current iterate is Â“nearÂ” the boundary. By Â“boundaryÂ”, we mean the set {*>0:^ = 0 for some j}. This can be explained by the example shown in Figure 4 4. In Figure 4 4 (a), the second PML iterate is heading toward the x 2 axis. If we perform the acceleration step with the direction vector jA n+1 ), then the accelerated iterate will lie on the a^axis as shown in Figure 4 4 (b). A better direction to be searched is the one that is parallel to the X 2 axis as shown in Figure 4 4 (c). The principle of the proposed direction vector is to exclude the coordinates corresponding to the voxel values that are Â“nearÂ” the boundary. An easy way to incorporate this idea is to determine the voxels whose values are less than a small positive value, e. Specifically, the proposed direction,
PAGE 63
55 (a) (b) (c) Figure 4 4: Direction vectors: (a) Standard PML iterates are shown, (b) an accelerated iterate (the single solid circle) is shown by using the direction vector C/( n+1 \ and (c) an accelerated iterate (the single dotted circle) is shown Dy using the proposed direction vector i+ +1 \ The double circles denote PML iterates. The mark x denotes the minimizer of the function subject to the constraints x\ > 0 and x 2 > 0. v (n+l) IS (n+l) A 0 , Â•r (n+l) it x < e Â— > otherwise where e > 0 is a userdefined parameter (note: + n+1 ) = p( n +i) if e 1, 2, . . . , J. (4.43) 0) and j 1. 1 Properties of the APML Algorithm We now provide a summary of the desirable properties of the APML algorithm: Â• The APML algorithm needs only one additional parameter (i.e., e), whereas orderedsubsets based algorithms [33, 35] need at least three extra parameters (i.e., a relaxation parameter, the number of subsets, and a small positive number that is set to any nonpositive elements of an iterate). Â• In experiments, the proposed direction vector performed better than the direction vector put forth by Hooke and Jeeve [41, pp. 287291], Â• The APML algorithm monotonically decreases the PML objective function unlike the algorithms in [23,35].
PAGE 64
56 Â• The APML algorithm theoretically guarantees nonnegative iterates, whereas some algorithms [33, 35] set any negative element of the iterates to a small positive number. Â• The APML algorithm can incorporate a large class of edgepreserving penalty functions unlike the algorithm by De Pierro [30]. Â• The APML algorithm converges to the minimizer of the PML objective function. Convergence proofs for the algorithms in [23, 33] are not available.
PAGE 65
CHAPTER 5 QUADRATIC EDGE PRESERVING ALGORITHM In this chapter, we present a regularized image reconstruction algorithm that aims to preserve edges in the reconstructed images so that fine details are more resolvable. We refer to the proposed algorithm as the quadratic edge preserving (QEP) algorithm. The QEP algorithm results via a certain modification of the surrogate function $0) f or tR e PML objective function. It should be mentioned that the QEP algorithm was first introduced in [18]. Recall that the n th surrogate function for the PML objective function $, can be expressed as $ ( " ) (x) = f\xj) + C^ n) (see (3.29) and (3.35)). For the discussion to follow, it will be convenient to express in (3.35) in a different manner \ '?{t ) = Efhog^ + F^e + G^t (5.1) keNj I EjÂ”> log(t) + i Y. Pis + 211 Y. 2 m
PAGE 66
58 and rrijl\ Ej H \ F ^" ] , and G'*" ) are defined in (3.24), (3.30), (3.31), and (3.32), respectively. Note that the j th element of the next PML iterate aff n+1 ), a^ n+1 \ is defined to be the nonnegative minimizer of the function 0) n) . The function hJfi} is quadratic so its aperture, ujjk 7(x^ Â— x[ n) ), and minimizer, are expected to play key roles in the regularization process (for a quadratic function, f(t ) = at 2 + bt + c , the constant a is called the aperture of the function). To see the role of the function h^) on determining XjÂ” suppose (3 = 0 in (5.5). Then, = l^\t) and the minimizer of the function 4>\ n \ t ) is the Â“pureÂ” log likelihood iterate (i.e.. minimizer of l ^ l> ) . For (3 7^ 0, intuitively speaking, it is evident that the functions {h^}keN :l act to bias x '1 1 1 1 from the pure log likelihood iterate towards the minimizers of {h^jkeNj (observe: the last term in (5.5) is independent of x). The degree of influence that the functions {h'jk [rev, possess is controlled by their apertures and the penalty parameter (3 . To highlight the role of the function 7, we let (3 =  and ui 3 k = 1 for all j and k. In this case, the aperture of equals 7 (x^ Â— x["^). For the quadratic penalty function (i.e., A (t) = t 2 ), the aperture of is a constant for all j, k , and n. Consequently, for all k e Nj, the function has the same degree of influence regardless of the absolute difference between xj n) and x[ n) . Practically speaking, this means that the quadratic penalty will overly smooth edges. To preserve edges, it would be helpful to lessen the degree of influence of the functions {h'j^}keNj whenever the absolute difference between and xj^ n) is sufficiently large. Figure 5 1 is a plot of 7 when A(t) = log(cosh()) (recall: 7(f) = A (t)/t in (AS6)) is used in the penalty function with 5 = 10, 50, 100, 500. From the figure, it can be seen that 7 becomes very small compared with 7(0) for f >> 6. This means that, when x^ n) xj; n)  >> 5, h% ] will have a Â“very smallÂ” aperture, and consequently the function will not have much influence on x^Â‘ 1 1 1 . Said another way, the logcosh penalty function helps preserve edges whose Â“heightsÂ” are on the order of 6.
PAGE 67
59 Figure 5 1: The function 7 is shown when A (t) = log(cosh()) is used as the penalty function with: (a) <5Â—10, (b) 5=50, (c) 5=100, and (d) 5=500. We will now move the discussion from the aperture of the function h^} to the minimizer of hff. As stated previously, the minimizers of the functions {h^}keNj play an important role in the regularization procedure. More specifically, when the aperture of is sufficiently large, the (n+l) si iterate is biased towards the minimizer of h^\ which is m J ]k = {x { l) + x k ' ] )/2. As a result, there is inherent averaging that takes place with the PML algorithm. Consider a penalty function, where, for certain functions { j = 1 keNj In order to better preserve edges, we believe an improvement would be to construct cTj 7 ^ so that it has the same aperture as , but a different minimizer. The minimizer of (Tjl\ which would depend on whether an edge is present, is chosen to be u jk = x 5 n) 4 n) ) . (5.9)
PAGE 68
60 where rj is a function such that v(t) e, e Â— 800 (value is chosen arbitrarily) and r](t) = 250tanh(2Q). It can be observed that is approximately x ^ + 250 when x k 1 i s larger than 1 +250 and is approximately x^ Â— 250 when is less than x j ^ ~ 250. In other words, the function 77 prevents from being overly biased towards x^ when the absolute difference between and x^ is greater than 250. Using the edge preserving penalty function Acp*. an algorithm for obtaining regularized estimates of the emission means follows: Â• Step 1 Let > 0 be the initial estimate Â• Step 2 Construct A^(a:) from the current estimate using ( 5 . 8 ), ( 5 . 9 ), (5.10), and (5.11) Â• Step 3 Compute a: (n+1) = argmin E (n) (aj) subject to x > 0. where E^(x) = V(x)+pA&\x). Â• Step 4 Iterate between Steps 2 and 3.
PAGE 69
61 Figure 5 2: Plots of rn^ and versus x j," * are shown, where x^ l) = 800 is fixed and 77 (f) = 250tanh(^Q). It turns out that a closed form expression for the problem in Step 3 is not possible. An alternative to Step 3 is: Â• Step 3a Find Â®^ n+ 0 such that E( n )(a: (n+ 0) < The problem in Step 3a can be solved by defining cc^ n+1 ^ as x {n+1) = argmin $Â£>(a?) , (5.12) x>0 where ^(x) = ^ ) (x)+/5A(" ) (x) (5.13) and is defined in (3.12). Defining the iterates according (5.12) and (5.13) insures that E (n >(x( n+1 )) < E< n >(a:< B >). To see this fact, note that (*("+!)) < $Â£>( x < n+1 >) because ^ (n) (x) < 'I'(x) for all x > 0. Since $ip } (x (n+1) ) < ip ) (x (n )) by (5.12), it follows that = E (n) (x (n) ) . E ( n)( Â£C (n+ 1 )) < $J)(*("+1)) < $g)(* (w >) (5.14)
PAGE 70
62 All that remains now is to solve the optimization problem in (5.12). Observe that cp) can be written as 4>S>(x) = #<Â”>(*) I ( J where T> 'r' j log (Xj) > + C\ (n) + 2 /?X X a *( X i) j = 1 keNj J ( 1 1 p.x^ + X I X u 3kl( j = i [ keNj = X{^l n) lo 8( x i) + F j n)x ) + H j n)x j} + c( i ] Â» 3 = 1 + Q (n) [x w _ x (n) ) ( x j 2u i )x i + u 5? 2 ' (5.15) (5.16) (5.17) (5.18) H > n) = X ^ X ^(4 n) 4 n) ) (Â«) _ jk i = 1 fceJV* Q n) = are defined in (3.13), (3.30), and (3.31), respectively. Since (n) $e P is decoupled, it follows that the solution to (5.12) is given by An) (5.19) (5.20) a> +1) = arg min {E^ n) log(xj) + Fj n) x? + ffj n) xj } , j = 1, 2, . . . , J . (5.21) Repeating the steps used to derive (3.37), it is straightforward to see that the solution to the optimization problem in (5.21) is (Â„ +1 , ^ n W<Â’ 12 8j t )j 7Â’ 1 j 4 j , j = 1,2, ..., J . (5.22)
PAGE 71
CHAPTER 6 JOINT EMISSION MEAN AND PROBABILITY MATRIX ESTIMATION In Chapters 3, 4, and 5, we assumed that the probability matrix V accounts for errors due to attenuation, detector inefficiency, detector penetration, noncollinearity, and scatter. However, we now assume that an I x J corrected matrix V c is available that accounts for errors due to attenuation, detector inefficiency, and detector penetration. And, we develop a method for correcting errors caused by scatter and noncollinearity. Before presenting the proposed method, we briefly review standard approaches for obtaining V c in practice. The ( i,j) th element of V c , R7 , denotes the probability that an annihilation in the j th voxel leads to a photon pair being recorded by the i th detector pair when there are no errors due to scatter and noncollinearity. In practice, one can reliably estimate the probability matrix V c by using the detector penetration correction method in [7] with an attenuation correction method [9] and detector inefficiency correction method in [15]. However, other correction methods for detector penetration, attenuation, and detector inefficiency could be used in conjunction. To address attenuation errors, two scans, known as a transmission scan and blank scan, are taken. During a transmission scan, 1 Â— 3 rotatingrods filled with positronemitting isotopes rotate outside the subject and the transmission data are the coincidence sums. A blank scan is taken in the same way as the transmission scan except that there is no subject inside the scanner. The coincidence sums during the blank scan form the blank scan data. Given the transmission and blank scan data, an estimate of the attenuation map can be generated. Using the estimated attenuation map, attenuation correction factors are computed and used for attenuation correction. Numerous attenuation estimation methods have been proposed [8 10]. 63
PAGE 72
64 Figure 6 1: A simple example to illustrate the geometry of PET image reconstruction with three voxels (t^, v 2 , and v 3 ) and three detector pairs ((ai,bi), (a 2 ,b 2 ), and (03,63)). The dashed lines define the tubes of the detector pairs. To address errors due to detector inefficiency, a blank scan of extremely long duration is performed. From the blank scan, the relative efficiency of the detector pairs can be estimated. Recall that the efficiency of a detector pair is defined to be the probability that a coincidence is recorded when a photon pair is incident on the detectors. Consider a set of detector pairs where each detector pair has same spatial extent. One simple way to estimate the efficiency of a detector pair in the set of detector pairs is to define its efficiency to be the ratio of the number of photon pairs recorded by the detector pair and the mean number of photon pairs recorded by a detector pair in the set of detector pairs. Once, the estimates of efficiencies for all detector pairs are available, they can be incorporated into the probability matrix. To address detector inefficiency, more complicated correction methods such as [13 15] have been proposed. As discussed in Section 1.3, two photons generated by an annihilation usually do not propagate in exactly opposite directions, which is called noncollinearity of lineofresponse [5]. As illustrated in Figure 15, it is possible that an annihilation
PAGE 73
65 is recorded by detector pair i\ given that the annihilation would have been recorded by detector pair i 2 if both photons propagate in exactly opposite directions, where h 7 ^ * 2 Since scatter depends on the activity and attenuation within the subject and the scanner design, it is not straightforward to correct for errors due to scatter. Photons lose energy when they are under gone Compton scattering. Thus, the energy of detected photons may be used to discriminate between unscattered photons and scattered photons [11, pp. 6569]. Scatter correction methods that are based on multiple (two or more) energy windows have a limitation because detectors have finite energy resolution [46,47]. As discussed in Chapter 2, stateoftheart scatter correction methods, such as the method in [52] where the scatter distribution is calculated using an analytical equation, are not practical because of their extremely high computational cost. Consequently, a simple scatter correction method would be beneficial to the PET community. In this chapter, we propose a method that estimates the probability matrix in such a way that errors due to scatter and noncollinearity are addressed. In Section 6.1, we first propose a model for emission data that accounts for errors due to scatter and noncollinearity. Then, in Section 6.2, we present a method where the unknown emission mean vector and an unknown matrix in the proposed model are jointly estimated. Finally, we propose an algorithm, referred to as probability correction in projection space (PCiPS) algorithm, that estimates the unknown emission mean vector and the unknown matrix in the proposed model. 6.1 Scatter Matrix Model Consider Figure 6 1 in which a simplified PET scanner consisting of three detector pairs is depicted. For this discussion, we will focus on the voxels iq, v 2 , and v$ shown in Figure 6 1. Let the detector pairs (ai,6i), (a 2 ,b 2 ), and ( a 3 ,b 3 ) be the first, second, and third detector pairs, respectively. For the scanner in Figure 6 1, we
PAGE 74
66 assume that 0.04 0 0 V c = 0 0.04 0 0.03 0.03 0.02 ( 6 . 1 ) Suppose during a scan there are no errors due to scatter and noncollinearity. Then, the mean number of photon pairs recorded by the detector pair (ai,6i) is Vhxi, where Xi is the mean number of positrons emitted from voxel V\ . The mean number of photon pairs recorded by the detector pairs (02,62) and (03,63) are V 22 x 2 and (P31X1 + 'P' !2 x' 2 + P33X3), respectively, where x 2 and x 3 are the mean number of positrons emitted from voxels V2 and v 3 . respectively. As discussed in Section 1.3, a photonÂ’s original flight path is altered when it undergoes Compton scattering. Consequently, an annihilation in v\ that would have been recorded by detector pair (01,61) if both photons had not undergone Compton scattering may instead be recorded by detector pair ( a 2 ,b 2 ) when scatter occurs. When we consider noncollinearity, it is also possible that an annihilation in v\ is recorded by detector pair {a 2l b 2 ) given that the annihilation would have been recorded by detector pair (01,61) if both photons propagate in exactly opposite directions. Let denotes the conditional probability that an annihilation is recorded by detector pair i\ given that the annihilation would have been recorded by detector pair i 2 if both photons produced by the annihilation had not undergone Compton scattering and both photons propagate in exactly opposite directions, where i\ = 1,2 ,...,/ and *2 = 1 , 2 ,...,/. It is through these unknown probabilities {/C ?1 , 2 } that we model scatter and noncollinearity. Now, accounting for scatter and noncollinearity, the mean number of photon pairs recorded by detector pair (ai,6i) is {IC n V c n Xi + ICi2'P 22 X2 + ICi3(Vl 1 Xi+V c 32 X2+Vl 3 X3)}. Moreover, the mean number of photon pairs recorded by detector pairs (02,62) and (03,63) are {K,2\V\ l X\+K22P 22 x 2 + Â£23(^31^1 +^32^2+Â’Â£ , 33Â£3)} and {fcz\V\ 1 Xx+)C 32 V 22 X2 +/C33 (V 3l Xi +V 32 X2+V 33 X 3 )},
PAGE 75
67 respectively. In matrix notation, the mean of the data E[D] can be expressed as 1 to CO 1 0.04 0 0 x x E\D] = /C21 K 22 1C 23 0 0.04 0 X2 ^31 ^32 ^33 0.03 0.03 0.02 XÂ‘A = /CV K x , ( 6 . 2 ) (6.3) where x = [xi X2 x$ and the I x I matrix 1C denotes the first matrix on the righthand side of (6.2). We will refer to 1C as the scatter matrix. Since the emission means can be expressed as lCV c x in this model, we define the Â“trueÂ” probability matrix as R truc A KVC ' ( 6 . 4 ) where the ( i,j) th element of 'P truc , 'P[Â™ c , denotes the probability that an annihilation in the j th voxel leads to a photon pair being recorded by the i th detector pair. To our knowledge, the scatter matrix model we propose is not available in literatures. However, similar factorization of 'P truc in (6.4) can be found in [28,53,64,65]. With the definition of 'P truc in (6.4), the mean number of photon pairs recorded by the i th detector pair can be expressed as <Â«s) j=l j = 1 i 2 ~l 1 J = E E < 6 6 ) *2=1 j = 1 / = Y,K, h [PÂ‘x} l2 . (6.7) * 2=1 In other words, the mean number of photon pairs recorded by the i th detector pair is a weighted sum of the mean number of photon pairs recorded by all detector pairs when there are no errors due to scatter and noncollinearity. Regarding the matrix 1C, two constraints are necessary. Since 1C is a probability matrix, it must be true that 0 < lC lll2 < 1 for all i\ and i 2 . Recall that /C (1 , 2 is the conditional probability that
PAGE 76
68 an annihilation is recorded by detector pair i\ given that the annihilation would have been recorded by detector pair i 2 if both photons produced by the annihilation had not undergone Compton scattering and both photons propagate in exactly opposite directions. Since a photon pair that is recorded by detector pair i\ can not be recorded by the other detector pairs, we require that / Y lChi 2 < 1 for all i 2 . (6.8) *i=i 6.2 Joint Minimum KullbackLeibler distance Method We now consider the Poisson model by Shepp and Vardi [16]. In their model, the emission data d is an observation of a random vector D that is Poisson distributed with mean ( V tIuc x + p), where x is the unknown emission mean vector and p is the known mean accidental coincidence rate. Since d is the ML estimate of the unknown mean (P tiac x + p) 1 , it can be said that d Â« (V tIuc x + p). Using the definition in (6.4), we can state that d Â« ICV c x + p . (69) It is important to point out that there are two unknowns in (6.9): the scatter matrix /C and emission mean vector x. Thus, given the emission data d, mean accidental coincidence rate p, and probability matrix V c , the problem of interest is to estimate the emission mean x and scatter matrix K. We define the estimate (x,IC) to be a 1 If d is an observation of a random vector D that is Poisson distributed with unknown mean A, then the likelihood function for d is Pr{D = dA} = ^e _A . Since d maximizes the log likelihood function log Pr{D = dA}, d is the ML estimate of A.
PAGE 77
69 minimizer of the KullbackLeibler (KL) distance [66] between d and ( KLV c x + p): (x, KL) = arg min KL(d. ICV' x + p) subject to ( QC . /C ) / x > 0, /C > 0, and KL n i 2 < 1 for all i 2 , (6.10) u=i where the KL distance between a and b is defined by KL(a, b) = ^ ^aj log ^ a, + b^j (6.11) and KL > 0 means that /Qp 2 > 0 for all i\ and i 2 The definition of the estimate of (x, KL) is motivated in part by the fact that in [26], Byrne derived Shepp and VardiÂ’s MLEM algorithm [16] by minimizing the KL distance between the emission data d and V true x using the alternating projection algorithm by Csiszar and Tusnady [67], where 'P true is assumed to be known. It should be mentioned that Byrne derived the MLEM algorithm when p = 0. 6.3 Probability Correction in Projection Space (PCiPS) Algorithm Since it is difficult to solve the problem in (6.10), we propose an alternating minimization algorithm where, in a repetitive fashion, one of the two unknowns (i.e., KL and x) is estimated while the other is fixed. Suppose an initial estimate for the scatter matrix KL (e.g., an I x I identity matrix), denoted by KL^Â°\ and an initial estimate x ^ > 0 are available. Then, for n = 0, 1, 2, . . ., the steps of the proposed algorithm are Â• Step 1 Get the current estimate of emission mean vector a^ n+ T using KS n \ the current estimate of the scatter matrix KL: x {n+1) = arg min KL(d, IC {n) V c x + p) . x>0 ( 6 . 12 )
PAGE 78
70 Â• Step 2 Get the current estimate of the scatter matrix AG n+1 * using x^ n+1 \ the current estimate of the emission mean vector x: /C (n+1) = argmin KL(d. KV c x {n+l) + p) /c> 0 i subject to ICi x i 2 < 1 for all ? 2 Â• (6.13) *i=i Â• Step 3 Repeat the steps above until some chosen stopping criterion is met. Note that the problem in Step 1 can be solved using the MLEM algorithm [16,26]. The reason is because Byrne showed that Shepp and VardiÂ’s MLEM algorithm [16] minimizes the KL distance between the emission data d and V Um 'x. where 'P truc is known. Specihcally, given an initial guess x tn Â’Â°) > 0. the iteration for obtaining ahÂ”+i) is (n,m+l) ,(n,m) i ' [/C< n) P c ] 0 xf ,m) [ICWV'xM + p\i j = 1,2,.. . , J , (6.14) for m = 0, 1, 2, ... , M\. The current estimate of x is defined to be x ("+ 1 ) = x (Â«,ati+i) ^ the M[ h iterate of (6.14). One of the issues surrounding the constrained minimization problem in (6.13) is that it is impossible to estimate the scatter matrix Kwhen all of the elements in the matrix are assumed to be unknown. The reason is because there would be too many unknowns, which would result in the problem being underdetermined (note: the dimension of /C is / x J, while there are only I data points). To reduce the number of unknowns in AC, we assume that = 0 if the detector pairs i\ and are not in the same projection (see Figure 1 3 for the definition of a projection). This assumption means that an annihilation that would have been recorded by a detector pair within a certain projection if both photons had not undergone Compton scattering cannot be recorded by a detector pair within some other projection. Under the stated assumption, the number of unknown parameters in the matrix K. is dramatically reduced.
PAGE 79
71 Moreover, /C is a block diagonal matrix with T x T submatrices along its diagonal /Ci 0 0 0 0 /C 2 0 0 0 0 0 0 0 0 K. s (6.15) where T is the number of detector pairs within each projection (e.g., 160), S is the number of projection angles (e.g., 192), and it is assumed that emission counts of the detector pairs within a projection are placed as a Â“chunkÂ” in the emission data d (i.e. , d = [(1 st projection), ( 2 nd projection), . . . , ( I th projection)]'). Since K, is a block diagonal matrix, the minimization problem in (6.13) can be broken into S subproblems. Consider the sinogram of d , denoted by jV, which is a T x S matrix. The s th column of the sinogram y is the projection that corresponds to a certain projection angle. Figure 6 2 (a) shows an example of a sinogram. In the figure, the sinogram of the 14 minute emission data for plane 21 is shown. The /xl vector V c x ( ' n+1 ^ is known in the PET community as the forward projection of x^ n+l \ We define a matrix W ( n+1 '> where the first column is the first S elements of 'P c ab" + b and the second column contains the (S + l) th to (2 S) th elements of V c x^ n+1 ^ and so on. In other words, the ordering of the detector pairs associated with Wk" + b and y match. Figure 6 2 (b) shows an example of VV^" +1 \ where a^ n+ b was generated by running the MLEM algorithm for 1000 iterations on the emission data in Figure 6 2 (a). Under the assumption that Â— 0 if the detector pairs i\ and are not in the same projection, the minimization problem in (6.13) can be expressed as: for s = 1,2,. ..,5, Â£(n+i) ^ arg mia KL {y a ,K a w K,>V (n+1) s + Z s ) T subject to Y1K.U < 1 for i 2 = 1, 2, . . . , T , *i=i (6.16)
PAGE 80
72 (a) Emission Data (b) Forward Projection Figure 6 2: Sinograms: (a) 14 minute emission data for plane 21 (i.e., y) and (b) forward projection of at (n+1) (i.e., W (n+1) = V c x ( n+ x) ), where at (u+1) is the 1000 t/l MLEM iterate using the emission data in (a). Note that the images were adjusted with their own dynamic range.
PAGE 81
73 where z s = [/0{( s iyr+i} P{( s i)r+ 2 } . . Â• P{ s t}]', Â£ s is the s th submatrix of /C, and y s and w {' 1 1 1J are the s th column of y and W (n+1 \ respectively. Since there are T x T unknowns in each minimization problem in (6.16), there would be still too many unknowns. To reduce the number of unknowns, we assume that, for all s, K s is a Toeplitz matrix. An example of a Toeplitz matrix is 03 Â«2 a 1 0 0 0 ( 2 4 Â«3 a 2 a 1 0 0 <25 ci 4 a 3 a\ 0 0 a 4 <*2 Â«i 0 0 (35 CI4 03 02 0 0 0 05 cl 4 03 The assumption that /C s is a Toeplitz matrix implies that there are at most T unknowns in each submatrix K. s . Moreover, the assumption means that a single kernel can account for scatter within each projection. The assumption that JC S is a Toeplitz matrix can be justified for regions with approximately uniform attenuation, such as the brain. Consider Figure 6 3 in which a simplified PET scanner consisting of four detector pairs is depicted. In the figure, the dotted circle defines a region with uniform attenuation. We refer to the detector pairs (ai,6i), (a 2 ,b 2 ), (a 3 ,b 3 ), and (a 4 ,b 4 ) as the first, second, third, and fourth detector pairs, respectively. Note that the geometry of the first and second detector pairs and the geometry of the second and third detector pairs are approximately same. Because of the approximately uniform attenuation of the subject and geometric similarity of the detector pairs, it can be said that the conditional probabilities K 2 \ and K 32 are approximately same. Using this rationale, for a projection of a PET scanner, it can be assumed that ~ JC i3i2 , when (i 2 Â— i\) = [i 3 Â— * 2 ). Thus, we can construct KL S that can be approximated as a Toeplitz matrix.
PAGE 82
74 Figure 6 3: Geometry of a simplified PET image reconstruction problem: three voxels (ui, t>2, an d v 3 ) and four detector pairs ((ai,&i), (02,62), (03,63), and (a 4 ,6 4 )). The dashed lines dehne the tubes of the detector pairs. The dotted circle defines a region with uniform attenuation. Now, consider the following functions: for an integer t. Va{t) = [ V s ]t , t = l,2,...,T 0, otherwise (6.17) wÂ« n+1, (t) A [miÂ” +1) ] t , t = 1,2 ,...,T 0, otherwise (6.18) [z s ] t , t Â— 1, 2, . . . , T 0, otherwise (6.19) Under the assumption that K s is a Toeplitz matrix, y s (t) is approximately equal to the convolution of wi n+1 \t) and an unknown nonnegative function, denoted by k[ s>T )(t), that depends on the s th projection angle. Thus, for t = 1, 2, . . . , T, T Â— 1 Vs{t) ~ {k(s,r) + z s (t) = k M (u)wl n+l) (t Â— u) + z a (t) , ( 6 . 20 ) u=Â— r+l
PAGE 83
75 where k( s , T )(t ) Â€ [0, 1] for an integer t and /q SjT )(i) = 0 for t > r. The parameter r is defined by the user, but must satisfy the constraint r < [(T + l)/2j . Observing would have been recorded by detector pair ( t Â— u) if both photons produced by the annihilation had not undergone Compton scattering and they had propagated in exactly opposite directions, but instead are recorded by detector pair t. The parameter r restricts the number of detector pairs to be convolved. In principle, the parameter r is to be chosen in such a way that r is proportional to the mean number of scatter events. Since the mean number of scatter events is proportional to the attenuation of the material, it is possible that attenuation map could be used to determine the parameter r. The convolution in (6.20) can be expressed in a matrix notation: for (6.20), the product k( S!T )(u)wi n+1 \t Â— u ) equals the proportion of photon pairs that (h.,T) * +1) m = [B^k is , r) ] t , t = 1,2, ... ,r, (6.21) where fc( S)T ) = [^(s,t)(~tÂ’ + 1) fc( s ,r )( Â— T + 2) Â• Â• Â• k( s ^ T ){r Â— 1)]' and is a known T x (2 r Â— 1) matrix that is of the form 0 0 0 0 Thus, y s can be approximated as y ( 6 . 22 )
PAGE 84
76 With (6.22), the minimization problem in (6.16) is now iS n + 1 ') A = arg mm k {a , t )>0 KL (y s ,Bl n+ Vk {S}T) + z s ) 2 r Â— 1 subject to [fc( s> r)]t < 1 Â• t=l (6.23) The constrained optimization problem in (6.23) is difficult to solve because of the constraint < 1Consequently, we first solve the following optimization problem: * ", t) = ar S , min KL (?/ S > #iÂ” +1) *W) + z a ) . fc ( ,, r )>0 (6.24) Then, to get kjÂ™ T) , we normalize so that the sum of its elements equals one: t = l,2,...,(2r1) . (6.25) q s ,T) > a ,Â«(n+1) F(s,r) I* (n+ 1 )] A [^(s.r) it yitrlrTfr+lh 2 jt = 1 l K (s,r) Jt The minimization problem in (6.24) can be solved by the MLEM algorithm [16,26] because it has the same form as the optimization problem in Step 1. Specifically, given an initial estimate > 0. the iteration is as follows: for m = 0, 1, . . . M 2 , d n Â’ m )l T rÂ«(n+l)i (n,ra+l)i l *W) . , j = l,2,...,(2rl). (6.26) ErÂ„[sSÂ” +1) ]Â«1rr[8S" +l >fc!;;Â” , + ^ 1 . We define k[Â™l^ = and normalize using (6.25) to get fc" +1) Â“(s.t) I '(s.t)
PAGE 85
77 Given we determine ICi" 1 " by using the following equation: fc(n+ 1) _ \k {n+1) 1 J T LK (s,r) J T _i J t+ 1 L K (s,t) J t \k {n+ } ] } [K (s,r) J 2r _i >2t2 0 \k {n+1) ] Â• \k (n+ } ) ] 1 K (s,t) i 2r _ 1 \ S > T ) T+l 0 0 0 0 0 0 0 [iu( n + 1 )l [K (s,r) l T . (6.27) Finally, is defined as a block diagonal matrix with {/Ci" +1 ^} along its diagonal: ]Q(n+l) IC[ n+1) 0 0 0 o /c { 2 n+1) o o (6.28) 0 0 0 o oo /dÂ” +1) Repeating Steps 1 and 2 generate the estimates of K and x. Once the estimate of K is available, denoted by /C, a regularized image reconstruction algorithm, such as the PML, APML, and QEP algorithm, can be used to estimate the emission mean vector x. More specifically, we first let our estimate for 'p lvnc be V = 1CV C . Then, the probability matrix V is used in the image reconstruction algorithm of choice. In summary, the Steps of the PCiPS algorithm are: for n = 0, 1, 2, . . . Â• Step 1 Get an initial estimate for the emission mean vector x^ > 0 and an initial estimate for the scatter matrix K , ^ Â• Step 2 Get the current estimate of the emission mean vector x < ' n+1 ' > using K^ n \ which is the current estimate of the scatter matrix fC: ^(n+1) _ ar g m j n KL (d, K^V c x + p) . (6.29) *>o Â• Step 3 For s Â— 1, 2, . . . , S, get using (6.26).
PAGE 86
78 Â• Step 4 For s = 1,2 , . . . ,S, normalize using (6.25) to get Â• Step 5 For s = 1,2 , . . . , 5, get JC iÂ” +1) using (6.27). Â• Step 6 Get /G" +1 ) using (6.28). Â• Step 7 Repeat Steps 2 through 6 for a chosen number, M, of iterations. Â• Step 8 Define P " ' = /c( M+1 )p c and use f> j n the APML algorithm or some other algorithm of choice.
PAGE 87
CHAPTER 7 SIMULATIONS AND EXPERIMENTAL STUDY To evaluate the algorithms in Chapters 3, 4, and 5, and the method in Chapter 6, we applied them to real thorax phantom data and compared them quantitatively and qualitatively to certain existing algorithms. Also, in Section 7.1 simulation studies with computergenerated synthetic data are presented for the PML and QEP algorithms in Chapters 3 and 5, respectively. It should be noted that simulation results with the synthetic data have limitations because they are generated under the assumption that the system model for emission data in Section 1.4 is exactly correct. However, the system model is not perfect due to errors discussed in Section 1.3. Thorax phantom data was obtained from the PET laboratory at the Emory University School of Medicine. The phantom was filled with 2[ 18 F]fluoro2deoxydglucose ([ 18 F]FDG) and scanned using a SiemensCTI ECAT EXACT [model 921] scanner in slicecollimated mode (i.e., septaextended mode). Thirty independent data sets were generated from multiple scans of duration 7 minutes. Fifteen realizations of 14 min data were generated by adding nonoverlapping two 7 min data sets. The [ 18 F]FDG concentration for the heart wall, heart cavity, liver, three tumors, and thorax cavity of the thorax phantom by Data Spectrum Inc., were 0.72/xCi/ml, 0.23/rCi/ml, 0.72/iCi/ml, 2.01//Ci/ml, and 0.24/rCi/ml, respectively. The lungs, which contained styrofoam beads, were filled with a 0.25/iCi/ml solution of [ 18 F]FDG. The concentrations were chosen to mimic those observed in wholebody scans. The tumors were of size 1 cm, 1.5 cm, and 2 cm. The sinogram consists of 160 radial bins and 192 angles. The physical dimensions of the image space is 43.9 x 43.9 cm 2 and the reconstructed images contain 128 x 128 voxels (voxel size is 3.43 x 3.43 79
PAGE 88
80 mm 2 ). Two planes (10 and 21) were considered in the experiments. Plane 10 contains activity due to the heart, lungs, spine, and background, while plane 21 contains activity due to the heart, two tumors (1.5 cm and 2.0 cm), and background. The total number of prompts for planes 10 and 21 was 397, 000 and 340, 000, respectively, for 14 minute data. The randoms makeup about 10% and 12% of the data for planes 10 and 21, respectively. The probability matrix V was computed using the angleofview method [16] with corrections for errors due to attenuation and detector inefficiency. To get the attenuation correction factors, postinjected transmission scan data was collected for three minutes and the attenuation correction method by Anderson et al. [9] was employed. A normalization file was used to correct for detector inefficiency. Finally, the randoms were used as noise free estimates of the mean numbers of accidental coincidences. For all of the experiments and simulations, we used a uniform initial estimate (all voxels equal J), the eight nearest neighbors of the j th voxel were used for Nj, and the weights, {iHjk}are one for horizontal and vertical nearest neighbors and l/\/2 for diagonal nearest neighbors. In Section 7.1, we applied the PML and QEP algorithms to real thorax phantom data and computergenerated synthetic data, and compared them quantitatively and qualitatively. Then, performance of the APML algorithm will be evaluated in Section 7.2. Finally, experimental results with the probability estimation method in Chapter 6 will be presented in Section 7.3. 7.1 Regularized Image Reconstruction Algorithms In this section, we compare the PML and QEP algorithms, quantitatively and qualitatively, to the MLEM algorithm and a penalized weighted leastsquares (PWLS) algorithm [42], Two adhoc forms of regularization include the postfiltering of MLEM estimates and early termination of the MLEM algorithm (usually quite far from the
PAGE 89
81 MLEM estimate). Given their simplicity, we also compared the postfiltering (MLEMF) and earlystopping (MLEMS) strategies to the proposed algorithms. Quantitative comparisons were made using contrast as a figureofmerit: Contrast 4 ' ~ Mfl l , (7.1) M b where M bo i and Mb denote the mean of a chosen regionofinterest (ROI) and background, respectively. We define another figureofmerit that quantifies the distinguishability of the two tumors: Distinguishability = ^ , (7.2) \lVl r p Â— IVIb I where Mr and Mj denote the mean activity of the two tumor regions and intermediate region between the two tumors, respectively. If two tumors overlap each other, then Mj = M T and the distinguishability will be zero. On the other hand, if intermediate region between the two tumors has the same mean as the background, then the distinguishability will be one. For image comparisons, converged images were used for the MLEM, PML, and PWLS algorithms. The QEP images result by running the QEP algorithm for the same number of iterations as the PML algorithm. This was necessary because the QEP algorithm does not have a single objective function. Recall that the QEP algorithm defines a new objective function to be minimized at each iteration. For the PML, QEP, and PWLS images, the penalty parameter, (3, was chosen in such a way that the standard deviation of their softtissue (i.e. , background) regions were equal. In this sense, the algorithms are Â“balancedÂ” with respect to (3. For the MLEMS and MLEMF images, early MLEM iterates and filtered converged MLEM images were obtained that matched the standard deviation of the softtissue regions in the PML, QEP, and PWLS images. To filter the MLEM image, we used 5x5 Gaussian Liters with different standard deviation values.
PAGE 90
82 7.1.1 Synthetic Data In this subsection, we present simulation results for software phantoms. Fifty realizations were used in the simulation study. To generate emission data, a software phantom was forward projected (i.e., Vx) using the V matrix, where it was assumed that there are no errors except accidental coincidences. Then, for each bin, the prompts and randoms were generated using pseudorandom Poisson variates with mean [Px \ * + p and p, respectively. The constant p was chosen such that the mean accidental coincidence rate was approximately 10%. The mean number of prompts and randoms are about 550, 000 and 50, 000, respectively. For simulation studies, dimensions of the image space follow the corresponding ones of the real phantom image space. The total number of intensities within a software phantom was about 500, 000. We first consider a tumor phantom. Figure 7 1 shows a software phantom that consists of two tumors (1.7 cm and 2.4 cm in diameter) in a uniform circular background with a diameter of 30.5 cm, where two tumors and background intensities are 7 x 74 and 74, respectively (tumor contrast is 6). The image also depicts regions for contrast and distinguishability calculation. The intermediate region between the two tumors consists of 14 voxels and the large and small tumors consist of 45 and 21 voxels, respectively. For the tumor phantom in Figure 7 1, we used the logcosh penalty function A (t) = log(cosh()) with 5 = 20 in the PML and QEP algorithms, while the quadratic penalty function was used in the PWLS algorithm. For the QEP algorithm, 77(f) = Â£tanh() was used with Â£ = 150 unless noted. The parameters 5 and Â£ were chosen experimentally. Figure 7 2 is a plot of the images obtained by the MLEM, MLEMS, MLEMF, PML, QEP, and PWLS algorithms when the tumor phantom in Figure 7 1 was used.
PAGE 91
83 The MLEM image is the 500 f/ ' MLEM iterate and 200 iterations were used to reconstruct the PML, QEP, and PWLS images. For the PML, QEP, and PWLS images, / 3 was chosen such that the standard deviation of the background was approximately 12. The MLEMS image is the 20 th MLEM iterate. To get the MLEMF image, the MLEM image in Figure 7 2 (a) was filtered once using a 5 x 5 Gaussian filter with standard deviation of 1.27 in voxels. As stated, all of the images in Figure 7 2 have the same background standard deviation except for the MLEM image. The MLEM image in Figure 7 2 (a) is considerably noisy (background standard deviation is about 78) compared to the other images. Figures 7 2 (d) and (e) illustrate that the PML image and the QEP image are smooth and, at the same time, the tumors in the images are resolvable and differ greatly from the background. On the other hand, Figures 7 2 (c) and (f) demonstrate that the images generated by the MLEMF and PWLS algorithm are too smooth, especially near the boundary of the tumors. Figure 7 2 (b) shows that edges of the tumors in the MLEMS image are not as clear as the ones in the PML and QEP images. In Figure 7 3, the images in Figure 7 2 are plotted with their own dynamic range. For the images in Figure 7 2, the contrast of the QEP image was 1%, 12%, 21%, 4%, and 22% higher than the MLEM, MLEMS, MLEMF, PML, and PWLS images, respectively, for the large tumor. The increased contrast of the QEP image for the small tumor with respect to the MLEM, MLEMS, MLEMF, PML, and PWLS images was Â—4%, 16%, 31%, 5%, and 34%, respectively. The increased tumor distinguishability of the QEP image with respect to the MLEM, MLEMS, MLEMF, PML, and PWLS images was Â—5%, 16%, 34%, 5%, and 40%, respectively. The MLEM image outperformed the QEP image in contrast and distinguishability comparison. However, as can be seen in Figure 7 2 (a), the MLEM image is extremely noisy.
PAGE 92
84 Figures 7 4 (a) and (b) are line plots (the row is shown in Figure 7 1) of the images in Figure 7 2. For the row under consideration, it can be seen from the line plots that the PML and QEP images have a higher degree of contrast than the other images, except for the MLEM image. And. the edges in the QEP image are sharper than those in the PML image. As expected, the MLEM image is excessively noisy from the line plot. Figures 7 5 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifty realizations, respectively, when the tumor phantom in Figure 7 1 was used. Further, a plot of the average standard deviation of the large tumor versus the average background standard deviation for the fifty realizations is shown in Figure 7 5 (c). Finally, in Figure 7 5 (d), a plot of the average distinguishability of two tumors versus the average background standard deviation for fifty realizations is shown. As can be seen from Figures 7 5 (a), (b), and (d), the contrast curves and the distinguishability curve of the QEP algorithm lie above the curves of the other algorithms for comparable Â“background noiseÂ”. Thus, for a fixed level of comparable background noise, the QEP images, on average, have the greatest contrast and distinguishability. The average standard deviation curves of the PML and QEP algorithms in Figure 7 5 (c) generally lie below the corresponding curves of the other algorithms for reasonably small background noise. To see where the PML and QEP algorithms break down in terms of contrast, we performed simulations using the synthetic tumor phantom in Figure 7 1 with four different tumor contrast values (tumor contrast equals 3, 1.5, 0.75, and 0.5). For the PML and QEP algorithms, j3 Â— 1/16 and 1/32 were used, respectively, and 200 iterations were used. For the QEP algorithm, Â£ = 150 was used for tumor contrast of 3, whereas Â£ Â— 80 was used for the other tumor contrast values (i.e., tumor contrast equals 1.5, 0.75, and 0.5). The parameters (3 and Â£ were chosen
PAGE 93
85 experimentally. Figures 7 6 (a) and (b) are the PML and QEP images, respectively, obtained by using the phantom in Figure 7 1 with tumor contrast of 3. As can be seen in the figures, when tumor contrast was 3, the tumors in the PML and QEP images are clearly resolvable. Figures 7 6 (c) and (d) are the PML and QEP images, respectively, obtained by using the phantom in Figure 7 1 with tumor contrast of 1.5. From the figures, the tumors in the PML and QEP images are clear enough when tumor contrast was 1.5. It should be mentioned that the images in Figure 7 6 have the same standard deviation of the background that is approximately 10. Figures 7 7 (a) and (b) are the PML and QEP images, respectively, obtained by using the phantom in Figure 7 1 with tumor contrast of 0.75. From the figures, the tumors in the PML and QEP images are still resolvable when tumor contrast was 0.75. Consider the PML and QEP images in Figures 7 7 (c) and (d), respectively, that are obtained by using the phantom in Figure 7 1 with tumor contrast of 0.5. The tumors in the PML and QEP images are hardly resolvable which implies that the PML and QEP algorithms break down when tumor contrast was 0.5. The images in Figure 7 7 have the same standard deviation of the background that is approximately 9. Figures 7 8 (a), (b), (c), and (d) are plots of the average contrast of the small tumor versus the average background standard deviation using fifty realizations, when tumor contrast of the phantom in Figure 7 1 were 3, 1.5, 0.75, and 0.5, respectively. As can be seen from Figures 7 8 (a) and (b), the contrast curves of the QEP algorithm he above the curves of the PML algorithm for a fixed background noise when tumor contrast equals 3 and 1.5. For tumor contrast of 0.75 and 0.5, the curves of the PML and QEP algorithm coincide as can be seen in Figures 78 (c) and (d) which implies that the PML and QEP algorithms perform similar to each other when tumor contrast is small enough. To see where the PML and QEP algorithms break down in terms of spacing between two tumors, we performed simulations using four different synthetic tumor
PAGE 94
86 phantoms where each phantom has different tumor spacing. We compared the PML and QEP images both visually and in terms of distinguishability. Figures 7 9 (a), (b), and (c) show software phantoms that consist of two tumors (each tumor equals 3x3 voxels). For the phantoms in Figures 7 9 (a), (b), and (c), tumor contrast is 3 and the spacing between two tumors is 2, 3, and 4 voxels, respectively. Figure 7 9 (d) shows a software phantoms that consists of two tumors (each tumor equals 3x3 voxels) with the spacing of 2 voxels, where tumor contrast is 6. The images in Figure 7 9 also depict regions for distinguishability calculation. The intermediate regions between the two tumors consist of 6, 8, 10, and 6 voxels for the phantoms in Figures 7 9 (a), (b), (c) and (d), respectively. Figure 7 10 is a plot of the PML and QEP images obtained by using the phantoms in Figures 7 9 (a) and (b). Figure 7 11 is a plot of the PML and QEP images obtained by using the phantoms in Figures 7 9 (c) and (d). The PML and QEP images in Figures 7 10 and 7 11 were from 200 iterations. The PML and QEP images in Figures 7 10 and 7 11 have the same standard deviation of the background that is approximately 10 and 9, respectively. For the PML and QEP images in Figures 7 10 and 7 11, (3 = 1/16 and 1/32 were used, respectively. Figure 7 10 indicates that the PML and QEP algorithms generate images where the tumors are clearly separated when the spacing between the tumors is 3 and 4 in voxels. As can be seen in Figures 7 11 (a) and (b), the PML and QEP algorithms were not able to resolve the two tumors when the spacing between the tumors is 2 in voxels. However, when the tumor contrast was increased, the PML and QEP algorithms worked well when the spacing between the tumors is 2 in voxels as shown in Figures 7 11 (c) and (d). Figure 7 12 is a plot of the average distinguishability of two tumors versus the average background standard deviation using fifty realizations, when the tumor phantoms in Figure 7 9 were used. As can be seen from Figures 7 12 (a), (b), (c), and (d), the distinguishability curves of the QEP algorithm lie above the curves of the
PAGE 95
87 Figure 7 1: A software tumor phantom is shown. Contrast of the tumors in the phantom is 6. The regions surrounded by the dotted and dashed lines define the tumor intermediate region (i.e., M/) and background region, respectively. PML algorithm for a fixed background noise. Thus, for a fixed level of background noise, the QEP images, on average, have greater distinguishability. 7.1.2 Real Data In this subsection, we present experimental results using real phantom data for plane 21. Unless noted, the data are from 14 minute scans. The image in Figure 7 13, which was produced by averaging fifteen converged MLEM images, depicts the regions that were used in the contrast and distinguishability calculations. In the large and small tumor ROIs and intermediate region, the number of voxels were 24, 12, and 16, respectively. In the PML and QEP algorithms, we used the logcosh penalty function A (t) = log(cosh()) with 5 = 50. Since noise in real data is stronger than the synthetic data due to errors (e.g., scatter), we increased the value of the parameter 5 to reduce background noise (observe that we used 5 = 20 for the synthetic phantom in Figure 7 1). For the QEP algorithm, r](t) = Â£ tanh() was used with Â£ = 500 and 1000 for 7 minute and 14 minute real data, respectively. We varied Â£ because the data sets lead to reconstructed images that have different edge Â“heightsÂ”. As in the simulations
PAGE 96
88 (a) MLEM (b) MLEMS m m (c) MLEMF (d) PML (e)QEP (f) PWLS Figure 7 2 : Comparison of emission images when the synthetic phantom in Figure 7 1 was used: (a) MLEM image, (b) MLEMS image, (c) MLEMF image, (d) PML image, (e) QEP image, and (f) PWLS image. The images in (a) and (b) are from 500 and 20 iterations, respectively, while the images in (d), (e), and (f) are from 200 iterations. The image in (c) was obtained from filtering the MLEM image once with a 5 x 5 Gaussian filter with standard deviation of 1.27 in voxels. For the PML, QEP, and PWLS images, /? was chosen in such a way that the standard deviation of the background is approximately 12. Specifically, (3 = 0.0415, 0.021, and 0.006 for the PML, QEP, and PWLS images, respectively. The standard deviation of the background of images in (b) and (c) is also approximately 12. For display purposes, all the images were adjusted so that they have the same dynamic range.
PAGE 97
89 (a) MLEM (b) MLEMS (c) MLEMF (d) PML Â• Â• (e) QEP (f) PWLS * * Figure 7 3: The images in Figure 7 2 are shown with their own dynamic range.
PAGE 98
90 (a) (b) Figure 74: A line plot comparison of the reconstructed images in Figures 72 (a), (b), and (c) is shown in (a). A line plot comparison of the reconstructed images in Figures 7 2 (d), (e), and (f) is shown in (b).
PAGE 99
91 (c) Large Tumor (d) Tumors Figure 7 5: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively, when the synthetic phantom in Figure 7 1 was used. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifty synthetic data realizations were used in the study. For the MLEMS curves, the images from iterations 5 Â— 160 were used. For the MLEMF curves, the MLEM images were filtered once by 5 x 5 Gaussian filters with a standard deviation range of 0.44 Â— 3.0 voxels (each voxel is 3.43 x 3.43 mm 2 ). For the PML, QEP, and PWLS algorithms, the images were reconstructed using two hundred iterations for (3 = 2 1 Â— 2Â“ 9 , 2~ 4 Â— 2~ 9 , and 2" 4 Â— 2 12 , respectively.
PAGE 100
92 (a) PML,Contrast=3 (b) QEP,Contrast=3 (c) PML, Contrasts. 5 (d) QEP, Contrasts. 5 Figure 7 6: Comparison of emission images: (a) PML image and (b) QEP image when tumor contrast of the phantom in Figure 7 1 was 3, and (c) PML image and (d) QEP image when tumor contrast of the phantom in Figure 7 1 was 1.5. All images are from 200 iterations and they have the same background standard deviation that is approximately 10. For the PML and QEP images, f3 = 1/16 and 1/32 were used, respectively. For the QEP images in (b) and (d), Â£ = 150 and Â£ = 80 were used, respectively. For display purposes, each set of images (i.e., (a) and (b), (c) and (d)) were adjusted so that they have the same dynamic range.
PAGE 101
93 (a) PML,Contrast=0.75 (b) QEP,Contrast=0.75 (c) PML, Contrasts. 5 (d) QEP,Contrast=0.5 Figure 7 7: Comparison of emission images: (a) PML image and (b) QEP image when tumor contrast of the phantom in Figure 7 1 was 0.75, and (c) PML image and (d) QEP image when tumor contrast of the phantom in Figure 7 1 was 0.5. All images are from 200 iterations and they have the same background standard deviation that is approximately 10. For the PML and QEP images, (5 = 1/16 and 1/32 were used, respectively. For the QEP images in (b) and (d), Â£ = 80 was used. For display purposes, each set of images (i.e., (a) and (b), (c) and (d)) were adjusted so that they have the same dynamic range.
PAGE 102
94 (a) Contrast=3 (b) Contrasts. 5 (c) Contrasts. 75 (d) Contrast=0.5 Figure 7 8: Plots of the average contrast of the small tumor versus the average background standard deviation are shown in (a), (b), (c), and (d) when the synthetic phantom in Figure 7 1 was used with tumor contrast of 3, 1.5, 0.75 and 0.5, respectively. Fifty synthetic data realizations were used in the study. For the PML and QEP algorithms, the images were reconstructed using two hundred iterations for (3 = 2 _1 Â— 2~ 9 . For the QEP curves in (b), (c), and (d), Â£ = 80 was used, whereas Â£ = 150 was used for the QEP curves in (a).
PAGE 103
95 (a) Distance=4,Contrast=3 (b) Distance=3,Contrast=3 I I I 1 I J I J : I 1 I 1 I J I J (c) Distance=2,Contrast=3 (d) Distance=2,Contrast=6 I 1 I J I 1 I I Figure 7 9: Software tumor phantoms are shown for different spacing between two tumors. For the phantoms in (a), (b), (c), and (d), spacing between tumors are 2, 3, 4, and 2 in voxels, respectively. Tumor contrast is 3, 3, 3, and 6 for the phantoms in (a), (b), (c), and (d), respectively. The regions surrounded by the dotted and dashed lines define the tumor intermediate region (i.e., Mj ) and background region, respectively. The images are shown with their own dynamic range.
PAGE 104
96 (a) PML,Distance=4,Contrast=3 (b) QEP,Distance=4,Contrast=3 (c) PML,Distance=3,Contrast=3 (d) QEP,Distance=3,Contrast=3 Figure 7 10: Comparison of emission images: (a) PML image and (b) QEP image when the phantom in Figure 7 9 (a) was used, and (c) PML image and (d) QEP image when the phantom in Figure 7 9 (b) was used. All images are from 200 iterations and they have the same standard deviation of the background that is approximately 10. For the PML and QEP images, (3 = 1/16 and 1/32 were used, respectively. For display purposes, each set of images (i.e., (a) and (b), (c) and (d)) were adjusted so that they have the same dynamic range.
PAGE 105
97 (a) PML,Distance=2,Contrast=3 (b) QEP,Distance=2,Contrast=3 (c) PML,Distance=2,Contrast=6 (d) QEP,Distance=2,Contrast=6 Figure 7 11: Comparison of emission images: (a) PML image and (b) QEP image when the phantom in Figure 7 9 (c) was used, and (c) PML image and (d) QEP image when the phantom in Figure 7 9 (d) was used. All images are from 200 iterations and they have the same standard deviation of the background that is approximately 9. For the PML and QEP images, (3 = 1/16 and 1/32 were used, respectively. For display purposes, each set of images (i.e., (a) and (b), (c) and (d)) were adjusted so that they have the same dynamic range.
PAGE 106
98 (a) Distance=4, Contrast=3 (b) Distance=3, Contrast=3 (c) Distance=2, Contrast=3 (d) Distance=2, Conlrast=6 Figure 7 12: Plots of the average distinguishability of two tumors versus the average background standard deviation are shown in (a), (b), (c), and (d) when the synthetic phantoms in Figures 7 9 (a), (b), (c), and (d) were used, respectively. Fifty synthetic data realizations were used in the study. For the PML and QEP algorithms, the images were reconstructed using two hundred iterations for /3 = 2 1 Â— 2Â“ 9 .
PAGE 107
99 with the synthetic phantom, the quadratic penalty function was used in the PWLS algorithm. Note, the parameters 6 and Â£ were chosen experimentally. Figure 7 14 is a plot of the images for plane 21 that were obtained by the MLEM, MLEMS, MLEMF, PML, QEP. and PWLS algorithms. The number of iterations used to reconstruct the MLEM, PML, and PWLS images was 500, 200, and 200, respectively. Figure 7 15 shows that the PML and QEP images for different iteration numbers when a 14 minute data set for plane 21 was used. As can be seen in Figure 7 15, early iterates of the PML and QEP algorithms can be used because 100 t/l iterates are not much different from 500 t/l iterates. As in the simulated data, the number of iterations for the QEP algorithm was set to the number of iterations used for the PML algorithm. For the PML, QEP, and PWLS images in Figure 7 14, (3 was chosen so that the standard deviation of their backgrounds are approximately same (background standard deviation is approximately 68). Using 9 iterations and repeating the application of a 5 x 5 Gaussian filter two times yielded the MLEMS and MLEMF images with a background standard deviation of 68. Figures 7 14 (d) and (e) illustrate that the tumors in the PML image and the QEP image are resolvable and differ significantly from the background. On the other hand, the tumors are not as distinct in the images produced by the MLEMS, MLEMF, and PWLS algorithms (see Figures 7 14 (b), (c), and (f)). As expected, the MLEM image in Figure 7 14 (a) is considerably noisier and grainier than the other images. In Figure 7 16, the images in Figure 7 14 are plotted with their own dynamic range. The true contrast of the small and large tumors was 7.38. Due to finite resolution effects, all of the algorithms underestimate the contrast [68]. Consequently, the algorithm that produces the greatest contrast is to be preferred. For the images in Figure 7 14, the large tumor contrast of the QEP image was Â—14%, 37%, 32%, 3%, and 34% higher than the MLEM, MLEMS, MLEMF, PML, and PWLS images,
PAGE 108
100 respectively. The increased contrast of the QEP image for the small tumor with respect to the MLEM, MLEMS, MLEMF, PML, and PWLS images was Â—28%, 46%, 46%, 9%, and 43%, respectively. The increased tumor distinguishability of the QEP image with respect to the MLEM, MLEMS, MLEMF, PML, and PWLS images was Â—8%, 20%, 49%, 5%, and 35%, respectively. Although the contrast for Â“trueÂ” MLEM images (i.e., images obtained when the MLEM algorithm is not terminated early) can be quite large, the noise level of these images limits their practical use. Figure 7 17 is a line plot (the row is shown in Figure 7 13) of the images in Figures 7 14 (b), (d), (e), and (f) (the image in Figure 7 14 (a) is too noisy and the image in Figure 7 14 (c) is too smooth). For the row under consideration, it can be seen from the line plots that the PML and QEP images have a higher degree of contrast than the other images. And, the edges in the QEP image are sharper than those in the PML image. Figures 7 18 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifteen realizations for plane 21, respectively. Further, plots of the average standard deviation of the large tumor and the average distinguishability of two tumors versus the average background standard deviation for the fifteen realizations are shown in Figures 7 18 (c) and (d), respectively. Just like the contrast curves in the synthetic data simulations, the contrast curves of the QEP algorithm lie above the corresponding curves of the other algorithms. Thus, for a fixed level of background noise, the QEP images, on average, have the greatest contrast. Not too surprising, the large tumor average standard deviation curves of the QEP algorithm in Figure 7 18 (c) generally lie above the corresponding curves of the PML algorithm. The PWLS algorithm produced large tumor standard deviation curves that lie below the corresponding curves generated by the other algorithms. However, the degree of smoothing produced by the PWLS algorithm was too great. This claim is supported qualitatively by Figure
PAGE 109
101 7 14 (f) and quantitatively by the contrast plots in Figures 7 18 (a) and (b). The distinguishability curves of the PML and QEP algorithms lie above the curves of the other algorithms when the background standard deviation is less than 300. Above that point, the distinguishability curves of the PML and QEP algorithms lie below the curves of the other algorithms except the PWLS curve. However, images with background standard deviation greater than 300 are too noisy for practical use. Thus, for practical noise levels, the PML and QEP images, on average, have the greatest contrast and distinguishability. To see how the algorithms would perform in shorter duration protocols, we now consider fifteen realizations of 7 minute real phantom data for plane 21. Figures 7 19 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifteen realizations, respectively. A plot of the average standard deviation of the large tumor versus the average background standard deviation for the fifteen realizations is shown in Figure 7 19 (c). Also, in Figure 7 19 (d), we provide a plot of the average distinguishability of two tumors versus the average background standard deviation for fifteen realizations. As in the experiments with the 14 minute real phantom data, it is evident that the PML and QEP algorithms outperform the MLEMS, MLEMF, and PWLS algorithms in terms of contrast recovery and tumor distinguishability. Figure 7 20 is a plot of the images obtained by the MLEM, MLEMS, MLEMF, PML, QEP, and PWLS algorithms for a 7 minute data set for plane 21. The number of iterations used to reconstruct the MLEM, PML, QEP, and PWLS images was 500, 200, 200, and 200, respectively. For the PML, QEP, and PWLS images in Figure 7 20, (3 was chosen so that the standard deviation of their backgrounds are approximately 38. Using 8 iterations and repeating the application of a 5 x 5 Gaussian filter two times yielded the MLEMS and MLEMF images with a background standard deviation of 38. As in the experiments with the 14 minute data, the tumors in the PML image
PAGE 110
102 Figure 7 13: The mean of 15 MLEM emission images reconstructed from 14 minute data for plane 21. The boxes indicate the regions used to compute the contrast and distinguishability of the tumors. The solid and dashed lines define the tumors and background regions, respectively. The region surrounded by the dotted lines defines the tumor intermediate region (i.e., Mj). The dotted line indicates the row chosen for the line plots. and the QEP image are resolvable and differ significantly from the background. In Figure 7 21, the images in Figure 7 20 are plotted with their own dynamic range. For the images in Figure 7 20, the large tumor contrast of the QEP image was 20%, 37%, 25%, 5%, and 25% higher than the MLEM, MLEMS, MLEMF, PML, and PWLS images, respectively. The increased contrast of the QEP image for the small tumor with respect to the MLEM, MLEMS, MLEMF, PML, and PWLS images was Â—58%, 34%, 27%, 5%, and 23%, respectively. The increased tumor distinguishability of the QEP image with respect to the MLEM, MLEMS, MLEMF, PML, and PWLS images was Â—30%, 12%, 42%, 6%, and 25%, respectively. Figure 7 22 is a line plot (the row is shown in Figure 7 13) of the images in Figures 7 20 (b), (d), (e), and (f). As in the experiments with the 14 minute data, it can be seen from the line plots that the PML and QEP images have a higher degree of contrast than the other images.
PAGE 111
103 (a) MLEM (b) MLEMS (c) MLEMF (d) PML (e) QEP (f) PWLS Figure 7 14: Comparison of emission images when a 14 minute real phantom data for plane 21 was used: (a) MLEM image, (b) MLEMS image, (c) MLEMF image, (d) PML image, (e) QEP image, and (f) PWLS image. The images in (a) and (b) are from 500 and 9 iterations, respectively, while the images in (d), (e), and (f) are from 200 iterations. The image in (c) was obtained from filtering the MLEM image two times with a 5 x 5 Gaussian filter with standard deviation of 1.95 in voxels. For the PML, QEP, and PWLS images, f3 was chosen in such a way that the standard deviation of the background is approximately 68. Specifically, (3 = 2Â“ 6 , 2 7 , and 0.00017 for the PML, QEP, and PWLS images, respectively. Note, the standard deviation of the background of images in (b) and (c) is also approximately 68. For display purposes, all the images were adjusted so that they have the same dynamic range except the MLEM image because it has very wide dynamic range.
PAGE 112
104 (a) PML100 (b) QEP100 * > (c) PML200 (d) QEP200 %Â•* (e) PML500 (f) QEP500 Â«4 Â«4 Figure 7 15: Iteration comparison of emission images reconstructed from using a 14 minute real phantom data for plane 21: (a) PML image using 100 iterations, (b) QEP image using 100 iterations, (c) PML image using 200 iterations, (d) QEP image using 200 iterations, (e) PML image using 500 iterations, and (f) QEP image using 500 iterations. For the PML and QEP images, /3 = 2~ 6 and 2~ 7 , respectively. For display purposes, all the images were adjusted so that they have the same dynamic range.
PAGE 113
105 (a) MLEM (b) MLEMS (c) MLEMF (d) PML (e)QEP (f) PWLS Figure 7 16: Emission images in Figure 7 14 are shown with their own dynamic range.
PAGE 114
106 (a) (b) Figure 7 17: A line plot comparison of the reconstructed images in Figures 7 14 (a), (b), and (c) is shown in (a). A line plot comparison of the reconstructed images in Figures 7 14 (d), (e), and (f) is shown in (b).
PAGE 115
107 (a) Large Tumor (b) Small Tumor (c) Large Tumor (d) Tumors Figure 7 18: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifteen realizations were used and 14 minute real phantom data for plane 21 was used in the study. For the MLEMS curves, the images from iterations 5 Â— 160 were used. For the MLEMF curves, the MLEM images were filtered once by 5 x 5 Gaussian filters with a standard deviation range of 0.44 Â— 3.0 voxels (each voxel is 3.43 x 3.43 mm 2 ). For the PML, QEP, and PWLS algorithms, the images were reconstructed using two hundred iterations for (3 = 2Â“ 4 Â— 2Â“ 12 , 2 4 Â— 2 12 , and 2 12 Â— 2~ 20 , respectively.
PAGE 116
108 (a) Large Tumor (b) Small Tumor (c) Large Tumor (d) Tumors Figure 7 19: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifteen realizations were used and 7 minute real phantom data for plane 21 was used in the study. For the MLEMS curves, the images from iterations 5 Â— 160 were used. For the MLEMF curves, the MLEM images were filtered once by 5 x 5 Gaussian filters with a standard deviation range of 0.44 Â— 3.0 voxels (each voxel is 3.43 x 3.43 mm 2 ). For the PML, QEP, and PWLS algorithms, the images were reconstructed using two hundred iterations for (5 = 2~ 4 Â— 2Â“ 12 , 2~ 4 Â— 2 12 , and 2 12 Â— 2~ 20 , respectively.
PAGE 117
109 (a) MLEM (b) MLEMS (c) MLEMF (d) PML (e)QEP (f) PWLS Figure 7 20: Comparison of emission images when a 7 minute real phantom data for plane 21 was used: (a) MLEM image, (b) MLEMS image, (c) MLEMF image, (d) PML image, (e) QEP image, and (f) PWLS image. The images in (a) and (b) are from 500 and 8 iterations, respectively, while the images in (d), (e), and (f) are from 200 iterations. The image in (c) was obtained from filtering the MLEM image two times with a 5 x 5 Gaussian filter with standard deviation of 1.95 in voxels. For the PML, QEP, and PWLS images, (3 was chosen in such a way that the standard deviation of the background is approximately 38. Specifically, (3 = 0.029, 0.0145, and 0.000155 for the PML, QEP, and PWLS images, respectively. Note, the standard deviation of the background of images in (b) and (c) is also approximately 38. For display purposes, all the images were adjusted so that they have the same dynamic range except the MLEM image because it has very wide dynamic range.
PAGE 118
no (a) MLEM (b) MLEMS (c) MLEMF (d) PML 4k... %Â• (e)QEP (f) PWLS Figure 7 21: Emission images in Figure 7 20 are shown with their own dynamic range.
PAGE 119
Ill (a) (b) Figure 7 22: A line plot comparison of the reconstructed images in Figures 7 20 (a), (b), and (c) is shown in (a). A line plot comparison of the reconstructed images in Figures 7 20 (d), (e), and (f) is shown in (b).
PAGE 120
112 7.2 APML algorithm To evaluate the APML algorithm in Chapter 4, we applied it to plane 21 from the real thorax phantom data (unless noted, the data are from 14 minute scans) and compared it to the PML algorithm and an algorit hm by Alin and Fessler [35] called the block sequential regularized expectationmaximization (BSREM)II algorithm. The BSREMII algorithm is a straightforward modification of Ahn and FesslerÂ’s BSREMI algorithm [35]. The BSREMII algorithm results by setting any negative element of a BSREMI iterate to a small positive number. The BSREMI algorithm is based on the BSREM algorithm by De Pierro and Yamagishi [33] and the orderedsubsets idea originally put forth by Hudson and Larkin [45]. We used the penalty parameter (3 = 0.02 and 0.04 for 14 minute data and 7 minute data, respectively, and the logcosh function A (t) = log(cosh()) with 5 = 50 as the penalty function. The values of (3 and 5 were chosen experimentally such that the reconstructed images were visually Â“goodÂ”. For the BSREMII algorithm, 8 and 24 subsets, and the ordering rule suggested by Ahn and Fessler [35] (i.e., make projections in two successive subsets as perpendicular as possible each other) were used. The ordering rule was originally introduced by Herman and Meyer [69], Additionally, the relaxation parameter rule specified by Ahn and Fessler [35] was used in the implementation. In Figure 7 23, plots of the cost versus CPUtime are shown for the APML algorithm for different values of e when 14 minute data were used. As can be seen in the figure, the convergence rate of the APML algorithm depends on e. Moreover, e = 0 generated the slowest convergence rate for the considered planes. These claims are supported by Figure 7 24 in which plots of the number of iterations versus e are shown for the APML algorithm to decrease the PML objective function to 4>(x*), where x* is the 5000^ iterate of the PML algorithm. Practically speaking, for the planes considered, x* is the minimizer of the PML objective function because the PML
PAGE 121
113 algorithm did not decrease the PML objective function after about 1000 iterations until 5000 iterations. For 0 < e < 0.01. the number of iterations required for the APML algorithm to decrease the PML objective function to (a;*) varied from 83 to 125 and 113 to 185 for planes 10 and 21. respectively. For e = 0, the number of iterations required were 359 and 652 for planes 10 and 21, respectively. For 0 < t < 0.01, the decreased convergence time (in CPU seconds) of the APML algorithm with respect to the PML algorithm was 71% Â— 81% and 75% Â— 85% for planes 10 and 21, respectively. For e = 0, the decreased convergence time of the APML algorithm with respect to the PML algorithm was 17% and 12% for planes 10 and 21, respectively. In Figure 7 25, plots of the cost versus CPUtime are shown for the PML, APML, and BSREMII algorithms. As can be seen in the figure, the APML algorithm for e = 0.01 decreased the PML objective function more than about three times faster than the PML algorithm. The APML algorithm for e = 0.01 needed about 27% and 18% of the CPUtime that was necessary for the PML algorithm to decrease the PML objective function to for planes 10 and 21, respectively. At the early iterations (CPUtime less than about 10 seconds), the BSREMII algorithm for 8 subsets produced the greatest decrease. However, after about 10 seconds in CPUtime, the APML algorithm for e = 0.01 decreased the PML objective function faster than the other algorithms. It should be pointed out that the BSREMII algorithm did not decrease the PML objective function to $(x*) until 10,000 iterations, whereas (a:*) was obtained by the APML algorithm for e = 0.01 with 116 and 136 iterations for planes 10 and 21, respectively. Moreover, the convergence rate of the BSREMII algorithm significantly depends on the number of subsets as shown in Figure 7 25. To decrease the PML objective function faster at the early iterations, an alternative would be to first use the ordered subsets algorithm for a few iterations and then switch to the APML algorithm. To do this, we divide the emission data into a few subsets (e.g., 8 or 24) according to the ordering rule by Herman and Meyer [69].
PAGE 122
114 For the r th subset, we define a subobjective function as in [35]: $r(*) = {[P*]i d, \og([Px + p\i) + pi + log(dj!)} + ^A(*) , (7.3) teMr where R is the number of subsets, M r is the set of indices corresponding to emission data within the r th subset, and r = 1,2,...,/?. Note that <3? (a:) Â— X]f=i 3v(*)For each subset and the corresponding subobjective function, a subiteration is performed sequentially. At the (n, r) ,h subiteration, a surrogate function for the r th subobjective function is constructed using (3.29), (3.30), (3.31), and (3.32) at the ( n,r) th subiterate x^ n,r K The next subiterate x ( n Â’ r+1 ^ is defined to be the nonnegative minimizer of the surrogate function. After one pass of the entire subiterations, we define x ll+l = a;^ n,fi+1 ) and Â£c( n+1,1 ) = x n+1 for the next pass of subiterations. We refer to the above iteration as the ordered subset PML (OSPML) iteration. Figure 7 26 shows plots of the cost versus CPUtime for the BSREMII algorithm for 8 subsets and the APML algorithm with a few OSPML iterations (2, 3, and 4) for 8 and 24 subsets. As can be seen in the figure, the APML algorithm with a few OSPML iterations decreases the PML objective function faster than the BSREMII algorithm for the early iterations. Specifically, the APML algorithm with the orderedsubsets idea needed less than about 8 seconds in CPUtime to decrease the PML objective function to the cost that the BSREMII algorithm decreased with 10 seconds. The convergence rate depends on the number of OSPML iterations and the number of subsets, but the degree of dependency is not much great as shown in the figure. In Figure 7 27, iterates for plane 10 produced by the BSREMII algorithms for 8 subsets and the APML algorithm with 2 OSPML iterations for 24 subsets are shown for different CPUtimes. As shown in the figure, the iterates of the APML and BSREMII algorithms look very similar each other even though their associated costs are different. Figure 7 28 shows iterates for plane 21 produced by the BSREMII
PAGE 123
115 algorithms for 8 subsets and the APML algorithm with 2 OSPML iterations for 24 subsets are shown for different CPUtimes. Now, we consider 7 minute real phantom data. In Figure 7 29, plots of the cost versus CPUtime are shown for the PML. APML. and BSREMII algorithms when 7 minute data were used. As can be seen in the figure, the convergence rates of the algorithms are similar to the ones for 14 minute data except the BSREMII algorithm with 24 subsets that converged much slower than the other algorithms. Again, this indicates that the convergence rate of the BSREMII algorithm significantly depends on the number of subsets. Figure 7 30 shows plots of the cost versus CPUtime for the BSREMII algorithm for 8 subsets and the APML algorithm with a few OSPML iterations (2, 3, and 4) for 8 and 24 subsets when 7 minute data were used. As in the experiments with the 14 minute data, the APML algorithm with a few OSPML iterations decreases the PML objective function faster than the BSREMII algorithm with 8 subsets for the early iterations. For 7 minute data case, we do not include the corresponding images because the visual comparisons were similar to those observed in the images for 14 minute data. To see whether convergence rates in Figures 7 25, 7 26, 7 29, and 7 30 are Â“typicalÂ”, we averaged convergence rate for fifteen realization. Since cost varies for each realization, we used the normalizedcostdifference that is defined as ) $(x< 10 Â°>) Â’ ' ' where x (0) and x ( ' W0) are the uniform initial estimate and the 100 th APML iterate, respectively. The reason why we used cc (100) is that the APML algorithm Â“almostÂ” converges with 100 iterations, which means that the PML objective function does not decrease appreciably after 100 APML iterations. In Figures 7 31 and 7 32, plots of the average normalizedcostdifference (averaged normalizedcostdifference over fifteen realizations) versus CPUtime are shown for the PML, APML, and BSREMII
PAGE 124
116 algorithms when fifteen 7 and 14 minute data were used, respectively. Figures 7 33 and 7 34 show plots of the average normalizedcostdifference versus CPUtime for the BSREMII algorithm for 8 subsets and the APML algorithm with a few OSPML iterations (2, 3, and 4) for 8 and 24 subsets when fifteen 7 and 14 minute data were used, respectively. The figures confirm that the observed convergence rates in Figures 7 25, 7 26, 7 29, and 7 30 are Â“typicalÂ” . Table 7 1 contains the CPUtimes, memory accesses, floating point operations, and the number of function calls (\/f and A (t)) per iteration for the algorithms used in the experiments. From the table, it is shown that the APML algorithm greatly reduced the number of iterations for convergence compared with the PML algorithm. Convergence in the table means that an algorithm decreases the PML objective function until it equals the 5000 fh iterate of the PML algorithm, denoted by x*. Although the BSREMII algorithm did not converge until 10,000 iterations, the algorithm Â“almostÂ” converged at the point because 4>(x^ 10000 ') Â— 4>(cc*) ~ 1, where a* 10000 is the 10000
PAGE 125
Cost Cost 117 x 10 5 (a) Convergence Rate (plane 10) x 1 0 5 (b) Convergence Rate (plane 21 ) Figure 7 23: Convergence rate comparison of the APML algorithm for different values of t when 14 minute data were used.
PAGE 126
Number of Iterations Number of Iterations 118 (a) Number of Iterations to Converge (plane 10) 400 350 300 250 200 150 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 600 500 400 300 200 : 100 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 * * * * * 50 E (b) Number of Iterations to Converge (plane 21 ) 700 E Figure 7 24: Number of iterations for the APML algorithm to decrease the PML objective function to 4>(aP) are shown for different values of e when 14 minute data were used, where x* is the 5000 th iterate of the PML algorithm.
PAGE 127
Cost Cost 119 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 25: Convergence rate comparison of the PML, APML and the BSREMII algorithms when 14 minute data were used.
PAGE 128
120 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 26: Convergence rate comparison of the APML algorithm with different number of OSPML iterations for 8 and 24 subsets, and the BSREMII algorithm for 8 subsets when 14 minute data were used. For the APML algorithm e = 0.01 was used.
PAGE 129
121 (a) APML, 5 sec (b) BSREMtl, 5 sec (c) APML, 15 sec (d) BSREMII, 15 sec (e) APML, 25 sec (f) BSREMII, 25 sec (g) APML, 35 sec (h) BSREMII, 35 sec Figure 7 27: APML and BSREMII iterates for plane 10 are shown for different CPUtimes when 14 minute data was used. The APML iterates were generated by the APML algorithm with e = 0.01 and 2 OSPML iterations for 24 subsets The BSREMII iterates were generated by the BSREMII algorithm for 8 subsets. For display purposes, all the images were adjusted so that they have the same dynamic range.
PAGE 130
122 (a) APML, 5 sec (b) BSREMII, 5 sec <1 Â« (c) APML, 15 sec (d) BSREMII, 15 sec % Â•* (e) APML, 25 sec (f) BSREMII, 25 sec (g) APML, 35 sec (h) BSREMII, 35 sec Figure 7 28: APML and BSREMII iterates for plane 21 are shown for different CPUtimes when 14 minute data was used. The APML iterates were generated by the APML algorithm with e = 0.01 and 2 OSPML iterations for 24 subsets The BSREMII iterates were generated by the BSREMII algorithm for 8 subsets. For display purposes, all the images were adjusted so that they have the same dynamic range.
PAGE 131
Cost Cost 123 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 29: Convergence rate comparison of the PML, APML and the BSREMII algorithms when 7 minute data were used.
PAGE 132
124 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21 ) Figure 7 30: Convergence rate comparison of the APML algorithm with different number of OSPML iterations for 8 and 24 subsets, and the BSREMII algorithm for 8 subsets when 7 minute data were used. For the APML algorithm e = 0.01 was used.
PAGE 133
NormalizedCostDifference NormalizedCostDifference 125 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 731: Average convergence rate comparison of the PML, APML and the BSREMII algorithms when 7 minute data was used.
PAGE 134
NormalizedCostDifference NormalizedCostDifference 126 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 32: Average convergence rate comparison of the PML, APML and the BSREMII algorithms when 14 minute data was used.
PAGE 135
127 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 33: Average convergence rate comparison of the APML algorithm with different number of OSPML iterations for 8 and 24 subsets, and the BSREMII algorithm for 8 subsets when 7 minute data was used. For the APML algorithm e = 0.01 was used.
PAGE 136
128 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 34: Average convergence rate comparison of the APML algorithm with different number of OSPML iterations for 8 and 24 subsets, and the BSREMII algorithm for 8 subsets when 14 minute data was used. For the APML algorithm e = 0.01 was used.
PAGE 137
129 Table 7 1: Comparison of the computational complexity per iteration. PML BSREMII APML CPUtime (sec) 0.271565 0.58836 {R = 8) 0.49828 Memory Read 2 P 2 P 3 P Mul/Div 2 P + I + 24 J 2P + I + 12RJ 4P + 8/ + 58 J Add/Sub 2P + 31J 2P + / + 16RJ 4P + 3/ + 66 J Vt J J m 8 J 8 RJ 16 J N c for plane 10 791 N/A 359 (e = 0), 116 (e = 0.01) N c for plane 21 1362 N/A 652 (e = 0), 136 (e = 0.01) The letters P and R denote the number of nonzero elements in the probability matrix V and the number of subsets for the BSREMII algorithm, respectively. The letters I and J represent the number of data points and voxels, respectively. N c denotes the number of iterations for convergence. All the algorithms were computed on a Dell Inspiron5150 computer. Convergence in the table means that an algorithm decreases the PML objective function until it equals 4>(:e*), where x* is the 5000 (/l PML iterate. probability matrix. In other words, given the MLEM image based on V c , the PCiPS algorithm first estimated the probability matrix, referred to as 'P true , and then images were reconstructed by the APML algorithm with 100 iterations. In the PCiPS algorithm, two choices (15 and 63) for the parameter r were examined in the experiments. We used the logcosh function A (t) = log(cosh()) with 5 = 50 as the penalty function. The images in Figures 7 35 (a) and (b) were obtained by the MLEMS and APML algorithms with V c . The images in Figures 7 35 (c) and (d) were obtained by the APML algorithm using P true with r = 15 and r = 63, respectively. For the APML images in Figure 7 35, (3 was chosen so that the background standard deviation is approximately same (background standard deviation is approximately 48). Using 5 iterations, the MLEMS image with a background standard deviation of
PAGE 138
130 48 was obtained. Figure 7 35 (a) illustrates that the tumors in the MLEMS image is too smooth. Figures 7 35 (b), (c), and (d) show that the APML images visually look similar to each other. However, they are quite different especially around tumors. This claim is supported by Figure 7 36 that is a line plot (the row is shown in Figure 7 13) of the images in Figure 7 35. For the row under consideration, it can be seen from the line plots that the APML images in Figures 7 35 (c) and (d) have a higher degree of contrast than the other images. For the images in Figure 7 35, the large tumor contrast of the APML image in (d) was 59%, 17%, and 7% higher than the images in (a), (b), and (c), respectively. The increased contrast of the APML image in (d) for the small tumor with respect to the images in (a), (b), and (c) was 60%, 20%, and 14%, respectively. The increased tumor distinguishability of the APML image in (d) with respect to the images in (a), (b), and (c) was 30%, 10%, and 5%, respectively. Figures 7 37 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifteen realizations for plane 21, respectively. Further, plots of the average standard deviation of the large tumor and the average distinguishability of two tumors versus the average background standard deviation for the fifteen realizations are shown in Figures 7 37 (c) and (d), respectively. The contrast and distinguishability curves of the APML algorithm with the estimated probability matrices lie above the corresponding curves of the other algorithms. Thus, for a fixed level of background noise, the APML images generated using the estimated probability matrices, on average, have the greatest contrast and distinguishability. Now, we consider fifteen realizations of 7 minute real phantom data for plane 21. Figures 7 38 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifteen realizations, respectively. A plot of the average standard deviation of the large tumor versus the
PAGE 139
131 average background standard deviation for the fifteen realizations is shown in Figure 7 38 (c). Also, in Figure 7 38 (d), we provide a plot of the average distinguishability of two tumors versus the average background standard deviation for fifteen realizations. As in the experiments with the 14 minute real phantom data, it is evident that the APML algorithm with the estimated probability matrices outperform the MLEMS and APML algorithms with V c in terms of contrast recovery and tumor distinguishability. Figures 7 39 (a) and (b) are the images obtained by the MLEMS and APML algorithms using V c , respectively, for a 7 minute data. Figures 7 39 (c) and (d) are the images obtained by the APML algorithm using 'P t " lc with r = 15 and r = 63, respectively, for a 7 minute data. Figure 7 39 (a) is the 7 th MLEM iterate. For the APML algorithm, 100 iterations were used. For the images in Figures 7 39 (b), (c), and (d), (3 was chosen so that the standard deviation of their backgrounds are approximately 36. As in the experiments with the 14 minute data, the images in Figures 7 39 (b), (c), and (d) look similar to each other. Figure 7 40 is the line plot of the images in Figure 7 39. Also, as in the experiments with the 14 minute data, for the row under consideration, the APML images in Figures 7 39 (c) and (d) have a higher degree of contrast than the other images. For the images in Figure 7 39, the large tumor contrast of the APML image in (d) was 50%, 19%, and 6% higher than the images in (a), (b), and (c), respectively. The increased contrast of the APML image in (d) for the small tumor with respect to the images in (a), (b), and (c) was 47%, 19%, and 7%, respectively. The increased tumor distinguishability of the APML image in (d) with respect to the images in (a), (b), and (c) was 18%, 10%, and Â— 1%, respectively.
PAGE 140
132 (a) MLEMS (b)APML % * (c) PCiPS,T=15 (d) PCiPS,t=63 %<* Figure 7 35: Comparison of emission images when a 14 minute real phantom data for plane 21 was used: (a) MLEMS image with V c , (b) APML image with V c , (c) APML image with 'P true and r = 15, (d) APML image with 'P true and r = 63. The image in (a) is from 5 iterations, while the images in (b), (c), and (d) are from 100 iterations. For the images in (b), (c), and (d), (3 was chosen in such a way that the standard deviation of the background is approximately 48. Specifically, (3 Â— 1/32, 1/32, and 0.028 for the images in (b), (c), and (d), respectively. For display purposes, all the images were adjusted so that they have the same dynamic range.
PAGE 141
133 Line plot Figure 7 36: A line plot comparison of the reconstructed images in Figure 7 35.
PAGE 142
134 (a) Large Tumor (b) Small Tumor (c) Large Tumor Figure 7 37: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifteen realizations were used and 14 minute real phantom data for plane 21 was used in the study. For the MLEMS curves, the images from iterations 5 Â— 160 were used. For the other curves, the images were reconstructed using two hundred iterations for (3 Â— 2~ 4 Â— 2 ~ 12 .
PAGE 143
135 (a) Large Tumor (b) Small Tumor Figure 7 38: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifteen realizations were used and 7 minute real phantom data for plane 21 was used in the study. For the MLEMS curves, the images from iterations 5 Â— 160 were used. For the other curves, the images were reconstructed using two hundred iterations for (3 = 2 4 Â— 2 12 .
PAGE 144
136 (a) MLEMS (b)APML (c) PCiPS,x=15 (d) PCiPS,x=63 % Â« Figure 7 39: Comparison of emission images when a 7 minute real phantom data for plane 21 was used: (a) MLEMS image with V c , (b) APML image with V c , (c) APML image with , p true and r = 15, (d) APML image with f> tTue and r = 63. The image in (a) is from 7 iterations, while the images in (b), (c), and (d) are from 100 iterations. For the images in (b), (c), and (d), (5 was chosen in such a way that the standard deviation of the background is approximately 36. Specifically, (3 = 1/32, 1/32, and 0.028 for the images in (b), (c), and (d), respectively. For display purposes, all the images were adjusted so that they have the same dynamic range.
PAGE 145
137 Line plot Figure 7 40: A line plot comparison of the reconstructed images in Figure 7 39.
PAGE 146
CHAPTER 8 CONCLUSIONS AND FUTURE WORK 8.1 Conclusions The PML algorithm we developed for reconstructing emission images generates nonnegative emission mean estimates and monotonically decreases the PML objective function. The algorithm is straightforward to implement and can incorporate any penalty function that satisfies the mild assumptions (AS3)(AS8). Under certain conditions (i.e. , (AS1)(AS8)), the PML objective function is strictly convex. And, for the case where the PML objective function is strictly convex, it has been proven that the PML algorithm converges to the minimizer of the PML objective function. Although the tradeoff between resolution and noise can be controlled by certain regularization hyperparameters (i.e., /3 and 5), like many researchers, we have not determined a way to choose the parameters so that the datafit and a prior knowledge are optimally Â“balancedÂ”. A fast version of the PML algorithm, called the APML algorithm, was developed that retains the properties of the PML algorithm. The APML algorithm is based on the PML algorithm and pattern search idea of Hooke and Jeeve. However, we modified the direction vector to account for the PET image reconstruction problem. The APML algorithm generates nonnegative emission mean estimates, monotonically decreases the PML objective function, and can accommodate the same class of penalty functions as the PML algorithm. Importantly, it has been proven that the APML algorithm converges to the minimizer of the PML objective function when the PML objective function is strictly convex. Although the APML algorithm requires an additional parameter e that is used to define the direction vector, experiments using real phantom data demonstrated that fast convergence rates were obtained over a 138
PAGE 147
139 wide range of values for e (e.g., 0 < e < 0.01). This means that the APML algorithm is robust with respect to the parameter e. In experiments using real phantom data, it was shown that the APML algorithm decreased the PML objective function about three times faster than the PML algorithm. Specifically, the APML algorithm for 0 < e < 0.01 needed about one third of the CPUtime that was necessary for the PML algorithm to decrease the PML objective function to a Â“practical minimizerÂ” of the objective function. By practical minimizer, we mean a PML iterate that has a PML objective value (i.e. , cost) that, practically speaking, cannot be appreciably decreased with increasing iterations. Specifically, the minimum PML objective function value resulted from the 5000 iteration of the PML algorithm. We compared the convergence rate of the APML algorithm to an orderedsubsets method in [35] because the orderedsubsets based algorithm is said to converge to the nonnegative minimizer of the PML objective function. At the early iterations, the orderedsubsets method decreased the PML objective function at a faster rate than the APML algorithm. However, after about 10 seconds in CPUtime, the APML algorithm decreased faster. It was also shown that when the APML algorithm utilizes an orderedsubsets based PML iteration for a few early iterations, the resulting algorithm decreased the PML objective function more than about 1.2 times faster than the orderedsubsets method for the early iterations (i.e., less than 10 seconds in CPUtime). In addition to the PML and APML algorithms, we also proposed a regularized image reconstruction algorithm we call the QEP algorithm. The QEP algorithm obtains regularized estimates of the emission means through the use of an iteration dependent penalty function that serves to preserve edges in the reconstructed images. The definition of the penalty function was motivated by an analysis of the surrogate function for a penalty function that is utilized by the PML algorithm. In the QEP algorithm, at each iteration, the iteration dependent penalty function is found and
PAGE 148
140 the next iterate is originally defined to be a nonnegative minimizer of sum of the negative log likelihood function and the penalty function. However, since it is not possible to solve the constrained optimization problem, the QEP algorithm alternatively defines the next iterate to be a nonnegative minimizer of sum of the decoupled surrogate function for the negative log likelihood function and the iteration dependent penalty function. It is important to understand that the QEP algorithm defines a new objective function to be minimized at each iteration. Thus, unlike the PML and APML algorithms, the QEP algorithm does not minimize a single objective function. This means that the usual mathematical tools for investigating convergence are unavailable. Despite its theoretical drawbacks, the QEP algorithm performed extremely well in experiments with computergenerated phantom data and real thorax phantom data, and outperformed the PML (or APML) algorithm and PWLS algorithm [42] in terms of contrast recovery. In experiments, the images produced by the PML (or APML) and QEP algorithms had greater contrast and Â“distinguishabilityÂ” than the MLEMS and PWLS images for a fixed level of background noise. The MLEMS images were produced by early termination of the MLEM algorithm [16] and the PWLS images were produced by the PWLS algorithm [42]. Specifically, for a 14 minute real phantom data set, the contrast of the large tumor in the QEP image was 37%, 3%, and 34% higher than the MLEMS, PML, and PWLS images, respectively. With respect to the MLEMS, PML, and PWLS images, the contrast of the small tumor in the QEP image was 46%, 9%, and 43% higher, respectively. The increased tumor distinguishability of the QEP image with respect to the MLEMS, PML, and PWLS images was 20%, 5%, and 35%, respectively. Moreover, qualitatively speaking, the spatial extent of the tumors were more easily resolved with the PML and QEP images. Since the QEP algorithm yielded the greatest contrast for a fixed level of background noise, it may be particularly well suited for tumor detection applications.
PAGE 149
141 Errors caused by scatter, noncollinearity, detector penetration, and positron range have the net effect of introducing blur into PET images. In theory, these errors can be corrected by determining a suitable probability matrix. However, it is difficult to determine such a probability matrix because the required modelling is impractical. In Chapter 6, we first assume that the Â“trueÂ” probability matrix for the observed emission data is a product of an unknown nonnegative matrix, called a scatter matrix, and a Â“conventionalÂ” probability matrix. The conventional probability matrix is generated from the geometry of the PET scanner and image space to be reconstructed, along with certain standard corrections for errors. We developed a method, referred to as the joint minimum KullbackLeibler (KL) distance method, that aims to reduce blur in PET images. In the joint minimum KL distance method, the scatter matrix and emission means are jointly estimated by minimizing the distance the data and model parameters. Because of the difficulty of the minimization problem, the number of unknowns in the scatter matrix is reduced and an alternating minimization algorithm is developed. Thus, given an estimate for the scatter matrix, an estimate for the emission means is obtained. The estimate for the emission means can then be used to generate an improved estimate for the scatter matrix. Alternating between these two steps leads to the desired estimates for the scatter matrix and emission means. Once the estimate of the scatter matrix is obtained, the estimate for the true probability matrix is the a product of the estimated scatter matrix and conventional probability matrix. Then, a regularized image reconstruction algorithm is applied to the emission data using the estimated true probability matrix. In experiments with real phantom data, the APML algorithm is employed because of its fast convergence. The contrast of the reconstructed images generated using the estimated probability matrix was more accurate than the reconstructed images generated using the conventional probability matrix.
PAGE 150
142 8.2 Future Work Alt hough the APML algorithm decreases the PML objective function faster than the PML algorithm in experiments, it is not clear that how fast the APML algorithm converges in theory. Consequently, it would be worthwhile to determine the theoretical rate of convergence of the APML algorithm. A related research direction would be to determine the parameter e that maximizes convergence rate of the APML algorithm. Keeping in mind Hook and JeeveÂ’s idea, it is possible that the convergence rate of the APML algorithm could be improved by using a Â“better" direction vector. We say this because, in experiments where the Â“bestÂ” direction vector x n 1 1 Â— x* was used ( x * is the minimizer of the PML objective function) the APML algorithm converged in only a few iterations (e.g., 3 iterations). A key assumption of the PCiPS algorithm is that the mean number of photon pairs recorded by the i th detector pair is a weighted sum of the mean number of photon pairs that would have been recorded by a certain set of detector pairs if there were no errors due to scatter and noncollinearity. Currently, the chosen set of detector pairs is the projection for which the i th detector pair belongs. In Chapter 6, the assumption was justified for the case where the image space had approximately uniform attenuation (e.g., brain). However, for an image space with nonuniform attenuation, another choice for the set of detector pairs may be more appropriate. Thus, it would be worthwhile to revisit the assumptions of the PCiPS algorithm so as to broaden its application. Increasingly, three dimensional (3D) PET scanners are being used to perform wholebody scans. Thus, it may be beneficial to the PET community to extend the proposed algorithms to 3D. One practical problem of 3D PET is that it takes an extremely long time to reconstruct images because the number of data points is increased. So, with 3D implementations, one should consider parallel computing
PAGE 151
143 techniques such as a multiprocessor approach [70] or a computer cluster implementation [71]. Observe that the proposed algorithms can be parallelized because all the pixel values are updated simultaneously. By contrast, the penalized weighted leastsquares algorithm in [42] updates pixel values sequentially so that the algorithm cannot be parallelized. The proposed algorithms were assessed with real phantom data. However, a more indepth assessment would include other types of real phantoms (e.g., brain phantoms) and patient data. Another consideration is to study the proposed algorithms more when errors due to detector penetration, noncollinearity of lineofresponse, and positron range are corrected. We did not consider those errors in the dissertation.
PAGE 152
APPENDIX A IIUBERÂ’S SURROGATE FUNCTIONS In this appendix, we present HnberÂ’s proof of the inequality A 0)(Â£) > A (f) for all f. From (3.14), recall that A (n) is defined to be A <">(f) = A(f (n) ) + A(t (n) )(<f (n) ) + ^7(i (n) )(^i (n) ) 2 (AT) = A (f (n) ) + A (f (n) )(f f (n) ) 7(f (n) )ff (n) + ^7(f (n> ){f 2 + (Â£ (ti) ) 2 }(A.2) = A (f<">) + A(f (n) )(f f (n) ) A(f (n) )f + ^A(f (n) )f {n) + A.3) z z = A(f (n) ) \\{t [n) )t [n) + )U(f (n) )f 2 , (A. 4) z z where we used that 7(f) = Consider the function /(f) = A*'^(f) Â— A(f) and its first derivative /(f) = [ 7 (f (n) )7(W(A5) From (A. 5) and the assumption that 7(f) is nonincreasing over [0, 00) (see (AS6)), it follows that /(f) < 0 over [0, f^] and /(f) > 0 over [fO/ 00). These inequalities and the fact that /(f (n) ) = 0 imply that /(f) > 0 for f > 0, or equivalently A ( ")(f) > A(f) for f > 0. It is clear that A (n )(f) > A(f) for f < 0 because of the symmetry of A (n) (f) and A (f) (see (AS3)). 144
PAGE 153
APPENDIX B SURROGATE FUNCTIONS FOR PENALTY FUNCTION Our objective in this appendix is to demonstrate that the surrogate function A*'' f can be expressed as (3.22). First, we make the following observation: Since 7 (t) = (see (AS6)), g 74 in (3.17) can be written as g {n) {Xj,x k ) = A(xj n) xp } ) + X(x ( ;> x[ nJ ) (xj x p 1 ) (x k 1 4 J n )\ J n ) J n )) Jn )' 1 (B.2) + 4 B) ) ) ) 2 + (2x fc 2x[ n) ) 2 \ (B.l) = Hx { ; ] 4 n) ) + 7(4 n) 4 n) ) [& *5 n) ) 2 + (** 4 n) ) 2] + 7 (^" ) xÂ£ n) )(xj n) xj^) (xj xj n) ) (x fc x[ n) ) = A(x^ x[ n) ) + 7 (Xjn) x[ n) ) x 2 2 XjX^ l) + (x^) 2 + x 2 k 2x k xP + (x' n) ) 2 + 7 (xJ n) xj^) (x$ n) ) 2 + xj^xf } x { ; ] x k + x { k n) x k + x^xf* (x< n) ) 2 l (B.3) (n) (n) rti V / /v* /yÂ» ' ' /yÂ» x j x fc = 7(xJ n) x { k n) ) x] ~ (xj n) + X^)Xj + 4 (Â®} B} + (n)\ (n) Â“j J n )^ + 2 7 (x<Â”> xr)xr4 n ' + a< ( n )\AÂ«)AÂ«) (n) X) Â— X (Â«)> (B.4) = 7 (x n) x< n) ) , x, xf+x ^ 2 2 ( 0\2 }z Xfc x' n) +x<^ 2 ' + A(x< n) x^, n) ) + 27(^ n) x { k ) )xf ) x { k ) 7(^ n) ^l n) ) f r X<"Â»+X<">1 2 > + {xfc xP + X? 2 J 2 Â“ ^7(^ n) x P)G" 1 xf>f + A(i<"> xP) l( x \ n) ~ x[ n) ) xj n) + xj; n) ) 2 > + jxfc xP + x<Â”> 2 J 2 + c$> . (B.5) (B. 6 ) (B.7) 145
PAGE 154
146 where Cj? = A(Â®J n) x{ n) ) A(xJ n) x^X 3 ^ 4 ? j n} ). Thus, A (n > in (3.18) can be written as (B. 8 ) j = 1 keNj v ' where C
PAGE 155
APPENDIX C STRICT CONVEXITY OF PML OBJECTIVE FUNCTION We will prove that the PML objective function ( I> is strictly convex over the set {x . x > 0}. First, we will show that the Hessian of A, denoted by S, satisfies the following properties: Â• (SI) z'Sz > 0 for all z, where denotes the transpose operator Â• (S2) z Sz = 0 only when 2 = 0 or 2 = cl for some c ^ 0. The (i l,m) th element of S is given by [S]lm = < l = m Â—2wi m X(xi x m ), l ^ m and m Â€ Ni 0, otherwise (C.l) Using (C.l), it follows that j (=i m&Ni m^Ni Sz Â— ^ ' Z[ ^ 2 Z[ ^ ' tC; m A(x/ X m ) ^ ^ ^ l Tn A (L *^m)^ (^Â• 2 ) (C.3) (C.4) = X! x m)zi{zi Z m ) 1=1 mE.Ni J Â— 2 ^ ^ ^ ^ *^m) %m) 1=1 mENi,m>l Since wi m \(xi Â— x m ) > 0 for all l and m (see (AS5)), (SI) and (S2) follow from (C.4). To prove that <3? is strictly convex, it suffices to show that zTz > 0 for z / 0, where T is the Hessian of
PAGE 156
148 The (l, m) th element of 1Z equals By (Si) and the fact that [7^ m = X>rp \ tm ]2 jri [Px + p}j (C.5) 1 A , 3 J t=l 1 JI mÂ—1 ( Yl Puzi ) 1=1 (C.6) ST' 1 ( [P Z \i \ 2 ^Â‘'{[Px + p],) Â’ (C.7) it is true that zTz > 0 for all z. By (ASl) and (AS2), it follows that ziiz=syd t ( J vl] , ) ^ V Wx + p}J > 0 1 V [Px + p\ for z = cl when c ^ 0. Thus, by (S2), it is concluded that zTz > 0 for z 7 ^ 0 (C.8)
PAGE 157
APPENDIX D SOLUTION TO UNCONSTRAINED OPTIMIZATION PROBLEM IN MODIFIED APML LINE SEARCH We first show in this appendix that the surrogate function pO+P in (4.29) is strictly convex. Then, we prove that the solution to the unconstrained optimization problem in (4.30) can be expressed by (4.31). Consider the second derivative of r (n+ P; f<"+Â‘> = J 2 d" +1) + 0 x x W(4" +1> 4 Â” + 1 , ) U Â” +I> 4Â” +11 ) 2 (D.1) i = 1 j= 1 keNj (observe: f("+P j s independent of a). It is true that {x^} is bounded for all n, 7(f) > 0 for Â— oo < t < oo, and 7 is a continuous function over Â—00 < t < 00. Thus, it is clear that the second term in the right hand side of (D.l) is positive for y(Â«+P ^ c \ provided c ^ 0. For the case where t/ n+ P = cl with c 7^ 0, it follows that from (4.27) A (n+l) c 2 d,(\V l ],) 2 Â„ > n (c[r , l]iL( n + 1 )+[Â’Px (n+1) ]i+p0 2 Â’ c 2 M[r l ],) 2 c< n (cplliU^+V+lVX^+^i+pi) 2 Â’ (D.2) for all i. Thus, by (AS1) and (AS2), Yll=i A t ,n+1) > 0 fÂ° r v (n+1) = cl, when c^O. Consequently, it can be concluded that f( n+ P > 0 for i/ n+ P ^ 0. This result implies that r< n+ P is strictly convex for t/ n+ P ^ 0. Now, we will show that the expression in (4.31) is the solution to the unconstrained optimization problem in (4.30). First, recall that. j A (n+ P(a?) = ^2 ^2 ^jfcA ( " +1) (xj x k ) , (D.3) j = 1 fceNj 149
PAGE 158
150 where A {n+l \t) = A(^" +1) ) + A(f (n+1) )(t f (n+1) ) + l i{t (n+l] )(t f (n+1) ) 2 . (D.4) For A n+ 0 = ah n+1> Â— the surrogate function f^ n+1 ^ can be expressed as r (n+1) (a) = 0 (n+1) (a) + /3A (n+1) (a: (n+1) + av (n+l) ) (D.5) = Â£Â«!Â”Â«>(Â„) + /3 Â£Â£ u jk \(xy +l) x[ n+1) ) + c i = 1 j = 1 keNj \^( n+1 ) _ ^.(" +1 )\r, ,(Â«+!) _ Â„,("+!) + u> jk x{xf +1) 4 n+ 1 )K vl 'at j = 1 keNj j = i tew,' 1 / (n+1) (n+l)\ / (n+1) (rt+l)\ 2 _,2 2 ' ')( ** ~ v k ) V , (D.6) where C = X^ =1 {pi + log(dJ)}. Thus, using the definition of 6) n+L> in (4.23), the first derivative of r^" +1 \ denoted by r(" +1 \ is (n+l) r (n+1) (a) = ]T{p< n+1 W^" +1) (0)} iÂ— 1 + pYIYI ujkHxj 4 n+1 ) W ; ) j = 1 keNj J +/?ee W (*r +i) 4 n+,, )(Â»r i) 4 n+,) ) 2 Â« Â• to.?) j=i fceVj Since r( ra+1 ) is strictly convex, the minimizer of r( n+ 0 can be found by setting its first derivative to zero: f (n+1) (a) = 0 . (D.8) Solving (D.8) results in the expression in (4.31).
PAGE 159
APPENDIX E CONVEXITY OF SURROGATE FUNCTIONS FOR OBJECTIVE FUNCTIONS IN APML LINE SEARCH In this appendix, our objective is to show that there exists a symmetric positive definite matrix M, which is independent of n, such that f G+ 1 ) > 2(u* n+1 ^) , A / I(u (n+1 ^), where D n+1 ) is the second derivative of the surrogate function FO+ 1 ) in (4.29). Note that TG+ 1 ) i s shown in (D.l). It is true that is bounded for all n, 7 (t) > 0 for or Â—00 < t < 00, and 7 is a continuous function over Â— 00 < t < 00. Thus, there exists a number 70 > 0 such that 7(Xj ,i+1 ^ Â— 1 1 ' > ) >70 for all j, k, and n. Hence, it follows that f<"+'> > Y. V +1) + /37Â„ X X V +1) V +1> ) 2 (E l) *=i j = 1 fceiVj / > Y. ^ +l) + 2/370 {v^yn(v^) , (E.2) i = 1 where the (l, m) th element of H is given by E fcgjvjWHk, l = m Mlm = { ~ ~^lm ? 0, l 7^ m and m Â£ Ni otherwise (E.3) Note that the matrix H is symmetric because wi m = w m i for all l and m. Now, we consider the term Ei/4Â” + ^First, note that can be expressed as (see (4.27)) (n+l) /(aS n+1) )d i ([Pt; ( " +1) ] i ) 2 , [Pv^+% Â± 0 0, [Pv( n+ % = 0 , (E.4) 151
PAGE 160
152 where f(t ) = ^ and a (Â„ +1) a  [Vv^\L^ + [Px^+% + p u [Pv^% >0 i [ 7 > w (Â«+l)].f/(n+l) + [Vx^+% + p h [Pv^ n+ % < 0 Also, note that s,!+1) is bounded below by 0 for all n and i (see (4.15)). Moreover, if we assume that ?/ n+1 ) is bounded for all n, it follows that is bounded above for all n and i. To see this fact, we only need to consider Z/ n+1 ) and [/^ n+1 ^ because cc (Â«+ 1 ) i s bounded for all n by Proposition 1. When v'1 1 ^ < 0 for all j, it follows that l/ n+1 ) = Â— oo. For this case, the fact that L^ n+1 ^ = Â— oo is not a problem because sj n+1) = ['Pv (?l+1) ] ! f7 ( " +1) + [' Px { n+l ' , ] i + p t , where t/ (n+1) is a finite number. Similarly, when > 0 for all j, it follows that U ^ n+ ^ = oo. For this case, s" +1) = [Pi/ n+1 )].L( n+1 ) + [Px^ n+l \ + pi, where L < ' n+1 ' 1 is a finite number. Thus, s <" +1) is bounded for all n and i. Since f(t ) > 0 for 0 < t < oo, the function / is a continuous function over 0 < t < oo, and sj n+1 ^ > 0 is bounded above, there exists a number / 0 > 0 such that /(sn+1) ) > / 0 for all n and i. Thus, it follows that / r (n+1) > fo^ d i{[V vin+ %) 2 + 2(3^ 0 (v in+1) yn{v {n+1) ) (E.6) i = 1 > f Q (y (n+l) )'V'VV{v (n+l) ) + 2(3f 0 (v {n+1) )'H{v {n+1) ) (E.7) > 2(u (n+1) )'.M('u (Tl+1) ) , (E.8) where V = diag(d) and M = ^V'VV + (3j 0 Tfj. Since V'VV and H are symmetric, the matrix M. is symmetric. Now, we only need to show that M. is positive definite. From the second term in the right hand side of (E.l), it is easy to see that 2 Hz > 0 for all 2 and zHz = 0 only when 2 = 0 or 2 = cl for some c ^ 0. Moreover, by (AS1) and (AS2), it follows that I 2 zV'VVz = c 2 d t ([PI],) > 0 i = 1 (E.9)
PAGE 161
153 for z = cl and 0. Thus, from / 0 > 0 and /?7o > 0, it can be concluded that M is positive definite.
PAGE 162
REFERENCES [1] E. V. Garcia. T. L. Faber, J. R. Galt, C. D. Cooke, and R. D. Folks, Â“Advances in nuclear emission PET and SPECT imaging,Â” IEEE Eng. Med. Biol. Mag., vol. 19, pp. 21 33, Sept. /Oct. 2000. [2] M. M. Maisey, R. L. Wahl, and S. F. Barrington, Atlas of clinical positron emission tomography. London, Great Britain: Arnold, 1999. [3] G. K. von Schulthess, Ed., Clinical positron emission tomography: correlation with morphological crosssectional imaging. Philadelphia, PA: Lippincott Williams and Wilkins, 2000. [4] J. M. Ollinger and J. A. Fessler, Â“Positronemission tomography,Â” IEEE Signal Processing Mag., vol. 14, pp. 43 55, Jan. 1997. [5] S. R. Cherry, J. A. Sorenson, and M. E. Phelps, Physics in nuclear medicine, 3rd ed. Philadelphia, PA: Saunders, 2003. [6] Z. Liang, Â“Detector response restoration in image reconstruction of high resolution positron emission tomography,Â” IEEE Trans. Med. Imag., vol. 13, pp. 314 312, June 1994. [7] K. Choi, Mathematical modeling and correction of detector penetration in positron emission tomography. MasterÂ’s thesis, Department of Electrical and Computer Engineering, University of Florida, 2002. [8] H. Erdogan and J. A. Fessler, Â“Monotonic algorithms for transmission tomography,Â” IEEE Trans. Med. Imag., vol. 18, pp. 801 814, Sept. 1999. [9] J. M. M. Anderson, R. Srinivasan, B. A. Mair, and J. R. Votaw, Â“Hidden Markov model based attenuation correction for positron emission tomography,Â” IEEE Trans. Nucl. Sci, vol. 49, pp. 2103 2111, Oct. 2002. [10] H. Erdoi/an and J. A. Fessler, Â“Ordered subsets algorithms for transmission tomography,Â” Physics in Medicine and Biology , vol. 44, pp. 2835 2851, May 1999. [11] B. Bendriem and D. W. Townsend, Eds., The theory and practice of 3D PET. Upper Saddle River, NJ: Kluwer Academic Publishers, 1998. [12] K. Wienhard, L. Eriksson, S. Grootoonk, M. Casey, U. Pietrzyk, and W.D. Heiss, Â“Performance evaluation of the positron scanner EC AT EXACT,Â” Journal of Computer Assisted Tomography, vol. 16, pp. 804 813, Sept. /Oct. 1992. 154
PAGE 163
155 [13] J. M. Ollinger, Â‘'Detector efficiency and Compton scatter in fully 3D PET.' IEEE Trans. Nucl. Sci, vol. 42, pp. 1168 1173, Aug. 1995. [14] D. L. Bailey, D. W. Townsend, P. E. Kinahan, S. Grootoonk, and T. Jones, Â“An investigation of factors affecting detector and geometric correction in normalization of 3D PET data,Â” IEEE Trans. Nucl. Sci., vol. 43, pp. 3300 3307, Dec. 1996. [15] W.H. Lee, J. M. M. Anderson, and J. R. Votaw, Â“Maximum likelihood estimation of detector efficiencies in positron emission tomography,Â’ in Proc. IEEE Nuclear Science Symposium Conference, 2001, pp. 2049 2053. [16] L. A. Shepp and Y. Vardi, Â“Maximum likelihood reconstruction for emission tomography,Â” IEEE Trans. Med. Imag., vol. 1. pp. 113 122, Oct. 1982. [17] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Englewood Cliffs, NJ: PrenticeIIall, 1993. [18] J II. Chang, J. M. M. Anderson, and J. R. Votaw, Â“Regularized image reconstruction algorithms for positron emission tomography,Â” IEEE Trans. Med. Imag., vol. 23, Sept. 2004 (to be published). [19] J II. Chang, J. M. M. Anderson, and B. A. Mair, Â“Accelerated penalized maximum likelihood algorithm for positron emission tomography,Â” IEEE Trans. Image Processing, under review. [20] D. L. Snyder and M. I. Miller, Â“The use of sieves to stabilize images produced with the EM algorithm for emission tomography,Â” IEEE Trans. Nucl. Sci., vol. 32, pp. 3864 3872, Oct. 1985. [21] E. Levitan and G. T. Herman, Â“A maximum a posteriori probability expectation maximization algorithm for image reconstruction in emission tomography,Â’ IEEE Trans. Med. Imag., vol. 6, pp. 185 192, Sept. 1987. [22] T. Hebert and R. Leahy, Â“A generalized EM algorithm for 3D Bayesian reconstruction from poisson data using Gibbs priors,Â” IEEE Trans. Med. Imag., vol. 8, pp. 194 202, June 1989. [23] P. J. Green, Â“Bayesian reconstructions from emission tomography data using a modified EM algorithm,Â” IEEE Trans. Med. Imag., vol. 9, pp. 84 93, Mar. 1990. [24] K. Lange, Â“Convergence of EM image reconstruction algorithms with Gibbs smoothing,Â” IEEE Trans. Med. Imag., vol. 9, pp. 439 446, Dec. 1990. [25] A. R. De Pierro, Â“On the relation between the ISRA and the EM algorithm for positron emission tomography,Â” IEEE Trans. Med. Imag., vol. 12, pp. 328 333, June 1993.
PAGE 164
156 [26] C. L. Byrne, Â“Iterative image reconstruction algorithms based on crossentropy minimization,Â” IEEE Trans. Image Processing, vol. 2, pp. 96 103, Jan. 1993. [27] Z. Wu, Â“MAP image reconstruction using wavelet decomposition, in Proc. 13th. Conference on Information Processing in Medical Imaging , 1993, pp. 354 371. [28] E. U. Mumcuoglu, R. M. Leahy, S. R. Cherry, and Z. Zhou, "Fast gradientbased methods for Bayesian reconstruction of transmission and emission PET images,Â” IEEE Trans. Med. Imag., vol. 13, pp. 687 701, Dec. 1994. [29] J. A. Fessler and A. O. Hero, Â“Penalized maximumlikelihood image reconstruction using spacealternating generalized EM algorithms,Â” IEEE Trans. Image Processing , vol. 4, pp. 1417 1429, Oct. 1995. [30] A. R. De Pierro, Â“A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography,Â” IEEE Trans. Med. Imag., vol. 14, pp. 132 137. Mar. 1995. [31] C. A. Bouman and K. Sauer, Â“A unified approach to statistical tomography using coordinate descent optimization,Â” IEEE Trans. Image Processing, vol. 5, pp. 480 492, Mar. 1996. [32] S. Alenius, U. Ruotsalainen, and J. Astola, Â“Using local median as the location of the prior distribution in iterative emission tomography image reconstruction,Â” IEEE Trans. Nucl. Sci., vol. 45, pp. 3097 3104, Dec. 1998. [33] A. R. De Pierro and M. E. B. Yamagishi, Â“Fast EMlike methods for maximum a posteriori estimates in emission tomography,Â” IEEE Trans. Med. Imag., vol. 20, pp. 280 288, Apr. 2001. [34] I.T. Hsiao, A. Rangarajan, and G. Gindi, Â“A new convergent MAP reconstruction algorithm for emission tomography using ordered subsets and separable surrogates,Â” in Proc. IEEE International Symposium on Biomedical Imaging, 2002, pp. 409 412. [35] S. Ahn and J. A. Fessler, Â“Globally convergent image reconstruction for emission tomography using relaxed ordered subsets algorithms,Â” IEEE Trans. Med. Imag., vol. 22, pp. 613 626, May 2003. [36] B. W. Silverman, M. C. Jones, J. D. Wilson, and D. W. Nychka, Â“A soothed EM approach to indirect estimation problems, with particular reference to stereology and emmision tomography (with discussion),Â” Journal of the Royal Statistical Society B, vol. 52, pp. 271 324, 1990. [37] H. Lu and J. M. M. Anderson, Â“Image reconstruction of PET images using denoised data,Â” in Proc. IEEE Nuclear Science Symposium Conference, 2001, pp. 17461749.
PAGE 165
157 [38] H. Lu. Y. Kim, and J. M. M. Anderson. Â“Improved Poisson intensity estimation: denoising application using Poisson data, 1 IEEE Tmns. Image Processing , vol. 13, pp. 1128 1135, Aug. 2004. [39] W. I. Zangwill. Nonlinear programming. Englewood Cliffs, N.l: PrenticeHall, 1969. [40] I.T. Hsiao, A. Rangarajan, and G. Gindi, Â“A new convex edgepreserving median prior with applications to tomography,Â” IEEE Trans. Med. Imag., vol. 22, pp. 580 585, May 2003. [41] M. S. Bazaraa. H. D. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithms , 2nd ed. New York, NY: John Wiley and Sons, 1993. [42] J. A. Fessler, Â“Penalized weighted leastsquares image reconstruction for positron emission tomography,Â” IEEE Trans. Med. Imag., vol. 13, pp. 290 300, June 1994. [43] D. G. Luenberger, Linear and nonlinear programming, 2nd ed. Reading, MA: AddisonWesley, 1984. [44] P. J. Huber, Robust statistics. New York, NY: John Wiley and Sons, 1981. [45] II. M. Hudson and R. S. Larkin, Â“Accelerated image reconstruction using ordered subsets of projection data,Â” IEEE Trans. Med. Imag., vol. 13, pp. 601 609, Dec. 1994. [46] S. Grootoonk, T. J. Spinks, T. Jones, C. Michel, and A. Bol, Â“Correction for scatter using a dual energy window technique with a tomograph operated without septa,Â” in Proc. IEEE Nuclear Science Symposium Conference, 1991, pp. 1569 1573. [47] L. Shao, J. S. Karp, and R. Freifelder, Â“Composite dual windeow scattering correction technique in PET,Â” in Proc. IEEE Nuclear Science Symposium Conference, 1993, pp. 1391 1395. [48] D. W. Townsend, A. Geissbuhler, M. Defrise, E. J. Hoffman, T. J. Spinks, D. L. Bailey, M.C. Gilardi, and T. Jones, Â“Fully threedimensional reconstruction for a PET camera with retractable septa,Â” IEEE Trans. Med. Imag., vol. 10, pp. 505 512, Dec. 1991. [49] L. Shao and J. S. Karp, Â“Crossplane scattering correctionpoint source deconvolution in PET,Â” IEEE Trans. Med. Imag., vol. 10, pp. 234 239, Sept. 1991. [50] B. T. A. McKee, A. T. Gurvey, P. J. Harvey, and D. C. Howse, Â“A deconvolution scatter correction for a 3D PET system,Â” IEEE Trans. Med. Imag., vol. 11, pp. 560 569, Dec. 1992.
PAGE 166
158 [51] M. E. DaubeWitherspoon, R. E. Carson, Y. Yan. and T. K. Yap. Â“Scatter correction in maximum likelihood reconstruction of PET data.Â” in Proc. IEEE Nuclear Science Symposium and Medical Imaging Conference, 1992, pp. 945 947. [52] J. M. Ollinger, Â“Modelbased scatter correction for fully 3D PET.Â” Physics in Medicine and Biology , vol. 41, pp. 153 176, Jan. 1996. [53] E. U. Mumcuorylu. R. M. Leahy, and S. R. Cherry, Â“Bayesian reconstruction of PET images: methodology and performance analysis,Â” Physics in Medicine and Biology , vol. 41, pp. 1777 1807, Sept. 1996. [54] K. Lange, D. R. Hunter, and I. Yang, Â“Optimization transfer using surrogate objective functions,Â” Journal of Computational and Graphical Statistics, vol. 9, pp. 1 20, Mar. 2000. [55] T. K. Moon and W. C. Stirling, Mathematical methods and algorithms. Upper Saddle River. NJ: PrenticeHall, 2000. [56] K. Lange and R. Carson, Â“EM reconstruction algorithms for emission and transmission tomography,Â” Journal of Computer Assisted Tomography, vol. 8, pp. 306 316, Apr. 1984. [57] N. L. Carothers, Real analysis. New York, NY: Cambridge University Press, 2000 . [58] R. G. Bartle, The elements of real analysis, 2nd eel. New York, NY : John Wiley and Sons, 1976. [59] D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar, Convex analysis and optimization. Belmont, MA: Athena Scientific, 2003. [60] A. M. Ostrowski, Solution of equations in Euclidean and Banach spaces. New York, NY: Academic Press, 1973. [61] J. Browne and A. R. De Pierro, Â“A rowaction alternative to the EM algorithm for maximizing likelihoods in emission tomography,Â” IEEE Trans. Med. Imag ., vol. 15, pp. 687 699, Oct. 1996. [62] J. M. M. Anderson, R. Srinivasan, B. A. Mair, and J. R. Votaw, Â“Penalized weighted leastsquares and maximum likelihood reconstruction of transmission images from PET transmission,Â” IEEE Trans. Med. Imag., under review. [63] G. Strang, Linear algebra and its applications, 3rd ed. Orlando, FL: Harcourt, 1988. [64] A. J. Reader, S. Ally, F. Bakatselos, R. Manavaki, R. J. Walledge, A. P. Jeavons, P. J. Julyan, S. Zhao, D. L. Hastings, and J. Zweit, Â“Onepass listmode EM algorithm for highresolution 3D PET image reconstruction into large arrays,Â” IEEE Trans. Nucl. Sci., vol. 49, pp. 693 699, June 2002.
PAGE 167
159 [65] A. J. Reader, P. J. Julyan. H. Williams, D. L. Hastings, and J. Zweit, Â“EM algorithm system modeling by imagespace techniques for PET reconstruction. IEEE Trans. Nucl. Sci., vol. 50, pp. 1392 1397, Oct. 2003. [66] S. Kullback and R. Leibler, Â“On information and sufficiency,Â” Annals Mathematical Statistics , vol. 22, pp. 79 86, 1951. [67] I. Csiszdr and G. Tusnddy, Â“Information geometry and alternating minimization procedures,Â” Statistics & Decisions , vol. SI1, pp. 205 237, 1984. [68] R. M. Kessler, J. R. Ellis, Jr., and M. Eden, Â“Analysis of emission tomographic scan data: limitations imposed by resolution and background,Â” Journal of Computer Assisted Tomography , vol. 8, pp. 514 522, 1984. [69] G. T. Herman and L. B. Meyer, Â“Algebraic reconstruction techniques can be made computationally efficient,Â” IEEE Trans. Med. Imag., vol. 12, pp. 600 609, 1993. [70] K. Rajan, L. M. Patnaik, and J. Ramakrishna, Â“Linear array implementation of the EM algorithm for PET image reconstruction,Â” IEEE Trans. Nucl. Sci., vol. 42, pp. 1439 1444, Aug. 1995. [71] D. W. Shattuck, J. Rapela, E. Asma, A. Chatzioannou, J. Qi, and R. M. Leahy, Â“Internet2based 3D PET image reconstruction using a PC cluster.Â” Physics in Medicine and Biology, vol. 47, pp. 2785 2795, July 2002.
PAGE 168
BIOGRAPHICAL SKETCH JiHo Chang was born in Daegu, Korea. He received the B.S. and M.E. degrees in electronic engineering from the Kangwon National University, Chuncheon, Korea, in 1997 and 1999, respectively. He received the M.S. degree in electrical and computer engineering from the University of Florida, Gainesville, Florida, in 2001. For the Ph.D degree, he has been working on developing image reconstruction algorithms for positron emission tomography. 160
PAGE 169
I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. IAV /*Â•/ A/ Â• M. M. Anderson, Chair Associate Professor of Electrical and Computer Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor oÂ£J2feiosophy . Jian Profai^or of Electrical and Computer Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation ^^nd is fully adequate, in scope and quality, as a dissertation for the degree ofJ^ctbr df Philosophy. C /Ered J/Taylor ^Professor of Computer and Information Science and Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is^fully adequate, in scope and quality, as a dissertation for the degree of Doctor oi Bernard A. Mair Professor of Mathematics
PAGE 170
This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. 0 August 2004 Pramod P. Khargonekar Dean, College of Engineering Kenneth J. Gerhardt Interim Dean, Graduate School

