Citation

## Material Information

Title:
Image reconstruction algorithms for achieving high-resolution positron emission tomography images
Creator:
Chang, Ji-Ho
Publication Date:
Language:
English
Physical Description:
viii, 160 leaves : ill. ; 29 cm.

## Subjects

Subjects / Keywords:
Algorithms ( jstor )
Image contrast ( jstor )
Image reconstruction ( jstor )
Matrices ( jstor )
Objective functions ( jstor )
Penalty function ( jstor )
Pets ( jstor )
Photons ( jstor )
Standard deviation ( jstor )
Tumors ( jstor )
Dissertations, Academic -- Electrical and Computer Engineering -- UF
Electrical and Computer Engineering thesis, Ph. D
Image processing -- Digital techniques ( lcsh )
Imaging systems in medicine -- Mathematical models ( lcsh )
Tomography, Emission ( lcsh )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

## Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 2004.
Bibliography:
Includes bibliographical references.
General Note:
Printout.
General Note:
Vita.
Statement of Responsibility:
by Ji-Ho Chang.

## Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Resource Identifier:
022820658 ( ALEPH )
880637461 ( OCLC )

Full Text

IMAGE RECONSTRUCTION ALGORITHMS
FOR ACHIEVING HIGH-RESOLUTION
POSITRON EMISSION TOMOGRAPHY IMAGES

By

JI-HO CHANG

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2004

by

Ji-Ho Chang

I dedicate this work to my family.

ACKNOWLEDGMENTS

First of all, I would like to thank my academic advisor, Dr. John M.M. Anderson, for his great guidance of my research. I also would like to thank my committee members Dr. Jian Li. Dr. Fred J. Taylor, and Dr. Bernard A. Mair for their precious comments and corrections on my dissertation. Especially, I would like to thank Dr. Bernard A. Mair for his comments and corrections on convergence proof of the proposed algorithms. I thank Yoonchul Kim and Kerkil Choi who have shared with me valuable discussions related to my research. Finally, I am grateful to my family, who have always prayed for my health.

page

ACKNOWLEDGMENTS ............................. iv

A BSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

CHAPTER

1 BACKGROUND ............................... 1

1.1 Overview of Positron Emission Tomography (PET) .......... 2
1.2 PET Scanner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Sources of Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 System Model for Emission Data .................. 10

2 LITERATURE REVIEW .......................... 16

2.1 Penalized Maximum Likelihood Algorithms ................ 17
2.2 Scatter Correction Methods ..................... 19
2.3 Summary of the Proposed Algorithms ................ 20

3 PENALIZED MAXIMUM LIKELIHOOD ALGORITHM ............ 23

3.1 Penalized Maximum Likelihood (PML) Algorithm ........... 24
3.2 Convergence Proof .......................... 33
3.3 Properties of the PML Algorithm .................. 37

4 ACCELERATED PENALIZED MAXIMUM LIKELIHOOD ALGORITHM 39

4.1 Convergence Proof .......................... 40
4.2 Accelerated Penalized Maximum Likelihood (APML) Algorithm 44 4.3 Direction Vectors ................................. 52
4.4 Properties of the APML Algorithm ..................... 55

5 QUADRATIC EDGE PRESERVING ALGORITHM .............. 57

6 JOINT EMISSION MEAN AND PROBABILITY MATRIX ESTIMATION 63

6.1 Scatter M atrix M odel ......................... 65
6.2 Joint Minimum Kullback-Leibler distance Method ........... 68
6.3 Probability Correction in Projection Space (PCiPS) Algorithm . . 69

7 SIMULATIONS AND EXPERIMENTAL STUDY ................ 79

7.1 Regularized Image Reconstrnction AlgorithMns ........... 80
7.1.1 Synthetic D ata . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.1.2 R eal D ata . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2 APNIL algorithm ........................... 112
7.3 PCiPS Algorithm ........................... 116

8 CONCLUSIONS AND FUTURE WORK .................. 138

8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.2 Future W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

APPENDIX

A tHUBER'S SURROGATE FUNCTIONS .................. 144

B SURROGATE FUNCTIONS FOR PENALTY FUNCTION ........ 145 C STRICT CONVEXITY OF PML OBJECTIVE FUNCTION ........ 147 D SOLUTION TO UNCONSTRAINED OPTIMIZATION PROBLEM
IN MODIFIED APML LINE SEARCH ................. 149

E CONVEXITY OF SURROGATE FUNCTIONS
FOR OBJECTIVE FUNCTIONS IN APML LINE SEARCH. .....151 REFERENCES .. ......... .. .... ..... ... .. ...... .. 154

BIOGRAPHICAL SKETCH ............................ 160

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

IMAGE RECONSTRUCTION ALGORITHMS FOR ACHIEVING HIGH-RESOLUTION POSITRON EMISSION TOMOGRAPHY IMAGES By

Ji-Ho Chang

August 2004

Chair: John M. M. Anderson
Major Department: Electrical and Computer Engineering

In this dissertation, we present four algorithms for reconstructing high-resolution images in PET. The first algorithm, referred to as the penalized maximum likelihood (PML) algorithm, iteratively minimizes a PML objective function. At each iteration, the PML algorithm generates a function, called a surrogate function, that satisfies certain conditions. The next iterate is defined to be the nonnegative minimizer of the surrogate function. The PML algorithm utilizes standard de-coupled surrogate functions for the maximum likelihood objective function of the data and de-coupled surrogate functions for a certain class of penalty functions. As desired, the PML algorithm guarantees nonnegative estimates and monotonically decreases the PML objective function with increasing iterations. For the case where the PML objective function is strictly convex, which is true for the class of penalty functions under consideration, the PML algorithm has been shown to converge to the minimizer of the PML objective function.

The drawback of the PML algorithm is that it converges slowly. Thus, a "fast" version of the PML algorithm, referred to as the accelerated PML (APML) algorithm,

was developed where an additional search step, called a pattern search step. is performed after each standard PML iteration. In the pattern search step, an accelerated iterate, which has lower cost than the standard PIL iterate, is found by solving a certain constrained optimization problem that arises at each pattern search step. The APML algorithm retains the nice properties of the PML algorithm.

The third algorithm, referred to as the quadratic edge preserving (QEP) algorithm., aims to preserve edges in the reconstructed images so that fine details, such as small tumors, are more resolvable. The QEP algorithm is based on an iteration dependent, de-coupled penalty function that introduces smoothing while preserving edges. The penalty function was developed by modifying the surrogate functions of the penalty function for the PML method.

In PET, there are several errors that have the net effect of introducing blur into the reconstructed images. We propose a method that aims to reduce blur in PET images. The method is based on the assumption that the "true" probability matrix for the observed emission data is a product of an unknown nonnegative matrix, called a scatter matrix, and a "conventional" probability matrix. Under the suggested framework, the problem is to jointly estimate the scatter matrix and emission means. We propose an alternating minimization algorithm to estimate them by minimizing a certain distance.

The algorithms are qualitatively and quantitatively assessed using synthetic phantom data and real phantom data.

CHAPTER 1
BACKGROUND
Medical imaging modalities, such as X-ray computed tomography and magnetic resonance imaging, are used to obtain images of anatomical structures within the human body. However, in certain medical applications, it is also important to get physiological information. The reason is because physiological changes can indicate disease states earlier than anatomical changes [1]. Positron emission tomography (PET) and single-photon emission computed tomography (SPECT) are widely used medical imaging modalities that acquire physiological information on both human and animal subjects.
In SPECT, physiological information is obtained by imaging the distribution of gamma-ray or X-ray emitting radio-isotopes within the human body [1]. After the radio-isotopes are introduced into the human body, the radio-isotopes decay and emit gamma-ray or X-ray single photons. A SPECT scanner detects these photons with help of collimators that are made of lead. Since a large percentage photons are absorbed by the lead collimators, the sensitivity and accuracy of SPECT is limited.

In PET, there is no need for lead collimators because collimation is performed by electronic circuits that are connected to the detectors. Consequently, PET possesses relatively high sensitivity and accuracy, when compared to SPECT. Although cost is the major limitation of PET, recent research advances, such as a less expensive materials for detectors and scanner configurations that need a smaller number of detectors than conventional scanners, have helped decrease its cost [1]. The main advantage of SPECT over PET is its substantially lower cost.

2

\Ve present a brief overview of PET, PET scanner, and certain errors in PET in Section 1.1, 1.2, and 1.3, respectively. Then, we provide some background on the image reconstruction problem in PET.

1.1 Overview of Positron Emission Tomography (PET)

Most clinical applications of PET are in oncological cases involving the diagnosis and staging of cancer, treatment planning and monitoring of tumors, detection of recurrent tumors, and localization of biopsy sites in cases when there are tumors in the head or neck [2,3]. PET is also used in the diagnosis of coronary artery disease and other heart diseases [2,3].

In PET, physiological information is acquired by imaging the distribution of positron-emitting isotopes, such as IaN, 150, '8F, and iC, within the human body [3]. The positron-emitting isotopes are bound to compounds with known biochemical properties. Compounds that are labeled with positron-emitting isotopes are called radiopharmaceuticals. The choice of the radiopharmaceutical depends on its application. For example, 2-[18F]fluoro-2-deoxy-d-glucose ([lSF]FDG) is used for imaging brain tumors while [I3N]ammonia is used for the detection of coronary artery disease [3]. After the radiopharmaceutical is introduced into the subject (injection or inhalation), positrons are emitted as the positron-emitting isotopes decay. An emitted positron annihilates with a nearby electron within the body causing the generation of two high energy (511keV) photons. The two photons, which can penetrate the subject, travel in almost opposite directions. Ideally, photon pairs that are generated by a positron annihilating with an electron will be detected by a pair of detectors.

If two electronically connected detectors detect a pair of photons within a short time interval (e.g., < 10 ns), then the detection is recorded along the line connecting the two detectors, which is called a line-of-response. In the absence of an error, the detection indicates that there is an annihilation somewhere along the line-ofresponse. The detection of a photon pair is referred to as a coincidence. In addition,

the timing interval of a scanner that is used to define a coincidence is called the scanner's coincidence timing window. Since there are many, many detector pairs in a PET scanner, sufficient information is available for reconstructing a map of the concentration of the radiopharmaceutical. For each detector pair, the number of coincidences that occur (luring the scan are summed. The emission data is the coincidence sums for all of the possible detector pairs.

The key idea in PET is that emission data depends on the distribution of the radiopharmaceutical within the subject being scanned, which in turn depends on the metabolism of the subject. Consequently, numerous researchers have developed algorithms that reconstruct PET images whose pixel values represent the distribution of the radiopharmaceutical, and, ultimately, the metabolism of the subject.

1.2 PET Scanner

Typical PET scanners have a diameter of 80 - 100 cm and an axial extent of 10- 20 cm [4]. Figure 1 1 (a) shows a simplified PET scanner and Figures 1 1 (b) and

(c) show two-dimensional views of the scanner. Usually, a PET scanner consists of hundreds of rectangular bundles of crystals that are formed to make between 20 - 30 rings of detectors, where each detector ring contains 300- 600 detectors. Each bundle of crystals is connected to a few (e.g., 2 - 8) photo-multiplier tubes (PMTs). Figure 1 1 (d) shows such a block of crystals coupled to four PMTs. When a photon interacts with a crystal, light photons are emitted and the PMTs collect the photons. From the collected light, a PET scanner determines the crystal within which the scintillation occurs.

State-of-the-art PET scanners generally provide two scan modes: slice-collimated mode and fully three-dimensional mode [4]. In slice-collimated mode or septa-extended mode, thin tungsten rings, called septa, are placed between the detector rings. Figure 1 -2 (a) illustrates slice-collimated mode in which coincidences are collected within a detector ring. In fully three-dimensional mode, the septa is removed from the

(a) (b)

(c) 1_1y (d)
X
~PMTs

Y Crystals

Figure 1 1: Simplified PET scanner: (a) a simplified full-ring PET scanner with 8 detector rings, (b) two-dimensional view of the scanner at x = 0, (c) two-dimensional view of the scanner at y = 0, and (d) a block detector consisting of an array of 8 x 8 crystals coupled to four PMTs, where the origin of the rectangular coordinate system is at the center of the scanner.

(a) (b) Septa

II ILI

Figure 1 2: Scan modes: (a) septa-extended mode and (b) fully three-dimensional mode. The dotted lines represent line-of-responses.

(a) (b)

Figure 1 3: Projection definition using zig-zag scan: (a) a set of detector pairs that define a projection and (b) another projection. A dashed line represents a detector pair.

Figure 1 4: Illustration of positron range and a coincidence. Two concentric circles denote a detector ring.

scanner. Figure 1 2 (b) illustrates fully three-dimensional mode in which allowable line-of-responses are not restricted to occur within a detector ring.

Within a detector ring, detector pairs are grouped into projections, where a projection is a set of detector pairs that are defined by a particular "zig-zag scan". Figure 1 3 (a) shows a projection and the defining zig-zag scan. Figure 1 3 (b) shows another projection. A sinogram is defined to be a S x T matrix, where each row of the matrix contains a projection, S is the number of detector pairs within the projection, and T is the number of projections in the emission data.

1.3 Sources of Error

After a positron is emitted, it travels a short distance before it annihilates with a nearby electron within a subject. The distance between the locations at which the emission and the annihilation take place is called positron range. The positron range is proportional to the reciprocal of the density of the material an emitted positron travels [5]. Figure 1 4 is an illustration of positron range for a simplified scanner geometry. The positron range depends on the energy an emitted positron deposits (and,

Figure 1 5: Illustration of non-collinearity of line-of-response. The dashed line indicates the path of the photon pair if they had departed in exactly opposite directions. The arrows show the actual photon paths.

consequently, the chosen positron-emitting isotope). For typical positron-emitting isotopes, a full width half maximum (FWHIM) of the distribution of positron range is a few millimeters. For example, the FWIIMs of the distribution of the positron range for 18F and 150 are about 2 mm and 8 mm, respectively [5, p. 331].

Two photons generated by an annihilation usually do not propagate in exactly opposite directions. This phenomenon is called non-collinearity of line-of-response. Figure 1 5 is an illustration of non-collinearity of line-of-response for a simplified scanner geometry. For a fixed direction that one of the two photons propagates, we refer to the angle between the opposite direction of the fixed direction and the actual direction that the other photon travels in as the "angle of non-collinearity". The distribution of the angle of non-collinearity can be approximated by a Gaussian distribution with a FWHM of 0.5 degree [5].

The angle between the path that a photon propagates and the face of a detector when the photon hits the detector is referred to as the incident angle. The way that a photon interacts with a detector depends on the incident angle. If the path that a

(a) (b)

Figure 1 6: Single and scatter: (a) Illustration of attenuation and a single, and
(b) Illustration of a scattered coincidence. The arrow penetrating the detector ring denotes that the photon is scattered through an oblique angle such that it does not hit a detector. The dotted line denotes the incorrectly positioned line-of-response. photon travels is not perpendicular to the face of a detector when the photon hits the detector, then it is possible that the photon does not interact with the detector that it strikes. The photon may interact with another detector that is nearby the detector the photon hits originally. This phenomenon is termed by detector penetration. Some methods are proposed to account for detector penetration [6, 7].

When a 511keV photon propagates within a subject and interacts with an electron, the photon may undergo a phenomenon known as Compton scattering. When a photon hits an electron, the photon gives part of its energy to the electron and deflects from its original path if the photon has sufficient energy. This phenomenon is called Compton scattering. Compton scattering can lead to three kinds of error: attenuation, scatter, and accidental coincidence.

Most scattered photons are scattered out of the scanner's field-of-view so that many of them are not detected. This phenomenon is called attenuation. Figure 1

6 (a) illustrates attenuation, where one of the two photons of an annihilation does not hit a detector due to Compton scattering. Figure 1 6 (a) also illustrates an event called a single, where only one of the emitted photons of an annihilation is

(a) (b)

Figure 1 7: Accidental coincidence: (a) Illustration of an accidental coincidence due to two annihilations occurring at almost the same time and (b) Illustration of an accidental coincidence due to Compton scattering. detected. Note that attenuation leads to an incorrect decrease in emission counts. In order to address attenuation, numerous correction methods have been proposed (see e.g., [8 10]).

Consider an annihilation event where Compton scattering occurs. It is still possible that a detector pair detects both photons even though one or both of the photons may have undergone Compton scattering. Such an event is called a scattered coincidence or scatter, which is illustrated in Figure 1 6 (b). Note that a scattered coincidence leads an incorrect increase in emission counts. Since a scattered photon loses part of its energy, the energy of detected photons may be used to discriminate between unscattered photons and scattered photons [11, pp. 65-69].

Two photons arising from different annihilations can be recorded by a detector pair. This event is called an accidental coincidence. Figure 1 7 (a) illustrates an accidental coincidence due to two annihilations occurring at almost the same time. Sometimes, an accidental coincidence may be due to Compton scattering as illustrated in Figure 1 7 (b). Like scatter, an accidental coincidence leads an incorrect increase

in emission counts. For many PET scanners. the imean accidental coincidence rate is estimated using a -delayed" tining window technique [12].

The efficiency of a detector pair is defined to be the probability that a coincidence is recorded when a photon pair hit the detectors. Ideally, this probability should be one. However, the efficiencies of detector pairs are non-uniform because of their geometric differences and the non-uniform physical characteristics of detectors. The non-uniformity of detector pairs is referred to as detector inefficiency. To address detector inefficiency, correction methods such as [13 15] have been proposed.

1.4 System Model for Emission Data In 1982, Shepp and Vardi proposed a Poisson model for PET emission data and an algorithm, known as the maximum likelihood expectation maximization (MLEM) algorithm, for reconstructing maximum likelihood (ML) emission images [16]. In the Poisson model, the region-of-interest is divided into J equal sized volume elements, called voxels. Ultimately, the goal is to estimate the mean number of positrons emitted from each voxel. Let the ith component of a vector d, di, represents the observed number of photon pairs recorded by the ith detector pair and the jth component of a vector x, xj, represents the unknown mean number of emissions from the jth voxel. Further, let I and J denote the number of detector pairs and number of voxels, respectively. The I x 1 vector d and the J x 1 vector x are the emission data and the unknown emission mean vector, respectively. Let Pij denotes the probability that an annihilation in the jth voxel leads to a photon pair being recorded by the ith detector pair. The I x J probability matrix P has Pij as its (i, j)th element. In the Poisson model, Shepp and Vardi assumed that the emission data d is an observation of a random vector D with elements {D}i=1 that are Poisson distributed and independent. For all i, the mean of the random variable Di is
J
E{Dij : 7' . (1.1)
j=l

In practice, the probability is unknown and must be estimated somehow. The simplest way to estimate the probability matrix P is the angle-of-view (AOV) method [16]. In the AOV method. a detector ring and a detector are modelled by a circle and an arc on the circle, respectively. Moreover, within a voxel, all emitted positrons are assumed to be emitted from the center of the voxel. Figure 1 8 illustrates a detector ring, a detector pair, and the tube (i.e., spatial extent) that is defined by the detector pair. In the figure, the AOV from the point qj to the detector pair (y, z) is also shown. Specifically, this AOV is defined to be

AOV from g. to (y, z)

{ in Zagjb, Za gjb, (r- Zaygjb), (r - Zbygjaz)}, gj E tube (y, z)
(1..2)
0, otherwise

Said another way, the AOV in (1.2) is the maximum angle for which a line that goes through the point qj will simultaneously intersect both detectors of the detector pair (y, z). In the AOV method, the probability Pi is defined to be A AOV from gj to (y, z) (1.3)

where the detector pair (y, z) is the ith detector pair and the point gj is the center point of the jth voxel. In the AOV method, it is assumed that a photon is detected by a detector whenever the photon hits the arc corresponding to the detector. Clearly, the AOV method does not account for detector penetration discussed in Section 1.3. Some methods have been developed to address errors due to detector penetration [6,7].

In this dissertation, we make the following mild assumptions on the probability matrix P and the emission data d:

" (AS1) P has no row vector of zeros

" (AS2) dPj 0 for all j, where d' is the transpose of d and Pj is the jth

column of P.

AOV from g, to (y, z)

Figure 1 8: The angle-of-view from the point gj to the detector pair (y, z) is shown. The circle denotes a detector ring. The arcs (ay, by) and (as, b,) represent the detectors y and z, respectively. The area between the vertical lines (ay, b.) and (by, a,) represents the tube defined by the detector pair (y, z).

To see the implication of the second assumption, consider all the detector pairs where the probability of recording a photon pair generated by an annihilation in the jthl voxel is non-zero. Among the set of detector pairs. the second assumption implies that there exists at least one detector pair with non-zero emission counts. The (AS2) is expected to hold whenever the duration of the emission scan is a reasonable length of time. In addition to (AS1) and (AS2). we assume that the probability matrix P accounts for errors due to attenuation, detector inefficiency, detector penetration, positron range, non-collinearity of line-of-response, and scatter. In practice., there are correction methods for attenuation [9], detector inefficiency [15], and detector penetration [7]. However, other correction methods for detector penetration. attenuation, and detector inefficiency could be used in conjunction. Note that in Chapter 6, we do not assume that the probability matrix accounts for errors due to scatter and non-collinearity of line-of-response. Instead, we present a method that estimates the probability matrix in such a way that errors due to scatter non-collinearity of line-of-response are addressed.

When accidental coincidences (i.e., randoms) are considered, the Poisson model must be modified so that now the emission data d is an observation of a random vector D that is Poisson distributed with mean (Px + p), where the Zth component of p, pi, is the mean number of accidental coincidences recorded by the ith detector pair, i = 1, 2,... , I. Usually, it is assumed that the mean accidental coincidence rate p is known. In practice, the mean accidental coincidence rate is estimated using a "delayed" time window technique [12].

Given the emission data d, mean accidental coincidence rate p, and probability matrix P, the problem of interest is to estimate the mean number of positrons emitted from each voxel. Since it is assumed that the data are independent, it follows that

14

the likelihood function for emission data d is given )y I [P x + p i e ['Px p,(14
Pr{D = dlx} d c (1.4) The ML estimate of x is defined to be the naxinizer of the likelihood function over the feasible set. Alternatively, the NIL estimate of emission mean vector is given by

A
XAIL = arg max T1(x) , (1.5) x>O

where T(x) ï¿½ log Pr{D = dlx} is the log likelihood function: I I I
11(x) Edi log([Px + p]j)- E[Px]i + E{-Pi- log(di!)} (1.6)
i1 i=1 i=1

(note: maximizing the likelihood function or log likelihood function are equivalent operations).

Although the ML estimator has several nice theoretical properties [17, ch. 7], images produced by the ML method (i.e., the MLEM algorithm) have the drawback that they are extremely noisy. This is due to the PET image reconstruction problem is ill-posed because of the facts that (1) scan times for data acquisition is short, (2) emission data contain errors due to attenuation, scatters, and accidental coincidences, and (3) the data obey Poisson statistics. Currently, the most popular way to address the ill-posed nature of the image reconstruction problem is through the use of penalty functions. Numerous penalized maximum likelihood (PML) methods (also known as Bayesian and maximum a posteriori methods (MAP)) have been proposed [18 35].

In the MAP method, x is assumed to be an observation of a random variable X with known distribution and the a posteriori distribution (i.e., conditional probability density function of X given D = d) is maximized. After some manipulations [17, p. 351], the MAP estimate is found to be the maximizer of the log likelihood function

plus the log of the probability density function of X:
A
XA1p L arg max II(x) + log Q(x) , (1.7) x>O

where the function Q is the probability density function of X (i.e., prior distribution). It is through the prior distribution Q that MAP methods have the ability to regularize the image reconstruction problem.

The form of the prior distributions commonly used in PET is Q(x) = Ce- A(x), where the function A is a scalar valued function and C and fI are constants. The constant C > 0 is chosen such that the area under the distribution Q equals one. The constant 0 > 0, known as the penalty parameter, controls the penalty function's degree of influence. Since it is known that PET images should be highly correlated, the penalty function A is designed in such a way that it forces the estimates of neighboring voxels to be similar in value. Given the definition of Q, the PML or MAP estimate is the nonnegative minimizer of the PML objective function

(1.8)

D()= x + O x

CHAPTER 2
LITERATURE REVIEW

Although the MLEM algorithm by Shepp and Vardi [16] produces ML estimates of the emission means, the images produced by the MIL method are extremely noisy due to the fact that the image reconstruction problem in PET is ill-posed. As discussed in Section 1.4, the reasons are that (1) scan times for data acquisition is short,

(2) emission data contain errors due to attenuation, scatters, and accidental coincidences, and (3) the data obey Poisson statistics. One way to obtain PET images with sufficient smoothness is to terminate the MLEM algorithm before the log likelihood function is maximized. Of course, the resulting images are not ML images. Another modification of the MLEM algorithm is to first obtain an ML image and then filter it with a low pass filter. The drawback of this post-filtering is that it is not clear how the filter is chosen. A variation of filtering approach just described is to filter every MLEM iterate, which was suggested by Silverman [36]. Silverman did not provide an answer as to how the filter should be chosen. Denoising emission data (i.e., observed data) is another way to regularize the PET image reconstruction problem [37,38].

Currently, the most popular way to introduce regularization is through the use of penalty functions that force the estimates of neighboring voxels to be similar in value. The basis for such penalty functions is that PET images should be highly correlated. The first part of the chapter is devoted to so-called penalized maximum likelihood (PML) algorithms. PML algorithms are algorithms that minimize PML objective functions, which are a sum of the negative log likelihood function and a penalty function. In PET, PML and maximum a posteriori (MAP) are terms that are used for methods that minimize PML objective functions. In Section 2.1, we briefly review existing PML algorithms.

Reconstructed PIL images are blurred by errors due to detector penetration, positron range., non-collinearity, and scatter. Errors due to scatter are difficult to correct because scatter depends oi the activity and attenuation within the subject and the scanner design. In Section 2.2, some scatter correction methods are reviewed. The regularized image reconstruction algorithms and scatter correction algorithm we propose are compared with the existing algorithms in Section 2.3.

2.1 Penalized Maximum Likelihood Algorithms

In 1990, Green [231 proposed a PML algorithm, known as the one-step-late (OSL) algorithm. The algorithm can be viewed as a fixed point iteration that is derived from the Kuhn-Tucker equations [39, pp. 36-49] for the PIL optimization problem. Incidentally, Shepp and Vardi showed that the MLEM algorithm could be derived from the Kuhn-Tucker conditions in a similar way. The OSL algorithm is straightforward to implement, but nonnegative estimates cannot be guaranteed and, like many existing algorithms, convergence is an open issue. Lange's goal [24] was to modify the OSL algorithm in such a way that the modified algorithm converges to the PML estimate. It should be pointed out that Lange's algorithm requires line searches, which can be computationally expensive.

Alenius et al. [32] suggested a Gaussian "type" prior that depends on the median of voxels within local neighborhoods, and introduced an algorithm called the medianroot-prior (MRP) algorithm. The MRP algorithm is based on an iteration dependent objective function. Consequently, it really cannot be considered a PML algorithm. Nevertheless, the MRP algorithm generates "good" images in the sense that noise level of the reconstructed images is suppressed. It should be mentioned that a PML algorithm was derived by Hsiao et al. [40] that resembles the MRP algorithm and performs similar to the MRP algorithm. The PML algorithm by Hsiao et al. was derived using a prior that is based on a certain auxiliary vector.

Levitan and Herman [21] proposed a PML algorithm based on an assumption that the prior (list ribution of the true emission means was a multivariate Gaussian distribution. The assumption led to a penalty function that was in the form of a weighted least-squares distance between x and a reference image. However, they did not indicate how the reference image to be chosen.

An algorithm was proposed by Wu [27] using a wavelet decomposition formulation. Specifically, the author assumed that a vector consisting of the wavelet coefficients of the true emission means is a zero-mean Gaussian random vector with a known covariance matrix. From this assumption, a prior distribution for the emission means was derived. The prior distribution is a zero-mean Gaussian random vector with a covariance matrix that depends on the choice for the wavelet transform and the assumed distribution for the vector of wavelet coefficients. It should be pointed out that the assumption was not clearly justified in the paper.

Researchers have used an optimization algorithm, called the iterative coordinate descent (ICD) algorithm [41, pp. 283-287], to obtain estimates for various penalty functions [31], [42]. Convergence results are given for the penalized weighted leastsquares method [42] and both algorithms (i.e., [31], [42]) enforce the nonnegativity constraint. Algorithms based on ICD algorithm update each voxel in a serial manner so that parallel implementation for them may not be possible.

De Pierro [25,30] derives PML algorithms that minimize certain surrogate functions that he constructs by exploiting the fact that the log likelihood function is concave and penalty functions, such as the quadratic penalty function, are convex. Except for the quadratic penalty function, closed form expressions for the minimizers of the surrogate functions do not exist. Consequently, an optimization method, such as Newton's method [43, pp. 201-202], is needed to minimize the surrogate functions. De Pierro presents some convergence results, however the utility of his methods is unclear because no experimental results were provided. It should be noted that, in

19

the transmission tomography paper by Erdogan and Fessler [8], a quadratic surrogate function was used for a certain class of penalty functions. The quadratic surrogate function was developed by Huber [44, pp. 184-186].

A fast PML method. based on the ordered subset-EM algorithm [45], was proposed by De Pierro and Yaniagishi [33]. The authors show that if the sequence generated by the algorithm converges, then it must converge to the true PML solution. Recently, Ahn and Fessler proposed algorithms [35] that are based on the ordered subset-EM algorithm [45], an algorithm by De Pierro and Yamagishi [33], and an algorithm by Erdogan and Fessler [10]. Like other algorithms based on the ordered subset-EM algorithm, there is some uncertainty as to how the subsets are to be chosen. In the paper by Ahn and Fessler, the algorithms are said to converge to the nonnegative minimizer of the PMIL objective function for certain penalty functions and their accompanying parameters by using a relaxation parameter that diminishes with iterations. Open issues are how the relaxation parameters should be chosen in practice and how they affect the performance of the algorithms. Convergence rate varies with the relaxation parameter.

2.2 Scatter Correction Methods

Many methods have been proposed to correct scattered coincidences [11, ch. 3]. They can be classified into a few categories: (1) energy window method, (2) convolution/deconvolution method, and (3) calculating scatter distribution method.

One of the scatter correction methods is based on the use of multiple (two or more) energy windows [46,47]. Recall that photons lose their energy when they have undergone Compton scattering. The principle of the method utilizing energy windows is to discard detection of a photon whenever the energy of the photon is less than 511Kev. Since detectors have finite energy resolution, there is a limitation of energy window based methods. Consequently, it is preferred to use another method jointly with the energy window techniques.

Another correction method for scatter is the convolution/deconvolution method [48 51]. The methods in [48,49] assume that scattered coincidences (i.e., scatter) can be approximated by a convolution of unscattered coincidences and a certain scatter function. Under the assumption, the mean scatter in the observed coincidences is estimated that can be subtracted from emission data or incorporated in the system model for emission data. The method by McKee et al. assumes that the distribution of scattered annihilations can be approximated by the convolution of the distribution of uiscattered annihilations and some scatter response function [50]. The issue of the convolution/deconvolution method is that the distribution of unscattered coincidences (or annihilations) and scatter response function (or scatter function) are not known.

Ollinger introduced a scatter correction method that calculates scatter distribution using an analytical equation, transmission images, emission images, and scanner geometry [52]. Computational cost of the method is excessively expensive, thus it might be hard to be accepted in clinical use at the moment of writing.

2.3 Summary of the Proposed Algorithms

, In Chapter 3, we present an algorithm that obtains PML estimates for a certain class of edge-preserving penalty functions. The PML algorithm is derived by combining the convexity idea by De Pierro [25, 30] and Huber's surrogate functions [44]. Combining two existing theories in such a way that the PML algorithm is convergent is new to PET community. In theory, the algorithm guarantees nonnegative iterates, monotonically decreases the PML objective function with increasing iterations, and converges to the solution of the PML problem. In practice, it is straightforward to implement (i.e., no additional hyperparameter and no need of line search) and it can incorporate many edgepreserving penalty functions.

* In Chapter 4. we develop an accelerated version of the PML algorithm by

using the pattern search idea of Hooke and Jeeve [41, pp. 287-291]. Using this approach, we solve a constrained problem at each pattern search step that leads to improved convergence rates. A modification of Hooke and Jeeve's direction vector is also introduced that improves performance. It should be mentioned that Hooke and Jeeve's method has not been used in PET image reconstruction.

The proposed algorithm inherits the nice properties of the PIL algorithm and converges to the minimizer of the PNL objective function. In experiments, the accelerated algorithm needed less than about one third of the CPU-time that

was necessary for the PML algorithm to converge.

" In Chapter 5, we propose a regularized image reconstruction algorithm, referred

to as the quadratic edge preserving (QEP) algorithm, that aims to preserve edges through the use of certain newly developed de-coupled penalty functions that depend on the current estimate. The QEP algorithm was motivated by the analysis of the PML algorithm. The algorithm by Alenius et al. [32] also uses an iteration dependent objective function. However, it should be mentioned that the algorithm uses the OSL algorithm to generate the next iterate. The drawback of Alenius' approach is that the OSL algorithm does not guarantee

convergence.
" In Chapter 6, we propose a model for emission data where an unknown matrix,

called a scatter matrix, is introduced. The model aims to account for errors due to scatter and non-collinearity. Based on the model, a certain minimization problem is constructed that allows for the scatter matrix and emission mean vector to be jointly estimated. Since the minimization problem is impossible to solve, we propose an algorithm that greatly reduces the number of unknowns in the scatter matrix and alternately estimates the scatter matrix and emission mean vector. It should be mentioned that Mumcuojlu et al. [53] used the same

22

model. However, they assumed that the scatter matrix is known and accounts for errors due to detector penetration as well. Their scatter matrix was obtained through Monte-Carlo simulations.

CHAPTER 3
PENALIZED MAXIMUM LIKELIHOOD ALGORITHM

Although the ML estimates of the emission means are available by using the MLEM algorithm [16], as discussed in Section 1.4, the resulting PET images are extremely noisy due to the fact that the PET image reconstruction problem is illposed. This is because of short scan times, errors in the emission data, and the fact that the data obey Poisson statistics. The most popular way to address the illposed nature of the PET image reconstruction problem is through the use of penalty functions. Penalty functions used in PET are designed in such a way that estimates for the emission means of neighboring voxels are forced to be similar in value, unless there is an "edge" within neighborhood. By an edge, we mean that there is a group of connected voxels that have significantly greater activity than the other voxels in neighborhood. For example, suppose there is only one voxel with significantly greater activity than the other voxels in its neighborhood. Then, we would say that there is no edge within the neighborhood. Simply stated, penalty functions provide a means for reconstructing PET images that have considerably less noise than MLEM images, yet retain edges (e.g., tumors) which may convey important information.

In Section 3.1, we first derive an algorithm, called the penalized maximum likelihood (PML) algorithm, that incorporates a wide class of edge-preserving penalty functions. Then, we prove that the PML algorithm converges in Section 3.2. Finally, we summarize the properties of the PML algorithm in Section 3.3. It should be mentioned that we presented the PML algorithm in [18] without a proof of convergence. Our proof of convergence can now be found in a recent manuscript [19].

. , (t)

At)

Figure 3 1: One-dimensional illustration of the optimization transfer method. At each iteration, a surrogate function is obtained and a minimizer of the surrogate function is defined as the next iterate. Ideally, it is "easy" to get the minimizer of the surrogate function.

3.1 Penalized Maximum Likelihood (PML) Algorithm

The problem of interest is to determine the nonnegative minimizer of the PML objective function -(x) = -XV(x) +3A(x) (P is defined in (1.6)), where A is a penalty function that forces emission mean estimates of neighboring voxels to be similar in value. In other words, we want to solve the following optimization problem:

(P) XPML = arg min -1(x) .
x>O

The penalty functions we consider are of the form
J
A(x) = 1 E jkg(Xj,Xk) , (3.1) j=1 kEN

where Nj is a set of voxels in a neighborhood of the jth voxel, the constants {Wjk} are positive weights for which Wjk = Wkj for all j and k, and g(s, t) = A(s - t) whereby the function A satisfies the following assumptions:

" (AS3) A(t) is symmetric

" (AS4) A(t) is everywhere differentiable

* (AS5) A(t) A dA(t) is increasing for all I (assmnption implies that A is strictly

convex)

" (AS6) (t) -- is nonincreasing for t > 0 * (AS7) y(O) = lim y(t) is finite and nonzero t-0
" (AS8) A(t) is bounded below (assumption implies that A(x) is bounded below). Examples of functions that satisfy (AS3)-(AS8) are the quadratic function A(t) = t2 and Green's log-cosh function A(t) = log(cosh(t)) [23]. Regarding the neighborhood Nj, the jth voxel is excluded from the set Nj and, if the kth voxel is in Nj, then the jth voxel is in Ak. A common choice for Nj is the eight nearest neighbors of the th voxel.
Since it is not possible to get a closed-form solution to the minimization problem

(P), iterative optimization methods are necessary. The PML algorithm we propose is based on the optimization transfer method [10,25,30.34,54] where, at each iteration, a function that satisfies certain conditions is obtained and the next iterate is defined to be a minimizer of the function. The function found at each iteration is referred to as a surrogate function for the function to be minimized. This idea is illustrated with the one-dimensional example in Figure 3 1. In the figure, the problem is to find the minimizer of the function f, which is t*. It is assumed that a closed-form solution is not available to the minimization problem. Given an initial guess t(ï¿½), a surrogate function f(0) that depends on t(ï¿½) is determined. Then, the next iterate 1) is generated by finding the minimizer of f(0). To get the following iterate t(2), a surrogate function f(1) that depends on t1) is obtained and then minimized. These steps are repeated until some convergence criterion is met.

For a vector argument t, a surrogate function f(n) satisfies the following conditions:

" (C1) f(n)(t) > f(t) for all t E {domain of f}
" (C2) f (,)(t(n)) = f (t(n))

* (C3) Vf('")(t(')) = Vf(t(")).,

where t(') is the nth' iterate, V denotes the gradient of a function. and the superscript
(n) indicates that the functions {ff(")} and the iterates {t0')} depend on the iteration number. The next iterate t(+ F1) is defined to be a minimizer of f
t(n~l) arg min f(')(t) subject to t E {domain of f . (3.2)

t

Defining the iterates in this way insures that the objective function f decreases monotonically with increasing iterations. To prove this fact, we first note that f(t(n+K)) < f(nl)(t(n+1)) by (Cl). Since f(n)(t("+')) < f(n)(t(n)) by (3.2), it follows by

(C2) that for all n,

f(t(n+l)) < f(-)(t(n+l)) < f(n)(t(n)) = fpt(n)) .(3.3)

It should be mentioned that (C3) is not necessary for the monotonicity in (3.3). However, (C3) is often needed in order to prove an algorithm that utilizes the optimization transfer method converges (see [18, 25, 30]). Although the optimization transfer method is straightforward in principle, the difficulty in practice is that it may be difficult to find surrogate functions that satisfy the conditions (C), (C2), and (C3).
De Pierro developed surrogate functions for the negative log likelihood function
-q/ by using the convexity of the negative log function [25,30]. His idea is based on the following property of convex functions [55, pp. 860-861]: For a convex function

f( atj) < Zajf(t) (3.4) J Ji
where Ej ajtj E {domain of f}, tj E {domain of f} for all j, aj > 0 for all j,
--j aj = 1, and {domain of f} is a convex set. Specifically, De Pierro utilized the

following inequality in AIL est imation where f(t) = -log(t): f([Px],) = f(. [Px(")] ,[,,) I j] (3.5) [, r x(f? )]i .
< Ef .[4"' 'ij' (3.6)
[ ") 0.i Jnd [("1) ] 0foalian where x(') is the nt"' iterate of x. Pij > 0. x > 0, an( [Px()]j \$ 0 for all 1, j. and n. Let
fi~x f f([.px]i (37)

'() -j (3.8) ) f
Then, it is straightforward to see that, for all i, (1) fin)(X) > fi (x) for all x > 0,
(2) fitfl)(x(n)) = fi(x(n)), and (3) Vft(n)(x(n)) = Vfi(x(n)). Thus, for all i, j/(n) is a surrogate function for fi,.
Although De Pierro developed the surrogate functions for the log likelihood function under the assumption of no accidental coincidences (i.e., p = 0), the surrogate functions can be easily modified to account for accidental coincidences. Observe that [?Px + p], can be written as a convex combination
' P,-X (n) [pX(n) + ph A
(+ p+ Pi ( ) + [x -) + p]i. (3.9) = ++pj +[Px(-) + piA

Using the convexity of the negative log function and the fact that JP(~ ï¿½r p(n) xn +p
ji t'ijXj Pi
+ t A 1 (3.10) = [p)' X + (n) + ['Px(-) + P]i ,(.0

we have the following inequality for f(t) log(t):
f(Pj Pij (n)~ f[Xn+p]Xj
f(Px pi): J [pX(n)p] (.n)S~
j=l X ] + PA ([pX(n ) +P],) (3.11) +[p)X(n + p],

Given the inequality in (3.11), the surrogate function at the nth iteration for the negative log likelihood function -T can be expressed as L Eijxj log(xj) + COO (3.12) i=1 j=1

where

CIn (n 9 .X ,n )
pi+log(di) -dlog([Px + p] j)ï¿½ d [Px(+)] log(X.1 (3.13)
1 j=1

It is straightforward to show that the surrogate function I,(n) satisfies (Cl), (C2), and (C3): (1) TI(n)(X) < T(x) for all x > 0, (2) 'I(n)(X(n )) = I(x(")), and (3)

Since surrogate functions for the negative log likelihood function -T are available, we only need to find surrogate functions for the penalty function A(x) = EjEk ZkA(Xj - Xk). Under assumptions (AS3)-(AS8), Huber developed a surrogate function for A in [44] (see also [8]). Given an arbitrary point (n), Huber's surrogate function for A, which is defined by
( )(t) + (t(-))(t _ t(n)) + 1_,(t(n))(t _ t(-))2,(.4 1(t(n))

has the property that A(n)(t) > A(t) for all t (see Appendix A), A(n)(t(n)) = A(&)),
-(n)
and A (t(n)) =(t(n)), where the dot over a function represents its first derivative. For t A - it follows that a surrogate function for A is A( )x)= Wjk ()xj~k) , (3.15)
j=1 kcNj

where g(fl)(s, t) - A(' (s t).

Using (n) as a starting point, we will now construct a surrogate function for A that has a more convenient form. By the convexity of the square function, we have

the following inequality

[xi - X.. - (X - 2A, - [2 ( - 2 +xn) 2 r

< (2 2x + (2x() - 2x,.)2 (3.16)
2 J )2 k2
It should be mentioned that De Pierro [30] and Hsiao et al. [34] utilized this property of the quadratic function in the PET, and Erdogan and Fessler [10] applied it to a non-quadratic convex function for transmission tomography. Motivated by (3.16)., we define
(A, xk) = ,\(jn -( x ,),)+ -(" -7 )) [(x - X-))(-k - X4())
9(7)(,,Xk k k ) [(j , k", ()2
)2- ) + (2Xk x.n))2 . (3.17)

By construction, the following statements are clear: (1) .(")(Xj, Xk) > g(xj, ak) for all xj and Xk from (3.16), (2) g(-)(x~), (,), g( ), () ) k
a~_,(~x, (n) ___(n)) (1),
x x('), x(')), and (4) --I = Lgnj) ,( ). The difference between g(,) and g(n) is that g(n) is de-coupled in the sense that it does not have terms of the form XjXk. This difference is important because, as we show later, using g(,) enables us to construct surrogate functions for 1D that have closed form expressions for their minimizers. Using g(f), an alternative surrogate function for A is

J
A(n)(x) = E E wjkg(n)(xjxk) (3.18) j=1 kCNj
It is clear that, by construction, A(n) satisfies the following properties: (1) A(n)(x) > A(x) for all x, (2) A(n)(x(n)) = A(x(n)), and (3) VA(n)(x(n)) = VA(x(n)).
Now, using Tj(n) and AW, the desired surrogate function at the nth iteration for 4is

4 -()(x) = -(")(x) + O3A(n)(x) .

(3.19)

From the properties of TOO ani A("). it follows that tle surrogate flnction 4(n) possesses the requisite properties:

" (P1) (4(")(x) > 4)(x) for all x > 0

" (P2) 4)(")(x(')) = (I(x('1))

" (P3) V4)(')(x(')) = V4)(x(n)).

Given x("), the next iterate x("11) is found by minimizing 4)('): x arg rin (I)((x) . (3.20) x>0
Defining the iterates in this way insures that the objective function (P decreases with increasing iterations as shown in (3.3):

4)(x(")) > ()(x(n+i)) for all n. (3.21) All that remains now is to solve the optimization problem in (3.20). To this end, we write A(n)(x) as

A(n)(x) = 2 1) hi (X) + C2 (3.22) j= kcN,
where

h(n) ( ( (n)
jk Mt =wjkY(xj - X~k )( -- mn) (3.23)
(n ) + X (+ )
T'rlk 2 (3.24)
J
') Ewk A xn _(n) ) _ (n) (n)),, (n) ))
2 --) A(X - X7))(4n - )k) (3.25)
j i kiNj

(see proof in Appendix B). Since "(n) and A('") are de-coupled. (,) can be written as

(3.26)

{[PXi

Edi log(xj> j=1 '[PX(1 +F ib

J
+2/3 (n) (Xj) + C(n)+ OC2(n)
j=1 kcN,

I ., (n)
log(Xj) E di [Pxï¿½
= [I)X(n) + pli

2/3 E WJk3(xV3
kENj

x(n)) ( 2 (n) + (n)2"\ k) - 2mjkJ + k

YE E) log(xj) + F(n)X2 (n) . I+ )
j + G i
j=1

(- I - j N)
i 1 [P X(n) + p],

kCNj
G( 2/3 E wki(x4 - w~x ) ()

X - - 4/3 E Wjk y(Xn -~)rnn
a = - -- k )m k
i=1 kCNj

c3tl) CnC + OC2ï¿½) + 2( E S W')' - 4())M (n)2 1,- - k 1mjN j=1 kcNj

(3.28) (3.29)

(3.30) (3.31) (3.32)

(3.33)

Since 41)(n) is de-coupled, as seen from (3.29), it follows that the solution to (3.20) is given by

x(n+l) arg min Oin)(xj) j 2, J
Xj >0 3

(n() E (n log(t) + F(n)t2 + G (nt
J, 3

(3.34)

(3.35)

jl I
E=mi=

(3.27)

J
+l
j=l

+ C +/3C )

where

where

P(it)( )

= - T(,)(x) + 3Ai(,)( X)

32

Fortunately. the function is strictly convex for all j and n under the assumption that xt2 > 0 for all j and n. We will prove this statenent by showing that the second derivative of' (P(') is positive, when I > 0 for all j and 'n. First, note that E ") is negative and F(') is positive for all j and i. The fact that E(') > 0 is due to
J J J
the fact that d'Pj \$ 0 (see (AS2) in Section 1.4) and the assumption that .r > 0. The fact that F(") > 0 follows from the positivity of the function -. weights {WUjk}, and penalty parameter 13. To see why - (t) > 0 for -co < t < oc, recall that A(t) is a symmetric, strictly convex function by (AS3) and (AS5). It follows that A(t) > 0 over (0, so) and A(t) < 0 over (-so, 0). Using the fact that -y(0) is finite and nonzero (see (AS7)), we have that -y(t) > 0 for -oc < t < oo. Now, consider the second derivative of 0('). Easy calculations show that ï¿½f)((t) . (-E n)/t2 + 2F(")), where the double dot over a function represents the second derivative of a function. Since F(n ) > 0 and Etn) < 0, it follows that the second derivative of 0j'((t) is positive for all j and n. Thus, 0(")(t) is strictly convex for all j, n, and t > 0, and, from (3.29), (D(n) is strictly convex over the set {x : x > 0}.

Since (n) is de-coupled, 0(.) is strictly convex, and ( n)(t) -, oc as t --, 0+, it is true that
(nj 1) (n x(n+l)
x( ' > 0 and 0j'( ,Xj 0 (3.36) Note that (3.36) satisfies our assumption that x~n) > 0 for all j and n. To solve (3.34), we compute the first derivative of 0(n) and set it to zero. Since &) < 0 and F(") > 0, the root of the quadratic equation that preserves the nonnegativity constraint is
( V (n)
xin l - F , j - 1,2,... ,J. (3.37)

Observe, as / - 0, (3.37) approaches -Ej/n) E Pij by L'Hospital's rule. Thus, the iteration in (3.37) is equivalent to the MLEM algorithm when/3 = 0 .

In summary. given a strictly positive initial estimate x()0 > 0. the steps of the PNIL algorithm are: for n = 0, 1, 2....

" Step 1 Let x(ï¿½) > 0 be the initial estimate
" Step 2 Construct the surrogate function 4)(,) from the current iterate x(') using

(3.29). (3.30). (3.31), and (3.32)

ï¿½ Step 3 Get x +1 using (3.37)
" Step 4 Iterate between Steps 2 and 3 until sonic chosen stopping criterion is

met.

3.2 Convergence Proof
Using (Pl)-(P3) and (AS1)-(AS8), we now prove that the PML algorithm converges. The following convergence proof is based on the convergence proof by Lange and Carson [56] (see also [30]) of the MLEM algorithm by Shepp and Vardi [16].

By (3.21), the PML algorithm has the property that it decreases the objective function D with increasing iterations:

e (P4) (I(x(n+l)) :< 4)(x(n)) for all n _> 0. Another property of the algorithm is that

* (P5) the sequence {O(X(n))} is convergent.
This property follows from (P4) and the fact that (F is bounded below by (AS8) (see [57, Theorem 1.4, p. 6]).
Proposition 1 The sequence {a(')} is bounded.
Proof: From (P4), it follows that 'I(X(n)) < 'I(x(0)) for all n > 0. Consider the set B = {x > 0 : 4(x) < P(x(0))}. Then, clearly {atn)} C B. So, to prove {a(-)I is bounded, we will prove that the set B is bounded. It is straightforward to see that B is bounded below by 0. Now, suppose that B is not bounded above. Then, there exists a point z E B such that iziJl ï¿½ oo. This result means that for some j there exists zj such that zj oc. Since (AS2) implies that P has no column vector of zeros, it follows that [Pz]i 5 oo for some i and 4>(z) - oc. This implies that z is not

an element of the set, B because all the elements of lB have an objective value that is less than or equal to 4)(x(ï¿½)) < oo, which is a conltradiction. Therefore. the set, B is 1)ounded above. 0 Proposition 2 There exists some constant c1 > 0 such that D(x(")) - d(x(u-' 1)) > C1 Ix(1) - X(n+1)112 for all n > 0.

Proof: By (P1), (P2), and (3.29). we have the following inequality
.1
__X~ ) _ D X n -) n( ~ ) ~ ) X I~ )1)( n) (n) ( .+ 1), j=1
(3.38)
Suppose, for each n, the function , )(t) is expanded into a second-order Taylor series [55, pp. 868-869] about the point r and evaluated at t Then, the right hand side of (3.38) can be written as
j
(-)(X(n))--( (n)(X(n+l)) - j (X~n)-x('n+l)) ')0,.7 (' ni )'-1 (x(n)J _- 'j -X(n+l)'2 (n) (.t(n+l)' j)"j \ j=l
(3.39)
where the double dot over a function represents its second derivative and :t is a
(n) a(n) (n+l)
point between .and +) Since j ) 0 by (3.36), it follows

.(n)(x(n)) _ 4o(n)(x(n+l)) - I 1(X n) - xn+))2 -(nj+). (3.40) j=l

Now, recall that j )(t)= (-Ej,)/t2 + 2Fn)) with F 2Q ZkCNj WjkY(x - ) and E{ ) < 0. Since {x(')} is bounded and 'y(t) > 0 is a continuous function for

-oc < t < oc, there exists a number yo > 0 such that -y(x5n) _(n)) > yo for all j, k, and n. Hence, Fn) > cl for all j and n, where cl = 2Qyo min WkENj Wjk > 0.
3 - i
Therefore, n)(xj) > 2c, for all j and n, and we obtain the desired result

S)(X(n))_(D(x(n+l)) > C1 IIx(n) - X(+)112 ï¿½ (3.41)

35

From (P5) and Proposition 2. it follows that
(P6) the sequence {x(") - x(,"') } converges to 0.
The following proposition will be used later to prove not only that a limit point of the sequence {x(')} satisfies one of the Kuhn-Tucker conditions [55, p. 777] but also that the sequence {x(")} has a finite number of limit points. Proposition 3 Let x* be a limit point of the sequence {x( )}. Then, for all j such that x; \$ 0,
-- (x) - 0. (3.42) Ox X =
Proof: By Proposition 1, there is a subsequence {X(n )} that converges to x* (see the Bolzano-Weierstrass theorem in [58, p. 108]). By (P6), the subsequence {x( ",)} also converges to x*. Recall from (3.29) that 0',()(t)=E(')/t+ 2F')t +
If 0. it follows that x and Ej') /x (nIï¿½) converge to the same limit, and hence
li (nj) (nj), :(7 ), (71~ ) ( .3 li j (,xj limi Oj (xj (3,43 Since .'( = 0 by (3.36), it follows from (P3) that

(X ) : 0 for all j such that x* 4 0. (3.44)

Using Proposition 3, we can prove the following proposition, which will be used to prove that the sequence {x(n) } converges. Proposition 4 The sequence {X(I)} has a finite number of limit points.
Proof : Consider the following sets

" {1, 2...., J} (3.45) Z* {J E Y:x = 0} (3.46) = jj EY ** =0}, (3.47)

36

where x* and x** are limit points of {x(")}. Now. let the function *(x) be the restriction of O(x) to the reduced parameter set, W* {X > 0 : XJ = 0 for j G Z*}. Since (P*(x) is strictly convex over a convex set, the unique minimizer of (D*(x) is its only stationary point [59, Proposition 2.1.2. p. 87]. Thus, by Proposition 3, there is only one limit point of {x(')} in the set W*. In other words, if Z* = Z**, then x* = x**. Therefore, the number of limit points are bounded above by the number of subsets of Y. which is clearly finite. 0 Theorem 1 The sequence {a(n)} converges to the unique minimizer of 4).

Proof: Let x* be a limit point of the sequence {x(')}. Using the theorem in [60., p. 173], which says that the set of limit points of a bounded sequence {x(?)} is connected if {X(n) - X(n+1)} -+ 0, we obtain the fact that the set of limit points of {X(n)} is connected by Proposition I and (P6). Since the number of limit points of {x()} is finite by Proposition 4, {X(n)} has only one limit point. Thus, {x()} - x*. Now, note that the PML objective function 4 is strictly convex on the set {x ï¿½ x > 0} (see Appendix C), so there is only one minimizer. To prove the sequence {X(-) } converges to the unique minimizer, we need to show that x* satisfies the Kuhn-Tucker conditions [55, p. 777]: for all j,

X* > 0 (3.48)

xj 1(x) X=X* = 0 (3.49) a4(x) > 0. (3.50)

Since x(') > 0 for all n, it must be true that the limit point x* is nonnegative (i.e., (3.48) is satisfied). By Proposition 3,

a-0j (x) = 0 for all j such that \$ 0. (3.51) Jx x;X (351

So, (3.49) is satisfied. Now, we consider the case =0. For jsuch that 0. suppose

Ox ( xLx'
Then, it follows that lim Cp(n)(xSn)) < 0 by (P3). and (x ) < 0 for sufficiently ;(,n),((+l), =E(n ), I(,+l) +2F"')x(n+') +G(" ). If X(nn)< X(,n) large n. Consider (xn1)) /xj +1) ï¿½ j + fj < then
(n1+) (n)
S+ 2F)x (n+1) +G >0 (3.53)
(n) (n+l) 3 'I
3 Xj
" n,(n+') ()Xn l)< 0 n
because Oj)xn ) = 0 by (3.36) and En/xSn~l) < 0. Moreover, the fact Fn > 0 implies that

_+ 2F , ï¿½ G ( )) > 0 , (3.54) x(n) +z3 x1 +G 33

which is a contradiction. Thus, xj1) > X for all sufficiently large n. However, this contradicts the fact x -* 0. So, it is true that x I(x) > 0 for all jsuch thatx 0. (3.55)
Oj X=X*

This satisfies (3.50). U

3.3 Properties of the PML Algorithm

We now provide a summary of the desirable properties of the PML algorithm:

" The PML algorithm is straightforward to implement because there are no hyperparameters required for the algorithm itself and it has closed-form expressions for the iterates. Some algorithms require hyperparameters, such as relaxation parameters, in addition to the penalty parameter [33,35], while others [24,30]

do not have closed-form expressions for the updates.
" The PML algorithm theoretically guarantees nonnegative iterates, whereas some

algorithms [33, 35] set any negative element of the iterates to a small positive

number.

38

* The PML algorithm monotonically decreases the PAIL objective function unlike

the algorithms in [23, 35].

e The PML algorithm can incorporate a large class of edge-preserving penalty

functions unlike the algorithm by De Pierro [30].

* The PML algorithm converges to the minimizer of the PML objective function.

Convergence proofs for the algorithms in [23, 33] are not available.

CHAPTER 4
ACCELERATED PENALIZED MAXIMUM LIKELIHOOD ALGORITHM

Although the PML algorithm presented in Chapter 3 converges to the nonnegative minimizer of the PML objective function, it has the drawback that it converges slowly. In PET, a popular way to accelerate iterative image reconstruction algorithms is through the use of so called ordered-subsets [45]. In ordered-subsets based reconstruction algorithms, the observed data, d, is divided into a predefined number of subsets via some chosen rule. Then, the iterative reconstruction algorithm to be accelerated is applied sequentially to each data subset. In [45], Hudson and Larkin developed the first PET image reconstruction algorithm that used the orderedsubsets idea. Since the MLEM algorithm was applied to each data subset, they called their algorithm the ordered-subsets expectation maximization (OS-EM) algorithm. In [61], Browne and De Pierro showed that the OS-EM algorithm did not converge and introduced another ordered-subsets based image reconstruction algorithm that employed a relaxation parameter. It should be pointed out that some convergence results are available for ordered subsets based algorithms that use relaxation parameters [33, 35, 61].

With ordered-subsets based algorithms, there is uncertainty as to how many subsets to be used and how the data should be divided. Moreover, it is not clear how relaxation parameters should be chosen in practice because, generally, they depend on the data.
In this chapter, we introduce an accelerated version of the PML algorithm, referred to as the accelerated PML (APML) algorithm, that uses a pattern search suggested by Hooke and Jeeve [41, pp. 287-291]. A pattern search has also been exploited to accelerate an algorithm in the transmission tomography [62]. In Section

~0
X2

xI

Figure 4 1: Two-dimensional illustration of the sequence {X(f)}. The single circles and double circles denote the accelerated iterates {f (n)} and standard iterates {x()}, respectively. Each ellipse represents a set of points that have same cost. The mark x denotes the minimizer of the function subject to the constraints x, > 0 and x2 > 0. 4.1, using the mathematical ideas in the convergence proof of the PML algorithm, we show that a sequence that satisfies certain conditions converges to the minimizer of the PML objective function. Then, we use this result to prove that the APML algorithm, which is developed in Section 4.2, converges to nonnegative minimizer of the PML objective function. In Section 4.3, we introduce the direction vector to be used in the pattern search. Finally, we summarize the properties of the APML algorithm in Section 4.4. It should be mentioned that we introduced the APML algorithm in [19].

4.1 Convergence Proof
In this section, we prove that the sequence {Xln} {If(ï¿½), X(1),;(1) X( (2), .} converges to the minimizer of the PML objective function I), where i(o) > 0 is an
A ...(n)
initial guess, x("') = argmin >' (x) subject to x > 0, 4.(n) is of the same form of the surrogate function for the PML objective function 4) at the iterate x('), I) in (3.29), except that (n) is defined at the point i(n) instead of X(n), and the point ,i(n) > 0 satisfies the following conditions for all n

* (C4) (I()) J(x(n))

* (C5) there exists some constant c.2 > 0 such that
>_C - ;j (- )112

This convergence result will from the basis for the convergence proof of the APML algorithm. Note, the strict positivity of' for all n is necessary because the surrogate function for the PAIL objective function, is undefined for vectors with zero or negative elements. An example of such a sequence {x(')} is illustrated in Figure 4 1. In the figure, the single circles and double circles represent the accelerated iterates { (,)} and standard iterates {x(t)}, respectively.

First, note that the following convergence proof is mainly based on the convergence proof of the PIL algorithm in Chapter 3. Since (D is bounded below and the sequence {X(n)} monotonically decreases the PML objective function 4) by (P4) and

(C4), it follows that (see [57, Theorem 1.4, p. 6])
* (P7) the sequence {f(X (n)} converges.
By (P4) and (C4), it is straightforward to see that {X(n)} C 13 - {x > 0 ï¿½ (x) < J(J(O))}. Since the set 1 is bounded (see Proposition 1), it follows that

e (P8) the sequence {X(n)} is bounded.
Now, note that x(+') is the minimizer of the surrogate function Vn), which satisfies
(1) 4 (n) (x) > I) (x) for all x > 0, (2) (.) (i(n)) = ) and (3) V (n)(,() = VD(x(n)). Thus, by (P8) (see Proposition 2), we have the following property:
* (P9) there exists some constant c3 > 0 such that J(i(n)) - cI(x(n+1)) >

C3 lxtin -+112
By (P7) and (P9), we obtain the property
e (P10) that the sequence {(n) - x(n+l)} converges to 0.
Also, by (P7) and (C5), it must be true that the sequence {X(n) -i(n)} converges to 0. Thus, from the fact that ]l(n) - i(n+1)112 < 11,(n) - x(n+)12 + IIx('+1) - ,(n+l) 12, the property follows:
9 (P11) the sequence {f(n) - Z(n+l)} converges to 0.

For the discussion to follow, consider the surrogate function for the PML objective function 1I at the iterate 0, ). (see (3.29))

2_( (xi) +C3), (4.1) j=l
O ((t) + -(n))
where (t) ! ï¿½j log(t) + F>t2 ï¿½ (j for t > 0. independent of x, and

1 - E di (4.2) )A -[Pi(d) + P]i

p(n) A 2/3 3Wk"(Xn - 47) (4.3) kcN
I
.- 2 .jk'(Xj + (4.4) P~ -20E j --X ;k )j Xk) i~l kENj
(not: (n) -J) (n) () 03(n
(note: E) 1) p) G j, and ) result by substituting ;r for x in (3.29), (3.30), (3.31), (3.32), and (3.33), respectively). To prove that the whole sequence {X(n)} converges, as done in the convergence proof of the PXIL algorithm, we first present the following proposition:
Proposition 5 Let x* be a limit point of the sequence {(n)}. Then, for all j such that x* 7 0,

4)O7 O(x) = 0. (4.5) Proof: By (P8), the sequence {(n)} is bounded and there is a subsequence { (ni)} that converges to x*. By (Pl0), the subsequence {x(nl+1)} also converges to
-(ne)
x*. Recall that j (t)= E>()/t+ 2 ï¿½ G(nL). Now, if x* = 0, it follows that E - (nt),n) - (nt)/ (n,11) jn /xj(n) and Ej /xj converge to the same limit, which in turn implies that

S(n ,)n (n1+1)(4.6) li Oj (ixj )= lim 3 (x )(4 )

Since -j (x( 0 by (3.36), and V41)((n)) - Vi(&(n,)), it can be said that

09 (X) x= 0 for all j such that x* \$ 0. (4.7)
Ogxj IXX

0
Theorem 2 The sequence {X(") converges to the unique minimizer of 4).

Proof: Let x* be a limit point of the sequence {(") }. Since the set of limit points of the sequence {5:('} is connected by (P8) and (P11) (see [60, p. 173] and Theorem 1 in Chapter 3) and there are a finite number of limit points of {i(n)} by Proposition 4 and Proposition 5, it follows that there is only one limit point in the set of limit points. Thus, {5:(i)} -- x*. Since lim{I 5(1) - x*112} = 0, by (P10) we have lim{IJx(+') - x*112} = 0 (note: lJx(n+1) - x*112 < Ix(n+t) - ;(n)I112 + 1.(n) x*112). Hence, {X(n+l)} -+ x* (i.e., {X(n)} -, x*). Therefore we can deduce that the whole sequence {X(n)I ---+ x*. To prove the sequence {X(n)} converges to the unique minimizer of the PML objective function, we must show that x* satisfies the Kuhn-Tucker conditions (i.e., (3.48), (3.49), and (3.50)). Since all the points in the sequence {X()} are positive, it must be true that the limit point x* is nonnegative (i.e., (3.48) is satisfied). By Proposition 5,

a D(x) = 0 for all j such that x* :0. (4.8)

Thus, (3.49) is satisfied. For j such that x* = 0, suppose a -D(x) < 0. (4.9) . (n) (n)(n E(n)i(),
Then, it follows that limOj (Jrj < 0 by the property ) Vi( (t )),
(n) (n)
an j -n) Oj) (n+l), 7() nl
and Xj (J) < 0 for sufficiently large n. Consider j (Xj ) - /j +
S(n). If (n ( then
(n+l) ~ (n)

) + 2F n)xj ( j"n) > 0 (4.10) xn xn

-(n) ( n
because Oj (x ' -1)) = 0 by (3.36) and Ej"/xj < 0. Moreover. the fact that F!j is positive implies that

,,) +2- 12F y ( + &0 .0") > 0 (4.11)

which is a contradiction. Thus, x > k') for all sufficiently large n. However, this contradicts the fact "(n) -- 0. Therefore.

9 4 (x) x > 0 for all j such that x= 0. (4.12)

This satisfies (3.50). U

4.2 Accelerated Penalized Maximum Likelihood (APML) Algorithm

In Section 4.1, we showed that the sequence {X(n)} = {ff (0) X(1), (1), X(2), 5:(2) ...} converges to the nonnegative minimizer of the PML objective function 41 if: (1) 5c(ï¿½) > 0, (2) X(n+l) is the nonnegative minimizer of the surrogate function for the PML objective function 4I at the iterate (n) (), and (3) ,(n) > 0 satisfies (C4) and (C5). In this section, we present an algorithm that produces such a sequence {X(n)}1.

First, consider the following steps: given an initial guess v(o) > 0, for n 0,1,2,...

" Step 1 Get the standard PML iterate X(n+l) A arg min (n) (x) subject to
x>O.

" Step 2 Get the accelerated PML iterate (n+l) & x(n+l) + ol+l)v(n+l),

where V(n+l) = 0 is the chosen search direction (i.e., direction vector) and

&(n+l) argmin O(x(n+l) + av(n~)) . (4.13)

* Step 3 Repeat the steps above until some chosen stopping criterion is met.

With v(n+l - X l_(n), Step 2 is the pattern search step put forth by Hooke and Jeeve [41, pp. 287-291].

WNe now modify Step 2 so that the sequence produced by Steps 1, 2. and 3 converges to the nonnegative minimizer of the PAIL objective function. Step 2 does not guarantee that the accelerated iterate is positive. Consequently. we modify the optimal step size as follows

a (n+1) = argmin 4(x(n +1) + av(n Fl) subject to a E A(n+1) (4.14)

where

(n+1) (71+1) (n+1) >-(n+l)
A+ Z > for all j} (4.15)
(n+1)
ï¿½+1) A mnxj+*
mi n + m (4.16)

Observe, for simplicity, we use the same notation for the optimal step sizes defined by (4.13) and (4.14). For a E A('+'), it is straightforward to see that X(n+l)+ V(n+l) > 0
_(n-+-)
because x . > 0 for all j and n. Thus, by adding the constraint a E A(n+1) to the problem in (4.13), it follows that (n) > 0 for all n. Because x( > (('+') for all j and n, it is evident that ((nï¿½+) has been chosen in such a way that 0 E A('+') for all n. The fact that 0 E A(n+1) will be used later to prove that the proposed algorithm monotonically decreases the PML objective function (D with increasing iterations.

Remark : If we allow some elements of i(n) to be zero, then the feasible region (n+l) (n+,)
of the function 1D(x(n+l) + a v(+l)) is {a ï¿½ zj + a, > 0 for all j}. However, for the sequence {X(n)} to converge, we must constrain all elements of C(n) to be positive because the surrogate function for the PML objective function '1, i(n), is not defined at x = i (n) where .i(') = 0 for some j. A feasible region of the function I)(x(n+l) + OeV(n+l)) that appears natural is

A (n+1) a{C : (~n+l) (n+)
0 XJ + avj > 0 for allj} (1

(4.17)

(a) (b)
X2' F("" V (cr)

V(t+I

x,"

Figure 4 2: Illustration of the pattern search step: (a) two-dimensional illustration of 4, (b) one-dimensional slice of the function along the chosen direction vector. The single circle and double circle denote an accelerated iterate 5(n) and a PML iterate x(n+1), respectively. The mark x denotes the minimizer of the function subject to the constraints x, > 0 and x2 >_ 0.

However, the set A('+') is an open set. Consequently, the optimization problem

argmin (x(n+1) + aZV(n+l)) subject to a E A('+') (4.18)
00

may not have a solution. To see this fact, suppose that the function ID(x(n+l) + av(n+l)) is strictly convex and A(n+ -l) < a < Un+') } for some L(n+l) and U('+l). If the function (D(x(n+l) + av(n+l)) has as its minimizer a = L(n+1) over the set la : < a < U"0 l, then the optimization problem in (4.18) does not have a solution.

One more point to mention about the set A(n+1) is that, strictly speaking, ((n+1) can be any positive number as long as it is chosen in such a way that 0 C A(n+l) for all n. However, in addition to the condition 0 E A(n+1), it is reasonable to keep the set A('+') as close as possible to A('+') in (4.17). In this sense, the chosen ((n+1) in (4.16) is optimal when n is sufficiently large. U

Another issue with Step 2 is that it is hard to solve the optimization problem in (4.13). A sub-optimal solution can be found by determining a surrogate function

for F(D(+l)(a) a + vt 1)). which we denote by p(n' )(a). that satisfies the following conditions:

" (C6) F(n+l)(a) > +(a) for a C A("')

ï¿½ (C7) 0 for o =0.

By incorporating the surrogate function F(n+l) with the constraint a E A("+1), an alternative to Step 2 is:
* Step 2a Get the accelerated PML iterate g(n+1) A x(ntl) q-a(n+l)v(n+l) where

a argnin F(n'1)(a) subject to a E A(+) . (4.19) It is important to point out that, for convenience, .7(n+1) has been re-defined in Step 2a. This new definition will be used throughout the rest of the dissertation. In Figure 4 2, the alternative pattern search step with the surrogate function F(n l) is illustrated. In Figure 4 2 (a), a two-dimensional example of 41 is shown with a direction vector V(n+l). In addition, the one-dimensional slice of 1) along the direction vector v(nll), which we denote .ï¿½ , and a surrogate function f('+l) that satisfies

(C6) and (C7) are shown in Figure 4 2 (b).

By design, the sequence {X(n)} produced by Steps 1, 2a, and 3 satisfies the monotonicity condition (C4). To see this fact, note that 4~(n+l)) : l+l)(a(nï¿½1)) by the definition of i(n+l). By (C6), it follows that D('+')(a(n+l)) < (n+(CV(n+l) Also, from the definition of a('+') in (4.19) and the fact that 0 E A+, we obtain the result F(n+')(a(n+l)) < F(n+1)(0). Finally, by (C7), it can be concluded that ,(,i(n+1)) < F(n+l)(0) = (Dn+1)(0) = I)(x(n+l)) for all n.

We now present our choice for the surrogate function F(n+l) that satisfies (C6) and (C7). First, note that the negative log likelihood function -T can be expressed

as
I I
- T'(X) Z[Pxl, di lng(['Px +4 PI ï¿½)I+ p log(d4!)} (4.20)
I
SZ I([xi) +C5, (4.21)
=1

where ,i(t) ï¿½ t di log(t + pi) and C5 A pi + log(d!)}. Suppose a function, 0( ", can be found such that

* (C8) 0 ,o(" (a) > f En+)(O) fur c l)
(C9) 0(f+()()() for a = 0,
where ]i() A ,b(Px("+)I + a[Pv~n1)]i). Then, the function

= 71) (a) +C5 (4.22)
i1
will satisfy the conditions
* (C10) e(n+l)(a) > I'(x(n+l) + a V(n+l)) for a E A(fl+1)
* (Cll) e(n+l)(a) =-I(x(n+l) + av(n+')) for a = 0. A function that satisfies (C8) and (C9) is

0(n+1) (n+1) a - (0)a + 0n) (0) (4.23)
2
(n+1)(n+1). ofn(nl1

where (nï¿½l) A max{i\$n~l)(o) subject to a E A(n+l)}. From the definition of 0(n+1)
it s ovios tat0(n+1)(0) = : Vnl
it is obvious that (0). Thus, (C9) is satisfied. To see that n+ ~(n+l)(, . (n+l)(a _ (n+l)(, )
satisfies (C8), consider the function zi (a) - 0i (a) - i From the
(n+l) ..(n+l) Z(n+l)
definition of p , it is clear that i > 0. Thus, it follows that i is a convex function. Moreover, a = 0 is a minimizer of z because zi (0) = 0 by the definition of 0fn+). Since z~n+l)(0) = 0 by (C9), it is straightforward to see that zn+(a) > 0. This result implies that 0+(a) _ n+l)(,) for all a E A(n+l). So, ((n+1)
(C8) is satisfied. To calculate pi , it is worthwhile to note that the set A(n+l) can

be written as A(" ) {a L( [1) < a < U01-t1)}, where

(ni) j such that(n)
= max (n)+) t0 (4.24)

U('-1) A mi n +1x~ such that (n+il) < 0~ (4.25) jj

Observe that L(n+1) < 0 and U(n+l) > 0. Since the second derivative of -+ is di([pVv(n+l)]i)2 (4.26) i Q(p)([T'(+1)J~cx + [7)X(n+l)], + Pi)2
the maximum second derivative of <,(n+1) for a G A(n+l) is di ([pv(,,+I) [pV(n+V)], > 0
([7)v(n+l)]iL(n+l)+[,pX(n+l)]i_ p,)2
/t(n+l) =d([pV(,+D)j)2 [pV(n+l)], < 0 (4.27)
Pi --- ([p:v(n+ l)]i U(n+,l) +[ :pX(n + l)]i Tpi )2,

0, [Pv(n+l)], = 0. It should be noted that [Pv(n+1)]ia + [Px(+l)]li + pi > 0 for a E A(n+l) (see (AS1)).
At this point, we need a surrogate function for the penalty function A once again. As mentioned in Chapter 3, under assumptions (AS3)-(AS8), Huber developed a surrogate function for A, denoted by A(') (see (3.14)). By the properties of the surrogate function A(n), it is clear that the surrogate function A('), which is defined by
J
A(n)(x) = E E w2kA(n)(xi - xk) , (4.28) j=1 kcNj
satisfies A(n)(x) > A(x) for all x and A&(n)(x(n)) = A(x(n)). Thus, A(-+')(x(-+') + av(n+l)) will satisfy
" (C12) A(n+l)(x(n+1) + av(n+1)) > A(x(n+l) + av(n+l)) for a A(n+l)
" (C13) A(n+l)(X(n+l) + OV(n+l)) A(x(n+1) + av(n+l)) for a = 0. Finally, by (C10)-(C13), it is clear that the function

ir(n+l)(oa) =___ E(n+l1)(a) + O/ (n+l1)(x(n+l) + O(V4(n+2))

(4.29)

satisfies (C6) and (C7).

To solve (4.19). we first, determine the unconstrained minimizer of F(n +p):

-+ argm nun in 1) (a) . (4.30)
0

Since 0' +: is strictly convex by -y(t) > 0 for -oc < t < 0c, v("+l) 0 0, (AS1), and (AS2) (see Appendix D), the expression for d('+') can be found by simply computing the first derivative of F(n 1) and setting it to zero (see Appendix D):
- E il i'+' (0)- j. X(11 (n+l) _ (n+l ) V (11+ ). d(n+l) _j=-1I- } + )( ) E - E kCN, I'li , -j - k: ) -V k
(71+:) - - I3 (n+l) 1(n+:) )((n+:) (n+l))2
i-i 1PIi + 3 =~1 ZkCNj Wjk~Y(X - Xkf l)vjf~l - V
(4.31)
Given (4.24), (4.25), and (4.31), the solution to the constrained optimization problem in (4.19) is
SU(n+1), if c)(n+l) > U(n+l)

(n+1) = (n+l), if Q(n+1) < L (+l) (4.32) d(n+l), otherwise

All that remains now is to show that the sequence {X() } produced by Step 1, Step 2a with F(n+l) in (4.29), and Step 3 satisfies (C5). To see this, note that by

(C6), (C7), and the definition of ( we have the following inequality

I)(X(n+l)) _- (D(i(n+:)) > F(n+I)(0) - F(.+)(a(n+1) (4.33)

Suppose, for each n, the function F(n+l) is expanded into a second-order Taylor series about the point a(n+ 1) and evaluated at a = 0. Then, the right hand side of (4.33) can be written as

p(n+1)(0) _ F(n+)(C(1+I)) = t(n+) (o(n+l))(- (n+ï¿½)) + 1p(n+i)(d(n+:))(_Co(n+I))2 (4.34)

51

where 6(n 1) is a point between 0 and a('+1). By the strict convexity of F(n+1) L("'+) < 0, and U(' 1) > 0. it follows that 1(n 1)(a(f+1))(-a('1)) > 0 for all n. Thus,

F-1(0) F(fl+l)(a(n+t)) > 1 p(n 1 ))(a(n+ >)2 (4.35)
-)2
Since there exists a symmetric positive definite matrix M, which is independent of n, such that F(,,+)(&(,')) > 2(v(n+l))'A41(v(n+l)) (see Appendix E), it follows that

F(n+l) (0)- _ (n+l)(a(n+l)) : (v(n+l))/. (V(n+l))(oI(n+l))2 . (4.36)

Hence, to prove (C5), we show that there exists some constant c2 > 0 such that (v(n+l))I'4(V(n+l) ) >- C211 V(n+l)ll1 2, (4.37)

where we used the fact that IIx(n+l) -;i(+1) 11 2 = (a(,+l))211v(,+1)l2 by the definition of i(n+1). Since M is a symmetric matrix, it can be factored as A = TAT' by the Spectral theorem [63, p.309], where the columns of the matrix T contain orthonormal eigenvectors of A and the diagonal matrix A contains corresponding eigenvalues along its diagonal. Using the fact that TT produces the J x J identity matrix and Rayleigh's quotient with l - Tz~n+') (i.e., z T'v( +1)) [63, p.348], it follows that
(V(n+l))IA4(v(n+l)) (TZ(n+l))'.A (TZ(n+l))
= (4.38)
Ilv(n+1)112 (Tz(n+1))2(Tz(n+)) (z(n+l))IA(Z(n+l))
= (z(n+l)),(z(n+l)) (4.39) Zj eJ(Zfn+l))2 (4.40) > ema, (4.41) where {e3j 1 are the eigenvalues of M and em is the smallest eigenvalue. Since M is positive definite, em is positive. Therefore, with c2 = em > 0, we obtain

(V (71l)) (V (1 +l>)C2 IV('1) 112 and

4)(X(n, )) _ 4(.(&(nl )) _ C'2 11 X 1') - " 2. (4.42)

4.3 Direction Vectors

In some algorithms, the gradient of a function is used as a direction vector, as in the method of steepest descent [55, cl. 14]. Suppose the direction vector v(f'l+) was chosen to be the gradient of the P1IL objective function 4) evaluated at the PAIL iterate x(n+l) (i.e., v(n+l) = VO(x(+l))). Then, the gradient must be calculated at each step. However, the computational cost of the gradient of ï¿½ is on the same order as the computational cost of single PAIL iteration. Moreover, in experiments, the APML algorithm with v(n+l) = V4(x(n+l)) decreased the PML objective function slower than the PML algorithm.

The current direction vector in the pattern search step by Hooke and Jeeve is the difference of the two most recent iterations. Specifically, the direction vector is defined by j(n+1) x(n1) g(n). This choice can be justified in a reasonable manner. To see why, assume that a closed-form expression is available for the minimizer of 1(x(n+l) + OV(n+l)). Then, the "best" direction vector is (X(n+l) - x*) because 1(x(n+l) + CeV(f+l)) "contains" the point x* (note: x* would result with a = -1), where x* is the minimizer of the PML objective function. However, x* is not known at the nth iteration. The "best" estimate of x* at the nth iteration is Z(n), namely the accelerated iterate at the (n - 1)th iteration. In this section, we introduce a direction vector that works better than Hook and Jeeve's direction vector in terms of convergence rate.

The direction vector we choose is a simple variation of 0('+) that is not computationally expensive (note: J subtractions are required for i(n+l)). Since there are no positrons emitted outside the subject being scanned, the PML estimate will contain many values near zero. This claim is supported by Figure 4 3. In the figure,

(a) The 13th PML iterate

(c) The 15th PML iterate

(b) The 14th PML iterate

(d) The 1000th PML iterate

Figure 4 3: PML iterates are shown: (a) the 13th PML iterate, (b) the 14th PML iterate, (c) the 15th PML iterate, and (d) the 1000th PML iterate. The images were generated using a real thorax phantom data. The plane considered contains activity due to the heart, lungs, spine, and background. For display purposes, all the images were adjusted so that they have the same dynamic range.

PNIL images corresponding to different iteration numbers are shown. The images were generated by applying the PMIL algorithm to real thorax phantom data (scan duration was 14 minutes) with an uniform initial estimate. The penalty parameter was 3 = 0.02 and A(t) = log(cosh( )) with 6 = 50 was the penalty function. As can be seen in Figure 4 3 (a). (b), and (c), the early iterates contain values near zero outside of the body. Consider the figure in Figure 4 3 (d), which is the 1000th PML iterate (practically speaking, the iterate is the minimizer of 4 because it did not decrease after the 791"t iteration up to the 5000th iteration). The 1000th iterate also contains many values near zero outside the body. Inside the body, on the other hand, the 1000th iterate is very different from the early iterates. From the above observations, it can be said that convergence rate varies significantly between voxels inside the body and voxels outside the body.

From the above observations that the voxels outside the body converge faster than the voxels inside the body and the voxels outside the body converge to values near zero that is the boundary of the set {x : x > 0}, we claim that it is better to search for an accelerated iterate along the boundary whenever the current iterate is "near" the boundary. By "boundary", we mean the set {x > 0 : xi = 0 for some j}. This can be explained by the example shown in Figure 4 4. In Figure 4 4 (a), the second PML iterate is heading toward the x2-axis. If we perform the acceleration step with the direction vector P("+1), then the accelerated iterate will lie on the x2-axis as shown in Figure 4 4 (b). A better direction to be searched is the one that is parallel to the x2-axis as shown in Figure 4 4 (c). The principle of the proposed direction vector is to exclude the coordinates corresponding to the voxel values that are "near" the boundary. An easy way to incorporate this idea is to determine the voxels whose values are less than a small positive value, c. Specifically, the proposed direction,

55

(a) (b) (c)
2 X, F

X1X XI

Figure 4 4: Direction vectors: (a) Standard PML iterates are shown., (b) an accelerated iterate (the single solid circle) is shown by using the direction vector P('+'), and
(c) an accelerated iterate (the single dotted circle) is shown b3y using the proposed direction vector v(n+l). The double circles denote PMIL iterates. The mark x denotes the minimizer of the function subject to the constraints x, > 0 and x2 > 0. V/n+1) is

(n+l) A 0, if X n+l) < E i (n+1) _ (n) otherwise ( xj j )

where c > 0 is a user-defined parameter (note: V(n+l) . p(n+l) if E = 0) and j = 1,2,...,J.

4.4 Properties of the APML Algorithm

We now provide a summary of the desirable properties of the APML algorithm:

" The APML algorithm needs only one additional parameter (i.e., E), whereas

ordered-subsets based algorithms [33,35] need at least three extra parameters (i.e., a relaxation parameter, the number of subsets, and a small positive number

that is set to any nonpositive elements of an iterate).

" In experiments, the proposed direction vector performed better than the direction vector put forth by Hooke and Jeeve [41, pp. 287-291].

" The APML algorithm monotonically decreases the PML objective function unlike the algorithms in [23,35].

56

" The APML algorithm theoretically guarantees nonnegative iterates, whereas

some algorithms [33, 35] set any negative element of the iterates to a small

positive number.

" The APML algorithm can incorporate a large class of edge-preserving penalty

functions unlike the algorithm by De Pierro [30].

" The APML algorithm converges to the minimizer of the PML objective function.

Convergence proofs for the algorithms in [23,33] are not available.

CHAPTER 5
In this chapter., we present a regularized image reconstruction algorithm that aims to preserve edges in the reconstructed images so that fine details are more resolvable. We refer to the proposed algorithm as the quadratic edge preserving (QEP) algorithm. The QEP algorithm results via a certain modification of the surrogate function 4(n) for the PML objective function. It should be mentioned that the QEP algorithm was first introduced in [18].
Recall that the nth surrogate function for the PML objective function JD, V(), can be expressed as ()(x) = j n)i(X) + C3() (see (3.29) and (3.35)). For the discussion to follow, it will be convenient to express ( n) in (3.35) in a different manner

-t Ejn log(t) + F(n) t2 + Gf')t
-- En) log(t)+2/3 ( Wjk(y(n x7))t2 kENj
+ (2 - 4 w jl(X ) -~7 ) m)) t

I
= E n)log(t) + t Y Pi + 2/3 E wjy(Xfn) - x(n))(t2 2r i-i kcNk
1- (n)[t / (n) t)(n) _ (n),, (n)2
- (t)ï¿½2/3 h (t) - 23 E Wjk(y(x k
kENj kcNj

(5.1) (5.2) (5.3)

'(, ) (5.4)

(5.5)

where

/jn)(t) ___" E(n) log(t) + t 7,j
i=1
( - (X ) _ )) ( t M)2 h(')(t) = UjkT(- - Y)( --()
i k i ' k )

(5.6) (5.7)

anl injk. E('1 F . and G are defined in (3.24). (3.30), (3.31), and (3.32). respectively. Note that the j element of the next PML iterate x("l). x(l), is defined to be the nonnegative ninimizer of the function 4. The function n is quadratic so
( ) (n)
its aperture, Wjk'Y(Xj- X ), and minimizer, rnjk . are expected to play key roles in the regularization process (for a quadratic function. f(t) = at2 + bt + c, the constant a is called the aperture of the function). To see the role of the function h(') on determining x +l), suppose/3 0 in (5.5). Then, < )(t) = lJ'O(t) and the minimizer of the function Oj9)(t) is the "pure" log likelihood iterate (i.e., minimizer of (n)For 0, intuitively speaking, it is evident that the functions h()jkN j act to bias x from h ( e :
the pure log likelihood iterate towards the minimizers of {h1n~}kENj (observe: the last term in (5.5) is independent of x). The degree of influence that the functions { jkCN3 possess is controlled by their apertures and the penalty parameter /3.
To highlight the role of the function -y, we let/3 = and Wjk = for all j and k. In this case, the aperture of h(") equals 'Y(xn)-_ (n)). For the quadratic penalty function (i.e., A(t) = t2), the aperture of h(') is a constant for all j, k, and n. Consequently, for all k E Nj, the function h(n) has the same degree of influence regardless of the absolute difference between and xk(n. Practically speaking, this means that the quadratic penalty will overly smooth edges. To preserve edges, it would be helpful to lessen the degree of influence of the functions {hj)}kEj whenever the absolute difference between and x(k) is sufficiently large. Figure 5 1 is a plot of y when A(t) = log(cosh(l)) (recall: -y(t) =A (t)/t in (AS6)) is used in the penalty function with J = 10, 50,100,500. From the figure, it can be seen that 'Y becomes very small compared with -y(O) for Itl >> 6. This means that, whenx(-) - x(n)j >> 6, h(n) will have a "very small" aperture, and consequently the function h(n) will not have much
' jk
influence on x n+1). Said another way, the log-cosh penalty function helps preserve edges whose "heights" are on the order of 6.

(a) x 10' (b)
0.01 4

0.008 0.006

0.004

0.002L

0 0
-2000 -1000 0 1000 2000 -2000 -1000 0 1000 2000 t t x 10 4 (C) x 10 (d)

3.5
0.8

0.6
0.6 2.5 O4 2
1.5
0.2

0 0.5
-2000 -1000 0 1000 2000 -2000 -1000 0 1000 2000 t t Figure 5 1: The function -y is shown when A(t) = log(cosh(')) is used as the penalty function with: (a) 6=10, (b) 6=50, (c) 6=100, and (d) 6=500.

We will now move the discussion from the aperture of the function ) to the jk

minimizer of h( ). As stated previously, the minimizers of the functions {h(n)} N play an important role in the regularization procedure. More specifically, when the aperture of h() is sufficiently large, the (n+1)st iterate is biased towards the minimizer
i k

of h(') which ism k' ( + x As a result, there is inherent averaging that
"jk I lljk X k )2 s rslteei neetaeaigt takes place with the PML algorithm.

Consider a penalty function, where, for certain functions { },it is of the form

A()X) 2 (n)s
Ae(x p 2 Oj t)(Xi). (5.8) j=1 kEN

In order to better preserve edges, we believe an improvement would be to construct so that it has the same aperture as h() but a different minimizer. The minimizer of Irk, which would depend on whether an edge is present, is chosen to be

(5.9)

? (n)xn) I](x(.n) - (n))
j~k j- 3 X

where il is a function such that

'I(0, { t < - (5.10) t, otherwise

and is a parameter that represents the "heights" of the edges that are to be preserved. An example of a function that can be used is qj(t) - { tanh( ). In order for 1ujk to be the minimizer of ("), we define to be 'jk (7jk (to (be

jk M=(t) - jk' ( k -)(t- (5.11) Wn (n). (n)
Using oj . , when the absolute difference between x( and x1k is less than (i.e., no edge present), smoothing takes place because u(n) ( On the other hand, when the absolute difference between x1 and x( is greater than (i.e., edge present), less smoothing takes place because there are upper and lower bounds for .k(n) To help make our idea clearer, consider Figure 5 2. In the figure, ,(n) and u(") are plotted as a function of ?7), where xt 800 (value is chosen arbitrarily) and

77(t) = 250 tanh(-o). It can be observed that u is approximately x) + 250 when Xk ) is larger than ( 0 and approxiately --250 when Xn) is less than x () - 250. In other words, the function q prevents xjnï¿½l) from being overly biased towards x7) when the absolute difference between x (n) and x() is greater than 250.

Using the edge preserving penalty function A an algorithm for obtaining regularized estimates of the emission means follows:

" Step 1 Let x(ï¿½) > 0 be the initial estimate
" Step 2 Construct Ap)(x) from the current estimate n using (5.8), (5.9),

(5.10), and (5.11)
* Step 3 Compute x(n+l) = argmin "(')(x) subject to x > 0, where E(n)(X)
- T (x) + OACP(X).

* Step 4 Iterate between Steps 2 and 3.

1100

1000 900 Bo 800

1000 1200 1400

Figure 5 2: Plots of rn5") and )versus xk) are shown, where x" = 800 is fixed and 71(t)= 250 tanh(2- ).

It turns out that a closed form expression for the problem in Step 3 is not possible. An alternative to Step 3 is:

* Step 3a Find x(n+l) such that -() ) )

The problem in Step 3a can be solved by defining x('+') as

x arg min () , (5.12) where

I)x) A -I(")(x) + 3AC)(x) (5.13) and T(n) is defined in (3.12). Defining the iterates according (5.12) and (5.13) insures that E(n)(x(n+1)) < =(n)(x(')). To see this fact, note that E(N)(X(n+1)) < 1/Nl)(x(n+l)) because VI(n)(x) ql(x) for all x > 0. Since Cl(P)(x(n+1)) K I()(x(a)) by (5.12), it follows that

(5.14)

=() n l) 4(n) (n+l)) 5 (~ ) X -) = -7 n( ~ )

All that remains now is to solve the optimization )roblem in (5.12). Observe that 4(p) can be written as

FP(")(x) = -xP(")(x) + jA(31)(x) (5.15) {LPx] i d F'j' log(x)) 1j= 1[PX() p],
J
+2/3 5 7jk (xi) (5.16) j=1 kCN,
J ( I I (n j=l i=1 i=1[gOx~) 5 , +

+ 2/3 E wY( x xn) 2u x u )2 (5.17)
j=l A!CNj

E{E () log(xj)+ F 2 + H(-)xj + C+n) (5.18)
j=l

where
I
H7n) A 5 ij - 4/3 L wJkT (Xn, - ))u(n) (5.19) i=1 kcN
-4n ~ )(n))u(n)2-(.0
- + 2/ C5jkY(Xn - X ) , (5.20) j=1 kENj

and C(n), E and are defined in (3.13), (3.30), and (3.31), respectively. Since ).P is de-coupled, it follows that the solution to (5.12) is given by
(n+1) ()r()2 ,~ ) "
x( = argmin fj log(xj)+ F F(n)X2 +H' xj , j= 1,2,..., J. (5.21)

Repeating the steps used to derive (3.37), it is straightforward to see that the solution to the optimization problem in (5.21) is
-Hn) + (nH)2-8E(n)F (n)
(n+I) - ï¿½
x4F (n) j = 1, 2,.. .,J. (5.22)
1 3 3

CHAPTER 6
JOINT EMISSION MEAN AND PROBABILITY MATRIX ESTIMATION

In Chapters 3, 4, and 5, we assumed that the probability matrix 'P accounts for errors due to attenuation. detector inefficiency, detector penetration, non-collinearity, and scatter. However, we now assume that an I x J corrected matrix 'PC is available that accounts for errors due to attenuation, detector inefficiency, and detector penetration. And, we develop a method for correcting errors caused by scatter and non-collinearity. Before presenting the proposed method, we briefly review standard approaches for obtaining Pc in practice.

The (i, j)th element of Pc, '1 , denotes the probability that an annihilation in the j" voxet leads to a photon pair being recorded by the it detector pair when there are no errors due to scatter and non-collinearity. In practice, one can reliably estimate the probability matrix 'P' by using the detector penetration correction method in [7] with an attenuation correction method [9] and detector inefficiency correction method in [15]. However, other correction methods for detector penetration, attenuation, and detector inefficiency could be used in conjunction.

To address attenuation errors, two scans, known as a transmission scan and blank scan, are taken. During a transmission scan, 1 - 3 rotating-rods filled with positron-emitting isotopes rotate outside the subject and the transmission data are the coincidence sums. A blank scan is taken in the same way as the transmission scan except that there is no subject inside the scanner. The coincidence sums during the blank scan form the blank scan data. Given the transmission and blank scan data, an estimate of the attenuation map can be generated. Using the estimated attenuation map, attenuation correction factors are computed and used for attenuation correction. Numerous attenuation estimation methods have been proposed [8 10].

64

a b3

Ib I2
1b 1b I

Figure 6 1: A simple example to illustrate the geometry of PET image reconstruction with three voxels (vi, v2, and v3) and three detector pairs ((aj,bi),(a2,b2), and (a3,b3)). The dashed lines define the tubes of the detector pairs.

To address errors due to detector inefficiency, a blank scan of extremely long duration is performed. From the blank scan, the relative efficiency of the detector pairs can be estimated. Recall that the efficiency of a detector pair is defined to be the probability that a coincidence is recorded when a photon pair is incident on the detectors. Consider a set of detector pairs where each detector pair has same spatial extent. One simple way to estimate the efficiency of a detector pair in the set of detector pairs is to define its efficiency to be the ratio of the number of photon pairs recorded by the detector pair and the mean number of photon pairs recorded by a detector pair in the set of detector pairs. Once, the estimates of efficiencies for all detector pairs are available, they can be incorporated into the probability matrix. To address detector inefficiency, more complicated correction methods such as [13 15] have been proposed.

As discussed in Section 1.3, two photons generated by an annihilation usually do not propagate in exactly opposite directions, which is called non-collinearity of line-of-response [5]. As illustrated in Figure 1 5, it is possible that an annihilation

65

is recorded by detector pair i1 given that the annihilation would have been recorded by detector pair i2 if both photons propagate in exactly opposite directions. where

il ? i2.
Since scatter depends on the activity and attenuation within the subject and the scanner design., it is not straightforward to correct for errors due to scatter. Photons lose energy when they are under gone Compton scattering. Thus, the energy of detected photons may be used to discriminate between unscattered photons and scattered photons [11, pp. 65-69]. Scatter correction methods that are based on multiple (two or more) energy windows have a limitation because detectors have finite energy resolution [46, 47]. As discussed in Chapter 2, state-of-the-art scatter correction methods, such as the method in [52] where the scatter distribution is calculated using an analytical equation, are not practical because of their extremely high computational cost. Consequently, a simple scatter correction method would be beneficial to the PET community. In this chapter, we propose a method that estimates the probability matrix in such a way that errors due to scatter and noncollinearity are addressed.

In Section 6.1, we first propose a model for emission data that accounts for errors due to scatter and non-collinearity. Then, in Section 6.2, we present a method where the unknown emission mean vector and an unknown matrix in the proposed model are jointly estimated. Finally, we propose an algorithm, referred to as probability correction in projection space (PCiPS) algorithm, that estimates the unknown emission mean vector and the unknown matrix in the proposed model.

6.1 Scatter Matrix Model

Consider Figure 6 1 in which a simplified PET scanner consisting of three detector pairs is depicted. For this discussion, we will focus on the voxels v1, v2, and v3 shown in Figure 6 1. Let the detector pairs (ai,bi), (a2,b2), and (a3,b3) be the first, second, and third detector pairs, respectively. For the scanner in Figure 6 1, we

assume that

0.04 0 0

p _ 0 0.04 0 (6.1)

0.03 0.03 0.02

Suppose during a scan there are no errors due to scatter and non-collinearity. Then, the mean number of photon pairs recorded by the detector pair (alibi) is P'Itl, where x1 is the mean number of positrons emitted from voxel vj. The mean number of photon pairs recorded by the detector pairs (a2,b2) and (a3,b3) are P222 and (W31Xl + P32X2 + 'P33X3), respectively, where X2 and X3 are the mean number of positrons emitted from voxels V2 and v3, respectively. As discussed in Section 1.3, a photon's original flight path is altered when it undergoes Compton scattering. Consequently, an annihilation in v, that would have been recorded by detector pair (ai,bi) if both photons had not undergone Compton scattering may instead be recorded by detector pair (a2,b2) when scatter occurs. When we consider non-collinearity, it is also possible that an annihilation in v, is recorded by detector pair (a2,b2) given that the annihilation would have been recorded by detector pair (al,bl) if both photons propagate in exactly opposite directions.

Let /C1 2 denotes the conditional probability that an annihilation is recorded by detector pair il given that the annihilation would have been recorded by detector pair i2 if both photons produced by the annihilation had not undergone Compton scattering and both photons propagate in exactly opposite directions, where il = 1,2,... ,I and i2 = 1,2,..., I. It is through these unknown probabilities {Ci,i12} that we model scatter and non-collinearity. Now, accounting for scatter and noncollinearity, the mean number of photon pairs recorded by detector pair (ai,b1) is {cP11it + I12222 +KC13(P31li +P32X2 + P333)}. Moreover, the mean number of photon pairs recorded by detector pairs (a2,b2) and (a3,b3) are {K21PC1xi1+C22PC222+ ï¿½23(PC1h +P32x2+P33x3) and {A31P11ï¿½l +ï¿½32P22ï¿½2+K33 (P31ï¿½ +P32x2 +P33x3) }

respectively. In matrix notation. the mean of the data E[D] can be expressed as Kll k12 K13 0.04 0 0 [1 E[D] iC21 K22 C23 0 0.04 0 [2 (6.2) iC31 /(:32 C33 0.03 0.03 0.02 t3 = 7cf:, (6.3)

where : = [x1 x2 X3]' and the I x I matrix C denotes the first matrix on the righthand side of (6.2). We will refer to KC as the scatter matrix. Since the emission means can be expressed as kP'Y: in this model, we define the "true" probability matrix as ptrue ](7= k , (6.4)

where the (i, j)th element of ptru, pl7u, , denotes the probability that an annihilation in the jth voxel leads to a photon pair being recorded by the ith detector pair. To our knowledge, the scatter matrix model we propose is not available in literatures. However, similar factorization of ptruc in (6.4) can be found in [28,53, 64,65].

With the definition of Ptrue in (6.4), the mean number of photon pairs recorded by the ith detector pair can be expressed as J J I
Z Ptruex =>3> K c X"2 (6.5) E ptiJ Zj E E /Cii2"Pi: J (5
j=1 j=1 i2=1
I J
- > Ki2 > Pic2jxj (6.6) i2=1 j=l
I
- Ek2 i2[1PcXi2 (6.7) i2-1

In other words, the mean number of photon pairs recorded by the it' detector pair is a weighted sum of the mean number of photon pairs recorded by all detector pairs when there are no errors due to scatter and non-collinearity. Regarding the matrix )C, two constraints are necessary. Since K is a probability matrix, it must be true that 0 - ili2 - 1 for all il and i2. Recall that ki2 is the conditional probability that

an annihilation is recorded by detector pair i1 given that the annihilation would have been recorded by detector pair i2 if both photons produced by the annihilation had not undergone Compton scattering and both photons propagate in exactly opposite directions. Since a photon pair that is recorded by detector pair iI can not be recorded by the other detector pairs, we require that

_ilui2 < 1 for all i2 (6.8) i1=1

6.2 Joint Minimum Kullback-Leibler distance Method

We now consider the Poisson model by Shepp and Vardi [16]. In their model, the emission data d is an observation of a random vector D that is Poisson distributed with mean (Pt"ux + p), where x is the unknown emission mean vector and p is the known mean accidental coincidence rate.

Since d is the ML estimate of the unknown mean (ptrucX + p) 1 , it can be said that d P (5ptruex + p). Using the definition in (6.4), we can state that

d ,:z caxp + p. (6.9) It is important to point out that there are two unknowns in (6.9): the scatter matrix kC and emission mean vector x. Thus, given the emission data d, mean accidental coincidence rate p, and probability matrix P', the problem of interest is to estimate the emission mean x and scatter matrix C. We define the estimate (;, C) to be a

1 If d is an observation of a random vector D that is Poisson distributed with unknown mean A, then the likelihood function for d is Pr{D = dAl= deA Since d maximizes the log likelihood function log Pr{D = dlA}, d is the ML estimate of A.

minimizer of the Kullback-Leibler (KL) distance [66] between d and (IC7Px + p): (&,/() A arg min KL(d, KICPx + p) subject to

I
x > 0, K > 0, and

where the KL distance between a and b is defined by KL (a, b)=. ai log a-_ai +bi (.1

and/KC > 0 means that Cii2 > 0 for all il and i2. The definition of the estimate of (x, 1C) is motivated in part by the fact that in [26], Byrne derived Shepp and Vardi's MLEM algorithm [16] by minimizing the KL distance between the emission data d and Ptrucx using the alternating projection algorithm by Csiszir and Tusnddy [67], where P"tru is assumed to be known. It should be mentioned that Byrne derived the MLEM algorithm when p = 0.

6.3 Probability Correction in Projection Space (PCiPS) Algorithm

Since it is difficult to solve the problem in (6.10), we propose an alternating minimization algorithm where, in a repetitive fashion, one of the two unknowns (i.e., IC and x) is estimated while the other is fixed. Suppose an initial estimate for the scatter matrix IC (e.g., an I x I identity matrix), denoted by 10(), and an initial estimate x () > 0 are available. Then, for n = 0, 1, 2,..., the steps of the proposed algorithm are

* Step 1 Get the current estimate of emission mean vector x('+l) using AC('), the

current estimate of the scatter matrix IC:

arg min KL(d, A(n)Pcx + p). (6.12) x>O

" Step 2 Get the current estimate of the scatter matrix ('1) using x("'+1). the

current estimate of the emission mean vector x:

kC("n1) A arg min KL(d, ICTPx(r'd) + p)
ï¿½:0
I
subject to /V1l12 < 1 for all i2 . (6.13)

" Step 3 Repeat the steps above until some chosen stopping criterion is met.

Note that the problem in Step 1 can be solved using the MLEM algorithm [16,26]. The reason is because Byrne showed that Shepp and Vardi's MLEM algorithm [16] minimizes the KL distance between the emission data d and Ptruex, where ptru is known. Specifically, given an initial guess x(10O) > 0, the iteration for obtaining x('+1) is

(nm-I-i) X(nm-) I [K(n)Pc]iX(.nm-)
(n l- ci , di /1 d , j 1,2,..., J, (6.14)
[=()PC] Z [[K()PCX(n) + Pi

for m = 0, 1, 2, .. ., M1. The current estimate of x is defined to be X(n+l) A X(nM11+1), the 1M1h iterate of (6.14).

One of the issues surrounding the constrained minimization problem in (6.13) is that it is impossible to estimate the scatter matrix C when all of the elements in the matrix are assumed to be unknown. The reason is because there would be too many unknowns, which would result in the problem being under-determined (note: the dimension of /C is I x I, while there are only I data points). To reduce the number of unknowns in /C, we assume that /Ci2 = 0 if the detector pairs il and i2 are not in the same projection (see Figure 1 3 for the definition of a projection). This assumption means that an annihilation that would have been recorded by a detector pair within a certain projection if both photons had not undergone Compton scattering cannot be recorded by a detector pair within some other projection. Under the stated assumption, the number of unknown parameters in the matrix IC is dramatically reduced.

Moreover, K is a block diagonal matrix with T x T sub-matrices along its diagonal 1c1 0 0 0

0 K2 0 0 (6.15)

0 0 -. 0

0 0 0 Ivs

where T is the number of detector pairs within each projection (e.g., 160), S is the number of projection angles (e.g., 192), and it is assumed that emission counts of the detector pairs within a projection are placed as a "chunk" in the emission data d (i.e., d = [(Ist projection), (2,d projection), ... , (Ith projection)]').

Since IC is a block diagonal matrix, the minimization problem in (6.13) can be broken into S sub-problems. Consider the sinogram of d, denoted by Y, which is a T x S matrix. The Sth column of the sinogram y is the projection that corresponds to a certain projection angle. Figure 6 2 (a) shows an example of a sinogram. In the figure, the sinogram of the 14 minute emission data for plane 21 is shown. The I x 1 vector Pcx('+) is known in the PET community as the forward projection of X(n+l). We define a matrix W(n+l) where the first column is the first S elements of Pcx(n+i) and the second column contains the (S + 1)th to (2S)th elements of Pcx(n+l) and so on. In other words, the ordering of the detector pairs associated with W(n+') and y match. Figure 6 2 (b) shows an example of WV(n+l), where x('l+) was generated by running the MLEM algorithm for 1000 iterations on the emission data in Figure 6 2 (a). Under the assumption that K/ili2 = 0 if the detector pairs il and i2 are not in the same projection, the minimization problem in (6.13) can be expressed as: for S = 1, 2,. .., S,

K;(n+l) arg min KL(ys, K ',,(n+)

T
subject to Z[2]i2 _< 1 for i2 = 1,2,..., T , (6.16)

(a) Emission Data

160 I I I I i
20 40 60 80 100 120 140 160 180

(b) Forward Projection
I I I IIIII /

1 6 0 - I I I I
20 40 60 80 100 120 140 160 180

Figure 6 2: Sinograms: (a) 14 minute emission data for plane 21 (i.e., Y) and (b) forward projection of x(fl') (i.e., W/;(n+l) = PcX(n+I)), where X(n+l) is the 1000th MLEM iterate using the emission data in (a). Note that the images were adjusted with their own dynamic range.

I I

A
where z, = [p{( 1)T+1} P{(s 1)T+2} ... PfsT}]1. Ks is the sth sub-matrix of k,. and y, and wn 1) are the sth column of Y and 110"), respectively.

Since there are T x T unknowns in each minimization problem in (6.16). there would be still too many unknowns. To reduce the number of unknowns, we assume that, for all s, K is a Toeplitz matrix. An example of a Toeplitz matrix is a3 a2 a, 0 0 0

a4 a3 a2 a, 0 0

a5 a4 a3 a2 a, 0

0 a5 a4 a3 a2 a,

0 0 a5 a4 a3 a2

0 0 0 a5 a4 a3

The assumption that IC, is a Toeplitz matrix implies that there are at most T unknowns in each sub-matrix K. Moreover, the assumption means that a single kernel can account for scatter within each projection.

The assumption that K, is a Toeplitz matrix can be justified for regions with approximately uniform attenuation, such as the brain. Consider Figure 6 3 in which a simplified PET scanner consisting of four detector pairs is depicted. In the figure, the dotted circle defines a region with uniform attenuation. We refer to the detector pairs (a,, b1), (a2, b2), (a3, b3), and (a4, b4) as the first, second, third, and fourth detector pairs, respectively. Note that the geometry of the first and second detector pairs and the geometry of the second and third detector pairs are approximately same. Because of the approximately uniform attenuation of the subject and geometric similarity of the detector pairs, it can be said that the conditional probabilities K:21 and IC32 are approximately same. Using this rationale, for a projection of a PET scanner, it can be assumed that K~i2i P Ki312, when (i2 - il) = (i3 - i2). Thus, we can construct K, that can be approximated as a Toeplitz matrix.

a* I

I I

Figure 6 3: Geometry of a simplified PET image reconstruction problem: three voxels (v1, v2, and v3) and four detector pairs ((al,bm),(a2,b2), (aa,b3), and (a4,b4)). The dashed lines define the tubes of the detector pairs. The dotted circle defines a region with uniform attenuation.

Now, consider the following functions: for an integer t,

= [yJt, t=(6.17) y8(t) (
0, otherwise

r (n+l)l
t = 1,2,...,T (6.18) 0, otherwise

z (t) ! z,]t, t 1, 2,. .., T .9

1 0, otherwise
Under the assumption that k, is a Toeplitz matrix, y,(t) is approximately equal to the convolution of w~n l)(t) and an unknown nonnegative function, denoted by k(s,-)(t), that depends on the sth projection angle. Thus, for t = 1, 2,.. T,

y"(t) (k(8, ) * w~n~1))(t) + z8(t) = k(s,r)(u)W( +1)(t - u) + z,(t) , (6.20)
u=-r+l

where k(,.,)(t) E [0, 1] for an integer I and k(,,)(t) = 0 for Itl T T. Tile parameter T is defined by the user, but must satisfy the constraint T < (T + 1)/2j. Observing (6.20). the product kisT)(u)w "+l)(t- u) equals the proportion of photon pairs that would have been recorded by detector pair (t u) if both photons produced by the annihilation had not undergone Compton scattering and they had propagated in exactly opposite directions, but instead are recorded by detector pair t. The parameter T restricts the number of detector pairs to be convolved. In principle, the parameter T is to be chosen in such a way that T is proportional to the mean number of scatter events. Since the mean number of scatter events is proportional to the attenuation of the material, it is possible that attenuation map could be used to determine the parameter T.

The convolution in (6.20) can be expressed in a matrix notation: for

(6.21)

where k(,,,) = [k(s,)(-T + 1) k(,,)(-T + 2)... k(sT)(7 T x (2T - 1) matrix that is of the form

[W(n+l)l, [ (n+l)],l

[(n+1)
81)IT

[Ws IT

0

0

W(n+l)] [wS )IT

0

[W (n+1)1l
r ( n + l ~ r (n+l~ r ( n + l ~ ï¿½ " [s T-,r+2

r ( n + l ~
ï¿½ "" Ws IT

)a (n+l)
1)'and ., is a known

0 0

r. [ (n+l)
[Ws1)IT-2 r+2 ... [W(n+l)]
S " W T-2r-+3

ï¿½ï¿½" [W s Jl)T- +l

Thus, y, can be approximated as

(6.22)

(k(,,,) * w (n+I) [13(n+') k t , t = 1, 2,
R S

Y. ,,L(n+)
5s 1" s (s,r) +- Z s .

With (6.22), the minimization problem in (6.16) is now

) Arg min KL(y,, B +)k(,,) + z.s) (s,,) - k(.a,)r>0 2r A
subject to E [k(,)]t < 1 . (6.23) t=1
The constrained optimization problem in (6.23) is difficult to solve because of the
-27-1
constraint _t=1 [k(s,,r)]t < 1. Consequently, we first solve the following optimization

problem:

(l,) = arg min KL(ys, Sn1)ks,,r' + zS) . (6.24) k(.,)>O

(n+1) (n+1)
Then, to get k(n,) we normalize k(,,) so that the sum of its elements equals one:

(n+1) [k(n+1)l
get((slr A tr (s,r) J

[k( It (S, [k I tn l = 1, 2,..., (2T - 1) . (6.25) E-t=l (s,,r)JIt

The minimization problem in (6.24) can be solved by the MLEM algorithm [16,26]

because it has the same form as the optimization problem in Step 1. Specifically,

given an initial estimate ,(n,O) > 0, the iteration is as follows: for m = 0, 1,... M2,
[(nm). T [3(n+l1)
.(n,m+l)l [k(s ) ]j [ ij[Ysi
[k",) =J T r(,+) E n+l) , j = 1, 2,..., (2T - 1). (6.26)
1 1L(~ [Bn ) W e d e n ei s ki j = L s ( si z " s i
k(n+l) A k(n,M2+l k(n+l)(n1
We define "(s'Tr) "sr) 1)and normalize (,,r) using (6.25) to get k(n,,)

Given k 1(',), we determine A,,+ï¿½) by using the following equation:
TS7)

K tn 1)

[k(,+)]7
(in+1)l

[k (n+1]

0

0

[k (n + 1) [k ni)

(s ) 2r-2 [k (n+r)] 1

0

... [k(n' + )]
[k (n+) ] [ (81-0 IT

... [k~(n+l)]
k (S,-)0

ï¿½""0

0 0

0 0

[k(n+l)] (sjr)

(6.27)

Finally, K(ntl) is defined as a block diagonal matrix with {fkn+1)} along its diagonal:

&+(n+1)

k(n+l)
1

0 0 0

0

iC(n+1)
2
0 0

0 0 0 0

0

o (n+l)
0 "S

(6.28)

Repeating Steps 1 and 2 generate the estimates of IC and x. Once the estimate of k is available, denoted by k, a regularized image reconstruction algorithm, such as the PML, APML, and QEP algorithm, can be used to estimate the emission mean vector x. More specifically, we first let our estimate for 'ptrue be ptrue = APC. Then, the probability matrix P true is used in the image reconstruction algorithm of choice.

In summary, the Steps of the PCiPS algorithm are: for n = 0, 1, 2,...

* Step 1 Get an initial estimate for the emission mean vector x(ï¿½) > 0 and an

initial estimate for the scatter matrix 100)

* Step 2 Get the current estimate of the emission mean vector x('+') using ('),

which is the current estimate of the scatter matrix K:

x(n+1) . arg min KL(d, I Pc()'x + p).
X>0

Step 3For s= 1,2,..., S, get k(,r) using (6.26).

(6.29)

78

9 Step 4 For s = 1, 2,. S . normalize k I) using (6.25) to get k(n')

* Step 5 For s = 1,2,...,S, get Id"'s+" using (6.27). " Step 6 Get KC(n+l) using (6.28).

* Step 7 Repeat Steps 2 through 6 for a chosen number, Al. of iterations.
treAtrue
" Step 8 Define P'ue A IC(A!+1)'Pc and use P in the APML algorithm or some other algorithm of choice.

CHAPTER 7
SIMULATIONS AND EXPERIMENTAL STUDY

To evaluate the algorithms in Chapters 3, 4, and 5, and the method in Chapter 6, we applied them to real thorax phantom data and compared them quantitatively and qualitatively to certain existing algorithms. Also, in Section 7.1 simulation studies with computer-generated synthetic data are presented for the PML and QEP algorithms in Chapters 3 and 5, respectively. It should be noted that simulation results with the synthetic data have limitations because they are generated under the assumption that the system model for emission data in Section 1.4 is exactly correct. However, the system model is not perfect due to errors discussed in Section 1.3.

Thorax phantom data was obtained from the PET laboratory at the Emory University School of Medicine. The phantom was filled with 2-[1F]fluoro-2-deoxyd-glucose (['5F]FDG) and scanned using a Siemens-CTI ECAT EXACT [model 921] scanner in slice-collimated mode (i.e., septa-extended mode). Thirty independent data sets were generated from multiple scans of duration 7 minutes. Fifteen realizations of 14 min data were generated by adding non-overlapping two 7 min data sets. The [1F]FDG concentration for the heart wall, heart cavity, liver, three tumors, and thorax cavity of the thorax phantom by Data Spectrum Inc., were 0.72pCi/ml, 0.23jtCi/ml, 0.72jiCi/ml, 2.01pCi/ml, and 0.24jtCi/ml, respectively. The lungs, which contained styrofoam beads, were filled with a 0.25,Ci/ml solution of [1F]FDG. The concentrations were chosen to mimic those observed in whole-body scans. The tumors were of size 1 cm, 1.5 cm, and 2 cm. The sinogram consists of 160 radial bins and 192 angles. The physical dimensions of the image space is 43.9 x 43.9 cm2 and the reconstructed images contain 128 x 128 voxels (voxel size is 3.43 x 3.43

80

rm2). Two planes (10 and 21) were considered in the experiments. Plane 10 contains activity due to the heart. lungs, spine, and background, while plane 21 contains activity due to the heart, two tumors (1.5 cm and 2.0 cm), and background. The total number of prompts for planes 10 and 21 was 397, 000 and 340, 000, respectively, for 14 minute data. The randoins makeup about 10% and 12% of the data for planes 10 and 21, respectively.

The probability matrix ' was computed using the angle-of-view method [16] with corrections for errors due to attenuation and detector inefficiency. To get the attenuation correction factors., post-injected transmission scan data was collected for three minutes and the attenuation correction method by Anderson et al. [9] was employed. A normalization file was used to correct for detector inefficiency. Finally, the randoms were used as noise free estimates of the mean numbers of accidental coincidences.

For all of the experiments and simulations, we used a uniform initial estimate (all voxels equal Ei di/J), the eight nearest neighbors of the jth voxel were used for Nj, and the weights, { Wjk}, are one for horizontal and vertical nearest neighbors and 1/v/'2 for diagonal nearest neighbors.

In Section 7.1, we applied the PML and QEP algorithms to real thorax phantom data and computer-generated synthetic data, and compared them quantitatively and qualitatively. Then, performance of the APML algorithm will be evaluated in Section 7.2. Finally, experimental results with the probability estimation method in Chapter

6 will be presented in Section 7.3.

7.1 Regularized Image Reconstruction Algorithms

In this section, we compare the PML and QEP algorithms, quantitatively and qualitatively, to the MLEM algorithm and a penalized weighted least-squares (PWLS) algorithm [42]. Two ad-hoc forms of regularization include the post-filtering of MLEM estimates and early termination of the MLEM algorithm (usually quite far from the

MLEM estimate). Given their simplicity, we also compared the post-filtering (MLEMIF) and early-stopping (MLEM-S) strategies to the proposed algorithms.

Quantitative comparisons were made using contrast as a figure-of- merit: Cotas I iAInoi - A'JB
Contrast A AI B (7.1) MB

where MRQo and IB denote the mean of a chosen region-of-interest (ROI) and background, respectively. We define another figure-of-merit that quantifies the distinguishability of the two tumors:

A T- MI7
Distinguishability NIT -MB (7.2) where MT and All denote the mean activity of the two tumor regions and intermediate region between the two tumors, respectively. If two tumors overlap each other, then M, = AIT and the distinguishability will be zero. On the other hand, if intermediate region between the two tumors has the same mean as the background, then the distinguishability will be one.

For image comparisons, converged images were used for the MLEM, PML, and PWLS algorithms. The QEP images result by running the QEP algorithm for the same number of iterations as the PML algorithm. This was necessary because the QEP algorithm does not have a single objective function. Recall that the QEP algorithm defines a new objective function to be minimized at each iteration. For the PML, QEP, and PWLS images, the penalty parameter, /, was chosen in such a way that the standard deviation of their soft-tissue (i.e., background) regions were equal. In this sense, the algorithms are "balanced" with respect to /. For the MLEM-S and MLEM-F images, early MLEM iterates and filtered converged MLEM images were obtained that matched the standard deviation of the soft-tissue regions in the PML, QEP, and PWLS images. To filter the MLEM image, we used 5 x 5 Gaussian filters with different standard deviation values.

7.1.1 Synthetic Data

In this subsection, we present simulation results for software phantoms. Fifty realizations were used in the simulation study. To generate emission data, a software phantom was forward projected (i.e., Px) using the P matrix, where it was assumed that there are no errors except accidental coincidences. Then. for each bin, the prompts and randoms were generated using pseudo-random Poisson variates with mean [Pxji + p and p, respectively. The constant p was chosen such that the mean accidental coincidence rate was approximately 10%. The mean number of prompts and randoms are about 550,000 and 50,000, respectively. For simulation studies, dimensions of the image space follow the corresponding ones of the real phantom image space. The total number of intensities within a software phantom was about 500,000.

We first consider a tumor phantom. Figure 7 1 shows a software phantom that consists of two tumors (1.7 cm and 2.4 cm in diameter) in a uniform circular background with a diameter of 30.5 cm, where two tumors and background intensities are 7 x 74 and 74, respectively (tumor contrast is 6). The image also depicts regions for contrast and distinguishability calculation. The intermediate region between the two tumors consists of 14 voxels and the large and small tumors consist of 45 and 21 voxels, respectively.

For the tumor phantom in Figure 7 1, we used the log-cosh penalty function A(t) = log(cosh(l)) with 6 = 20 in the PML and QEP algorithms, while the quadratic penalty function was used in the PWLS algorithm. For the QEP algorithm, rq(t) tanh(t) was used with = 150 unless noted. The parameters 6 and were chosen experimentally.

Figure 7 2 is a plot of the images obtained by the MLEM, MLEM-S, MLEM-F, PML, QEP, and PWLS algorithms when the tumor phantom in Figure 7 1 was used.

The MLEM image is the 500h MLEM iterate and 200 iterations were used to reconstruct the PML, QEP, and PWLS images. For the PML, QEP, and PWLS images, /3 was chosen such that the standard deviation of the background was approximately 12. The MLEM-S image is the 20t1h MLEMI iterate. To get the MLEM-F image, the MLEM image in Figure 7 2 (a) was filtered once using a 5 x 5 Gaussian filter with standard deviation of 1.27 in voxels. As stated, all of the images in Figure 7 2 have the same background standard deviation except for the MLElI image. The MLEM image in Figure 7 2 (a) is considerably noisy (background standard deviation is about 78) compared to the other images. Figures 7 2 (d) and (e) illustrate that the PML image and the QEP image are smooth and, at the same time, the tumors in the images are resolvable and differ greatly from the background. On the other hand, Figures 7 2 (c) and (f) demonstrate that the images generated by the MLEM-F and PWLS algorithm are too smooth, especially near the boundary of the tumors. Figure 7 2 (b) shows that edges of the tumors in the MLEM-S image are not as clear as the ones in the PML and QEP images. In Figure 7 3, the images in Figure 7 2 are plotted with their own dynamic range.

For the images in Figure 7 2, the contrast of the QEP image was -1%, 12%, 21%, 4%, and 22% higher than the MLEM, MLEM-S, MLEM-F, PML, and PWLS images, respectively, for the large tumor. The increased contrast of the QEP image for the small tumor with respect to the MLEM, MLEM-S, MLEM-F, PML, and PWLS images was -4%, 16%, 31%, 5%, and 34%, respectively. The increased tumor distinguishability of the QEP image with respect to the MLEM, MLEM-S, MLEMF, PML, and PWLS images was -5%, 16%, 34%, 5%, and 40%, respectively. The MLEM image outperformed the QEP image in contrast and distinguishability comparison. However, as can be seen in Figure 7 2 (a), the MLEM image is extremely noisy.

Figures 7 4 (a) and (b) are line plots (the row is shown in Figure 7 1) of the images in Figure 7 2. For the row under consideration, it can be seen from the line plots that the PAIL and QEP images have a higher degree of contrast than the other images. except for the MILEM image. And, the edges in the QEP image are sharper than those in the PML image. As expected. the MLEMI image is excessively noisy from the line plot.

Figures 7 5 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifty realizations, respectively, when the tumor phantom in Figure 7 1 was used. Further, a plot of the average standard deviation of the large tumor versus the average background standard deviation for the fifty realizations is shown in Figure 7 5 (c). Finally, in Figure 7 5 (d), a plot of the average distinguishability of two tumors versus the average background standard deviation for fifty realizations is shown. As can be seen from Figures 7 5 (a), (b), and (d), the contrast curves and the distinguishability curve of the QEP algorithm lie above the curves of the other algorithms for comparable "background noise". Thus, for a fixed level of comparable background noise, the QEP images, on average, have the greatest contrast and distinguishability. The average standard deviation curves of the PML and QEP algorithms in Figure 7 5 (c) generally lie below the corresponding curves of the other algorithms for reasonably small background noise.

To see where the PML and QEP algorithms break down in terms of contrast, we performed simulations using the synthetic tumor phantom in Figure 7 1 with four different tumor contrast values (tumor contrast equals 3, 1.5, 0.75, and 0.5). For the PML and QEP algorithms, 0 = 1/16 and 1/32 were used, respectively, and 200 iterations were used. For the QEP algorithm, = 150 was used for tumor contrast of 3, whereas = 80 was used for the other tumor contrast values (i.e., tumor contrast equals 1.5, 0.75, and 0.5). The parameters /7 and were chosen

experiinent ally. Figures 7 6 (a) and (b) are the PNIL and QEP iiages. respectively, obtained by using the pihantonm in Figure 7 1 with tumor contrast of 3. As can be seen in the figures, when tumor contrast was 3, the tumors in the PNL and QEP images are clearly resolvable. Figures 7 6 (c) and (d) are the P1\L and QEP images, respectively, obtained by using the phantom in Figure 7 1 with tumor contrast of 1.5. From the figures. the tumors in the PMIL and QEP images are clear enough when tumor contrast was 1.5. It should be mentioned that the images in Figure 7 6 have the same standard deviation of the background that is approximately 10. Figures 7 7 (a) and (b) are the PML and QEP images, respectively, obtained by using the phantom in Figure 7 1 with tumor contrast of 0.75. From the figures, the tumors in the PML and QEP images are still resolvable when tumor contrast was 0.75. Consider the PNIL and QEP images in Figures 7 7 (c) and (d), respectively, that are obtained by using the phantom in Figure 7 1 with tumor contrast of 0.5. The tumors in the PML and QEP images are hardly resolvable which implies that the PML and QEP algorithms break down when tumor contrast was 0.5. The images in Figure 7 7 have the same standard deviation of the background that is approximately 9.

Figures 7 8 (a), (b), (c), and (d) are plots of the average contrast of the small tumor versus the average background standard deviation using fifty realizations, when tumor contrast of the phantom in Figure 7 1 were 3, 1.5, 0.75, and 0.5, respectively. As can be seen from Figures 7 8 (a) and (b), the contrast curves of the QEP algorithm lie above the curves of the PML algorithm for a fixed background noise when tumor contrast equals 3 and 1.5. For tumor contrast of 0.75 and 0.5, the curves of the PML and QEP algorithm coincide as can be seen in Figures 7 8 (c) and (d) which implies that the PML and QEP algorithms perform similar to each other when tumor contrast is small enough.

To see where the PML and QEP algorithms break down in terms of spacing between two tumors, we performed simulations using four different synthetic tumor

)hantomls where each phantom has different tinor spacing. WXe compared tie PAIL and QEP images both visually and in terms of (list inguishability. Figures 7 9 (a),

(1)), and (c) show software phantoms that consist of two tumors (each tunor equals 3 x 3 voxels). For the phantoms in Figures 7 9 (a), (b), and (c). tumor contrast is 3 and the spacing between two tumors is 2, 3. and 4 voxels, respectively. Figure 7 9

(d) shows a software phantoms that consists of two tumors (each tumor equals 3 x 3 voxels) with the spacing of 2 voxels, where tumor contrast is 6. The images in Figure 7 9 also depict regions for distinguishability calculation. The intermediate regions between the two tumors consist of 6, 8, 10, and 6 voxels for the phantoms in Figures

7 9 (a), (b), (c) and (d), respectively.

Figure 7 10 is a plot of the PML and QEP images obtained by using the phantois in Figures 7 9 (a) and (b). Figure 7 11 is a plot of the PIL and QEP images obtained by using the phantoms in Figures 7 9 (c) and (d). The PML and QEP images in Figures 7 10 and 7 11 were from 200 iterations. The PML and QEP images in Figures 7 10 and 7 11 have the same standard deviation of the background that is approximately 10 and 9, respectively. For the PML and QEP images in Figures 7 10 and 7 11, fi = 1/16 and 1/32 were used, respectively. Figure 7 10 indicates that the PML and QEP algorithms generate images where the tumors are clearly separated when the spacing between the tumors is 3 and 4 in voxels. As can be seen in Figures 7 11 (a) and (b), the PML and QEP algorithms were not able to resolve the two tumors when the spacing between the tumors is 2 in voxels. However, when the tumor contrast was increased, the PML and QEP algorithms worked well when the spacing between the tumors is 2 in voxels as shown in Figures 7 11 (c) and (d).

Figure 7 12 is a plot of the average distinguishability of two tumors versus the average background standard deviation using fifty realizations, when the tumor phantoms in Figure 7 9 were used. As can be seen from Figures 7 12 (a), (b), (c), and

(d), the distinguishability curves of the QEP algorithm lie above the curves of the

. . . . . . . . . . . . . . . . . . . . . - -

-F - - - - -

Figure 7 1: A software tumor phantom is shown. Contrast of the tumors in the phantom is 6. The regions surrounded by the dotted and dashed lines define the tumor intermediate region (i.e., MI) and background region, respectively. PML algorithm for a fixed background noise. Thus, for a fixed level of background noise, the QEP images, on average, have greater distinguishability.

7.1.2 Real Data

In this subsection, we present experimental results using real phantom data for plane 21. Unless noted, the data are from 14 minute scans. The image in Figure 7 13, which was produced by averaging fifteen converged MLEM images, depicts the regions that were used in the contrast and distinguishability calculations. In the large and small tumor ROIs and intermediate region, the number of voxels were 24, 12, and 16, respectively.

In the PML and QEP algorithms, we used the log-cosh penalty function A(t) = log(cosh(l)) with 6 = 50. Since noise in real data is stronger than the synthetic data due to errors (e.g., scatter), we increased the value of the parameter 6 to reduce background noise (observe that we used 3 = 20 for the synthetic phantom in Figure 7 1). For the QEP algorithm, r(t) = tanh(') was used with = 500 and 1000 for 7 minute and 14 minute real data, respectively. We varied because the data sets lead to reconstructed images that have different edge "heights". As in the simulations

(a) MLEM (b) MLEM-S

0410

(c) MLEM-F (d) PML

so

(e) QEP (f) PWLS

Figure 7 2: Comparison of emission images when the synthetic phantom in Figure 7 1 was used: (a) MLEM image, (b) MLEM-S image, (c) MLEM-F image, (d) PML image, (e) QEP image, and (f) PWLS image. The images in (a) and (b) are from 500 and 20 iterations, respectively, while the images in (d), (e), and (f) are from 200 iterations. The image in (c) was obtained from filtering the MLEM image once with a 5 x 5 Gaussian filter with standard deviation of 1.27 in voxels. For the PML, QEP, and PWLS images, 3l was chosen in such a way that the standard deviation of the background is approximately 12. Specifically, j3 = 0.0415, 0.021, and 0.006 for the PML, QEP, and PWLS images, respectively. The standard deviation of the background of images in (b) and (c) is also approximately 12. For display purposes, all the images were adjusted so that they have the same dynamic range.

(b) MLEM-S

(c) MLEM-F

(d) PML

0.

as

(e) QEP

(f) PWLS

5.0

Figure 7 3: The images in Figure 7 2 are shown with their own dynamic range.

(a) MLEM

QO

90

(a)
600
..True MLEM 500 MLEM-F

400

300

200100

20 40 60 80 100 120

(b)
600
...True
-PML 500" QEP i - iPWLS 400 i

300

200

100

20 40 60 80 100 120

Figure 7 4: A line plot comparison of the reconstructed images in Figures 7 2 (a),
(b), and (c) is shown in (a). A line plot comparison of the reconstructed images in Figures 7 2 (d), (e), and (f) is shown in (b).

(a) Large Tumor
6

5

4

3 r-e- MLEM-F o LEPML
QEP 2-+- PWLS

0 20 40 61 Background Std. Dev.

(c) Large Tumor

;0

10

-*i- PWLS

0 20 40 61 Std. Dev. of Background

(b) Small Tumor

-e- MLEM-S MLEM-F
--- PML
QEP
- PWLS

20 40 6 Background Std. Dev.

(d) Tumors

0 20 40
Background Std. Dev.

Figure 7 5: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively, when the synthetic phantom in Figure 7 1 was used. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifty synthetic data realizations were used in the study. For the MLEM-S curves, the images from iterations 5 - 160 were used. For the MLEM-F curves, the MLEM images were filtered once by 5 x 5 Gaussian filters with a standard deviation range of 0.44 - 3.0 voxels (each voxel is 3.43 x 3.43 mm2). For the PML, QEP, and PWLS algorithms, the images were reconstructed using two hundred iterations for j3 2-1-2-9, 2-1 -2-9, and 2 --2-12, respectively.

(a) PML,Contrast=3

we

(c) PML,Contrast=1.5

(d) QEP,Contrast=1.5

Figure 7 6: Comparison of emission images: (a) PML image and (b) QEP image when tumor contrast of the phantom in Figure 7 1 was 3, and (c) PML image and
(d) QEP image when tumor contrast of the phantom in Figure 7 1 was 1.5. All images are from 200 iterations and they have the same background standard deviation that is approximately 10. For the PML and QEP images, /3 = 1/16 and 1/32 were used, respectively. For the QEP images in (b) and (d), = 150 and = 80 were used, respectively. For display purposes, each set of images (i.e., (a) and (b), (c) and (d)) were adjusted so that they have the same dynamic range.

(b) QEP,Contrast=3

00

Full Text

PAGE 1

IMAGE RECONSTRUCTION ALGORITHMS FOR ACHIEVING HIGH-RESOLUTION POSITRON EMISSION TOMOGRAPHY IMAGES By JI-HO CHANG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2004

PAGE 2

PAGE 3

I dedicate this work to my family.

PAGE 4

ACKNOWLEDGMENTS First of all. I would like to thank my academic advisor, Dr. John M.M. Anderson, for his great guidance of my research. I also would like to thank my committee members Dr. Jian Li, Dr. Fred J. Taylor, and Dr. Bernard A. Mair for their precious comments and corrections on my dissertation. Especially, I would like to thank Dr. Bernard A. Mair for his comments and corrections on convergence proof of the proposed algorithms. I thank Yoonchul Kim and Kerkil Choi who have shared with me valuable discussions related to my research. Finally, I am grateful to my family, who have always prayed for my health. IV

PAGE 5

TABLE OF CONTENTS page ACKNOWLEDGMENTS iv ABSTRACT vii CHAPTER 1 BACKGROUND 1 1.1 Overview of Positron Emission Tomography (PET) 2 1.2 PET Scanner 3 1.3 Sources of Error 6 1.4 System Model for Emission Data 10 2 LITERATURE REVIEW 16 2.1 Penalized Maximum Likelihood Algorithms 17 2.2 Scatter Correction Methods 19 2.3 Summary of the Proposed Algorithms 20 3 PENALIZED MAXIMUM LIKELIHOOD ALGORITHM 23 3.1 Penalized Maximum Likelihood (PML) Algorithm 24 3.2 Convergence Proof 33 3.3 Properties of the PML Algorithm 37 4 ACCELERATED PENALIZED MAXIMUM LIKELIHOOD ALGORITHM 39 4.1 Convergence Proof 40 4.2 Accelerated Penalized Maximum Likelihood (APML) Algorithm . 44 4.3 Direction Vectors 52 4.4 Properties of the APML Algorithm 55 5 QUADRATIC EDGE PRESERVING ALGORITHM 57 6 JOINT EMISSION MEAN AND PROBABILITY MATRIX ESTIMATION 63 6.1 Scatter Matrix Model 65 6.2 Joint Minimum Kullback-Leibler distance Method 68 6.3 Probability Correction in Projection Space (PCiPS) Algorithm . . 69 v

PAGE 6

7 SIMULATIONS AND EXPERIMENTAL STUDY 79 7.1 Regularized Image Reconstruction Algorithms 80 7.1.1 Synthetic Data 82 7.1.2 Real Data 87 7.2 APML algorithm 112 7.3 PCiPS Algorithm 116 8 CONCLUSIONS AND FUTURE WORK 138 8.1 Conclusions 138 8.2 Future Work 142 APPENDIX A HUBERÂ’S SURROGATE FUNCTIONS 144 B SURROGATE FUNCTIONS FOR PENALTY FUNCTION 145 C STRICT CONVEXITY OF PML OBJECTIVE FUNCTION 147 D SOLUTION TO UNCONSTRAINED OPTIMIZATION PROBLEM IN MODIFIED APML LINE SEARCH 149 E CONVEXITY OF SURROGATE FUNCTIONS FOR OBJECTIVE FUNCTIONS IN APML LINE SEARCH 151 REFERENCES 154 BIOGRAPHICAL SKETCH 160 vi

PAGE 7

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy IMAGE RECONSTRUCTION ALGORITHMS FOR ACHIEVING HIGH-RESOLUTION POSITRON EMISSION TOMOGRAPHY IMAGES By Ji-Ho Chang August 2004 Chair: John M. M. Anderson Major Department: Electrical and Computer Engineering In this dissertation, we present four algorithms for reconstructing high-resolution images in PET. The first algorithm, referred to as the penalized maximum likelihood (PML) algorithm, iteratively minimizes a PML objective function. At each iteration, the PML algorithm generates a function, called a surrogate function, that satisfies certain conditions. The next iterate is defined to be the nonnegative minimizer of the surrogate function. The PML algorithm utilizes standard de-coupled surrogate functions for the maximum likelihood objective function of the data and de-coupled surrogate functions for a certain class of penalty functions. As desired, the PML algorithm guarantees nonnegative estimates and monotonically decreases the PML objective function with increasing iterations. For the case where the PML objective function is strictly convex, which is true for the class of penalty functions under consideration, the PML algorithm has been shown to converge to the minimizer of the PML objective function. The drawback of the PML algorithm is that it converges slowly. Thus, a Â“fastÂ” version of the PML algorithm, referred to as the accelerated PML (APML) algorithm, vii

PAGE 8

was developed where an additional search step, called a pattern search step, is performed alter each standard PML iteration. In the pattern search step, an accelerated iterate, which has lower cost than the standard PML iterate, is found by solving a certain constrained optimization problem that arises at each pattern search step. The APML algorithm retains the nice properties of the PML algorithm. The third algorithm, referred to as the quadratic edge preserving (QEP) algorithm, aims to preserve edges in the reconstructed images so that fine details, such as small tumors, are more resolvable. The QEP algorithm is based on an iteration dependent, de-coupled penalty function that introduces smoothing while preserving edges. The penalty function was developed by modifying the surrogate functions of the penalty function for the PML method. In PET, there are several errors that have the net effect of introducing blur into the reconstructed images. We propose a method that aims to reduce blur in PET images. The method is based on the assumption that the Â“trueÂ” probability matrix for the observed emission data is a product of an unknown nonnegative matrix, called a scatter matrix, and a Â“conventionalÂ” probability matrix. Under the suggested framework, the problem is to jointly estimate the scatter matrix and emission means. We propose an alternating minimization algorithm to estimate them by minimizing a certain distance. The algorithms are qualitatively and quantitatively assessed using synthetic phantom data and real phantom data. viii

PAGE 9

CHAPTER 1 BACKGROUND Medical imaging modalities, such as X-ray computed tomography and magnetic resonance imaging, are used to obtain images of anatomical structures within the human body. However, in certain medical applications, it is also important to get physiological information. The reason is because physiological changes can indicate disease states earlier than anatomical changes [1], Positron emission tomography (PET) and single-photon emission computed tomography (SPECT) are widely used medical imaging modalities that acquire physiological information on both human and animal subjects. In SPECT, physiological information is obtained by imaging the distribution of gamma-ray or X-ray emitting radio-isotopes within the human body [1], After the radio-isotopes are introduced into the human body, the radio-isotopes decay and emit gamma-ray or X-ray single photons. A SPECT scanner detects these photons with help of collimators that are made of lead. Since a large percentage photons are absorbed by the lead collimators, the sensitivity and accuracy of SPECT is limited. In PET, there is no need for lead collimators because collimation is performed by electronic circuits that are connected to the detectors. Consequently, PET possesses relatively high sensitivity and accuracy, when compared to SPECT. Although cost is the major limitation of PET, recent research advances, such as a less expensive materials for detectors and scanner configurations that need a smaller number of detectors than conventional scanners, have helped decrease its cost [1]. The main advantage of SPECT over PET is its substantially lower cost. 1

PAGE 10

2 We present a brief overview of PET. PET scanner, and certain errors in PET in Section 1.1, 1.2, and 1.3, respectively. Then, we provide some background on the image reconstruction problem in PET. 1.1 Overview of Positron Emission Tomography (PET) Most clinical applications of PET are in oncological cases involving the diagnosis and staging of cancer, treatment planning and monitoring of tumors, detection of recurrent tumors, and localization of biopsy sites in cases when there are tumors in the head or neck [2,3]. PET is also used in the diagnosis of coronary artery disease and other heart diseases [2,3]. In PET, physiological information is acquired by imaging the distribution of positron-emitting isotopes, such as 13 N, 15 0, 18 F, and n C, within the human body [3]. The positron-emitting isotopes are bound to compounds with known biochemical properties. Compounds that are labeled with positron-emitting isotopes are called radiopharmaceuticals. The choice of the radiopharmaceutical depends on its application. For example, 2-[ 18 F]fluoro-2-deoxy-d-glucose ([ 18 F]FDG) is used for imaging brain tumors while [ 13 N] ammonia is used for the detection of coronary artery disease [3]. After the radiopharmaceutical is introduced into the subject (injection or inhalation), positrons are emitted as the positron-emitting isotopes decay. An emitted positron annihilates with a nearby electron within the body causing the generation of two high energy (511keV) photons. The two photons, which can penetrate the subject, travel in almost opposite directions. Ideally, photon pairs that are generated by a positron annihilating with an electron will be detected by a pair of detectors. If two electronically connected detectors detect a pair of photons within a short time interval (e.g., < 10 ns), then the detection is recorded along the line connecting the two detectors, which is called a line-of-response. In the absence of an error, the detection indicates that there is an annihilation somewhere along the line-ofresponse. The detection of a photon pair is referred to as a coincidence. In addition,

PAGE 11

3 the timing interval of a scanner that is used to define a coincidence is called the scannerÂ’s coincidence timing window. Since there are many, many detector pairs in a PET scanner, sufficient information is available for reconstructing a map of the concentration of the radiopharmaceutical. For each detector pair, the number of coincidences that occur during the scan are summed. The emission data is the coincidence sums for all of the possible detector pairs. The key idea in PET is that emission data depends on the distribution of the radiopharmaceutical within the subject being scanned, which in turn depends on the metabolism of the subject. Consequently, numerous researchers have developed algorithms that reconstruct PET images whose pixel values represent the distribution of the radiopharmaceutical, and, ultimately, the metabolism of the subject. 1.2 PET Scanner Typical PET scanners have a diameter of 80 100 cm and an axial extent of 10 Â— 20 cm [4]. Figure 1 1 (a) shows a simplified PET scanner and Figures 1 1 (b) and (c) show two-dimensional views of the scanner. Usually, a PET scanner consists of hundreds of rectangular bundles of crystals that are formed to make between 20 30 rings of detectors, where each detector ring contains 300-600 detectors. Each bundle of crystals is connected to a few (e.g., 2 Â— 8) photo-multiplier tubes (PMTs). Figure 1 1 (d) shows such a block of crystals coupled to four PMTs. When a photon interacts with a crystal, light photons are emitted and the PMTs collect the photons. From the collected light, a PET scanner determines the crystal within which the scintillation occurs. State-of-the-art PET scanners generally provide two scan modes: slice-collimated mode and fully three-dimensional mode [4], In slice-collimated mode or septa-extended mode, thin tungsten rings, called septa, are placed between the detector rings. Figure 1 2 (a) illustrates slice-collimated mode in which coincidences are collected within a detector ring. In fully three-dimensional mode, the septa is removed from the

PAGE 12

4 (a) (b) Figure 1 1: Simplified PET scanner: (a) a simplified full-ring PET scanner with 8 detector rings, (b) two-dimensional view of the scanner at x = 0, (c) two-dimensional view of the scanner at y = 0, and (d) a block detector consisting of an array of 8 x 8 crystals coupled to four PMTs, where the origin of the rectangular coordinate system is at the center of the scanner.

PAGE 13

(a) (b) Figure 1 2: Scan modes: (a) septa-extended mode and (b) fully three-dimensional mode. The dotted lines represent line-ofresponses. (a) (b) Figure 1-3: Projection definition using zig-zag scan: (a) a set of detector pairs that define a projection and (b) another projection. A dashed line represents a detector pair.

PAGE 14

6 Figure 1 4: Illustration of positron range and a coincidence. Two concentric circles denote a detector ring. scanner. Figure 1 2 (b) illustrates fully three-dimensional mode in which allowable line-of-responses are not restricted to occur within a detector ring. Within a detector ring, detector pairs are grouped into projections, where a projection is a set of detector pairs that are defined by a particular Â“zig-zag scanÂ”. Figure 1 3 (a) shows a projection and the defining zig-zag scan. Figure 1 3 (b) shows another projection. A sinogram is defined to be a S x T matrix, where each row of the matrix contains a projection, S is the number of detector pairs within the projection, and T is the number of projections in the emission data. 1.3 Sources of Error After a positron is emitted, it travels a short distance before it annihilates with a nearby electron within a subject. The distance between the locations at which the emission and the annihilation take place is called positron range. The positron range is proportional to the reciprocal of the density of the material an emitted positron travels [5]. Figure 1 4 is an illustration of positron range for a simplified scanner geometry. The positron range depends on the energy an emitted positron deposits (and,

PAGE 15

7 Figure 1 5: Illustration of non-collinearity of line-of-response. The dashed line indicates the path of the photon pair if they had departed in exactly opposite directions. The arrows show the actual photon paths. consequently, the chosen positron-emitting isotope). For typical positron-emitting isotopes, a full width half maximum (FWHM) of the distribution of positron range is a few millimeters. For example, the FWHMs of the distribution of the positron range for 18 F and 15 0 are about 2 mm and 8 mm, respectively [5, p. 331]. Two photons generated by an annihilation usually do not propagate in exactly opposite directions. This phenomenon is called non-collinearity of line-of-response. Figure 1 5 is an illustration of non-collinearity of line-of-response for a simplified scanner geometry. For a fixed direction that one of the two photons propagates, we refer to the angle between the opposite direction of the fixed direction and the actual direction that the other photon travels in as the Â“angle of non-collinearityÂ”. The distribution of the angle of non-collinearity can be approximated by a Gaussian distribution with a FWHM of 0.5 degree [5]. The angle between the path that a photon propagates and the face of a detector when the photon hits the detector is referred to as the incident angle. The way that a photon interacts with a detector depends on the incident angle. If the path that a

PAGE 16

8 (a) (b) Figure 1 6: Single and scatter: (a) Illustration of attenuation and a single, and (b) Illustration of a scattered coincidence. The arrow penetrating the detector ring denotes that the photon is scattered through an oblique angle such that it does not hit a detector. The dotted line denotes the incorrectly positioned line-of-response. photon travels is not perpendicular to the face of a detector when the photon hits the detector, then it is possible that the photon does not interact with the detector that it strikes. The photon may interact with another detector that is nearby the detector the photon hits originally. This phenomenon is termed by detector penetration. Some methods are proposed to account for detector penetration [6,7]. When a 511keV photon propagates within a subject and interacts with an electron, the photon may undergo a phenomenon known as Compton scattering. When a photon hits an electron, the photon gives part of its energy to the electron and deflects from its original path if the photon has sufficient energy. This phenomenon is called Compton scattering. Compton scattering can lead to three kinds of error: attenuation, scatter, and accidental coincidence. Most scattered photons are scattered out of the scannerÂ’s field-of-view so that many of them are not detected. This phenomenon is called attenuation. Figure 1 6 (a) illustrates attenuation, where one of the two photons of an annihilation does not hit a detector due to Compton scattering. Figure 1 6 (a) also illustrates an event called a single, where only one of the emitted photons of an annihilation is

PAGE 17

9 (a) (b) Figure 1 7: Accidental coincidence: (a) Illustration of an accidental coincidence due to two annihilations occurring at almost the same time and (b) Illustration of an accidental coincidence due to Compton scattering. detected. Note that attenuation leads to an incorrect decrease in emission counts. In order to address attenuation, numerous correction methods have been proposed (see e.g., [8 10]). Consider an annihilation event where Compton scattering occurs. It is still possible that a detector pair detects both photons even though one or both of the photons may have undergone Compton scattering. Such an event is called a scattered coincidence or scatter, which is illustrated in Figure 1 6 (b). Note that a scattered coincidence leads an incorrect increase in emission counts. Since a scattered photon loses part of its energy, the energy of detected photons may be used to discriminate between unscattered photons and scattered photons [11, pp. 65-69]. Two photons arising from different annihilations can be recorded by a detector pair. This event is called an accidental coincidence. Figure 1 7 (a) illustrates an accidental coincidence due to two annihilations occurring at almost the same time. Sometimes, an accidental coincidence may be due to Compton scattering as illustrated in Figure 1 7 (b). Like scatter, an accidental coincidence leads an incorrect increase

PAGE 18

10 ill emission counts. For many PET scanners, the mean accidental coincidence rate is estimated using a Â“delayedÂ” timing window technique [12]. The efficiency of a detector pair is defined to he the probability that a coincidence is recorded when a photon pair hit the detectors. Ideally, this probability should be one. However, the efficiencies of detector pairs are non-uniform because of their geometric differences and the non-uniform physical characteristics of detectors. The non-uniformity of detector pairs is referred to as detector inefficiency. To address detector inefficiency, correction methods such as [13 15] have been proposed. 1.4 System Model for Emission Data In 1982, Shepp and Vardi proposed a Poisson model for PET emission data and an algorithm, known as the maximum likelihood expectation maximization (MLEM) algorithm, for reconstructing maximum likelihood (ML) emission images [16]. In the Poisson model, the region-of-interest is divided into J equal sized volume elements, called voxels. Ultimately, the goal is to estimate the mean number of positrons emitted from each voxel. Let the i th component of a vector d. di, represents the observed number of photon pairs recorded by the i th detector pair and the j th component of a vector x, Xj, represents the unknown mean number of emissions from the j th voxel. Further, let I and J denote the number of detector pairs and number of voxels, respectively. The / x 1 vector d and the J x 1 vector x are the emission data and the unknown emission mean vector, respectively. Let Vij denotes the probability that an annihilation in the j th voxel leads to a photon pair being recorded by the i th detector pair. The I x J probability matrix V has Vij as its ( i,j) th element. In the Poisson model, Shepp and Vardi assumed that the emission data d is an observation of a random vector D with elements {Z?j}( =1 that are Poisson distributed and independent. For all i , the mean of the random variable D j is j E{Di} = ^ VjjXj. j = i ( 1 . 1 )

PAGE 19

11 In practice, the probability is unknown and must be estimated somehow. The simplest way to estimate the probability matrix V is the angle-of-view (AOV) method [16]. hi the AOV method, a detector ring and a detector are modelled by a circle and an arc on the circle, respectively. Moreover, within a voxel, all emitted positrons are assumed to be emitted from the center of the voxel. Figure 1 8 illustrates a detector ring, a detector pair, and the tube (i.e., spatial extent) that is defined by the detector pair. In the figure, the AOV from the point gj to the detector pair (y, z ) is also shown. Specifically, this AOV is defined to be AOV from gj to ( y , z) = I min {Zdygjby, Z a z g : b z , (tt /La y gjb z ) : (n Zb y gja z )}, jo, o, G tube (y, z) ] ( 1 . 2 ) otherwise Said another way, the AOV in (1.2) is the maximum angle for which a line that goes through the point gj will simultaneously intersect both detectors of the detector pair (y, z). In the AOV method, the probability V%j is defined to be AOV from gj to (y, z) 7T (1.3) where the detector pair (y, z ) is the i th detector pair and the point gj is the center point of the j th voxel. In the AOV method, it is assumed that a photon is detected by a detector whenever the photon hits the arc corresponding to the detector. Clearly, the AOV method does not account for detector penetration discussed in Section 1.3. Some methods have been developed to address errors due to detector penetration [6,7]. In this dissertation, we make the following mild assumptions on the probability matrix V and the emission data d. Â• (AS1) V has no row vector of zeros Â• (AS2) dVj ^ 0 for all j, where d is the transpose of d and Vj is the j th column of V.

PAGE 20

AOV from g, to (y, z) Figure 1 8: The angle-of-view from the point gj to the detector pair (y, z) is shown. The circle denotes a detector ring. The arcs ( a y ,b y ) and (a z , b z ) represent the detectors y and z, respectively. The area between the vertical lines (a y , b z ) and ( b y , a z ) represents the tube defined by the detector pair {y,z).

PAGE 21

13 To see the implication of the second assumption, consider all the detector pairs where the probability of recording a photon pair generated by an annihilation in the j th voxel is non-zero. Among the set of detector pairs, the second assumption implies that there exists at least one detector pair with non-zero emission counts. The ( AS2) is expected to hold whenever the duration of the emission scan is a reasonable length of time. In addition to (ASl) and (AS2), we assume that the probability matrix V accounts for errors due to attenuation, detector inefficiency, detector penetration, positron range, non-collinearity of line-of-response, and scatter. In practice, there are correction methods for attenuation [9], detector inefficiency [15], and detector penetration [7]. However, other correction methods for detector penetration, attenuation, and detector inefficiency could be used in conjunction. Note that in Chapter 6, we do not assume that the probability matrix accounts for errors due to scatter and non-collinearity of line-of-response. Instead, we present a method that estimates the probability matrix in such a way that errors due to scatter non-collinearity of line-of-response are addressed. When accidental coincidences (i.e., randoms) are considered, the Poisson model must be modified so that now the emission data d is an observation of a random vector D that is Poisson distributed with mean (Vx + p), where the i th component of p, pi , is the mean number of accidental coincidences recorded by the i th detector pair, i = 1,2 Usually, it is assumed that the mean accidental coincidence rate p is known. In practice, the mean accidental coincidence rate is estimated using a Â“delayedÂ” time window technique [12]. Given the emission data d , mean accidental coincidence rate p, and probability matrix V, the problem of interest is to estimate the mean number of positrons emitted from each voxel. Since it is assumed that the data are independent, it follows that

PAGE 22

14 the likelihood function for emission data d is given by Pr{D = d\x) = n ^Â±Â£lt e -\-Px +PI , (1 . 4 ) 7=1 dl The ML estimate of x is defined to be the maximizer of the likelihood function over the feasible set. Alternatively, the ML estimate of emission mean vector is given by x ML = argmax T(a;) , (1.5) x>0 where T(x) = log Pr{D = d\x} is the log likelihood function: i ii ^i x ) = ^2 di log {[Px + p]i) YyP x \i + _ lÂ°g(^J)} (1-6) i = 1 i=l i = 1 (note: maximizing the likelihood function or log likelihood function are equivalent operations). Although the ML estimator has several nice theoretical properties [17, ch. 7], images produced by the ML method (i.e. , the MLEM algorithm) have the drawback that they are extremely noisy. This is due to the PET image reconstruction problem is ill-posed because of the facts that (1) scan times for data acquisition is short, (2) emission data contain errors due to attenuation, scatters, and accidental coincidences, and (3) the data obey Poisson statistics. Currently, the most popular way to address the ill-posed nature of the image reconstruction problem is through the use of penalty functions. Numerous penalized maximum likelihood (PML) methods (also known as Bayesian and maximum a posteriori methods (MAP)) have been proposed [18 35]. In the MAP method, x is assumed to be an observation of a random variable X with known distribution and the a posteriori distribution (i.e., conditional probability density function of X given D = d) is maximized. After some manipulations [17, p. 351], the MAP estimate is found to be the maximizer of the log likelihood function

PAGE 23

15 plus the log of the probability density function of X : x MAP = argmax T(a?) + logfl(x) , (1.7) x>0 where the function Q is the probability density function of X (i.e., prior distribution). It is through the prior distribution f! that MAP methods have the ability to regularize the image reconstruction problem. The form of the prior distributions commonly used in PET is Q(x) = where the function A is a scalar valued function and C and f3 are constants. The constant C > 0 is chosen such that the area under the distribution fl equals one. The constant f3 > 0, known as the penalty parameter, controls the penalty functionÂ’s degree of influence. Since it is known that PET images should be highly correlated, the penalty function A is designed in such a way that it forces the estimates of neighboring voxels to be similar in value. Given the definition of G, the PML or MAP estimate is the nonnegative minimizer of the PML objective function \$(*) = Â— 'L(cc) + /3A(cc) . ( 1 . 8 )

PAGE 24

CHAPTER 2 LITERATURE REVIEW Although the MLEM algorithm by Shepp and Vardi [16] produces ML estimates of the emission means, the images produced by the ML method are extremely noisy due to the fact that the image reconstruction problem in PET is ill-posed. As discussed in Section 1.4, the reasons are that (1) scan times for data acquisition is short, (2) emission data contain errors due to attenuation, scatters, and accidental coincidences, and (3) the data obey Poisson statistics. One way to obtain PET images with sufficient smoothness is to terminate the MLEM algorithm before the log likelihood function is maximized. Of course, the resulting images are not ML images. Another modification of the MLEM algorithm is to first obtain an ML image and then filter it with a low pass filter. The drawback of this post-filtering is that it is not clear how the filter is chosen. A variation of filtering approach just described is to filter every MLEM iterate, which was suggested by Silverman [36]. Silverman did not provide an answer as to how the filter should be chosen. Denoising emission data (i.e. , observed data) is another way to regularize the PET image reconstruction problem [37,38]. Currently, the most popular way to introduce regularization is through the use of penalty functions that force the estimates of neighboring voxels to be similar in value. The basis for such penalty functions is that PET images should be highly correlated. The first part of the chapter is devoted to so-called penalized maximum likelihood (PML) algorithms. PML algorithms are algorithms that minimize PML objective functions, which are a sum of the negative log likelihood function and a penalty function. In PET, PML and maximum a posteriori (MAP) are terms that are used for methods that minimize PML objective functions. In Section 2.1, we briefly review existing PML algorithms. 16

PAGE 25

17 Reconstructed PML images are blurred by errors due to detector penetration, positron range, non-collinearity, and scatter. Errors due to scatter are difficult to correct because scatter depends on the activity and attenuation within the subject and the scanner design. In Section 2.2, some scatter correction methods are reviewed. The regularized image reconstruction algorithms and scatter correction algorithm we propose are compared with the existing algorithms in Section 2.3. 2.1 Penalized Maximum Likelihood Algorithms In 1990, Green [23] proposed a PML algorithm, known as the one-step-late (OSL) algorithm. The algorithm can be viewed as a fixed point iteration that is derived from the KuhnTucker equations [39, pp. 36-49] for the PML optimization problem. Incidentally, Shepp and Vardi showed that the MLEM algorithm could be derived from the KuhnTucker conditions in a similar way. The OSL algorithm is straightforward to implement, but nonnegative estimates cannot be guaranteed and, like many existing algorithms, convergence is an open issue. LangeÂ’s goal [24] was to modify the OSL algorithm in such a way that the modified algorithm converges to the PML estimate. It should be pointed out that LangeÂ’s algorithm requires line searches, which can be computationally expensive. Alenius et al. [32] suggested a Gaussian Â“typeÂ” prior that depends on the median of voxels within local neighborhoods, and introduced an algorithm called the medianroot-prior (MRP) algorithm. The MRP algorithm is based on an iteration dependent objective function. Consequently, it really cannot be considered a PML algorithm. Nevertheless, the MRP algorithm generates Â“goodÂ” images in the sense that noise level of the reconstructed images is suppressed. It should be mentioned that a PML algorithm was derived by Hsiao et al. [40] that resembles the MRP algorithm and performs similar to the MRP algorithm. The PML algorithm by Hsiao et al. was derived using a prior that is based on a certain auxiliary vector.

PAGE 26

18 Levitan and Herman [21] proposed a PML algorithm based on an assumption that the prior distribution of the true emission means was a multivariate Gaussian distribution. The assumption led to a penalty function that was in the form of a weighted least-squares distance between x and a reference image. However, they did not indicate how the reference image to be chosen. An algorithm was proposed by Wu [27] using a wavelet decomposition formulation. Specifically, the author assumed that a vector consisting of the wavelet coefficients of the true emission means is a zero-mean Gaussian random vector with a known covariance matrix. From this assumption, a prior distribution for the emission means was derived. The prior distribution is a zero-mean Gaussian random vector with a covariance matrix that depends on the choice for the wavelet transform and the assumed distribution for the vector of wavelet coefficients. It should be pointed out that the assumption was not clearly justified in the paper. Researchers have used an optimization algorithm, called the iterative coordinate descent (ICD) algorithm [41, pp. 283-287], to obtain estimates for various penalty functions [31], [42]. Convergence results are given for the penalized weighted leastsquares method [42] and both algorithms (i.e. , [31], [42]) enforce the nonnegativity constraint. Algorithms based on ICD algorithm update each voxel in a serial manner so that parallel implementation for them may not be possible. De Pierro [25,30] derives PML algorithms that minimize certain surrogate functions that he constructs by exploiting the fact that the log likelihood function is concave and penalty functions, such as the quadratic penalty function, are convex. Except for the quadratic penalty function, closed form expressions for the minimizers of the surrogate functions do not exist. Consequently, an optimization method, such as NewtonÂ’s method [43, pp. 201-202], is needed to minimize the surrogate functions. De Pierro presents some convergence results, however the utility of his methods is unclear because no experimental results were provided. It should be noted that, in

PAGE 27

19 the transmission tomography paper by Erdo/jan and Fessler [8], a quadratic surrogate function was used for a certain class of penalty functions. The quadratic surrogate function was developed by Huber [44, pp. 184-186]. A fast PML method, based on the ordered subset-EM algorithm [45], was proposed by De Pierro and Yamagishi [33]. The authors show that if the sequence generated by the algorithm converges, then it must converge to the true PML solution. Recently, Ahn and Fessler proposed algorithms [35] that are based on the ordered subset-EM algorithm [45], an algorithm by De Pierro and Yamagishi [33], and an algorithm by Erdogan and Fessler [10]. Like other algorithms based on the ordered subset-EM algorithm, there is some uncertainty as to how the subsets are to be chosen. In the paper by Ahn and Fessler, the algorithms are said to converge to the nonnegative minimizer of the PML objective function for certain penalty functions and their accompanying parameters by using a relaxation parameter that diminishes with iterations. Open issues are how the relaxation parameters should be chosen in practice and how they affect the performance of the algorithms. Convergence rate varies with the relaxation parameter. 2.2 Scatter Correction Methods Many methods have been proposed to correct scattered coincidences [11, ch. 3]. They can be classified into a few categories: (1) energy window method, (2) convolution/deconvolution method, and (3) calculating scatter distribution method. One of the scatter correction methods is based on the use of multiple (two or more) energy windows [46,47]. Recall that photons lose their energy when they have undergone Compton scattering. The principle of the method utilizing energy windows is to discard detection of a photon whenever the energy of the photon is less than 511Kev. Since detectors have finite energy resolution, there is a limitation of energy window based methods. Consequently, it is preferred to use another method jointly with the energy window techniques.

PAGE 28

20 Another correction method for scatter is the convolution/deconvolution method [48 51]. The methods in [48,49] assume that scattered coincidences (i.e. , scatter) can be approximated by a convolution of unscattered coincidences and a certain scatter function. Under the assumption, the mean scatter in the observed coincidences is estimated that can be subtracted from emission data or incorporated in the system model for emission data. The method by McKee et al. assumes that the distribution of scattered annihilations can be approximated by the convolution of the distribution of unscattered annihilations and some scatter response function [50]. The issue of the convolution/deconvolution method is that the distribution of unscattered coincidences (or annihilations) and scatter response function (or scatter function) are not known. Ollinger introduced a scatter correction method that calculates scatter distribution using an analytical equation, transmission images, emission images, and scanner geometry [52], Computational cost of the method is excessively expensive, thus it might be hard to be accepted in clinical use at the moment of writing. 2.3 Summary of the Proposed Algorithms Â• In Chapter 3, we present an algorithm that obtains PML estimates for a certain class of edge-preserving penalty functions. The PML algorithm is derived by combining the convexity idea by De Pierro [25, 30] and HuberÂ’s surrogate functions [44], Combining two existing theories in such a way that the PML algorithm is convergent is new to PET community. In theory, the algorithm guarantees nonnegative iterates, monotonically decreases the PML objective function with increasing iterations, and converges to the solution of the PML problem. In practice, it is straightforward to implement (i.e., no additional hyperparameter and no need of line search) and it can incorporate many edgepreserving penalty functions.

PAGE 29

21 Â• In Chapter 4, we develop an accelerated version of the PML algorithm by rising the pattern search idea of Hooke and Jeeve [41, pp. 287-291]. Using this approach, we solve a constrained problem at each pattern search step that leads to improved convergence rates. A modification of Hooke and JeeveÂ’s direction vector is also introduced that improves performance. It should be mentioned that Hooke and JeeveÂ’s method has not been used in PET image reconstruction. The proposed algorithm inherits the nice properties of the PML algorithm and converges to the minimizer of the PML objective function. In experiments, the accelerated algorithm needed less than about one third of the CPU-time that was necessary for the PML algorithm to converge. Â• In Chapter 5, we propose a regularized image reconstruction algorithm, referred to as the quadratic edge preserving (QEP) algorithm, that aims to preserve edges through the use of certain newly developed de-coupled penalty functions that depend on the current estimate. The QEP algorithm was motivated by the analysis of the PML algorithm. The algorithm by Alenius et al. [32] also uses an iteration dependent objective function. However, it should be mentioned that the algorithm uses the OSL algorithm to generate the next iterate. The drawback of Alenius approach is that the OSL algorithm does not guarantee convergence. Â• In Chapter 6, we propose a model for emission data where an unknown matrix, called a scatter matrix, is introduced. The model aims to account for errors due to scatter and non-collinearity. Based on the model, a certain minimization problem is constructed that allows for the scatter matrix and emission mean vector to be jointly estimated. Since the minimization problem is impossible to solve, we propose an algorithm that greatly reduces the number of unknowns in the scatter matrix and alternately estimates the scatter matrix and emission mean vector. It should be mentioned that Mumcuoglu et al. [53] used the same

PAGE 30

22 model. However, they assumed that the scatter matrix is known and accounts lor errors due to detector penetration as well. Their scatter matrix was obtained through Monte-Carlo simulations.

PAGE 31

CHAPTER 3 PENALIZED MAXIMUM LIKELIHOOD ALGORITHM Although the ML estimates of the emission means are available by using the MLEM algorithm [16], as discussed in Section 1.4, the resulting PET images are extremely noisy due to the fact that the PET image reconstruction problem is illposed. This is because of short scan times, errors in the emission data, and the fact that the data obey Poisson statistics. The most popular way to address the illposed nature of the PET image reconstruction problem is through the use of penalty functions. Penalty functions used in PET are designed in such a way that estimates lor the emission means of neighboring voxels are forced to be similar in value, unless there is an Â“edgeÂ” within neighborhood. By an edge, we mean that there is a group of connected voxels that have significantly greater activity than the other voxels in neighborhood. For example, suppose there is only one voxel with significantly greater activity than the other voxels in its neighborhood. Then, we would say that there is no edge within the neighborhood. Simply stated, penalty functions provide a means for reconstructing PET images that have considerably less noise than MLEM images, yet retain edges (e.g., tumors) which may convey important information. In Section 3.1, we first derive an algorithm, called the penalized maximum likelihood (PML) algorithm, that incorporates a wide class of edge-preserving penalty functions. Then, we prove that the PML algorithm converges in Section 3.2. Finally, we summarize the properties of the PML algorithm in Section 3.3. It should be mentioned that we presented the PML algorithm in [18] without a proof of convergence. Our proof of convergence can now be found in a recent manuscript [19]. 23

PAGE 32

24 Figure 3 1: One-dimensional illustration of the optimization transfer method. At each iteration, a surrogate function is obtained and a minimizer of the surrogate function is defined as the next iterate. Ideally, it is Â“easyÂ” to get the minimizer of the surrogate function. 3.1 Penalized Maximum Likelihood (PML) Algorithm The problem of interest is to determine the nonnegative minimizer of the PML objective function \$(*) = -(x)+f3A(x) (T is defined in (1.6)), where A is a penalty function that forces emission mean estimates of neighboring voxels to be similar in value. In other words, we want to solve the following optimization problem: (P) 2! PML = arg min \$(a:) . x>0 The penalty functions we consider are of the form j M x ) = 51 Y! u i k 9(?j, x k) , (3.1) 3 = 1 keNj where Nj is a set of voxels in a neighborhood of the j th voxel, the constants {a ij k } are positive weights for which u jk = w kj for all j and k, and g(s, t ) = A(s t) whereby the function A satisfies the following assumptions: Â• (AS3) A(i) is symmetric Â• (AS4) A (t) is everywhere differentiable

PAGE 33

25 Â• (AS5) A (t) Â— j t \ (t) is increasing for all t (assumption implies that A is strictly convex) Â• (AS6) 7 (t) = is nonincreasing for t > 0 Â• (AS7) 7(0) = liin7(f) is finite and nonzero Â• (AS8) A (t) is bounded below (assumption implies that A(x) is bounded below). Examples of functions that satisfy (AS3)-(AS8) are the quadratic function A(f) = t 2 and GreenÂ’s log-cosh function A(f) = log(cosh(f)) [23]. Regarding the neighborhood Nj, the j th voxel is excluded from the set Nj and, if the k th voxel is in Nj, then the j ,h voxel is in N^. A common choice for Nj is the eight nearest neighbors of the j th voxel. Since it is not possible to get a closed-form solution to the minimization problem (P), iterative optimization methods are necessary. The PML algorithm we propose is based on the optimization transfer method [10,25,30,34,54] where, at each iteration, a function that satisfies certain conditions is obtained and the next iterate is defined to be a minimizer of the function. The function found at each iteration is referred to as a surrogate function for the function to be minimized. This idea is illustrated with the one-dimensional example in Figure 3 1. In the figure, the problem is to find the minimizer of the function /, which is t*. It is assumed that a closed-form solution is not available to the minimization problem. Given an initial guess AÂ°\ a surrogate function /(Â°> that depends on < (0) is determined. Then, the next iterate is generated by finding the minimizer of /iÂ°b To get the following iterate t^ 2 \ a surrogate function that depends on is obtained and then minimized. These steps are repeated until some convergence criterion is met. For a vector argument t, a surrogate function / (n) satisfies the following conditions: Â• (Cl) f^ n \t) > f(t) for all t Â€ {domain of /} Â• (C2) /G)(tG)) = /(Â£(Â«))

PAGE 34

26 Â• (C3) V/ (n) (< ( " ) ) = V/(Â£ (n) ), where t 1 "' 1 is the n th iterate, V denotes the gradient of a function, and the superscript (n) indicates that the functions {f^ n) } and the iterates {Â£ (n) } depend on the iteration number. The next iterate Â£ (n+1) is defined to Ire a minimizer of / (n) : t ( " 1 = argrmn f^ n \t) subject to Â£ Â€ {domain of /} . (3.2) Defining the iterates in this way insures that the objective function / decreases monotonically with increasing iterations. To prove this fact, we first note that /(* (n+1) ) < / (n) (Â£ (n+1) ) by (Cl). Since < /(")(Â£(">) by (3.2), it follows by (C2) that for all n, /(Â£ (n+1) ) < /(Â«)(Â£(Â«+!)) < /( n ) (Â£<Â”)) = /(Â£(")) . (3.3) It should be mentioned that (C3) is not necessary for the monotonicity in (3.3). However, (C3) is often needed in order to prove an algorithm that utilizes the optimization transfer method converges (see [18,25,30]). Although the optimization transfer method is straightforward in principle, the difficulty in practice is that it may be difficult to find surrogate functions that satisfy the conditions (Cl), (C2), and (C3). De Pierro developed surrogate functions for the negative log likelihood function by using the convexity of the negative log function [25,30]. His idea is based on the following property of convex functions [55, pp. 860-861]: For a convex function /, f ( a jtj) Â— a j / (tj) (3-4) j j where Yhj a jtj Â€ {domain of /}, tj Â€ {domain of /} for all j, aj > 0 for all j, 'Yjjdj = 1, and {domain of /} is a convex set. Specifically, De Pierro utilized the

PAGE 35

27 following inequality in ML estimation where /(f) = log(f): Â’Pijx'j" pPx<Â”>] /(MO = / Â£ [PsWjj x ( n ) < y VijX ^ } f( ^ x{n) ^ x \ (3.5) (3.6) where x<") is the n th iterate of x. V i: > 0, x 'J n) > 0, and [Px^] t Â± 0 for all i, j, and n. Let /,(*) = f([Px) j) / 7 'w * (3.7) (3.8) Then, it is straightforward to see that, for all i, (1) fl n \x) > fi(x) for all x > 0, (2) /| n) (xW) = ^(xW), and (3) V/ (n) (xW) = V/^W). Thus, for all i, f\ n) is a surrogate function for /;. Although De Pierro developed the surrogate functions for the log likelihood function under the assumption of no accidental coincidences (i.e., p = 0), the surrogate functions can be easily modified to account for accidental coincidences. Observe that [Px + p\i can be written as a convex combination \vx+p }y VijX ^ ] \ Vx(n) + d [P x(n) + ph y Xj + pi [V a?( n ) + p], [PxW + p],. (3.9) Using the convexity of the negative log function and the fact that J v .J") , Pi 1 [p x (n) pj . + p]j Â’ we have the following inequality for /(f) = Â— log(f): V,iX { r ] ( [Px (n) + p\i (3.10) /(P-H.) < Â„(n) Â— ( [PxW -Ip\i [Va :M + p\ i f (j rxin) + p]l ) Â• (3.11)

PAGE 36

28 Given the inequality in (3.11), the surrogate function at the n th iteration for the negative log likelihood function Â—'I' can be expressed as V\j x j (n) -â€¢ n \x) = V { [Px]i V di ... ^ I J y [TUcH + p] t lÂ°g(^j) } + c [ n) , (3.12) i=l where C i n) = {pÂ»+ 1 Â°g(d i !)-daog([iPa : (w) + pj i )+y di { log(af)} . (3.13) L l Vx( n >\i + pi J It is straightforward to show that the surrogate function \lA n ) satisfies (Cl), (C2), and (C3): (1) \k (n )(a:) < 'F(x) for all x > 0, (2) vp( Tl )( x (")) = ^/(aA")^ and (3) W (n )(Â® (n) ) = V'k(a: (n) ). Since surrogate functions for the negative log likelihood function Â— are available, we only need to find surrogate functions for the penalty function A (a?) = Sj 12 k Â— x k ). Under assumptions (AS3)-(AS8), Huber developed a surrogate function for A in [44] (see also [8]). Given an arbitrary point t^ n \ HuberÂ’s surrogate function for A, which is defined by A {n \t) = A y>) + A {t (n) ){t i<">) + f (n) ) 2 , (3.14) has the property that A (n) (t) > A (t) for all t (see Appendix A), A (n) (t (n) ) = A (t (n) ), (n) . and A (v n Â’) = A(UÂ’h), where the dot over a function represents its first derivative. For Â— x^\ it follows that a surrogate function for A is a w (*) = E E , j = 1 keNj (3.15) where ^(s^) = A ( Â”)( s _ ^ Using A^ n ) as a starting point, we will now construct a surrogate function for A that has a more convenient form. By the convexity of the square function, we have 9

PAGE 37

29 the following inequality It should be mentioned that De Pierro [30] and Hsiao et al. [34] utilized this property of the quadratic function in the PET, and Erdo^an and Fessler [10] applied it to a non-quadratic convex function for transmission tomography. Motivated by (3.16), we define 9 {n) (xÂ„x k ) = \{xÂ™-xÂ™) + \(x (n) Â„(Â«K ( n ) X (n) ) ( X J X (n) ) ( x k X M) + -Mxf ] x[ n) ) [(2 xj 2x [ J l) f + (2x k 2x[ n) ) (3.17) By construction, the following statements are clear: (1) g (n \xj,x k ) > g(xj,x k ) for all xj and x k from (3.16), (2) g (n) (x { ^ x[ n) ) = g{xf\ x^), (3) Â£g (n) {x^ , x[ n) ) = Â£~g( x< j \ x k *)> an d (4) -^g (u) {x * n) ,x[ n) ) = ^g{Xj U \x [c*). The difference between 9^ and g ^ is that g is de-coupled in the sense that it does not have terms of the form XjX k . This difference is important because, as we show later, using g ^ enables us to construct surrogate functions for ( E> that have closed form expressions for their minimizers. Using g (n \ an alternative surrogate function for A is j A (n) (x) = u j k g (n \ x jiX k ) (3.18) j = 1 keNj It is clear that, by construction, A^ satisfies the following properties: (1) A^ n ^(cc) > A(Â®) for all x, (2) A (n >(Â®( n )) = A(a: (n )), and (3) VA (n )(Â®( n )) = VA(cc ( ' l) ). Now, using 'P (n) and A (n) , the desired surrogate function at the n th iteration for 4> is \$ {n) (*) = -T (n) (Â®) +/?A (n) (aO . (3.19)

PAGE 38

30 From the properties of and A (n) , it follows that the surrogate function (n) possesses the requisite properties: Â• (PI) for all x > 0 Â• (P2) = \$(* (u) ) Â• (P3) V (n )(a;(Â”)) = V\$(*( n )). Given x^ n \ the next iterate a; (n+1) is found by minimizing 0 (3.20) \$(a: (n) ) > \$(* (n+1) ) for all n. (3.21) j (3.22) 3=1 k&Nj 4 n) x*5 n) 4 n) ) (3.25) (3.24) (3.23) where

PAGE 39

31 (see proof in Appendix B). Since tA") and are de-coupled, (n) (a;) = -â€¢ n \x) + /3A {n \x) 1 ( 1 v = E Â‘ 1 (3.26) where i = 1 Â“ ' [VxW + p]i log (Xj) (n) +2p T,T. h 'pÂ’^i)+c++pc^ j = 1 keNj J (I I sp (n) E{^Ep u -iog(x j) |:d. I ^] (3.27) -(Â«) i ar<(. n ) E 2/3 E W(4"Â’ 4"Â’) (4 2 m 0 where = E ( ; ] log (t) + Fj n) t 2 + Gf ] t . (3.34) (3.35)

PAGE 40

32 Fortunately, the function (f)^ is strictly convex for all j and n under the assumption that Xj > 0 for all j and n. We will prove this statement by showing that the second derivative of ^ is positive, when > 0 for all j and n. First, note that Ej n) is negative and p'f' is positive for all j and n. The fact that E ( l) > 0 is due to the fact that dVj^O (see (AS 2 ) in Section 1.4) and the assumption that r^ 71 ' 1 > 0. The fact that Fj n) > 0 follows from the positivity of the function 7, weights {ujjk}, and penalty parameter /3. To see why 7 (t) > 0 for -00 < t < 00, recall that A (t) is a symmetric, strictly convex function by (AS 3 ) and (AS 5 ). It follows that A (t) > 0 over (0, 00) and A (t) < 0 over (Â— 00, 0). Using the fact that 7(0) is finite and nonzero (see (AS 7 )), we have that 7 (t) > 0 for Â—00 < t < 00. Now, consider the second derivative of <^ n) . Easy calculations show that 0 and E I J "' 1 < 0, it follows that the second derivative of 0, and, from (3.29), T (n) is strictly convex over the set {x : x > 0}. Since is de-coupled, is strictly convex, and ) Â— > 00 as t Â— > 0 + , it is true that cc (n+1) > 0 and <^ n) (xJ n+1) ) = q . ( 3.36) Note that (3.36) satisfies our assumption that > 0 for all j and n. To solve (3.34), we compute the first derivative of <^ n) and set it to zero. Since E^ n) < 0 and Fj n) > 0, the root of the quadratic equation that preserves the nonnegativity constraint is -G[ (n+l) _ ?r > + ji G (n)2 _ 8E Wp -(n) 7-r(n) 4 F (n) , j = 1,2, ...,J . (3.37) Observe, as f3 Â— > 0, (3.37) approaches Â—E^/ Pij by LÂ’HospitalÂ’s rule. Thus, the iteration in (3.37) is equivalent to the MLEM algorithm when (3 = 0 .

PAGE 41

33 In summary, given a strictly positive initial estimate a: (0) > 0, the steps of the PML algorithm are: for n = 0, 1, 2, . . . Â• Step 1 Let cc (l)) > 0 be the initial estimate Â• Step 2 Construct the surrogate function 4> (n) from the current iterate x (n) using (3.29), (3.30), (3.31), and (3.32) Â• Step 3 Get x n+l using (3.37) Â• Step 4 Iterate between Steps 2 and 3 until some chosen stopping criterion is met. 3.2 Convergence Proof Using (P1)-(P3) and (AS1)-(AS8), we now prove that the PML algorithm converges. The following convergence proof is based on the convergence proof by Lange and Carson [56] (see also [30]) of the MLEM algorithm by Sliepp and Vardi [16]. By (3.21), the PML algorithm has the property that it decreases the objective function \$ with increasing iterations: Â• (P4) \$(a;( n+1 >) < \$(*(")) for all n > 0. Another property of the algorithm is that Â• (P5) the sequence {4>(a: (n) )} is convergent. This property follows from (P4) and the fact that 4> is bounded below by (AS8) (see [57, Theorem 1.4, p. 6]). Proposition 1 The sequence {a^")} is bounded. Proof : From (P4), it follows that 0. Consider the set B = {x > 0 : 4>(cc) < 4>(cc (0) )}. Then, clearly {a:^} C Â®. So, to prove {x^ n) } is bounded, we will prove that the set B is bounded. It is straightforward to see that B is bounded below by 0. Now, suppose that B is not bounded above. Then, there exists a point z e B such that ||z|| oo. This result means that for some j there exists Zj such that zj oo. Since (AS2) implies that V has no column vector of zeros, it follows that [Pz\i ^ oo for some i and \$(z) oo. This implies that z is not

PAGE 42

34 an element of the set IB because all the elements ol B have an objective value that is less than or equal to < oo, which is a contradiction. Therefore, the set B is bounded above. Proposition 2 There exists some constant cj > 0 such that \$(x(")) Â— (x( n+1 )) > Cj ||x (n) Â— x (n+1) || 2 for all n > 0. Proof : By (PI), (P2), and (3.29), we have the following inequality j J (x (n) ) \$(x (n+1) ) > \$< n >(x (n) ) 4> (n) (x ( " +1) ) = ]T {^ n) (^ n) } ^ n) (xJ" +1) )| . 3 = 1 (3.38) Suppose, for each n, the function 4>^\t) is expanded into a second-order Taylor series [55, pp. 868-869] about the point x jÂ” +1) and evaluated at t = xj n) . Then, the right hand side of (3.38) can be written as \$(Â»)(*("))-\$(")(*("+!)) = J2{ (4 n) -^ n+1) )^ n) (4Â” +1) )+^ (^ n) -xf +l) ) 2 4>f ] (xf +1) ) } , 3 = 1 2 (3.39) where the double dot over a function represents its second derivative and a 1 ' 1 1 is a point between x { l) and x { J 1 ' u . Since <^ n) (xj n+1) ) = 0 by (3.36), it follows J 1 \$ W (x (n) ) (n) (x (n+1 )) = J] xf +x) Y'f\xf +l) ) . (3.40) 3 = 1 Now, recall that \$ n) (t) = ( -E\$ n) /t 2 + 2 Fj n) ) with F\$ n) = 2(3 Z keNj ^(xf x< n) ) and < 0. Since {x^} is bounded and 7 (t) > 0 is a continuous function for -00 < t < 00 , there exists a number 70 > 0 such that 7 (xj n) x[ n) ) > 70 for all j , k, and n. Hence, F j n ^ > cj for all j and n, where Ci = 2(3^q min Ylk<=N > 0Therefore, 2ci for all j and n, and we obtain the desired result 4>(x (n) ) \$(x (n+1) ) > ci ||x (n) x (n+1) || 2 . (3.41)

PAGE 43

35 From (P5) and Proposition 2 , it follows that Â• (P6) the sequence {ad'd Â— ad"+d} converges to 0. The following proposition will be used later to prove not only that a limit point of the sequence {cc ( ,l) } satisfies one of the Kuhn-Tucker conditions [55, p. 777] but also that the sequence {x 1 ' 1 ' 1 } has a finite number of limit points. Proposition 3 Let x* be a limit point of the sequence jad'd}. Then, for all j such that x* ^ 0, ^f {x] = 0 x=x * (3-42) Proof : By Proposition 1 , there is a subsequence jadÂ”d} that converges to x* (see the BolzanoWeierstrass theorem in [58, p. 108]). By (P6), the subsequence {ajlni+i)} a i so converges to x*. Recall from (3.29) that 4>^ l \t) = Â£Â•"'> /t + 2 + G^ l \ If x* ^ 0, it follows that E^/x^ and / x^ l+l>> converge to the same limit, and hence lim 0("' ) (x( n ' ) ) = lim ^"' ) (a:( n ' +1) ) . (3.43) lÂ—> oo J J lÂ—> oo J J Since 4>f l) {xf l+1) ) = 0 by (3.36), it follows from (P3) that _d_ dx \$(cc) X=XÂ‘ = 0 for all j such that x* ^ 0 (3.44) Using Proposition 3, we can prove the following proposition, which will be used to prove that the sequence jad'd} converges. Proposition 4 The sequence jad'd} has a finite number of limit points. Proof : Consider the following sets Y = {1, 2, . . . , J} Z* = {j 6 Y : x* = 0} Z** = {je Y : x** = 0} , (3.45) (3.46) (3.47)

PAGE 44

PAGE 45

37 So, (3.49) is satisfied. Now, we consider the case x* = 0. For j such that x* = 0, suppose d \$(* dxj < 0 x=x* (3.52) Then, it follows that lim j>[ n \xP) < 0 by (P3), and (f) { J l> (x ( J l> ) < 0 for sufficiently n Â— J J J J large n. Consider \$ n) (z\$ n+1) ) = /xf +l) + 2 Fj n) x'? +1) + GÂ™ . If z} n+1J < xf\ A n ) /^.( n + 1 ) (Â«)^.( n + 1 ) . r<( n ) Jn+l) (n) then (n+l) ri(Â«) r Â— H ttt + 2F} n) ^ n+1) + G' n) > 0 Â„(Â«) (n+l) J J J (3.53) because ^"\x^ n+1 ^) = 0 by (3.36) and /x^ +V> < 0. Moreover, the fact > 0 implies that E (n) -hr + + G<Â”> = \$\zf ') > 0 , x\ (3.54) which is a contradiction. Thus, x^ n 1 1 ^ > xp ] for all sufficiently large n. ffowever, this contradicts the fact x ( p Â— 0. So, it is true that x=x * > 0 for all j such that x* = 0 (3.55) This satisfies (3.50). 3.3 Properties of the PML Algorithm We now provide a summary of the desirable properties of the PML algorithm: Â• The PML algorithm is straightforward to implement because there are no hyperparameters required for the algorithm itself and it has closed-form expressions for the iterates. Some algorithms require hyperparameters, such as relaxation parameters, in addition to the penalty parameter [33,35], while others [24,30] do not have closed-form expressions for the updates. Â• The PML algorithm theoretically guarantees nonnegative iterates, whereas some algorithms [33, 35] set any negative element of the iterates to a small positive number.

PAGE 46

38 Â• The PML algorithm monotonically decreases the PML objective function unlike the algorithms in [23,35]. Â• The PML algorithm can incorporate a large class of edge-preserving penalty functions unlike the algorithm by De Pierro [30]. Â• The PML algorithm converges to the minimizer of the PML objective function. Convergence proofs for the algorithms in [23, 33] are not available.

PAGE 47

CHAPTER 4 ACCELERATED PENALIZED MAXIMUM LIKELIHOOD ALGORITHM Although the PML algorithm presented in Chapter 3 converges to the nonnegative minimizer of the PML objective function, it has the drawback that it converges slowly. In PET, a popular way to accelerate iterative image reconstruction algorithms is through the use of so called ordered-subsets [45]. In ordered-subsets based reconstruction algorithms, the observed data, d. is divided into a predefined number of subsets via some chosen rule. Then, the iterative reconstruction algorithm to be accelerated is applied sequentially to each data subset. In [45], Hudson and Larkin developed the first PET image reconstruction algorithm that used the orderedsubsets idea. Since the MLEM algorithm was applied to each data subset, they called their algorithm the ordered-subsets expectation maximization (OS-EM) algorithm. In [61], Browne and De Pierro showed that the OS-EM algorithm did not converge and introduced another ordered-subsets based image reconstruction algorithm that employed a relaxation parameter. It should be pointed out that some convergence results are available for ordered subsets based algorithms that use relaxation parameters [33,35,61], With ordered-subsets based algorithms, there is uncertainty as to how many subsets to be used and how the data should be divided. Moreover, it is not clear how relaxation parameters should be chosen in practice because, generally, they depend on the data. In this chapter, we introduce an accelerated version of the PML algorithm, referred to as the accelerated PML (APML) algorithm, that uses a pattern search suggested by Hooke and Jeeve [41, pp. 287-291]. A pattern search has also been exploited to accelerate an algorithm in the transmission tomography [62], In Section 39

PAGE 48

40 Figure 4 1: Two-dimensional illustration of the sequence {x (n) }The single circles and double circles denote the accelerated iterates {x (n ^} and standard iterates {*(Â”)}, respectively. Each ellipse represents a set of points that have same cost. The mark x denotes the minimizer of the function subject to the constraints x x > 0 and x 2 > 0. 4.1, using the mathematical ideas in the convergence proof of the PML algorithm, we show that a sequence that satisfies certain conditions converges to the minimizer of the PML objective function. Then, we use this result to prove that the APML algorithm, which is developed in Section 4.2, converges to nonnegative minimizer of the PML objective function. In Section 4.3, we introduce the direction vector to be used in the pattern search. Finally, we summarize the properties of the APML algorithm in Section 4.4. It should be mentioned that we introduced the APML algorithm in [19], 4.1 Convergence Proof In this section, we prove that the sequence cc (1 \ a; (2) , x^ 2 \ . . .} converges to the minimizer of the PML objective function \$, where x < ' 0 ' 1 > 0 is an initial guess, a; (n+1 > = argmin 4> ( n \x ) subject to x > 0, 4> (n) is of the same form of the surrogate function for the PML objective function \$ at the iterate x (n \ 4> (rl) in (3.29), except that 4^ J is defined at the point x ^ instead of x and the point x (n) > 0 satisfies the following conditions for all n Â• (C4) \$(*<")) < 4>(cc^)

PAGE 49

41 Â• (C5) there exists some constant c -2 > 0 such that \$(*<")) c 2 ||a;( n > x (n) || 2 . This convergence result will from the basis for the convergence proof of the APML algorithm. Note, the strict positivity of x {n ' ) for all n is necessary because the surrogate function for the PML objective function, 4> , is undefined for vectors with zero or negative elements. An example of such a sequence {x^} is illustrated in Figure 4 1. In the figure, the single circles and double circles represent the accelerated iterates and standard iterates {a^ n )}, respectively. First, note that the following convergence proof is mainly based on the convergence proof of the PML algorithm in Chapter 3. Since 4> is bounded below and the sequence {x^} monotonically decreases the PML objective function 0 : 4>(a;) < 4>(5 (0) )}. Since the set B is bounded (see Proposition i), it follows that Â• (P8) the sequence {x^} is bounded. Now, note that cc (n+1 ) is the minimizer of the surrogate function 4> l, ) , which satisfies (1) 4> {n) (*) > \$(*) for all x > 0, (2) l> (n) (:K (n) ) = ), and (3) VlÂ» (n) (Â® (n) ) = V4 >(ce^). Thus, by (P8) (see Proposition 2), we have the following property: Â• (P9) there exists some constant C3 > 0 such that d>(Â®^) Â— d>(cc( n+1 )) > c 3 ||* (n) -cc(" +1 )|| 2 . By (P7) and (P9), we obtain the property Â• (P10) that the sequence {x^ Â— ad n+1 )} converges to 0. Also, by (P7) and (C5), it must be true that the sequence { x ^ Â— x^} converges to 0. Thus, from the fact that || x (7l) 5 {n+1) || 2 < ||* (n) x ^ n+ ^\\ 2 + || x (" +1 ) Â£ (n+1) || 2 , the property follows: Â• (PH) the sequence { x ^ Â— Â£^ n+1 ^} converges to 0.

PAGE 50

42 For the discussion to follow, consider the surrogate function for the PML objective function at the iterate P n \ +j \t) = Â£']" ) log(f) -F Fph 2 + G^pt for t > 0, Cp ] is independent of x, and 'PijXj (n) tr [vx^+ P \ t kzNj = A ~ ^ E 4Â” ) )(4" ) + 4"Â’ i = i fceWj (4.2) (4.3) (4.4) (note: \$ l \ Ep\ Pj"\ Gp\ an< ^ resu ^ by substituting x for x in (3.29), (3.30), (3.31), (3.32), and (3.33), respectively). To prove that the whole sequence {%(")} converges, as done in the convergence proof of the PML algorithm, we first present the following proposition: Proposition 5 Let x* be a limit point of the sequence {x^}. Then, for all j such that x* ^ 0, _d_ dxj \$(x) = 0 . x=x* (4.5) Proof : By (P8), the sequence {x (,l) } is bounded and there is a subsequence {x^} that converges to x*. By (P10), the subsequence {x^ 71 '" 1-1 ^} also converges to x*. Recall that Â• (4.6) lim _ lÂ—> oo 4>j (xpÂ’) = P (x { p +1) ) J J 1-* oo J J i(ni) dxj (x) x=x = 0 for all j such that x* ^ 0 . Since 4>] (x< n,+1) ) = 0 by (3.36), and V\$'Â”' ; (x (ni) ) = V4>(x ( Â” i} ), it can be said that d (4.7)

PAGE 51

43 Theorem 2 The sequence {x^} converges to the unique minimizer o/4>. Proof : Let x* be a limit point of the sequence {ct^ n) }. Since the set of limit points of the sequence {x (n) } is connected by (P8) and (Pll) (see [60, p. 173] and Theorem 1 in Chapter 3) and there are a finite number of limit points of {ai^} by Proposition 4 and Proposition 5, it follows that there is only one limit point in the set of limit points. Thus, { x Â— Â» x*. Since lim{||i^ Â— cc*|| 2 } = 0, by (P10) we have lim{||a:( n+1 ) Â— Â£c*|| 2 } = 0 (note: ||z: (n+1 ) Â— a;*|| 2 < || aJ ( rl + 1 ) _ x^ n ^|| 2 + ||ai^ Â— x*|| 2 ). Hence, {x^" +1 ^} Â— > x* (i.e. , { x ( "^} Â— Â» a;*). Therefore we can deduce that the whole sequence {x^} x * To prove the sequence {x ( T} converges to the unique minimizer of the PML objective function, we must show that x* satisfies the KuhnTucker conditions (i.e., (3.48), (3.49), and (3.50)). Since all the points in the sequence {x^} are positive, it must be true that the limit point x* is nonnegative (i.e., (3.48) is satisfied). By Proposition 5, l; Hx] X=XÂ‘ = 0 for all j such that x* ^ 0 . (4.8) Thus, (3.49) is satisfied. For j such that x* = 0, suppose < 0 . -^Â—\$(x) dx Then, it follows that lim 0( x ^) < 0 by the property "'(x [n) ) Â— V<3>(al w ), and (f)j (x^) < 0 for sufficiently large n. Consider ^ (xj" +1 ^) = / Xj H+1 ^ + 2F\ n) xf +l) + Gf\ If x' n+1) < xf\ then (n+l) r(n) 2 + 2 F (n y." +1) + G { " ] > 0 x=x * (4.9) (4.10)

PAGE 52

~ (74) because ^ (x^ n+1) ) = 0 by (3.36) and Ej n ' > /xj l+1 ' > < 0. Moreover, the fact that Fj is positive implies that ?(n) X (n) i orÂ’( n )~( n ) i /-lira) .(n)/~(n)\ ^ n + 2 FÂ’Xj + Gj Â’ = Cfj ) (Xj)> 0 , (4.11) which is a contradiction. Thus, > x ^ for all sufficiently large n. However, this contradicts the fact Â— Â» 0. Therefore, x=x * > 0 for all j such that x* = 0 . (4.12) This satisfies (3.50). 4.2 Accelerated Penalized Maximum Likelihood (APML) Algorithm In Section 4.1, we showed that the sequence {x^} = {a^Â°\ x^ 2 \ x^ 2 \ . . .} converges to the nonnegative minimizer of the PML objective function \$ if: (1) Â£c (0) > 0, (2) a4 n+ 0 is the nonnegative minimizer of the surrogate function for the PML objective function at the iterate cc (n) , I*'"*, and (3) > 0 satisfies (C4) and (C5). fn this section, we present an algorithm that produces such a sequence {x (n) }First, consider the following steps: given an initial guess x (0 ^ > 0, for n = 0 , 1 , 2 ,... Â• Step 1 Get the standard PML iterate cc^" +1 ^ = argmin ^^(a:) subject to x > 0. Â• Step 2 Get the accelerated PML iterate = x^ n+1 ^ + ai n+1 ' , i/ n+1 ), where n^ n+1 ^ ^ 0 is the chosen search direction (i.e., direction vector) and q.G+ 1) = argmin (a^" +1 ) + at/ n+1 )) . (4.13) a Â• Step 3 Repeat the steps above until some chosen stopping criterion is met. With t/ n+1 ) = a4" +1 ) Â— x^ n \ Step 2 is the pattern search step put forth by Hooke and Jeeve [41, pp. 287-291].

PAGE 53

45 We now modify Step 2 so that the sequence produced by Steps 1. 2. and 3 converges to the nonnegative minimizer of the PML objective function. Step 2 does not guarantee that the accelerated iterate is positive. Consequently, we modify the optimal step size as follows a ! n+1) = argmin (cc (n+1) + ar (n+1) ) subject to a G A (ra+1) , (4.14) a where A (n+1) = {a : zj n+1) + av\ n+1) > C ( Â” +1) for all j} (4.15) a r (" +1 ) C (n+1) = min Â— Â— Â— (4.16) j n + 2 v ' Observe, lor simplicity, we use the same notation for the optimal step sizes defined by (4.13) and (4.14). For a G A (,l+1 ^. it is straightforward to see that a ;( n + 1 )+at;( n + 1 ) > o because > 0 for all j and n. Thus, by adding the constraint a Â€ A^" +1 ^ to the problem in (4.13), it follows that x (ri) > 0 for all n. Because xj n+1) > Â£ (n +i) for all j and n, it is evident that (4 n+1 ) has been chosen in such a way that 0 G A (n+1) for all n. The fact that 0 G A^" +1 ^ will be used later to prove that the proposed algorithm monotonically decreases the PML objective function 4> with increasing iterations. Remark : If we allow some elements of x ^ to be zero, then the feasible region of the function \$(x (n+1) + cru (n+1) ) is {a : a^" +1) + > 0 for all j}. However, for the sequence {x^} to converge, we must constrain all elements of x ^ to be positive because the surrogate function for the PML objective function 4>, \$ , is not defined at x = x ^ where x ^ = 0 for some j. A feasible region of the function \$(a;( n+ b + a"ih n+1 )) that appears natural is A^n+p a . x ( rt+1 ) _p cnv^ +V> > 0 for all j} . (4.17)

PAGE 54

46 Figure 4 2: Illustration of the pattern search step: (a) two-dimensional illustration of ( 1>, (b) one-dimensional slice of the function along the chosen direction vector. The single circle and double circle denote an accelerated iterate x ^ and a PML iterate cc(" +1 ), respectively. The mark x denotes the minimizer of the function subject to the constraints x\ > 0 and %2 > 0. However, the set A [" ' 1 1 is an open set. Consequently, the optimization problem argmin
PAGE 55

47 for 4>Â„Â’ fl ^(a) = 4>(aA J+ b 4-Q-w* n+1 ^), which we denote by r( n+1 )(a), that satisfies the following conditions: Â• (C6) r(" +1 )(a) > \$L n+1) (ct) for a G A(Â” +1 > Â• (C7) r<" +1 )(a) = 4 n+1) (a) for a = 0. By incorporating the surrogate function r (n+1) with the constraint a G A (n+1) , an alternative to Step 2 is: Â• Step 2a Get the accelerated PML iterate = ;*4 n+1 ) + rd n+1 M" +1 ), where Q.in+i) ^ argmin r^" +1 ^(a) subject to a G A^ n+1 ^ . (4-19) a It is important to point out that, for convenience, x (n+1) has been re-defined in Step 2a. This new definition will be used throughout the rest of the dissertation. In Figure 4 2, the alternative pattern search step with the surrogate function r^ n+1 ^ is illustrated. In Figure 4 2 (a), a two-dimensional example of <4> is shown with a direction vector t/ n+1 ). In addition, the one-dimensional slice of 4> along the direction vector which we denote 4 >q 1+1 \ and a surrogate function T^ n+1 ^ that satisfies (C6) and (C7) are shown in Figure 4 2 (b). By design, the sequence {%( n )} produced by Steps 1, 2a, and 3 satisfies the monotonicity condition (C4). To see this fact, note that 4>(cc (n+1) ) = 4>i n+1 ^(a ( " +1) ) by the definition of By (C6), it follows that \$L" +1) (a (ri+1 )) < r^" +1) (o: ( " +1 ^). Also, from the definition of a^" +1) in (4.19) and the fact that 0 G A ( " +1) , we obtain the result r(" +1 )((A ra+1 )) < r^" +1 ^(0). Finally, by (C7), it can be concluded that \$(5 (n+1) ) < r(Â” +1 )(0) = \$i n+1) (0) = ) for all n. We now present our choice for the surrogate function r^ n+1 ^ that satisfies (C6) and (C7). First, note that the negative log likelihood function Â— 4/ can be expressed

PAGE 56

48 as / i ~^{x) = ^2{[Px\ t di log ([Pa; + p],)} + + log(rfj)} (4.20) i = 1 i=l 1 = + C*5 , (4.21) i = 1 where = t Â— di\og(t + p,;) and C5 = {p> + lÂ°g(rfÂ» !) } Â• Suppose a function, can be found such that Â• (C8) 0 8 (n+1) (a) > 4 n+1) (a) for a Â£ A {n+1) Â• (C9) dl n+1 \a) = ipl n+1 \a) for a = 0, where ^ +l \a) = ^([Px^+% + a[Vv^ n+ %). Then, the function 7 0 (n+D( a ) ^ J2 e[ n+1 \a) + C 5 (4.22) i=l will satisfy the conditions Â• (CIO) 0( n+1 )(a) > -T(a;( n+1 ) + av( n+1 )) for a Â£ Al" +1 ) Â• (Cll) Â©l" +1 l(a) = -^(a;(" +1 ) +an( n+1 l) for a = 0. A function that satisfies (C8) and (C9) is et +l \a) = /4" +1) y + ^ ( Â” +1) (0 )a + ^ ( " +1) (0) , (4.23) where pj n+1) = ma x{fy n+1 \a) subject to a Â£ A (n+1) }. From the definition of 0j n+1) , it is obvious that (9j n+1) (0) = ^" +1) (0). Thus, (C9) is satisfied. To see that 6^ ra+1) satisfies (C8), consider the function 2|" +1 '(q) = 6^ n+1 \a) Â— '0j nt ^(a). From the definition of pn+1 \ it is clear that > 0. Thus, it follows that is a convex function. Moreover, a = 0 is a minimizer of zn+1) because in+1) (0) = 0 by the definition of Since z,l+1) (0) = 0 by (C9), it is straightforward to see that z n+1 \a) > 0. This result implies that #j" +1> (a) > ^ n+1 \a) for all a Â€ A ( " +1) . So, (C8) is satisfied. To calculate it is worthwhile to note that the set Al n+1 l can

PAGE 57

49 be written as Ab ,+1 ) = {a : L^ +1 ) < a < fA n+1 )}, where Â’ Â£(n+i) _ x ( n+ b L (n+1) = max (Â»i+i) j such that uj n+1 ^ > 0 A f A(n+ 1 ) _ Â™( n +l) U {n+1) = min i : j j such that u|" +1) < 0 J j (n+l) (4.24) (4.25) Observe that L (n+1) < 0 and Â£/ (,l+1) > 0. Since the second derivative of ^" +1) is di([Vv^+%Y ([Pv( n +%a + [Px( n +% + Pi y Â’ the maximum second derivative of ^-" + b for a Â£ A^ n+1 ^ is (4.26) ^ +1) = < M(^ +1 >],) 2 f7?Â„(Â«+i)l . > o ([pv( n+1 )]jL("+i)+[ra( n + 1 )] j + ft )2i v v J* ^ u Ml ? y (n+1) ].) 2 fT>Â„(Â«+l)l. < 0 ([p t ;(n+l)]. f/ (n + l) + [-p a ;(n + 1 )]. +/3 ,) 2 l P U U ^ U 0, [Pv ( n+ % = 0 . (4.27) It should be noted that \Pv (n+l \a + [Px^ n+V> ]i + pi > 0 for a G A (n+ b (see (ASl)). At this point, we need a surrogate function for the penalty function A once again. As mentioned in Chapter 3, under assumptions (AS3)-(AS8), Huber developed a surrogate function for A, denoted by A (n) (see (3.14)). By the properties of the surrogate function \( n \ it is clear that the surrogate function A^b, which is defined by j A (n) (aO = E E ^ (n) (M x k ) , (4.28) j = 1 keNj satisfies A (n) (Â») > A(x) for all x and A ( ")(aA")) = A(x (ri) ). Thus, A (n+ 1 )(a;( n+1 ) + au( n+1 )) will satisfy Â• (C12) A (n+ 1 )( x (Â«+i) + av (n+1 )) > A(aA n+1 ) + au (n+1 )) for a Â€ A< n+1 ) Â• (C13) A (n+ 1 ) (x( n+1 > + ai/ n+: b) Â— A(xl n+ b 4 a'i/" + b) for a = 0 . Finally, by (C10)-(C13), it is clear that the function F ( Â«+i)( a ) Â£ 0(" +1 )(a) + (3A( n+1 \x in+ V + av (Â«+!)) (4.29)

PAGE 58

50 satisfies (C6) and (C7). To solve (4.19), we first determine the unconstrained minimizer of rG+b : d (n+1) = argmin r (n+ 1 ) (a) (4.30) Since T (n+1) is strictly convex by j(t) > 0 for Â— oo G+b ^ 0, (ASl), and (AS2) (see Appendix D), the expression for dG+b can be found by simply computing the first derivative of pG+b and setting it to zero (see Appendix D): ^Â” +1, (0) P zu ^V4" +1 Â’ 4Â” +, Â’)(*> (n+1) Â„( n + 1 h/Â’Â„,G+b ct (n+1) (n+l) Ei=i ^ + P EU * } 4 ) 2 (4.31) Given (4.24), (4.25), and (4.31), the solution to the constrained optimization problem in (4.19) is a (n+l) u {n+1 \ if a (n + b > t/ (n +b r (n+l) ) if dG+b < pG+b . (4.32) aG+b otherwise All that remains now is to show that the sequence {x^} produced by Step 1, Step 2a with rG+b in (4.29), and Step 3 satisfies (C5). To see this, note that by (C6), (C7), and the definition of a^ n+1 \ we have the following inequality \$(a: (n+1) ) \$(Â®G+b) > T (,l+1) (0) r (n+ 1 ) (a (n+1) ) . (4.33) Suppose, for each n, the function rG+b i s expanded into a second-order Taylor series about the point cnG+b and evaluated at a = 0. Then, the right hand side of (4.33) can be written as rG+b(0) r (n+ 1 )( a (n+l)) = rG+ 1 )( a ( n + 1 ))(_ a G+D) _|_ IfG+b^G+b^-QjG+b^ ^ (4.34)

PAGE 59

51 where a (n+1) is a point between 0 and a (n+1) . By the strict convexity of r (n+1 \ Â£('Â»+!) < q_ anc j [/("+i) > o, it follows that r( n+1 )( a ( ri + 1 )^_ a (Â«+i)) > q for all n. Thus, T (n+1) (0) r (n+1) (a (n+1) ) > If ( " +1) (a (n+1 >)(a (n+1) ) 2 . (4.35) Since there exists a symmetric positive definite matrix A4, which is independent of n, such that f (Â«+ 1 )(a("+ 1 )) > 2{v^)' M{v^ n+ ^) (see Appendix E), it follows that r (n+1 >(0) T (n+1) (a (n+1) ) > (r; ( " +1) ) / M(v (n+1 ))(a (ri+1) ) 2 . (4.36) Hence, to prove (C5), we show that there exists some constant C 2 > 0 such that ( v ("+ 1 )yx( t ,("+ 1 )) > C2 ||v (n+1) || 2 , (4.37) where we used the fact that ||x (n+1) Â— Â® (n+1) || 2 = (a( n + 1 )) 2 || v ( n + 1 )|| 2 by the definition of Since M is a symmetric matrix, it can be factored as M = TAT' by the Spectral theorem [63, p.309], where the columns of the matrix T contain orthonormal eigenvectors of A4 and the diagonal matrix A contains corresponding eigenvalues along its diagonal. Using the fact that T'T produces the J x J identity matrix and RayleighÂ’s quotient with i/ n+1 ) = Y z (n+1) (i.e., z (n+1) = YV^ 1 )) [63, p.348], it follows that (-y(n+ 1 ))'_A4(v ( " +1 )) ||^(n+l) 112 (Yz( n+1 ))'A4(Yz^" +1 )) (Yz("+ 1 ))'(Yzb+ 1 )) ( z (n+l))/^( z (n+l)) ~ (z( n+1 )) / ( z ( n + 1 )) e ,( 4" +1> ) 2 _ 7 (4.38) (4.39) (4.40) (4.41) where {ej}j =1 are the eigenvalues of A4 and e m is the smallest eigenvalue. Since M is positive definite, e m is positive. Therefore, with c 2 = e m > 0, we obtain

PAGE 60

52 (i>( n+ i)y.M('u( n+ i)) > C 2 ||u^ n+1 ^|| 2 and 4>(z: (n+1) ) \$(x ( Â” +1) ) > c 2 ||x ( Â” +1) x (n+1) || 2 . (4.42) 4.3 Direction Vectors In some algorithms, the gradient of a function is used as a direction vector, as in the method of steepest descent [55, ch. 14]. Suppose the direction vector r/ n+1 ) was chosen to be the gradient of the PML objective function evaluated at the PML iterate (i.e. , rh n+ P = V\$(Â®* n+1 ^)). Then, the gradient must be calculated at each step. However, the computational cost of the gradient of 4> is on the same order as the computational cost of single PML iteration. Moreover, in experiments, the APML algorithm with r>( n +!) = V(aA l+1 )) decreased the PML objective function slower than the PML algorithm. The current direction vector in the pattern search step by Hooke and Jeeve is the difference of the two most recent iterations. Specifically, the direction vector is defined by iA n+ P = cc(" +1 ) Â— x^ n \ This choice can be justified in a reasonable manner. To see why, assume that a closed-form expression is available for the minimizer of ^)( a; (n+ 1 ) _|_ Q;t ;( n + 1 )). Then, the Â“bestÂ” direction vector is (Â®("+i) Â— x*) because (ah n+ b + a?;( n + 1 )) Â“containsÂ” the point x* (note: x* would result with a = Â— 1), where x* is the minimizer of the PML objective function. However, x* is not known at the n th iteration. The Â“bestÂ” estimate of x* at the n th iteration is x^ n \ namely the accelerated iterate at the (n Â— l) t/l iteration. In this section, we introduce a direction vector that works better than Hook and JeeveÂ’s direction vector in terms of convergence rate. The direction vector we choose is a simple variation of p( ra+1 ) that is not computationally expensive (note: J subtractions are required for Â£>( n+1 )). Since there are no positrons emitted outside the subject being scanned, the PML estimate will contain many values near zero. This claim is supported by Figure 4 3. In the figure,

PAGE 61

53 Figure 4 3: PML iterates are shown: (a) the 13 th PML iterate, (b) the l^ th PML iterate, (c) the 15 th PML iterate, and (d) the 1000^' PML iterate. The images were generated using a real thorax phantom data. The plane considered contains activity due to the heart, lungs, spine, and background. For display purposes, all the images were adjusted so that they have the same dynamic range.

PAGE 62

54 PML images corresponding to different iteration numbers are shown. The images were generated by applying the PML algorithm to real thorax phantom data (scan duration was 14 minutes) with an uniform initial estimate. The penalty parameter was f3 = 0.02 and A (t) = log(cosh(|)) with S = 50 was the penalty function. As can be seen in Figure 4 3 (a), (b), and (c), the early iterates contain values near zero outside of the body. Consider the figure in Figure 4 3 (d), which is the 1000 th PML iterate (practically speaking, the iterate is the minimizer of <Â£> because it did not decrease after the 791 st iteration up to the 5000 th iteration). The 1000 th iterate also contains many values near zero outside the body. Inside the body, on the other hand, the 1000 th iterate is very different from the early iterates. From the above observations, it can be said that convergence rate varies significantly between voxels inside the body and voxels outside the body. From the above observations that the voxels outside the body converge faster than the voxels inside the body and the voxels outside the body converge to values near zero that is the boundary of the set {cc : x > 0}, we claim that it is better to search for an accelerated iterate along the boundary whenever the current iterate is Â“nearÂ” the boundary. By Â“boundaryÂ”, we mean the set {*>0:^ = 0 for some j}. This can be explained by the example shown in Figure 4 4. In Figure 4 4 (a), the second PML iterate is heading toward the x 2 -axis. If we perform the acceleration step with the direction vector jA n+1 ), then the accelerated iterate will lie on the a^-axis as shown in Figure 4 4 (b). A better direction to be searched is the one that is parallel to the X 2 -axis as shown in Figure 4 4 (c). The principle of the proposed direction vector is to exclude the coordinates corresponding to the voxel values that are Â“nearÂ” the boundary. An easy way to incorporate this idea is to determine the voxels whose values are less than a small positive value, e. Specifically, the proposed direction,

PAGE 63

55 (a) (b) (c) Figure 4 4: Direction vectors: (a) Standard PML iterates are shown, (b) an accelerated iterate (the single solid circle) is shown by using the direction vector C/( n+1 \ and (c) an accelerated iterate (the single dotted circle) is shown Dy using the proposed direction vector i+ +1 \ The double circles denote PML iterates. The mark x denotes the minimizer of the function subject to the constraints x\ > 0 and x 2 > 0. v (n+l) IS (n+l) A 0 , Â•r (n+l) it x < e Â— > otherwise where e > 0 is a user-defined parameter (note: + n+1 ) = p( n +i) if e 1, 2, . . . , J. (4.43) 0) and j 1. 1 Properties of the APML Algorithm We now provide a summary of the desirable properties of the APML algorithm: Â• The APML algorithm needs only one additional parameter (i.e., e), whereas ordered-subsets based algorithms [33, 35] need at least three extra parameters (i.e., a relaxation parameter, the number of subsets, and a small positive number that is set to any nonpositive elements of an iterate). Â• In experiments, the proposed direction vector performed better than the direction vector put forth by Hooke and Jeeve [41, pp. 287-291], Â• The APML algorithm monotonically decreases the PML objective function unlike the algorithms in [23,35].

PAGE 64

56 Â• The APML algorithm theoretically guarantees nonnegative iterates, whereas some algorithms [33, 35] set any negative element of the iterates to a small positive number. Â• The APML algorithm can incorporate a large class of edge-preserving penalty functions unlike the algorithm by De Pierro [30]. Â• The APML algorithm converges to the minimizer of the PML objective function. Convergence proofs for the algorithms in [23, 33] are not available.

PAGE 65

CHAPTER 5 QUADRATIC EDGE PRESERVING ALGORITHM In this chapter, we present a regularized image reconstruction algorithm that aims to preserve edges in the reconstructed images so that fine details are more resolvable. We refer to the proposed algorithm as the quadratic edge preserving (QEP) algorithm. The QEP algorithm results via a certain modification of the surrogate function \$0) f or tR e PML objective function. It should be mentioned that the QEP algorithm was first introduced in [18]. Recall that the n th surrogate function for the PML objective function \$, can be expressed as \$ ( " ) (x) = f\xj) + C^ n) (see (3.29) and (3.35)). For the discussion to follow, it will be convenient to express in (3.35) in a different manner \ '?{t ) = Efhog^ + F^e + G^t (5.1) keNj I EjÂ”> log(t) + i Y. -Pis + 211 Y. 2 m
PAGE 66

58 and rrijl\ Ej H \ F ^" ] , and G'*" ) are defined in (3.24), (3.30), (3.31), and (3.32), respectively. Note that the j th element of the next PML iterate aff n+1 ), a^ n+1 \ is defined to be the nonnegative minimizer of the function 0) n) . The function hJfi} is quadratic so its aperture, ujjk 7(x^ Â— x[ n) ), and minimizer, are expected to play key roles in the regularization process (for a quadratic function, f(t ) = at 2 + bt + c , the constant a is called the aperture of the function). To see the role of the function h^) on determining XjÂ” suppose (3 = 0 in (5.5). Then, = l^\t) and the minimizer of the function 4>\ n \ t ) is the Â“pureÂ” log likelihood iterate (i.e.. minimizer of l ^ l> ) . For (3 7^ 0, intuitively speaking, it is evident that the functions {h^}keN :l act to bias x '1 1 1 1 from the pure log likelihood iterate towards the minimizers of {h^jkeNj (observe: the last term in (5.5) is independent of x). The degree of influence that the functions {h'jk [rev, possess is controlled by their apertures and the penalty parameter (3 . To highlight the role of the function 7, we let (3 = | and ui 3 k = 1 for all j and k. In this case, the aperture of equals 7 (x^ Â— x["^). For the quadratic penalty function (i.e., A (t) = t 2 ), the aperture of is a constant for all j, k , and n. Consequently, for all k e Nj, the function has the same degree of influence regardless of the absolute difference between xj n) and x[ n) . Practically speaking, this means that the quadratic penalty will overly smooth edges. To preserve edges, it would be helpful to lessen the degree of influence of the functions {h'j^}keNj whenever the absolute difference between and xj^ n) is sufficiently large. Figure 5 1 is a plot of 7 when A(t) = log(cosh(|)) (recall: 7(f) = A (t)/t in (AS6)) is used in the penalty function with 5 = 10, 50, 100, 500. From the figure, it can be seen that 7 becomes very small compared with 7(0) for |f| >> 6. This means that, when |x^ n) xj; n) | >> 5, h% ] will have a Â“very smallÂ” aperture, and consequently the function will not have much influence on x^Â‘ 1 1 1 . Said another way, the log-cosh penalty function helps preserve edges whose Â“heightsÂ” are on the order of 6.

PAGE 67

59 Figure 5 1: The function 7 is shown when A (t) = log(cosh(|)) is used as the penalty function with: (a) <5Â—10, (b) 5=50, (c) 5=100, and (d) 5=500. We will now move the discussion from the aperture of the function h^} to the minimizer of hff. As stated previously, the minimizers of the functions {h^}keNj play an important role in the regularization procedure. More specifically, when the aperture of is sufficiently large, the (n+l) si iterate is biased towards the minimizer of h^\ which is m J ]k = {x { l) + x k ' ] )/2. As a result, there is inherent averaging that takes place with the PML algorithm. Consider a penalty function, where, for certain functions { j = 1 keNj In order to better preserve edges, we believe an improvement would be to construct cTj 7 ^ so that it has the same aperture as , but a different minimizer. The minimizer of (Tjl\ which would depend on whether an edge is present, is chosen to be u jk = x 5 n) 4 n) ) . (5.9)

PAGE 68

60 where rj is a function such that v(t) e, e Â— 800 (value is chosen arbitrarily) and r](t) = 250tanh(2|Q). It can be observed that is approximately x ^ + 250 when x k 1 i s larger than 1 +250 and is approximately x^ Â— 250 when is less than x j ^ ~ 250. In other words, the function 77 prevents from being overly biased towards x^ when the absolute difference between and x^ is greater than 250. Using the edge preserving penalty function Acp*. an algorithm for obtaining regularized estimates of the emission means follows: Â• Step 1 Let > 0 be the initial estimate Â• Step 2 Construct A^(a:) from the current estimate using ( 5 . 8 ), ( 5 . 9 ), (5.10), and (5.11) Â• Step 3 Compute a: (n+1) = argmin E (n) (aj) subject to x > 0. where E^(x) = -V(x)+pA&\x). Â• Step 4 Iterate between Steps 2 and 3.

PAGE 69

61 Figure 5 2: Plots of rn^ and versus x j," * are shown, where x^ l) = 800 is fixed and 77 (f) = 250tanh(^Q). It turns out that a closed form expression for the problem in Step 3 is not possible. An alternative to Step 3 is: Â• Step 3a Find Â®^ n+ 0 such that E( n )(a: (n+ 0) < The problem in Step 3a can be solved by defining cc^ n+1 ^ as x {n+1) = argmin \$Â£>(a?) , (5.12) x>0 where ^(x) = -^ ) (x)+/5A(" ) (x) (5.13) and is defined in (3.12). Defining the iterates according (5.12) and (5.13) insures that E (n >(x( n+1 )) < E< n >(a:< B >). To see this fact, note that (*("+!)) < \$Â£>( x < n+1 >) because ^ (n) (x) < 'I'(x) for all x > 0. Since \$ip } (x (n+1) ) < ip ) (x (n )) by (5.12), it follows that = E (n) (x (n) ) . E ( n)( Â£C (n+ 1 )) < \$J)(*("+1)) < \$g)(* (w >) (5.14)

PAGE 70

62 All that remains now is to solve the optimization problem in (5.12). Observe that cp) can be written as 4>S>(x) = -#<Â”>(*) I ( J where T> 'r' j log (Xj) > + C\ (n) + 2 /?X X a *( X i) j = 1 keNj J ( 1 1 p.-x^ + X I X u 3kl( j = i [ keNj = X{^l n) lo 8( x i) + F j n)x ) + H j n)x j} + c( i ] Â» 3 = 1 + Q (n) [x w _ x (n) ) ( x j 2u i )x i + u 5? 2 ' (5.15) (5.16) (5.17) (5.18) H > n) = X ^ X ^(4 n) 4 n) ) (Â«) _ jk i = 1 fceJV* Q n) = are defined in (3.13), (3.30), and (3.31), respectively. Since (n) \$e P is de-coupled, it follows that the solution to (5.12) is given by An) (5.19) (5.20) a> +1) = arg min {E^ n) log(xj) + Fj n) x? + ffj n) xj } , j = 1, 2, . . . , J . (5.21) Repeating the steps used to derive (3.37), it is straightforward to see that the solution to the optimization problem in (5.21) is (Â„ +1 , -^ n W<Â’ 12 8j t )j 7Â’ 1 j 4 j , j = 1,2, ..., J . (5.22)

PAGE 71

CHAPTER 6 JOINT EMISSION MEAN AND PROBABILITY MATRIX ESTIMATION In Chapters 3, 4, and 5, we assumed that the probability matrix V accounts for errors due to attenuation, detector inefficiency, detector penetration, non-collinearity, and scatter. However, we now assume that an I x J corrected matrix V c is available that accounts for errors due to attenuation, detector inefficiency, and detector penetration. And, we develop a method for correcting errors caused by scatter and non-collinearity. Before presenting the proposed method, we briefly review standard approaches for obtaining V c in practice. The ( i,j) th element of V c , R7 , denotes the probability that an annihilation in the j th voxel leads to a photon pair being recorded by the i th detector pair when there are no errors due to scatter and non-collinearity. In practice, one can reliably estimate the probability matrix V c by using the detector penetration correction method in [7] with an attenuation correction method [9] and detector inefficiency correction method in [15]. However, other correction methods for detector penetration, attenuation, and detector inefficiency could be used in conjunction. To address attenuation errors, two scans, known as a transmission scan and blank scan, are taken. During a transmission scan, 1 Â— 3 rotating-rods filled with positron-emitting isotopes rotate outside the subject and the transmission data are the coincidence sums. A blank scan is taken in the same way as the transmission scan except that there is no subject inside the scanner. The coincidence sums during the blank scan form the blank scan data. Given the transmission and blank scan data, an estimate of the attenuation map can be generated. Using the estimated attenuation map, attenuation correction factors are computed and used for attenuation correction. Numerous attenuation estimation methods have been proposed [8 10]. 63

PAGE 72

64 Figure 6 1: A simple example to illustrate the geometry of PET image reconstruction with three voxels (t^, v 2 , and v 3 ) and three detector pairs ((ai,bi), (a 2 ,b 2 ), and (03,63)). The dashed lines define the tubes of the detector pairs. To address errors due to detector inefficiency, a blank scan of extremely long duration is performed. From the blank scan, the relative efficiency of the detector pairs can be estimated. Recall that the efficiency of a detector pair is defined to be the probability that a coincidence is recorded when a photon pair is incident on the detectors. Consider a set of detector pairs where each detector pair has same spatial extent. One simple way to estimate the efficiency of a detector pair in the set of detector pairs is to define its efficiency to be the ratio of the number of photon pairs recorded by the detector pair and the mean number of photon pairs recorded by a detector pair in the set of detector pairs. Once, the estimates of efficiencies for all detector pairs are available, they can be incorporated into the probability matrix. To address detector inefficiency, more complicated correction methods such as [13 15] have been proposed. As discussed in Section 1.3, two photons generated by an annihilation usually do not propagate in exactly opposite directions, which is called non-collinearity of line-of-response [5]. As illustrated in Figure 15, it is possible that an annihilation

PAGE 73

65 is recorded by detector pair i\ given that the annihilation would have been recorded by detector pair i 2 if both photons propagate in exactly opposite directions, where h 7 ^ * 2 Since scatter depends on the activity and attenuation within the subject and the scanner design, it is not straightforward to correct for errors due to scatter. Photons lose energy when they are under gone Compton scattering. Thus, the energy of detected photons may be used to discriminate between unscattered photons and scattered photons [11, pp. 65-69]. Scatter correction methods that are based on multiple (two or more) energy windows have a limitation because detectors have finite energy resolution [46,47]. As discussed in Chapter 2, state-of-the-art scatter correction methods, such as the method in [52] where the scatter distribution is calculated using an analytical equation, are not practical because of their extremely high computational cost. Consequently, a simple scatter correction method would be beneficial to the PET community. In this chapter, we propose a method that estimates the probability matrix in such a way that errors due to scatter and noncollinearity are addressed. In Section 6.1, we first propose a model for emission data that accounts for errors due to scatter and non-collinearity. Then, in Section 6.2, we present a method where the unknown emission mean vector and an unknown matrix in the proposed model are jointly estimated. Finally, we propose an algorithm, referred to as probability correction in projection space (PCiPS) algorithm, that estimates the unknown emission mean vector and the unknown matrix in the proposed model. 6.1 Scatter Matrix Model Consider Figure 6 1 in which a simplified PET scanner consisting of three detector pairs is depicted. For this discussion, we will focus on the voxels iq, v 2 , and v\$ shown in Figure 6 1. Let the detector pairs (ai,6i), (a 2 ,b 2 ), and ( a 3 ,b 3 ) be the first, second, and third detector pairs, respectively. For the scanner in Figure 6 1, we

PAGE 74

66 assume that 0.04 0 0 V c = 0 0.04 0 0.03 0.03 0.02 ( 6 . 1 ) Suppose during a scan there are no errors due to scatter and non-collinearity. Then, the mean number of photon pairs recorded by the detector pair (ai,6i) is Vhxi, where Xi is the mean number of positrons emitted from voxel V\ . The mean number of photon pairs recorded by the detector pairs (02,62) and (03,63) are V 22 x 2 and (P31X1 + 'P' !2 x' 2 + P33X3), respectively, where x 2 and x 3 are the mean number of positrons emitted from voxels V2 and v 3 . respectively. As discussed in Section 1.3, a photonÂ’s original flight path is altered when it undergoes Compton scattering. Consequently, an annihilation in v\ that would have been recorded by detector pair (01,61) if both photons had not undergone Compton scattering may instead be recorded by detector pair ( a 2 ,b 2 ) when scatter occurs. When we consider non-collinearity, it is also possible that an annihilation in v\ is recorded by detector pair {a 2l b 2 ) given that the annihilation would have been recorded by detector pair (01,61) if both photons propagate in exactly opposite directions. Let denotes the conditional probability that an annihilation is recorded by detector pair i\ given that the annihilation would have been recorded by detector pair i 2 if both photons produced by the annihilation had not undergone Compton scattering and both photons propagate in exactly opposite directions, where i\ = 1,2 ,...,/ and *2 = 1 , 2 ,...,/. It is through these unknown probabilities {/C ?1 , 2 } that we model scatter and non-collinearity. Now, accounting for scatter and noncollinearity, the mean number of photon pairs recorded by detector pair (ai,6i) is {IC n V c n Xi + ICi2'P 22 X2 + ICi3(Vl 1 Xi+V c 32 X2+Vl 3 X3)}. Moreover, the mean number of photon pairs recorded by detector pairs (02,62) and (03,63) are {K,2\V\ l X\+K22P 22 x 2 + Â£23(^31^1 +^32^2+Â’Â£ , 33Â£3)} and {fcz\V\ 1 Xx+)C 32 V 22 X2 +/C33 (V 3l Xi +V 32 X2+V 33 X 3 )},

PAGE 75

67 respectively. In matrix notation, the mean of the data E[D] can be expressed as 1 to CO 1 0.04 0 0 x x E\D] = /C-21 K 22 1C 23 0 0.04 0 X2 ^31 ^32 ^33 0.03 0.03 0.02 XÂ‘A = /CV K x , ( 6 . 2 ) (6.3) where x = [xi X2 x\$ and the I x I matrix 1C denotes the first matrix on the righthand side of (6.2). We will refer to 1C as the scatter matrix. Since the emission means can be expressed as lCV c x in this model, we define the Â“trueÂ” probability matrix as R truc A KVC ' ( 6 . 4 ) where the ( i,j) th element of 'P truc , 'P[Â™ c , denotes the probability that an annihilation in the j th voxel leads to a photon pair being recorded by the i th detector pair. To our knowledge, the scatter matrix model we propose is not available in literatures. However, similar factorization of 'P truc in (6.4) can be found in [28,53,64,65]. With the definition of 'P truc in (6.4), the mean number of photon pairs recorded by the i th detector pair can be expressed as <Â«-s) j=l j = 1 i 2 ~l 1 J = E E < 6 6 ) *2=1 j = 1 / = Y,K, h [PÂ‘x} l2 . (6.7) * 2=1 In other words, the mean number of photon pairs recorded by the i th detector pair is a weighted sum of the mean number of photon pairs recorded by all detector pairs when there are no errors due to scatter and non-collinearity. Regarding the matrix 1C, two constraints are necessary. Since 1C is a probability matrix, it must be true that 0 < lC lll2 < 1 for all i\ and i 2 . Recall that /C (1 , 2 is the conditional probability that

PAGE 76

68 an annihilation is recorded by detector pair i\ given that the annihilation would have been recorded by detector pair i 2 if both photons produced by the annihilation had not undergone Compton scattering and both photons propagate in exactly opposite directions. Since a photon pair that is recorded by detector pair i\ can not be recorded by the other detector pairs, we require that / Y lChi 2 < 1 for all i 2 . (6.8) *i=i 6.2 Joint Minimum Kullback-Leibler distance Method We now consider the Poisson model by Shepp and Vardi [16]. In their model, the emission data d is an observation of a random vector D that is Poisson distributed with mean ( V tIuc x + p), where x is the unknown emission mean vector and p is the known mean accidental coincidence rate. Since d is the ML estimate of the unknown mean (P tiac x + p) 1 , it can be said that d Â« (V tIuc x + p). Using the definition in (6.4), we can state that d Â« ICV c x + p . (6-9) It is important to point out that there are two unknowns in (6.9): the scatter matrix /C and emission mean vector x. Thus, given the emission data d, mean accidental coincidence rate p, and probability matrix V c , the problem of interest is to estimate the emission mean x and scatter matrix K. We define the estimate (x,IC) to be a 1 If d is an observation of a random vector D that is Poisson distributed with unknown mean A, then the likelihood function for d is Pr{D = d|A} = ^e _A . Since d maximizes the log likelihood function log Pr{D = d|A}, d is the ML estimate of A.

PAGE 77

69 minimizer of the Kullback-Leibler (KL) distance [66] between d and ( KLV c x + p): (x, KL) = arg min KL(d. ICV' x + p) subject to ( QC . /C ) / x > 0, /C > 0, and KL n i 2 < 1 for all i 2 , (6.10) u=i where the KL distance between a and b is defined by KL(a, b) = ^ ^aj log ^ a, + b^j (6.11) and KL > 0 means that /Qp 2 > 0 for all i\ and i 2 The definition of the estimate of (x, KL) is motivated in part by the fact that in [26], Byrne derived Shepp and VardiÂ’s MLEM algorithm [16] by minimizing the KL distance between the emission data d and V true x using the alternating projection algorithm by Csiszar and Tusnady [67], where 'P true is assumed to be known. It should be mentioned that Byrne derived the MLEM algorithm when p = 0. 6.3 Probability Correction in Projection Space (PCiPS) Algorithm Since it is difficult to solve the problem in (6.10), we propose an alternating minimization algorithm where, in a repetitive fashion, one of the two unknowns (i.e., KL and x) is estimated while the other is fixed. Suppose an initial estimate for the scatter matrix KL (e.g., an I x I identity matrix), denoted by KL^Â°\ and an initial estimate x ^ > 0 are available. Then, for n = 0, 1, 2, . . ., the steps of the proposed algorithm are Â• Step 1 Get the current estimate of emission mean vector a^ n+ T using KS n \ the current estimate of the scatter matrix KL: x {n+1) = arg min KL(d, IC {n) V c x + p) . x>0 ( 6 . 12 )

PAGE 78

70 Â• Step 2 Get the current estimate of the scatter matrix AG n+1 * using x^ n+1 \ the current estimate of the emission mean vector x: /C (n+1) = argmin KL(d. KV c x {n+l) + p) /c> 0 i subject to ICi x i 2 < 1 for all ? 2 Â• (6.13) *i=i Â• Step 3 Repeat the steps above until some chosen stopping criterion is met. Note that the problem in Step 1 can be solved using the MLEM algorithm [16,26]. The reason is because Byrne showed that Shepp and VardiÂ’s MLEM algorithm [16] minimizes the KL distance between the emission data d and V Um 'x. where 'P truc is known. Specihcally, given an initial guess x tn Â’Â°) > 0. the iteration for obtaining ahÂ”+i) is (n,m+l) ,(n,m) i ' [/C< n) P c ] 0 -xf ,m) [ICWV'xM + p\i j = 1,2,.. . , J , (6.14) for m = 0, 1, 2, ... , M\. The current estimate of x is defined to be x ("+ 1 ) = x (Â«,ati+i) ^ the M[ h iterate of (6.14). One of the issues surrounding the constrained minimization problem in (6.13) is that it is impossible to estimate the scatter matrix Kwhen all of the elements in the matrix are assumed to be unknown. The reason is because there would be too many unknowns, which would result in the problem being under-determined (note: the dimension of /C is / x J, while there are only I data points). To reduce the number of unknowns in AC, we assume that = 0 if the detector pairs i\ and are not in the same projection (see Figure 1 3 for the definition of a projection). This assumption means that an annihilation that would have been recorded by a detector pair within a certain projection if both photons had not undergone Compton scattering cannot be recorded by a detector pair within some other projection. Under the stated assumption, the number of unknown parameters in the matrix K. is dramatically reduced.

PAGE 79

71 Moreover, /C is a block diagonal matrix with T x T sub-matrices along its diagonal /Ci 0 0 0 0 /C 2 0 0 0 0 0 0 0 0 K. s (6.15) where T is the number of detector pairs within each projection (e.g., 160), S is the number of projection angles (e.g., 192), and it is assumed that emission counts of the detector pairs within a projection are placed as a Â“chunkÂ” in the emission data d (i.e. , d = [(1 st projection), ( 2 nd projection), . . . , ( I th projection)]'). Since K, is a block diagonal matrix, the minimization problem in (6.13) can be broken into S sub-problems. Consider the sinogram of d , denoted by jV, which is a T x S matrix. The s th column of the sinogram y is the projection that corresponds to a certain projection angle. Figure 6 2 (a) shows an example of a sinogram. In the figure, the sinogram of the 14 minute emission data for plane 21 is shown. The /xl vector V c x ( ' n+1 ^ is known in the PET community as the forward projection of x^ n+l \ We define a matrix W ( n+1 '> where the first column is the first S elements of 'P c ab" + b and the second column contains the (S + l) th to (2 S) th elements of V c x^ n+1 ^ and so on. In other words, the ordering of the detector pairs associated with Wk" + b and y match. Figure 6 2 (b) shows an example of VV^" +1 \ where a^ n+ b was generated by running the MLEM algorithm for 1000 iterations on the emission data in Figure 6 2 (a). Under the assumption that Â— 0 if the detector pairs i\ and are not in the same projection, the minimization problem in (6.13) can be expressed as: for s = 1,2,. ..,5, Â£(n+i) ^ arg mia KL {y a ,K a w K,>V (n+1) s + Z s ) T subject to Y1K.U < 1 for i 2 = 1, 2, . . . , T , *i=i (6.16)

PAGE 80

72 (a) Emission Data (b) Forward Projection Figure 6 2: Sinograms: (a) 14 minute emission data for plane 21 (i.e., y) and (b) forward projection of at (n+1) (i.e., W (n+1) = V c x ( n+ x) ), where at (u+1) is the 1000 t/l MLEM iterate using the emission data in (a). Note that the images were adjusted with their own dynamic range.

PAGE 81

73 where z s = [/0{( s -iyr+i} P{( s -i)r+ 2 } . . Â• P{ s t}]', Â£ s is the s th sub-matrix of /C, and y s and w {' 1 1 1J are the s th column of y and W (n+1 \ respectively. Since there are T x T unknowns in each minimization problem in (6.16), there would be still too many unknowns. To reduce the number of unknowns, we assume that, for all s, K s is a Toeplitz matrix. An example of a Toeplitz matrix is 03 Â«2 a 1 0 0 0 ( 2 4 Â«3 a 2 a 1 0 0 <25 ci 4 a 3 a\ 0 0 a 4 <*2 Â«i 0 0 (35 CI4 03 02 0 0 0 0-5 cl 4 03 The assumption that /C s is a Toeplitz matrix implies that there are at most T unknowns in each sub-matrix K. s . Moreover, the assumption means that a single kernel can account for scatter within each projection. The assumption that JC S is a Toeplitz matrix can be justified for regions with approximately uniform attenuation, such as the brain. Consider Figure 6 3 in which a simplified PET scanner consisting of four detector pairs is depicted. In the figure, the dotted circle defines a region with uniform attenuation. We refer to the detector pairs (ai,6i), (a 2 ,b 2 ), (a 3 ,b 3 ), and (a 4 ,b 4 ) as the first, second, third, and fourth detector pairs, respectively. Note that the geometry of the first and second detector pairs and the geometry of the second and third detector pairs are approximately same. Because of the approximately uniform attenuation of the subject and geometric similarity of the detector pairs, it can be said that the conditional probabilities K 2 \ and K 32 are approximately same. Using this rationale, for a projection of a PET scanner, it can be assumed that ~ JC i3i2 , when (i 2 Â— i\) = [i 3 Â— * 2 ). Thus, we can construct KL S that can be approximated as a Toeplitz matrix.

PAGE 82

74 Figure 6 3: Geometry of a simplified PET image reconstruction problem: three voxels (ui, t>2, an d v 3 ) and four detector pairs ((ai,&i), (02,62), (03,63), and (a 4 ,6 4 )). The dashed lines dehne the tubes of the detector pairs. The dotted circle defines a region with uniform attenuation. Now, consider the following functions: for an integer t. Va{t) = [ V s ]t , t = l,2,...,T 0, otherwise (6.17) wÂ« n+1, (t) A [miÂ” +1) ] t , t = 1,2 ,...,T 0, otherwise (6.18) [z s ] t , t Â— 1, 2, . . . , T 0, otherwise (6.19) Under the assumption that K s is a Toeplitz matrix, y s (t) is approximately equal to the convolution of wi n+1 \t) and an unknown nonnegative function, denoted by k[ s>T )(t), that depends on the s th projection angle. Thus, for t = 1, 2, . . . , T, T Â— 1 Vs{t) ~ {k(s,r) + z s (t) = k M (u)wl n+l) (t Â— u) + z a (t) , ( 6 . 20 ) u=Â— r+l

PAGE 83

75 where k( s , T )(t ) Â€ [0, 1] for an integer t and /q SjT )(i) = 0 for |t| > r. The parameter r is defined by the user, but must satisfy the constraint r < [(T + l)/2j . Observing would have been recorded by detector pair ( t Â— u) if both photons produced by the annihilation had not undergone Compton scattering and they had propagated in exactly opposite directions, but instead are recorded by detector pair t. The parameter r restricts the number of detector pairs to be convolved. In principle, the parameter r is to be chosen in such a way that r is proportional to the mean number of scatter events. Since the mean number of scatter events is proportional to the attenuation of the material, it is possible that attenuation map could be used to determine the parameter r. The convolution in (6.20) can be expressed in a matrix notation: for (6.20), the product k( S!T )(u)wi n+1 \t Â— u ) equals the proportion of photon pairs that (h.,T) * +1) m = [B^k is , r) ] t , t = 1,2, ... ,r, (6.21) where fc( S)T ) = [^(s,t)(~tÂ’ + 1) fc( s ,r )( Â— T + 2) Â• Â• Â• k( s ^ T ){r Â— 1)]' and is a known T x (2 r Â— 1) matrix that is of the form 0 0 0 0 Thus, y s can be approximated as y ( 6 . 22 )

PAGE 84

76 With (6.22), the minimization problem in (6.16) is now iS n + 1 ') A = arg mm k {a , t )>0 KL (y s ,Bl n+ Vk {S}T) + z s ) 2 r Â— 1 subject to [fc( s> r)]t < 1 Â• t=l (6.23) The constrained optimization problem in (6.23) is difficult to solve because of the constraint < 1Consequently, we first solve the following optimization problem: * ", t) = ar S , min KL (?/ S > #iÂ” +1) *W) + z a ) . fc ( ,, r )>0 (6.24) Then, to get kjÂ™ T) , we normalize so that the sum of its elements equals one: t = l,2,...,(2r1) . (6.25) q s ,T) > a ,Â«(n+1) F(s,r) I* (n+ 1 )-] A [^(s.r) it yitr-lrTfr+lh 2 -jt = 1 l K (s,r) Jt The minimization problem in (6.24) can be solved by the MLEM algorithm [16,26] because it has the same form as the optimization problem in Step 1. Specifically, given an initial estimate > 0. the iteration is as follows: for m = 0, 1, . . . M 2 , d n Â’ m )l T rÂ«(n+l)i (n,ra+l)i l *W) . , j = l,2,...,(2r-l). (6.26) ErÂ„[sSÂ” +1) ]Â«1rr[8S" +l >fc!;;Â” , + ^ 1 . We define k[Â™l^ = and normalize using (6.25) to get fc|" +1) Â“(s.t) I '(s.t)

PAGE 85

77 Given we determine ICi" 1 " by using the following equation: fc(n+ 1) _ \k {n+1) 1 J T LK (s,r) J T _i J t+ 1 L K (s,t) J t \k {n+ } ] } [K (s,r) J 2r _i >2t-2 0 \k {n+1) ] Â• \k (n+ } ) ] 1 K (s,t) i 2r _ 1 \ S > T ) T+l 0 0 0 0 0 0 0 [iu( n + 1 )l [K (s,r) l T . (6.27) Finally, is defined as a block diagonal matrix with {/Ci" +1 ^} along its diagonal: ]Q(n+l) IC[ n+1) 0 0 0 o /c { 2 n+1) o o (6.28) 0 0 0 o oo /dÂ” +1) Repeating Steps 1 and 2 generate the estimates of K and x. Once the estimate of K is available, denoted by /C, a regularized image reconstruction algorithm, such as the PML, APML, and QEP algorithm, can be used to estimate the emission mean vector x. More specifically, we first let our estimate for 'p lvnc be V = 1CV C . Then, the probability matrix V is used in the image reconstruction algorithm of choice. In summary, the Steps of the PCiPS algorithm are: for n = 0, 1, 2, . . . Â• Step 1 Get an initial estimate for the emission mean vector x^ > 0 and an initial estimate for the scatter matrix K , ^ Â• Step 2 Get the current estimate of the emission mean vector x < ' n+1 ' > using K^ n \ which is the current estimate of the scatter matrix fC: ^(n+1) _ ar g m j n KL (d, K^V c x + p) . (6.29) *>o Â• Step 3 For s Â— 1, 2, . . . , S, get using (6.26).

PAGE 86

78 Â• Step 4 For s = 1,2 , . . . ,S, normalize using (6.25) to get Â• Step 5 For s = 1,2 , . . . , 5, get JC iÂ” +1) using (6.27). Â• Step 6 Get /G" +1 ) using (6.28). Â• Step 7 Repeat Steps 2 through 6 for a chosen number, M, of iterations. Â• Step 8 Define P " ' = /c( M+1 )p c and use f> j n the APML algorithm or some other algorithm of choice.

PAGE 87

CHAPTER 7 SIMULATIONS AND EXPERIMENTAL STUDY To evaluate the algorithms in Chapters 3, 4, and 5, and the method in Chapter 6, we applied them to real thorax phantom data and compared them quantitatively and qualitatively to certain existing algorithms. Also, in Section 7.1 simulation studies with computer-generated synthetic data are presented for the PML and QEP algorithms in Chapters 3 and 5, respectively. It should be noted that simulation results with the synthetic data have limitations because they are generated under the assumption that the system model for emission data in Section 1.4 is exactly correct. However, the system model is not perfect due to errors discussed in Section 1.3. Thorax phantom data was obtained from the PET laboratory at the Emory University School of Medicine. The phantom was filled with 2-[ 18 F]fluoro-2-deoxyd-glucose ([ 18 F]FDG) and scanned using a Siemens-CTI ECAT EXACT [model 921] scanner in slice-collimated mode (i.e., septa-extended mode). Thirty independent data sets were generated from multiple scans of duration 7 minutes. Fifteen realizations of 14 min data were generated by adding non-overlapping two 7 min data sets. The [ 18 F]FDG concentration for the heart wall, heart cavity, liver, three tumors, and thorax cavity of the thorax phantom by Data Spectrum Inc., were 0.72/xCi/ml, 0.23/rCi/ml, 0.72/iCi/ml, 2.01//Ci/ml, and 0.24/rCi/ml, respectively. The lungs, which contained styrofoam beads, were filled with a 0.25/iCi/ml solution of [ 18 F]FDG. The concentrations were chosen to mimic those observed in whole-body scans. The tumors were of size 1 cm, 1.5 cm, and 2 cm. The sinogram consists of 160 radial bins and 192 angles. The physical dimensions of the image space is 43.9 x 43.9 cm 2 and the reconstructed images contain 128 x 128 voxels (voxel size is 3.43 x 3.43 79

PAGE 88

80 mm 2 ). Two planes (10 and 21) were considered in the experiments. Plane 10 contains activity due to the heart, lungs, spine, and background, while plane 21 contains activity due to the heart, two tumors (1.5 cm and 2.0 cm), and background. The total number of prompts for planes 10 and 21 was 397, 000 and 340, 000, respectively, for 14 minute data. The randoms makeup about 10% and 12% of the data for planes 10 and 21, respectively. The probability matrix V was computed using the angle-of-view method [16] with corrections for errors due to attenuation and detector inefficiency. To get the attenuation correction factors, post-injected transmission scan data was collected for three minutes and the attenuation correction method by Anderson et al. [9] was employed. A normalization file was used to correct for detector inefficiency. Finally, the randoms were used as noise free estimates of the mean numbers of accidental coincidences. For all of the experiments and simulations, we used a uniform initial estimate (all voxels equal J), the eight nearest neighbors of the j th voxel were used for Nj, and the weights, {iHjk}are one for horizontal and vertical nearest neighbors and l/\/2 for diagonal nearest neighbors. In Section 7.1, we applied the PML and QEP algorithms to real thorax phantom data and computer-generated synthetic data, and compared them quantitatively and qualitatively. Then, performance of the APML algorithm will be evaluated in Section 7.2. Finally, experimental results with the probability estimation method in Chapter 6 will be presented in Section 7.3. 7.1 Regularized Image Reconstruction Algorithms In this section, we compare the PML and QEP algorithms, quantitatively and qualitatively, to the MLEM algorithm and a penalized weighted least-squares (PWLS) algorithm [42], Two ad-hoc forms of regularization include the post-filtering of MLEM estimates and early termination of the MLEM algorithm (usually quite far from the

PAGE 89

81 MLEM estimate). Given their simplicity, we also compared the post-filtering (MLEMF) and early-stopping (MLEM-S) strategies to the proposed algorithms. Quantitative comparisons were made using contrast as a figure-of-merit: Contrast 4 ' ~ Mfl l , (7.1) M b where M bo i and Mb denote the mean of a chosen region-of-interest (ROI) and background, respectively. We define another figure-of-merit that quantifies the distinguishability of the two tumors: Distinguishability = ^ , (7.2) \lVl r p Â— IVIb I where Mr and Mj denote the mean activity of the two tumor regions and intermediate region between the two tumors, respectively. If two tumors overlap each other, then Mj = M T and the distinguishability will be zero. On the other hand, if intermediate region between the two tumors has the same mean as the background, then the distinguishability will be one. For image comparisons, converged images were used for the MLEM, PML, and PWLS algorithms. The QEP images result by running the QEP algorithm for the same number of iterations as the PML algorithm. This was necessary because the QEP algorithm does not have a single objective function. Recall that the QEP algorithm defines a new objective function to be minimized at each iteration. For the PML, QEP, and PWLS images, the penalty parameter, (3, was chosen in such a way that the standard deviation of their soft-tissue (i.e. , background) regions were equal. In this sense, the algorithms are Â“balancedÂ” with respect to (3. For the MLEM-S and MLEM-F images, early MLEM iterates and filtered converged MLEM images were obtained that matched the standard deviation of the soft-tissue regions in the PML, QEP, and PWLS images. To filter the MLEM image, we used 5x5 Gaussian Liters with different standard deviation values.

PAGE 90

82 7.1.1 Synthetic Data In this subsection, we present simulation results for software phantoms. Fifty realizations were used in the simulation study. To generate emission data, a software phantom was forward projected (i.e., Vx) using the V matrix, where it was assumed that there are no errors except accidental coincidences. Then, for each bin, the prompts and randoms were generated using pseudo-random Poisson variates with mean [Px \ * + p and p, respectively. The constant p was chosen such that the mean accidental coincidence rate was approximately 10%. The mean number of prompts and randoms are about 550, 000 and 50, 000, respectively. For simulation studies, dimensions of the image space follow the corresponding ones of the real phantom image space. The total number of intensities within a software phantom was about 500, 000. We first consider a tumor phantom. Figure 7 1 shows a software phantom that consists of two tumors (1.7 cm and 2.4 cm in diameter) in a uniform circular background with a diameter of 30.5 cm, where two tumors and background intensities are 7 x 74 and 74, respectively (tumor contrast is 6). The image also depicts regions for contrast and distinguishability calculation. The intermediate region between the two tumors consists of 14 voxels and the large and small tumors consist of 45 and 21 voxels, respectively. For the tumor phantom in Figure 7 1, we used the log-cosh penalty function A (t) = log(cosh(|)) with 5 = 20 in the PML and QEP algorithms, while the quadratic penalty function was used in the PWLS algorithm. For the QEP algorithm, 77(f) = Â£tanh(|) was used with Â£ = 150 unless noted. The parameters 5 and Â£ were chosen experimentally. Figure 7 2 is a plot of the images obtained by the MLEM, MLEM-S, MLEM-F, PML, QEP, and PWLS algorithms when the tumor phantom in Figure 7 1 was used.

PAGE 91

83 The MLEM image is the 500 f/ ' MLEM iterate and 200 iterations were used to reconstruct the PML, QEP, and PWLS images. For the PML, QEP, and PWLS images, / 3 was chosen such that the standard deviation of the background was approximately 12. The MLEM-S image is the 20 th MLEM iterate. To get the MLEM-F image, the MLEM image in Figure 7 2 (a) was filtered once using a 5 x 5 Gaussian filter with standard deviation of 1.27 in voxels. As stated, all of the images in Figure 7 2 have the same background standard deviation except for the MLEM image. The MLEM image in Figure 7 2 (a) is considerably noisy (background standard deviation is about 78) compared to the other images. Figures 7 2 (d) and (e) illustrate that the PML image and the QEP image are smooth and, at the same time, the tumors in the images are resolvable and differ greatly from the background. On the other hand, Figures 7 2 (c) and (f) demonstrate that the images generated by the MLEM-F and PWLS algorithm are too smooth, especially near the boundary of the tumors. Figure 7 2 (b) shows that edges of the tumors in the MLEM-S image are not as clear as the ones in the PML and QEP images. In Figure 7 3, the images in Figure 7 2 are plotted with their own dynamic range. For the images in Figure 7 2, the contrast of the QEP image was -1%, 12%, 21%, 4%, and 22% higher than the MLEM, MLEM-S, MLEM-F, PML, and PWLS images, respectively, for the large tumor. The increased contrast of the QEP image for the small tumor with respect to the MLEM, MLEM-S, MLEM-F, PML, and PWLS images was Â—4%, 16%, 31%, 5%, and 34%, respectively. The increased tumor distinguishability of the QEP image with respect to the MLEM, MLEM-S, MLEMF, PML, and PWLS images was Â—5%, 16%, 34%, 5%, and 40%, respectively. The MLEM image outperformed the QEP image in contrast and distinguishability comparison. However, as can be seen in Figure 7 2 (a), the MLEM image is extremely noisy.

PAGE 92

84 Figures 7 4 (a) and (b) are line plots (the row is shown in Figure 7 1) of the images in Figure 7 2. For the row under consideration, it can be seen from the line plots that the PML and QEP images have a higher degree of contrast than the other images, except for the MLEM image. And. the edges in the QEP image are sharper than those in the PML image. As expected, the MLEM image is excessively noisy from the line plot. Figures 7 5 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifty realizations, respectively, when the tumor phantom in Figure 7 1 was used. Further, a plot of the average standard deviation of the large tumor versus the average background standard deviation for the fifty realizations is shown in Figure 7 5 (c). Finally, in Figure 7 5 (d), a plot of the average distinguishability of two tumors versus the average background standard deviation for fifty realizations is shown. As can be seen from Figures 7 5 (a), (b), and (d), the contrast curves and the distinguishability curve of the QEP algorithm lie above the curves of the other algorithms for comparable Â“background noiseÂ”. Thus, for a fixed level of comparable background noise, the QEP images, on average, have the greatest contrast and distinguishability. The average standard deviation curves of the PML and QEP algorithms in Figure 7 5 (c) generally lie below the corresponding curves of the other algorithms for reasonably small background noise. To see where the PML and QEP algorithms break down in terms of contrast, we performed simulations using the synthetic tumor phantom in Figure 7 1 with four different tumor contrast values (tumor contrast equals 3, 1.5, 0.75, and 0.5). For the PML and QEP algorithms, j3 Â— 1/16 and 1/32 were used, respectively, and 200 iterations were used. For the QEP algorithm, Â£ = 150 was used for tumor contrast of 3, whereas Â£ Â— 80 was used for the other tumor contrast values (i.e., tumor contrast equals 1.5, 0.75, and 0.5). The parameters (3 and Â£ were chosen

PAGE 93

85 experimentally. Figures 7 6 (a) and (b) are the PML and QEP images, respectively, obtained by using the phantom in Figure 7 1 with tumor contrast of 3. As can be seen in the figures, when tumor contrast was 3, the tumors in the PML and QEP images are clearly resolvable. Figures 7 6 (c) and (d) are the PML and QEP images, respectively, obtained by using the phantom in Figure 7 1 with tumor contrast of 1.5. From the figures, the tumors in the PML and QEP images are clear enough when tumor contrast was 1.5. It should be mentioned that the images in Figure 7 6 have the same standard deviation of the background that is approximately 10. Figures 7 7 (a) and (b) are the PML and QEP images, respectively, obtained by using the phantom in Figure 7 1 with tumor contrast of 0.75. From the figures, the tumors in the PML and QEP images are still resolvable when tumor contrast was 0.75. Consider the PML and QEP images in Figures 7 7 (c) and (d), respectively, that are obtained by using the phantom in Figure 7 1 with tumor contrast of 0.5. The tumors in the PML and QEP images are hardly resolvable which implies that the PML and QEP algorithms break down when tumor contrast was 0.5. The images in Figure 7 7 have the same standard deviation of the background that is approximately 9. Figures 7 8 (a), (b), (c), and (d) are plots of the average contrast of the small tumor versus the average background standard deviation using fifty realizations, when tumor contrast of the phantom in Figure 7 1 were 3, 1.5, 0.75, and 0.5, respectively. As can be seen from Figures 7 8 (a) and (b), the contrast curves of the QEP algorithm he above the curves of the PML algorithm for a fixed background noise when tumor contrast equals 3 and 1.5. For tumor contrast of 0.75 and 0.5, the curves of the PML and QEP algorithm coincide as can be seen in Figures 7-8 (c) and (d) which implies that the PML and QEP algorithms perform similar to each other when tumor contrast is small enough. To see where the PML and QEP algorithms break down in terms of spacing between two tumors, we performed simulations using four different synthetic tumor

PAGE 94

86 phantoms where each phantom has different tumor spacing. We compared the PML and QEP images both visually and in terms of distinguishability. Figures 7 9 (a), (b), and (c) show software phantoms that consist of two tumors (each tumor equals 3x3 voxels). For the phantoms in Figures 7 9 (a), (b), and (c), tumor contrast is 3 and the spacing between two tumors is 2, 3, and 4 voxels, respectively. Figure 7 9 (d) shows a software phantoms that consists of two tumors (each tumor equals 3x3 voxels) with the spacing of 2 voxels, where tumor contrast is 6. The images in Figure 7 9 also depict regions for distinguishability calculation. The intermediate regions between the two tumors consist of 6, 8, 10, and 6 voxels for the phantoms in Figures 7 9 (a), (b), (c) and (d), respectively. Figure 7 10 is a plot of the PML and QEP images obtained by using the phantoms in Figures 7 9 (a) and (b). Figure 7 11 is a plot of the PML and QEP images obtained by using the phantoms in Figures 7 9 (c) and (d). The PML and QEP images in Figures 7 10 and 7 11 were from 200 iterations. The PML and QEP images in Figures 7 10 and 7 11 have the same standard deviation of the background that is approximately 10 and 9, respectively. For the PML and QEP images in Figures 7 10 and 7 11, (3 = 1/16 and 1/32 were used, respectively. Figure 7 10 indicates that the PML and QEP algorithms generate images where the tumors are clearly separated when the spacing between the tumors is 3 and 4 in voxels. As can be seen in Figures 7 11 (a) and (b), the PML and QEP algorithms were not able to resolve the two tumors when the spacing between the tumors is 2 in voxels. However, when the tumor contrast was increased, the PML and QEP algorithms worked well when the spacing between the tumors is 2 in voxels as shown in Figures 7 11 (c) and (d). Figure 7 12 is a plot of the average distinguishability of two tumors versus the average background standard deviation using fifty realizations, when the tumor phantoms in Figure 7 9 were used. As can be seen from Figures 7 12 (a), (b), (c), and (d), the distinguishability curves of the QEP algorithm lie above the curves of the

PAGE 95

87 Figure 7 1: A software tumor phantom is shown. Contrast of the tumors in the phantom is 6. The regions surrounded by the dotted and dashed lines define the tumor intermediate region (i.e., M/) and background region, respectively. PML algorithm for a fixed background noise. Thus, for a fixed level of background noise, the QEP images, on average, have greater distinguishability. 7.1.2 Real Data In this subsection, we present experimental results using real phantom data for plane 21. Unless noted, the data are from 14 minute scans. The image in Figure 7 13, which was produced by averaging fifteen converged MLEM images, depicts the regions that were used in the contrast and distinguishability calculations. In the large and small tumor ROIs and intermediate region, the number of voxels were 24, 12, and 16, respectively. In the PML and QEP algorithms, we used the log-cosh penalty function A (t) = log(cosh(|)) with 5 = 50. Since noise in real data is stronger than the synthetic data due to errors (e.g., scatter), we increased the value of the parameter 5 to reduce background noise (observe that we used 5 = 20 for the synthetic phantom in Figure 7 1). For the QEP algorithm, r](t) = Â£ tanh(|) was used with Â£ = 500 and 1000 for 7 minute and 14 minute real data, respectively. We varied Â£ because the data sets lead to reconstructed images that have different edge Â“heightsÂ”. As in the simulations

PAGE 96

88 (a) MLEM (b) MLEM-S m m (c) MLEM-F (d) PML (e)QEP (f) PWLS Figure 7 2 : Comparison of emission images when the synthetic phantom in Figure 7 1 was used: (a) MLEM image, (b) MLEM-S image, (c) MLEM-F image, (d) PML image, (e) QEP image, and (f) PWLS image. The images in (a) and (b) are from 500 and 20 iterations, respectively, while the images in (d), (e), and (f) are from 200 iterations. The image in (c) was obtained from filtering the MLEM image once with a 5 x 5 Gaussian filter with standard deviation of 1.27 in voxels. For the PML, QEP, and PWLS images, /? was chosen in such a way that the standard deviation of the background is approximately 12. Specifically, (3 = 0.0415, 0.021, and 0.006 for the PML, QEP, and PWLS images, respectively. The standard deviation of the background of images in (b) and (c) is also approximately 12. For display purposes, all the images were adjusted so that they have the same dynamic range.

PAGE 97

89 (a) MLEM (b) MLEM-S (c) MLEM-F (d) PML Â• Â• (e) QEP (f) PWLS * * Figure 7 3: The images in Figure 7 2 are shown with their own dynamic range.

PAGE 98

90 (a) (b) Figure 7-4: A line plot comparison of the reconstructed images in Figures 7-2 (a), (b), and (c) is shown in (a). A line plot comparison of the reconstructed images in Figures 7 2 (d), (e), and (f) is shown in (b).

PAGE 99

91 (c) Large Tumor (d) Tumors Figure 7 5: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively, when the synthetic phantom in Figure 7 1 was used. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifty synthetic data realizations were used in the study. For the MLEM-S curves, the images from iterations 5 Â— 160 were used. For the MLEM-F curves, the MLEM images were filtered once by 5 x 5 Gaussian filters with a standard deviation range of 0.44 Â— 3.0 voxels (each voxel is 3.43 x 3.43 mm 2 ). For the PML, QEP, and PWLS algorithms, the images were reconstructed using two hundred iterations for (3 = 2 1 Â— 2Â“ 9 , 2~ 4 Â— 2~ 9 , and 2" 4 Â— 2 -12 , respectively.

PAGE 100

92 (a) PML,Contrast=3 (b) QEP,Contrast=3 (c) PML, Contrasts. 5 (d) QEP, Contrasts. 5 Figure 7 6: Comparison of emission images: (a) PML image and (b) QEP image when tumor contrast of the phantom in Figure 7 1 was 3, and (c) PML image and (d) QEP image when tumor contrast of the phantom in Figure 7 1 was 1.5. All images are from 200 iterations and they have the same background standard deviation that is approximately 10. For the PML and QEP images, f3 = 1/16 and 1/32 were used, respectively. For the QEP images in (b) and (d), Â£ = 150 and Â£ = 80 were used, respectively. For display purposes, each set of images (i.e., (a) and (b), (c) and (d)) were adjusted so that they have the same dynamic range.

PAGE 101

93 (a) PML,Contrast=0.75 (b) QEP,Contrast=0.75 (c) PML, Contrasts. 5 (d) QEP,Contrast=0.5 Figure 7 7: Comparison of emission images: (a) PML image and (b) QEP image when tumor contrast of the phantom in Figure 7 1 was 0.75, and (c) PML image and (d) QEP image when tumor contrast of the phantom in Figure 7 1 was 0.5. All images are from 200 iterations and they have the same background standard deviation that is approximately 10. For the PML and QEP images, (5 = 1/16 and 1/32 were used, respectively. For the QEP images in (b) and (d), Â£ = 80 was used. For display purposes, each set of images (i.e., (a) and (b), (c) and (d)) were adjusted so that they have the same dynamic range.

PAGE 102

94 (a) Contrast=3 (b) Contrasts. 5 (c) Contrasts. 75 (d) Contrast=0.5 Figure 7 8: Plots of the average contrast of the small tumor versus the average background standard deviation are shown in (a), (b), (c), and (d) when the synthetic phantom in Figure 7 1 was used with tumor contrast of 3, 1.5, 0.75 and 0.5, respectively. Fifty synthetic data realizations were used in the study. For the PML and QEP algorithms, the images were reconstructed using two hundred iterations for (3 = 2 _1 Â— 2~ 9 . For the QEP curves in (b), (c), and (d), Â£ = 80 was used, whereas Â£ = 150 was used for the QEP curves in (a).

PAGE 103

95 (a) Distance=4,Contrast=3 (b) Distance=3,Contrast=3 I I I 1 I J I J : I 1 I 1 I J I J (c) Distance=2,Contrast=3 (d) Distance=2,Contrast=6 I 1 I J I 1 I I Figure 7 9: Software tumor phantoms are shown for different spacing between two tumors. For the phantoms in (a), (b), (c), and (d), spacing between tumors are 2, 3, 4, and 2 in voxels, respectively. Tumor contrast is 3, 3, 3, and 6 for the phantoms in (a), (b), (c), and (d), respectively. The regions surrounded by the dotted and dashed lines define the tumor intermediate region (i.e., Mj ) and background region, respectively. The images are shown with their own dynamic range.

PAGE 104

96 (a) PML,Distance=4,Contrast=3 (b) QEP,Distance=4,Contrast=3 (c) PML,Distance=3,Contrast=3 (d) QEP,Distance=3,Contrast=3 Figure 7 10: Comparison of emission images: (a) PML image and (b) QEP image when the phantom in Figure 7 9 (a) was used, and (c) PML image and (d) QEP image when the phantom in Figure 7 9 (b) was used. All images are from 200 iterations and they have the same standard deviation of the background that is approximately 10. For the PML and QEP images, (3 = 1/16 and 1/32 were used, respectively. For display purposes, each set of images (i.e., (a) and (b), (c) and (d)) were adjusted so that they have the same dynamic range.

PAGE 105

97 (a) PML,Distance=2,Contrast=3 (b) QEP,Distance=2,Contrast=3 (c) PML,Distance=2,Contrast=6 (d) QEP,Distance=2,Contrast=6 Figure 7 11: Comparison of emission images: (a) PML image and (b) QEP image when the phantom in Figure 7 9 (c) was used, and (c) PML image and (d) QEP image when the phantom in Figure 7 9 (d) was used. All images are from 200 iterations and they have the same standard deviation of the background that is approximately 9. For the PML and QEP images, (3 = 1/16 and 1/32 were used, respectively. For display purposes, each set of images (i.e., (a) and (b), (c) and (d)) were adjusted so that they have the same dynamic range.

PAGE 106

98 (a) Distance=4, Contrast=3 (b) Distance=3, Contrast=3 (c) Distance=2, Contrast=3 (d) Distance=2, Conlrast=6 Figure 7 12: Plots of the average distinguishability of two tumors versus the average background standard deviation are shown in (a), (b), (c), and (d) when the synthetic phantoms in Figures 7 9 (a), (b), (c), and (d) were used, respectively. Fifty synthetic data realizations were used in the study. For the PML and QEP algorithms, the images were reconstructed using two hundred iterations for /3 = 2 -1 Â— 2Â“ 9 .

PAGE 107

99 with the synthetic phantom, the quadratic penalty function was used in the PWLS algorithm. Note, the parameters 6 and Â£ were chosen experimentally. Figure 7 14 is a plot of the images for plane 21 that were obtained by the MLEM, MLEM-S, MLEM-F, PML, QEP. and PWLS algorithms. The number of iterations used to reconstruct the MLEM, PML, and PWLS images was 500, 200, and 200, respectively. Figure 7 15 shows that the PML and QEP images for different iteration numbers when a 14 minute data set for plane 21 was used. As can be seen in Figure 7 15, early iterates of the PML and QEP algorithms can be used because 100 t/l iterates are not much different from 500 t/l iterates. As in the simulated data, the number of iterations for the QEP algorithm was set to the number of iterations used for the PML algorithm. For the PML, QEP, and PWLS images in Figure 7 14, (3 was chosen so that the standard deviation of their backgrounds are approximately same (background standard deviation is approximately 68). Using 9 iterations and repeating the application of a 5 x 5 Gaussian filter two times yielded the MLEM-S and MLEM-F images with a background standard deviation of 68. Figures 7 14 (d) and (e) illustrate that the tumors in the PML image and the QEP image are resolvable and differ significantly from the background. On the other hand, the tumors are not as distinct in the images produced by the MLEM-S, MLEM-F, and PWLS algorithms (see Figures 7 14 (b), (c), and (f)). As expected, the MLEM image in Figure 7 14 (a) is considerably noisier and grainier than the other images. In Figure 7 16, the images in Figure 7 14 are plotted with their own dynamic range. The true contrast of the small and large tumors was 7.38. Due to finite resolution effects, all of the algorithms underestimate the contrast [68]. Consequently, the algorithm that produces the greatest contrast is to be preferred. For the images in Figure 7 14, the large tumor contrast of the QEP image was Â—14%, 37%, 32%, 3%, and 34% higher than the MLEM, MLEM-S, MLEM-F, PML, and PWLS images,

PAGE 108

100 respectively. The increased contrast of the QEP image for the small tumor with respect to the MLEM, MLEM-S, MLEM-F, PML, and PWLS images was Â—28%, 46%, 46%, 9%, and 43%, respectively. The increased tumor distinguishability of the QEP image with respect to the MLEM, MLEM-S, MLEM-F, PML, and PWLS images was Â—8%, 20%, 49%, 5%, and 35%, respectively. Although the contrast for Â“trueÂ” MLEM images (i.e., images obtained when the MLEM algorithm is not terminated early) can be quite large, the noise level of these images limits their practical use. Figure 7 17 is a line plot (the row is shown in Figure 7 13) of the images in Figures 7 14 (b), (d), (e), and (f) (the image in Figure 7 14 (a) is too noisy and the image in Figure 7 14 (c) is too smooth). For the row under consideration, it can be seen from the line plots that the PML and QEP images have a higher degree of contrast than the other images. And, the edges in the QEP image are sharper than those in the PML image. Figures 7 18 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifteen realizations for plane 21, respectively. Further, plots of the average standard deviation of the large tumor and the average distinguishability of two tumors versus the average background standard deviation for the fifteen realizations are shown in Figures 7 18 (c) and (d), respectively. Just like the contrast curves in the synthetic data simulations, the contrast curves of the QEP algorithm lie above the corresponding curves of the other algorithms. Thus, for a fixed level of background noise, the QEP images, on average, have the greatest contrast. Not too surprising, the large tumor average standard deviation curves of the QEP algorithm in Figure 7 18 (c) generally lie above the corresponding curves of the PML algorithm. The PWLS algorithm produced large tumor standard deviation curves that lie below the corresponding curves generated by the other algorithms. However, the degree of smoothing produced by the PWLS algorithm was too great. This claim is supported qualitatively by Figure

PAGE 109

101 7 14 (f) and quantitatively by the contrast plots in Figures 7 18 (a) and (b). The distinguishability curves of the PML and QEP algorithms lie above the curves of the other algorithms when the background standard deviation is less than 300. Above that point, the distinguishability curves of the PML and QEP algorithms lie below the curves of the other algorithms except the PWLS curve. However, images with background standard deviation greater than 300 are too noisy for practical use. Thus, for practical noise levels, the PML and QEP images, on average, have the greatest contrast and distinguishability. To see how the algorithms would perform in shorter duration protocols, we now consider fifteen realizations of 7 minute real phantom data for plane 21. Figures 7 19 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifteen realizations, respectively. A plot of the average standard deviation of the large tumor versus the average background standard deviation for the fifteen realizations is shown in Figure 7 19 (c). Also, in Figure 7 19 (d), we provide a plot of the average distinguishability of two tumors versus the average background standard deviation for fifteen realizations. As in the experiments with the 14 minute real phantom data, it is evident that the PML and QEP algorithms outperform the MLEM-S, MLEM-F, and PWLS algorithms in terms of contrast recovery and tumor distinguishability. Figure 7 20 is a plot of the images obtained by the MLEM, MLEM-S, MLEM-F, PML, QEP, and PWLS algorithms for a 7 minute data set for plane 21. The number of iterations used to reconstruct the MLEM, PML, QEP, and PWLS images was 500, 200, 200, and 200, respectively. For the PML, QEP, and PWLS images in Figure 7 20, (3 was chosen so that the standard deviation of their backgrounds are approximately 38. Using 8 iterations and repeating the application of a 5 x 5 Gaussian filter two times yielded the MLEM-S and MLEM-F images with a background standard deviation of 38. As in the experiments with the 14 minute data, the tumors in the PML image

PAGE 110

102 Figure 7 13: The mean of 15 MLEM emission images reconstructed from 14 minute data for plane 21. The boxes indicate the regions used to compute the contrast and distinguishability of the tumors. The solid and dashed lines define the tumors and background regions, respectively. The region surrounded by the dotted lines defines the tumor intermediate region (i.e., Mj). The dotted line indicates the row chosen for the line plots. and the QEP image are resolvable and differ significantly from the background. In Figure 7 21, the images in Figure 7 20 are plotted with their own dynamic range. For the images in Figure 7 20, the large tumor contrast of the QEP image was -20%, 37%, 25%, 5%, and 25% higher than the MLEM, MLEM-S, MLEM-F, PML, and PWLS images, respectively. The increased contrast of the QEP image for the small tumor with respect to the MLEM, MLEM-S, MLEM-F, PML, and PWLS images was Â—58%, 34%, 27%, 5%, and 23%, respectively. The increased tumor distinguishability of the QEP image with respect to the MLEM, MLEM-S, MLEMF, PML, and PWLS images was Â—30%, 12%, 42%, 6%, and 25%, respectively. Figure 7 22 is a line plot (the row is shown in Figure 7 13) of the images in Figures 7 20 (b), (d), (e), and (f). As in the experiments with the 14 minute data, it can be seen from the line plots that the PML and QEP images have a higher degree of contrast than the other images.

PAGE 111

103 (a) MLEM (b) MLEM-S (c) MLEM-F (d) PML (e) QEP (f) PWLS Figure 7 14: Comparison of emission images when a 14 minute real phantom data for plane 21 was used: (a) MLEM image, (b) MLEM-S image, (c) MLEM-F image, (d) PML image, (e) QEP image, and (f) PWLS image. The images in (a) and (b) are from 500 and 9 iterations, respectively, while the images in (d), (e), and (f) are from 200 iterations. The image in (c) was obtained from filtering the MLEM image two times with a 5 x 5 Gaussian filter with standard deviation of 1.95 in voxels. For the PML, QEP, and PWLS images, f3 was chosen in such a way that the standard deviation of the background is approximately 68. Specifically, (3 = 2Â“ 6 , 2 -7 , and 0.00017 for the PML, QEP, and PWLS images, respectively. Note, the standard deviation of the background of images in (b) and (c) is also approximately 68. For display purposes, all the images were adjusted so that they have the same dynamic range except the MLEM image because it has very wide dynamic range.

PAGE 112

104 (a) PML-100 (b) QEP-100 * > (c) PML-200 (d) QEP-200 %Â•* (e) PML-500 (f) QEP-500 Â«4 Â«4 Figure 7 15: Iteration comparison of emission images reconstructed from using a 14 minute real phantom data for plane 21: (a) PML image using 100 iterations, (b) QEP image using 100 iterations, (c) PML image using 200 iterations, (d) QEP image using 200 iterations, (e) PML image using 500 iterations, and (f) QEP image using 500 iterations. For the PML and QEP images, /3 = 2~ 6 and 2~ 7 , respectively. For display purposes, all the images were adjusted so that they have the same dynamic range.

PAGE 113

105 (a) MLEM (b) MLEM-S (c) MLEM-F (d) PML (e)QEP (f) PWLS Figure 7 16: Emission images in Figure 7 14 are shown with their own dynamic range.

PAGE 114

106 (a) (b) Figure 7 17: A line plot comparison of the reconstructed images in Figures 7 14 (a), (b), and (c) is shown in (a). A line plot comparison of the reconstructed images in Figures 7 14 (d), (e), and (f) is shown in (b).

PAGE 115

107 (a) Large Tumor (b) Small Tumor (c) Large Tumor (d) Tumors Figure 7 18: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifteen realizations were used and 14 minute real phantom data for plane 21 was used in the study. For the MLEM-S curves, the images from iterations 5 Â— 160 were used. For the MLEM-F curves, the MLEM images were filtered once by 5 x 5 Gaussian filters with a standard deviation range of 0.44 Â— 3.0 voxels (each voxel is 3.43 x 3.43 mm 2 ). For the PML, QEP, and PWLS algorithms, the images were reconstructed using two hundred iterations for (3 = 2Â“ 4 Â— 2Â“ 12 , 2 -4 Â— 2 -12 , and 2 -12 Â— 2~ 20 , respectively.

PAGE 116

108 (a) Large Tumor (b) Small Tumor (c) Large Tumor (d) Tumors Figure 7 19: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifteen realizations were used and 7 minute real phantom data for plane 21 was used in the study. For the MLEM-S curves, the images from iterations 5 Â— 160 were used. For the MLEM-F curves, the MLEM images were filtered once by 5 x 5 Gaussian filters with a standard deviation range of 0.44 Â— 3.0 voxels (each voxel is 3.43 x 3.43 mm 2 ). For the PML, QEP, and PWLS algorithms, the images were reconstructed using two hundred iterations for (5 = 2~ 4 Â— 2Â“ 12 , 2~ 4 Â— 2 -12 , and 2 -12 Â— 2~ 20 , respectively.

PAGE 117

109 (a) MLEM (b) MLEM-S (c) MLEM-F (d) PML (e)QEP (f) PWLS Figure 7 20: Comparison of emission images when a 7 minute real phantom data for plane 21 was used: (a) MLEM image, (b) MLEM-S image, (c) MLEM-F image, (d) PML image, (e) QEP image, and (f) PWLS image. The images in (a) and (b) are from 500 and 8 iterations, respectively, while the images in (d), (e), and (f) are from 200 iterations. The image in (c) was obtained from filtering the MLEM image two times with a 5 x 5 Gaussian filter with standard deviation of 1.95 in voxels. For the PML, QEP, and PWLS images, (3 was chosen in such a way that the standard deviation of the background is approximately 38. Specifically, (3 = 0.029, 0.0145, and 0.000155 for the PML, QEP, and PWLS images, respectively. Note, the standard deviation of the background of images in (b) and (c) is also approximately 38. For display purposes, all the images were adjusted so that they have the same dynamic range except the MLEM image because it has very wide dynamic range.

PAGE 118

no (a) MLEM (b) MLEM-S (c) MLEM-F (d) PML 4k... %Â• (e)QEP (f) PWLS Figure 7 21: Emission images in Figure 7 20 are shown with their own dynamic range.

PAGE 119

Ill (a) (b) Figure 7 22: A line plot comparison of the reconstructed images in Figures 7 20 (a), (b), and (c) is shown in (a). A line plot comparison of the reconstructed images in Figures 7 20 (d), (e), and (f) is shown in (b).

PAGE 120

112 7.2 APML algorithm To evaluate the APML algorithm in Chapter 4, we applied it to plane 21 from the real thorax phantom data (unless noted, the data are from 14 minute scans) and compared it to the PML algorithm and an algorit hm by Alin and Fessler [35] called the block sequential regularized expectation-maximization (BSREM)-II algorithm. The BSREM-II algorithm is a straightforward modification of Ahn and FesslerÂ’s BSREM-I algorithm [35]. The BSREM-II algorithm results by setting any negative element of a BSREM-I iterate to a small positive number. The BSREM-I algorithm is based on the BSREM algorithm by De Pierro and Yamagishi [33] and the ordered-subsets idea originally put forth by Hudson and Larkin [45]. We used the penalty parameter (3 = 0.02 and 0.04 for 14 minute data and 7 minute data, respectively, and the log-cosh function A (t) = log(cosh(|)) with 5 = 50 as the penalty function. The values of (3 and 5 were chosen experimentally such that the reconstructed images were visually Â“goodÂ”. For the BSREM-II algorithm, 8 and 24 subsets, and the ordering rule suggested by Ahn and Fessler [35] (i.e., make projections in two successive subsets as perpendicular as possible each other) were used. The ordering rule was originally introduced by Herman and Meyer [69], Additionally, the relaxation parameter rule specified by Ahn and Fessler [35] was used in the implementation. In Figure 7 23, plots of the cost versus CPU-time are shown for the APML algorithm for different values of e when 14 minute data were used. As can be seen in the figure, the convergence rate of the APML algorithm depends on e. Moreover, e = 0 generated the slowest convergence rate for the considered planes. These claims are supported by Figure 7 24 in which plots of the number of iterations versus e are shown for the APML algorithm to decrease the PML objective function to 4>(x*), where x* is the 5000^ iterate of the PML algorithm. Practically speaking, for the planes considered, x* is the minimizer of the PML objective function because the PML

PAGE 121

113 algorithm did not decrease the PML objective function after about 1000 iterations until 5000 iterations. For 0 < e < 0.01. the number of iterations required for the APML algorithm to decrease the PML objective function to (a;*) varied from 83 to 125 and 113 to 185 for planes 10 and 21. respectively. For e = 0, the number of iterations required were 359 and 652 for planes 10 and 21, respectively. For 0 < t < 0.01, the decreased convergence time (in CPU seconds) of the APML algorithm with respect to the PML algorithm was 71% Â— 81% and 75% Â— 85% for planes 10 and 21, respectively. For e = 0, the decreased convergence time of the APML algorithm with respect to the PML algorithm was 17% and 12% for planes 10 and 21, respectively. In Figure 7 25, plots of the cost versus CPU-time are shown for the PML, APML, and BSREM-II algorithms. As can be seen in the figure, the APML algorithm for e = 0.01 decreased the PML objective function more than about three times faster than the PML algorithm. The APML algorithm for e = 0.01 needed about 27% and 18% of the CPU-time that was necessary for the PML algorithm to decrease the PML objective function to for planes 10 and 21, respectively. At the early iterations (CPU-time less than about 10 seconds), the BSREM-II algorithm for 8 subsets produced the greatest decrease. However, after about 10 seconds in CPUtime, the APML algorithm for e = 0.01 decreased the PML objective function faster than the other algorithms. It should be pointed out that the BSREM-II algorithm did not decrease the PML objective function to \$(x*) until 10,000 iterations, whereas (a:*) was obtained by the APML algorithm for e = 0.01 with 116 and 136 iterations for planes 10 and 21, respectively. Moreover, the convergence rate of the BSREM-II algorithm significantly depends on the number of subsets as shown in Figure 7 25. To decrease the PML objective function faster at the early iterations, an alternative would be to first use the ordered subsets algorithm for a few iterations and then switch to the APML algorithm. To do this, we divide the emission data into a few subsets (e.g., 8 or 24) according to the ordering rule by Herman and Meyer [69].

PAGE 122

114 For the r th subset, we define a sub-objective function as in [35]: \$r(*) = {[P*]i d, \og([Px + p\i) + pi + log(dj!)} + ^A(*) , (7.3) teMr where R is the number of subsets, M r is the set of indices corresponding to emission data within the r th subset, and r = 1,2,...,/?. Note that <3? (a:) Â— X]f=i 3v(*)For each subset and the corresponding sub-objective function, a sub-iteration is performed sequentially. At the (n, r) ,h sub-iteration, a surrogate function for the r th sub-objective function is constructed using (3.29), (3.30), (3.31), and (3.32) at the ( n,r) th sub-iterate x^ n,r K The next sub-iterate x ( n Â’ r+1 ^ is defined to be the nonnegative minimizer of the surrogate function. After one pass of the entire sub-iterations, we define x ll+l = a;^ n,fi+1 ) and Â£c( n+1,1 ) = x n+1 for the next pass of sub-iterations. We refer to the above iteration as the ordered subset PML (OS-PML) iteration. Figure 7 26 shows plots of the cost versus CPU-time for the BSREM-II algorithm for 8 subsets and the APML algorithm with a few OS-PML iterations (2, 3, and 4) for 8 and 24 subsets. As can be seen in the figure, the APML algorithm with a few OS-PML iterations decreases the PML objective function faster than the BSREMII algorithm for the early iterations. Specifically, the APML algorithm with the ordered-subsets idea needed less than about 8 seconds in CPU-time to decrease the PML objective function to the cost that the BSREM-II algorithm decreased with 10 seconds. The convergence rate depends on the number of OS-PML iterations and the number of subsets, but the degree of dependency is not much great as shown in the figure. In Figure 7 27, iterates for plane 10 produced by the BSREM-II algorithms for 8 subsets and the APML algorithm with 2 OS-PML iterations for 24 subsets are shown for different CPU-times. As shown in the figure, the iterates of the APML and BSREM-II algorithms look very similar each other even though their associated costs are different. Figure 7 28 shows iterates for plane 21 produced by the BSREM-II

PAGE 123

115 algorithms for 8 subsets and the APML algorithm with 2 OS-PML iterations for 24 subsets are shown for different CPU-times. Now, we consider 7 minute real phantom data. In Figure 7 29, plots of the cost versus CPU-time are shown for the PML. APML. and BSREM-II algorithms when 7 minute data were used. As can be seen in the figure, the convergence rates of the algorithms are similar to the ones for 14 minute data except the BSREM-II algorithm with 24 subsets that converged much slower than the other algorithms. Again, this indicates that the convergence rate of the BSREM-II algorithm significantly depends on the number of subsets. Figure 7 30 shows plots of the cost versus CPU-time for the BSREM-II algorithm for 8 subsets and the APML algorithm with a few OS-PML iterations (2, 3, and 4) for 8 and 24 subsets when 7 minute data were used. As in the experiments with the 14 minute data, the APML algorithm with a few OS-PML iterations decreases the PML objective function faster than the BSREM-II algorithm with 8 subsets for the early iterations. For 7 minute data case, we do not include the corresponding images because the visual comparisons were similar to those observed in the images for 14 minute data. To see whether convergence rates in Figures 7 25, 7 26, 7 29, and 7 30 are Â“typicalÂ”, we averaged convergence rate for fifteen realization. Since cost varies for each realization, we used the normalized-cost-difference that is defined as ) \$(x< 10 Â°>) Â’ ' ' where x (0) and x ( ' W0) are the uniform initial estimate and the 100 th APML iterate, respectively. The reason why we used cc (100) is that the APML algorithm Â“almostÂ” converges with 100 iterations, which means that the PML objective function does not decrease appreciably after 100 APML iterations. In Figures 7 31 and 7 32, plots of the average normalized-cost-difference (averaged normalized-cost-difference over fifteen realizations) versus CPU-time are shown for the PML, APML, and BSREM-II

PAGE 124

116 algorithms when fifteen 7 and 14 minute data were used, respectively. Figures 7 33 and 7 34 show plots of the average normalized-cost-difference versus CPU-time for the BSREM-II algorithm for 8 subsets and the APML algorithm with a few OS-PML iterations (2, 3, and 4) for 8 and 24 subsets when fifteen 7 and 14 minute data were used, respectively. The figures confirm that the observed convergence rates in Figures 7 25, 7 26, 7 29, and 7 30 are Â“typicalÂ” . Table 7 1 contains the CPU-times, memory accesses, floating point operations, and the number of function calls (\/f and A (t)) per iteration for the algorithms used in the experiments. From the table, it is shown that the APML algorithm greatly reduced the number of iterations for convergence compared with the PML algorithm. Convergence in the table means that an algorithm decreases the PML objective function until it equals the 5000 fh iterate of the PML algorithm, denoted by x*. Although the BSREM-II algorithm did not converge until 10,000 iterations, the algorithm Â“almostÂ” converged at the point because |4>(x^ 10000 ') Â— 4>(cc*)| ~ 1, where a* 10000 is the 10000
PAGE 125

Cost Cost 117 x 10 5 (a) Convergence Rate (plane 10) x 1 0 5 (b) Convergence Rate (plane 21 ) Figure 7 23: Convergence rate comparison of the APML algorithm for different values of t when 14 minute data were used.

PAGE 126

Number of Iterations Number of Iterations 118 (a) Number of Iterations to Converge (plane 10) 400 350 300 250 200 150 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 600 500 400 300 200 : 100 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 * * * * * 50 E (b) Number of Iterations to Converge (plane 21 ) 700 E Figure 7 24: Number of iterations for the APML algorithm to decrease the PML objective function to 4>(aP) are shown for different values of e when 14 minute data were used, where x* is the 5000 th iterate of the PML algorithm.

PAGE 127

Cost Cost 119 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 25: Convergence rate comparison of the PML, APML and the BSREM-II algorithms when 14 minute data were used.

PAGE 128

120 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 26: Convergence rate comparison of the APML algorithm with different number of OS-PML iterations for 8 and 24 subsets, and the BSREM-II algorithm for 8 subsets when 14 minute data were used. For the APML algorithm e = 0.01 was used.

PAGE 129

121 (a) APML, 5 sec (b) BSREM-tl, 5 sec (c) APML, 15 sec (d) BSREM-II, 15 sec (e) APML, 25 sec (f) BSREM-II, 25 sec (g) APML, 35 sec (h) BSREM-II, 35 sec Figure 7 27: APML and BSREM-II iterates for plane 10 are shown for different CPU-times when 14 minute data was used. The APML iterates were generated by the APML algorithm with e = 0.01 and 2 OS-PML iterations for 24 subsets The BSREM-II iterates were generated by the BSREM-II algorithm for 8 subsets. For display purposes, all the images were adjusted so that they have the same dynamic range.

PAGE 130

122 (a) APML, 5 sec (b) BSREM-II, 5 sec <1 Â« (c) APML, 15 sec (d) BSREM-II, 15 sec % Â•* (e) APML, 25 sec (f) BSREM-II, 25 sec (g) APML, 35 sec (h) BSREM-II, 35 sec Figure 7 28: APML and BSREM-II iterates for plane 21 are shown for different CPU-times when 14 minute data was used. The APML iterates were generated by the APML algorithm with e = 0.01 and 2 OS-PML iterations for 24 subsets The BSREM-II iterates were generated by the BSREM-II algorithm for 8 subsets. For display purposes, all the images were adjusted so that they have the same dynamic range.

PAGE 131

Cost Cost 123 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 29: Convergence rate comparison of the PML, APML and the BSREM-II algorithms when 7 minute data were used.

PAGE 132

124 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21 ) Figure 7 30: Convergence rate comparison of the APML algorithm with different number of OS-PML iterations for 8 and 24 subsets, and the BSREM-II algorithm for 8 subsets when 7 minute data were used. For the APML algorithm e = 0.01 was used.

PAGE 133

Normalized-Cost-Difference Normalized-Cost-Difference 125 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7-31: Average convergence rate comparison of the PML, APML and the BSREM-II algorithms when 7 minute data was used.

PAGE 134

Normalized-Cost-Difference Normalized-Cost-Difference 126 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 32: Average convergence rate comparison of the PML, APML and the BSREM-II algorithms when 14 minute data was used.

PAGE 135

127 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 33: Average convergence rate comparison of the APML algorithm with different number of OS-PML iterations for 8 and 24 subsets, and the BSREM-II algorithm for 8 subsets when 7 minute data was used. For the APML algorithm e = 0.01 was used.

PAGE 136

128 (a) Convergence Rate (plane 10) (b) Convergence Rate (plane 21) Figure 7 34: Average convergence rate comparison of the APML algorithm with different number of OS-PML iterations for 8 and 24 subsets, and the BSREM-II algorithm for 8 subsets when 14 minute data was used. For the APML algorithm e = 0.01 was used.

PAGE 137

129 Table 7 1: Comparison of the computational complexity per iteration. PML BSREM-II APML CPU-time (sec) 0.271565 0.58836 {R = 8) 0.49828 Memory Read 2 P 2 P 3 P Mul/Div 2 P + I + 24 J 2P + I + 12RJ 4P + 8/ + 58 J Add/Sub 2P + 31J 2P + / + 16RJ 4P + 3/ + 66 J Vt J J m 8 J 8 RJ 16 J N c for plane 10 791 N/A 359 (e = 0), 116 (e = 0.01) N c for plane 21 1362 N/A 652 (e = 0), 136 (e = 0.01) The letters P and R denote the number of non-zero elements in the probability matrix V and the number of subsets for the BSREM-II algorithm, respectively. The letters I and J represent the number of data points and voxels, respectively. N c denotes the number of iterations for convergence. All the algorithms were computed on a Dell Inspiron-5150 computer. Convergence in the table means that an algorithm decreases the PML objective function until it equals 4>(:e*), where x* is the 5000 (/l PML iterate. probability matrix. In other words, given the MLEM image based on V c , the PCiPS algorithm first estimated the probability matrix, referred to as 'P true , and then images were reconstructed by the APML algorithm with 100 iterations. In the PCiPS algorithm, two choices (15 and 63) for the parameter r were examined in the experiments. We used the log-cosh function A (t) = log(cosh(|)) with 5 = 50 as the penalty function. The images in Figures 7 35 (a) and (b) were obtained by the MLEM-S and APML algorithms with V c . The images in Figures 7 35 (c) and (d) were obtained by the APML algorithm using P true with r = 15 and r = 63, respectively. For the APML images in Figure 7 35, (3 was chosen so that the background standard deviation is approximately same (background standard deviation is approximately 48). Using 5 iterations, the MLEM-S image with a background standard deviation of

PAGE 138

130 48 was obtained. Figure 7 35 (a) illustrates that the tumors in the MLEM-S image is too smooth. Figures 7 35 (b), (c), and (d) show that the APML images visually look similar to each other. However, they are quite different especially around tumors. This claim is supported by Figure 7 36 that is a line plot (the row is shown in Figure 7 13) of the images in Figure 7 35. For the row under consideration, it can be seen from the line plots that the APML images in Figures 7 35 (c) and (d) have a higher degree of contrast than the other images. For the images in Figure 7 35, the large tumor contrast of the APML image in (d) was 59%, 17%, and 7% higher than the images in (a), (b), and (c), respectively. The increased contrast of the APML image in (d) for the small tumor with respect to the images in (a), (b), and (c) was 60%, 20%, and 14%, respectively. The increased tumor distinguishability of the APML image in (d) with respect to the images in (a), (b), and (c) was 30%, 10%, and 5%, respectively. Figures 7 37 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifteen realizations for plane 21, respectively. Further, plots of the average standard deviation of the large tumor and the average distinguishability of two tumors versus the average background standard deviation for the fifteen realizations are shown in Figures 7 37 (c) and (d), respectively. The contrast and distinguishability curves of the APML algorithm with the estimated probability matrices lie above the corresponding curves of the other algorithms. Thus, for a fixed level of background noise, the APML images generated using the estimated probability matrices, on average, have the greatest contrast and distinguishability. Now, we consider fifteen realizations of 7 minute real phantom data for plane 21. Figures 7 38 (a) and (b) are plots of the average contrast of the large tumor and small tumor versus the average background standard deviation using fifteen realizations, respectively. A plot of the average standard deviation of the large tumor versus the

PAGE 139

131 average background standard deviation for the fifteen realizations is shown in Figure 7 38 (c). Also, in Figure 7 38 (d), we provide a plot of the average distinguishability of two tumors versus the average background standard deviation for fifteen realizations. As in the experiments with the 14 minute real phantom data, it is evident that the APML algorithm with the estimated probability matrices outperform the MLEM-S and APML algorithms with V c in terms of contrast recovery and tumor distinguishability. Figures 7 39 (a) and (b) are the images obtained by the MLEM-S and APML algorithms using V c , respectively, for a 7 minute data. Figures 7 39 (c) and (d) are the images obtained by the APML algorithm using 'P t " lc with r = 15 and r = 63, respectively, for a 7 minute data. Figure 7 39 (a) is the 7 th MLEM iterate. For the APML algorithm, 100 iterations were used. For the images in Figures 7 39 (b), (c), and (d), (3 was chosen so that the standard deviation of their backgrounds are approximately 36. As in the experiments with the 14 minute data, the images in Figures 7 39 (b), (c), and (d) look similar to each other. Figure 7 40 is the line plot of the images in Figure 7 39. Also, as in the experiments with the 14 minute data, for the row under consideration, the APML images in Figures 7 39 (c) and (d) have a higher degree of contrast than the other images. For the images in Figure 7 39, the large tumor contrast of the APML image in (d) was 50%, 19%, and 6% higher than the images in (a), (b), and (c), respectively. The increased contrast of the APML image in (d) for the small tumor with respect to the images in (a), (b), and (c) was 47%, 19%, and 7%, respectively. The increased tumor distinguishability of the APML image in (d) with respect to the images in (a), (b), and (c) was 18%, 10%, and Â— 1%, respectively.

PAGE 140

132 (a) MLEM-S (b)APML % * (c) PCiPS,T=15 (d) PCiPS,t=63 %<* Figure 7 35: Comparison of emission images when a 14 minute real phantom data for plane 21 was used: (a) MLEM-S image with V c , (b) APML image with V c , (c) APML image with 'P true and r = 15, (d) APML image with 'P true and r = 63. The image in (a) is from 5 iterations, while the images in (b), (c), and (d) are from 100 iterations. For the images in (b), (c), and (d), (3 was chosen in such a way that the standard deviation of the background is approximately 48. Specifically, (3 Â— 1/32, 1/32, and 0.028 for the images in (b), (c), and (d), respectively. For display purposes, all the images were adjusted so that they have the same dynamic range.

PAGE 141

133 Line plot Figure 7 36: A line plot comparison of the reconstructed images in Figure 7 35.

PAGE 142

134 (a) Large Tumor (b) Small Tumor (c) Large Tumor Figure 7 37: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifteen realizations were used and 14 minute real phantom data for plane 21 was used in the study. For the MLEM-S curves, the images from iterations 5 Â— 160 were used. For the other curves, the images were reconstructed using two hundred iterations for (3 Â— 2~ 4 Â— 2 ~ 12 .

PAGE 143

135 (a) Large Tumor (b) Small Tumor Figure 7 38: Plots of the average contrast of the large and small tumors versus the average background standard deviation are shown in (a) and (b), respectively. A plot of the average standard deviation of the large tumor versus the average background standard deviation is shown in (c). In (d), a plot of the average distinguishability of two tumors versus the average background standard deviation is shown. Fifteen realizations were used and 7 minute real phantom data for plane 21 was used in the study. For the MLEM-S curves, the images from iterations 5 Â— 160 were used. For the other curves, the images were reconstructed using two hundred iterations for (3 = 2 -4 Â— 2 -12 .

PAGE 144

136 (a) MLEM-S (b)APML (c) PCiPS,x=15 (d) PCiPS,x=63 % Â« Figure 7 39: Comparison of emission images when a 7 minute real phantom data for plane 21 was used: (a) MLEM-S image with V c , (b) APML image with V c , (c) APML image with , p true and r = 15, (d) APML image with f> tTue and r = 63. The image in (a) is from 7 iterations, while the images in (b), (c), and (d) are from 100 iterations. For the images in (b), (c), and (d), (5 was chosen in such a way that the standard deviation of the background is approximately 36. Specifically, (3 = 1/32, 1/32, and 0.028 for the images in (b), (c), and (d), respectively. For display purposes, all the images were adjusted so that they have the same dynamic range.

PAGE 145

137 Line plot Figure 7 40: A line plot comparison of the reconstructed images in Figure 7 39.

PAGE 146

CHAPTER 8 CONCLUSIONS AND FUTURE WORK 8.1 Conclusions The PML algorithm we developed for reconstructing emission images generates nonnegative emission mean estimates and monotonically decreases the PML objective function. The algorithm is straightforward to implement and can incorporate any penalty function that satisfies the mild assumptions (AS3)-(AS8). Under certain conditions (i.e. , (AS1)-(AS8)), the PML objective function is strictly convex. And, for the case where the PML objective function is strictly convex, it has been proven that the PML algorithm converges to the minimizer of the PML objective function. Although the tradeoff between resolution and noise can be controlled by certain regularization hyperparameters (i.e., /3 and 5), like many researchers, we have not determined a way to choose the parameters so that the data-fit and a prior knowledge are optimally Â“balancedÂ”. A fast version of the PML algorithm, called the APML algorithm, was developed that retains the properties of the PML algorithm. The APML algorithm is based on the PML algorithm and pattern search idea of Hooke and Jeeve. However, we modified the direction vector to account for the PET image reconstruction problem. The APML algorithm generates nonnegative emission mean estimates, monotonically decreases the PML objective function, and can accommodate the same class of penalty functions as the PML algorithm. Importantly, it has been proven that the APML algorithm converges to the minimizer of the PML objective function when the PML objective function is strictly convex. Although the APML algorithm requires an additional parameter e that is used to define the direction vector, experiments using real phantom data demonstrated that fast convergence rates were obtained over a 138

PAGE 147

139 wide range of values for e (e.g., 0 < e < 0.01). This means that the APML algorithm is robust with respect to the parameter e. In experiments using real phantom data, it was shown that the APML algorithm decreased the PML objective function about three times faster than the PML algorithm. Specifically, the APML algorithm for 0 < e < 0.01 needed about one third of the CPU-time that was necessary for the PML algorithm to decrease the PML objective function to a Â“practical minimizerÂ” of the objective function. By practical minimizer, we mean a PML iterate that has a PML objective value (i.e. , cost) that, practically speaking, cannot be appreciably decreased with increasing iterations. Specifically, the minimum PML objective function value resulted from the 5000 iteration of the PML algorithm. We compared the convergence rate of the APML algorithm to an ordered-subsets method in [35] because the ordered-subsets based algorithm is said to converge to the nonnegative minimizer of the PML objective function. At the early iterations, the ordered-subsets method decreased the PML objective function at a faster rate than the APML algorithm. However, after about 10 seconds in CPU-time, the APML algorithm decreased faster. It was also shown that when the APML algorithm utilizes an ordered-subsets based PML iteration for a few early iterations, the resulting algorithm decreased the PML objective function more than about 1.2 times faster than the ordered-subsets method for the early iterations (i.e., less than 10 seconds in CPU-time). In addition to the PML and APML algorithms, we also proposed a regularized image reconstruction algorithm we call the QEP algorithm. The QEP algorithm obtains regularized estimates of the emission means through the use of an iteration dependent penalty function that serves to preserve edges in the reconstructed images. The definition of the penalty function was motivated by an analysis of the surrogate function for a penalty function that is utilized by the PML algorithm. In the QEP algorithm, at each iteration, the iteration dependent penalty function is found and

PAGE 148

140 the next iterate is originally defined to be a nonnegative minimizer of sum of the negative log likelihood function and the penalty function. However, since it is not possible to solve the constrained optimization problem, the QEP algorithm alternatively defines the next iterate to be a nonnegative minimizer of sum of the de-coupled surrogate function for the negative log likelihood function and the iteration dependent penalty function. It is important to understand that the QEP algorithm defines a new objective function to be minimized at each iteration. Thus, unlike the PML and APML algorithms, the QEP algorithm does not minimize a single objective function. This means that the usual mathematical tools for investigating convergence are unavailable. Despite its theoretical drawbacks, the QEP algorithm performed extremely well in experiments with computer-generated phantom data and real thorax phantom data, and outperformed the PML (or APML) algorithm and PWLS algorithm [42] in terms of contrast recovery. In experiments, the images produced by the PML (or APML) and QEP algorithms had greater contrast and Â“distinguishabilityÂ” than the MLEM-S and PWLS images for a fixed level of background noise. The MLEM-S images were produced by early termination of the MLEM algorithm [16] and the PWLS images were produced by the PWLS algorithm [42]. Specifically, for a 14 minute real phantom data set, the contrast of the large tumor in the QEP image was 37%, 3%, and 34% higher than the MLEM-S, PML, and PWLS images, respectively. With respect to the MLEM-S, PML, and PWLS images, the contrast of the small tumor in the QEP image was 46%, 9%, and 43% higher, respectively. The increased tumor distinguishability of the QEP image with respect to the MLEM-S, PML, and PWLS images was 20%, 5%, and 35%, respectively. Moreover, qualitatively speaking, the spatial extent of the tumors were more easily resolved with the PML and QEP images. Since the QEP algorithm yielded the greatest contrast for a fixed level of background noise, it may be particularly well suited for tumor detection applications.

PAGE 149

141 Errors caused by scatter, non-collinearity, detector penetration, and positron range have the net effect of introducing blur into PET images. In theory, these errors can be corrected by determining a suitable probability matrix. However, it is difficult to determine such a probability matrix because the required modelling is impractical. In Chapter 6, we first assume that the Â“trueÂ” probability matrix for the observed emission data is a product of an unknown nonnegative matrix, called a scatter matrix, and a Â“conventionalÂ” probability matrix. The conventional probability matrix is generated from the geometry of the PET scanner and image space to be reconstructed, along with certain standard corrections for errors. We developed a method, referred to as the joint minimum Kullback-Leibler (KL) distance method, that aims to reduce blur in PET images. In the joint minimum KL distance method, the scatter matrix and emission means are jointly estimated by minimizing the distance the data and model parameters. Because of the difficulty of the minimization problem, the number of unknowns in the scatter matrix is reduced and an alternating minimization algorithm is developed. Thus, given an estimate for the scatter matrix, an estimate for the emission means is obtained. The estimate for the emission means can then be used to generate an improved estimate for the scatter matrix. Alternating between these two steps leads to the desired estimates for the scatter matrix and emission means. Once the estimate of the scatter matrix is obtained, the estimate for the true probability matrix is the a product of the estimated scatter matrix and conventional probability matrix. Then, a regularized image reconstruction algorithm is applied to the emission data using the estimated true probability matrix. In experiments with real phantom data, the APML algorithm is employed because of its fast convergence. The contrast of the reconstructed images generated using the estimated probability matrix was more accurate than the reconstructed images generated using the conventional probability matrix.

PAGE 150

142 8.2 Future Work Alt hough the APML algorithm decreases the PML objective function faster than the PML algorithm in experiments, it is not clear that how fast the APML algorithm converges in theory. Consequently, it would be worthwhile to determine the theoretical rate of convergence of the APML algorithm. A related research direction would be to determine the parameter e that maximizes convergence rate of the APML algorithm. Keeping in mind Hook and JeeveÂ’s idea, it is possible that the convergence rate of the APML algorithm could be improved by using a Â“better" direction vector. We say this because, in experiments where the Â“bestÂ” direction vector x n 1 1 Â— x* was used ( x * is the minimizer of the PML objective function) the APML algorithm converged in only a few iterations (e.g., 3 iterations). A key assumption of the PCiPS algorithm is that the mean number of photon pairs recorded by the i th detector pair is a weighted sum of the mean number of photon pairs that would have been recorded by a certain set of detector pairs if there were no errors due to scatter and non-collinearity. Currently, the chosen set of detector pairs is the projection for which the i th detector pair belongs. In Chapter 6, the assumption was justified for the case where the image space had approximately uniform attenuation (e.g., brain). However, for an image space with non-uniform attenuation, another choice for the set of detector pairs may be more appropriate. Thus, it would be worthwhile to revisit the assumptions of the PCiPS algorithm so as to broaden its application. Increasingly, three dimensional (3D) PET scanners are being used to perform whole-body scans. Thus, it may be beneficial to the PET community to extend the proposed algorithms to 3D. One practical problem of 3D PET is that it takes an extremely long time to reconstruct images because the number of data points is increased. So, with 3D implementations, one should consider parallel computing

PAGE 151

143 techniques such as a multiprocessor approach [70] or a computer cluster implementation [71]. Observe that the proposed algorithms can be parallelized because all the pixel values are updated simultaneously. By contrast, the penalized weighted least-squares algorithm in [42] updates pixel values sequentially so that the algorithm cannot be parallelized. The proposed algorithms were assessed with real phantom data. However, a more in-depth assessment would include other types of real phantoms (e.g., brain phantoms) and patient data. Another consideration is to study the proposed algorithms more when errors due to detector penetration, non-collinearity of line-of-response, and positron range are corrected. We did not consider those errors in the dissertation.

PAGE 152

APPENDIX A IIUBERÂ’S SURROGATE FUNCTIONS In this appendix, we present HnberÂ’s proof of the inequality A 0)(Â£) > A (f) for all f. From (3.14), recall that A (n) is defined to be A <">(f) = A(f (n) ) + A(t (n) )(<-f (n) ) + ^7(i (n) )(^-i (n) ) 2 (AT) = A (f (n) ) + A (f (n) )(f f (n) ) 7(f (n) )ff (n) + ^7(f (n> ){f 2 + (Â£ (ti) ) 2 }(A.2) = A (f<">) + A(f (n) )(f f (n) ) A(f (n) )f + ^A(f (n) )f {n) + A.3) z z = A(f (n) ) \\{t [n) )t [n) + )U(f (n) )f 2 , (A. 4) z z where we used that 7(f) = Consider the function /(f) = A*'^(f) Â— A(f) and its first derivative /(f) = [ 7 (f (n) )-7(W(A-5) From (A. 5) and the assumption that 7(f) is nonincreasing over [0, 00) (see (AS6)), it follows that /(f) < 0 over [0, f^] and /(f) > 0 over [fO/ 00). These inequalities and the fact that /(f (n) ) = 0 imply that /(f) > 0 for f > 0, or equivalently A ( ")(f) > A(f) for f > 0. It is clear that A (n )(f) > A(f) for f < 0 because of the symmetry of A (n) (f) and A (f) (see (AS3)). 144

PAGE 153

APPENDIX B SURROGATE FUNCTIONS FOR PENALTY FUNCTION Our objective in this appendix is to demonstrate that the surrogate function A*'' f can be expressed as (3.22). First, we make the following observation: Since 7 (t) = (see (AS6)), g 74 in (3.17) can be written as g {n) {Xj,x k ) = A(xj n) xp } ) + X(x ( ;> x[ nJ ) (xj x p 1 ) (x k 1 4 J n )\ J n ) J n )) Jn )' 1 (B.2) + 4 B) ) ) ) 2 + (2x fc 2x[ n) ) 2 \ (B.l) = Hx { ; ] 4 n) ) + 7(4 n) 4 n) ) [& *5 n) ) 2 + (** 4 n) ) 2] + 7 (^" ) xÂ£ n) )(xj n) xj^) (xj xj n) ) (x fc x[ n) ) = A(x^ x[ n) ) + 7 (Xjn) x[ n) ) x 2 2 XjX^ l) + (x^) 2 + x 2 k 2x k xP + (x' n) ) 2 + 7 (xJ n) xj^) (x\$ n) ) 2 + xj^xf } x { ; ] x k + x { k n) x k + x^xf* (x< n) ) 2 l (B.3) (n) (n) rti V / /v* /-yÂ» ' ' /yÂ» x j x fc = 7(xJ n) x { k n) ) x] ~ (xj n) + X^)Xj + 4 (Â®} B} + (n)\ (n) Â“j J n )^ + 2 7 (x<Â”> xr)xr4 n ' + a< ( n )\AÂ«)AÂ«) (n) X) Â— X (Â«)> (B.4) = 7 (x n) x< n) ) , x, xf+x ^ 2 2 ( 0\2 }z Xfc x' n) +x<^ 2 ' + A(x< n) x^, n) ) + 27(^ n) x { k ) )xf ) x { k ) 7(^ n) ^l n) ) f r X<"Â»+X<">1 2 > + {xfc xP + X? 2 J 2 Â“ ^7(^ n) x P)G" 1 xf>f + A(i<"> xP) l( x \ n) ~ x[ n) ) xj n) + xj; n) ) 2 > + jxfc xP + x<Â”> 2 J 2 + c\$> . (B.5) (B. 6 ) (B.7) 145

PAGE 154

146 where Cj? = A(Â®J n) x{ n) ) |A(xJ n) x^X 3 ^ 4 ? j n} ). Thus, A (n > in (3.18) can be written as (B. 8 ) j = 1 keNj v ' where C
PAGE 155

APPENDIX C STRICT CONVEXITY OF PML OBJECTIVE FUNCTION We will prove that the PML objective function ( I> is strictly convex over the set {x . x > 0}. First, we will show that the Hessian of A, denoted by S, satisfies the following properties: Â• (SI) z'Sz > 0 for all z, where denotes the transpose operator Â• (S2) z Sz = 0 only when 2 = 0 or 2 = cl for some c ^ 0. The (i l,m) th element of S is given by [S]lm = < l = m Â—2wi m X(xi x m ), l ^ m and m Â€ Ni 0, otherwise (C.l) Using (C.l), it follows that j (=i m&Ni m^Ni Sz Â— ^ ' Z[ ^ 2 Z[ ^ ' tC; m A(x/ X m ) ^ ^ ^ l Tn A (-L *^m)^ (^Â• 2 ) (C.3) (C.4) = X! x m)zi{zi Z m ) 1=1 mE.Ni J Â— 2 ^ ^ ^ ^ *^m) %m) 1=1 mENi,m>l Since wi m \(xi Â— x m ) > 0 for all l and m (see (AS5)), (SI) and (S2) follow from (C.4). To prove that <3? is strictly convex, it suffices to show that zTz > 0 for z / 0, where T is the Hessian of
PAGE 156

148 The (l, m) th element of 1Z equals By (Si) and the fact that [7^ m = X>r-p \ tm ]2 jri [Px + p}j (C.5) 1 A , 3 J t=l 1 JI mÂ—1 ( Yl Puzi ) 1=1 (C.6) ST' 1 ( [P Z \i \ 2 ^Â‘'{[Px + p],) Â’ (C.7) it is true that zTz > 0 for all z. By (ASl) and (AS2), it follows that ziiz=syd t ( J vl] , ) ^ V Wx + p}J > 0 1 V [Px + p\ for z = cl when c ^ 0. Thus, by (S2), it is concluded that zTz > 0 for z 7 ^ 0 (C.8)

PAGE 157

APPENDIX D SOLUTION TO UNCONSTRAINED OPTIMIZATION PROBLEM IN MODIFIED APML LINE SEARCH We first show in this appendix that the surrogate function pO+P in (4.29) is strictly convex. Then, we prove that the solution to the unconstrained optimization problem in (4.30) can be expressed by (4.31). Consider the second derivative of r (n+ P; f<"+Â‘> = J 2 d" +1) + 0 x x W(4" +1> 4 Â” + 1 , ) U Â” +I> 4Â” +11 ) 2 (D.1) i = 1 j= 1 keNj (observe: f("+P j s independent of a). It is true that {x^} is bounded for all n, 7(f) > 0 for Â— oo < t < oo, and 7 is a continuous function over Â—00 < t < 00. Thus, it is clear that the second term in the right hand side of (D.l) is positive for y(Â«+P ^ c \ provided c ^ 0. For the case where t/ n+ P = cl with c 7^ 0, it follows that from (4.27) A (n+l) c 2 d,(\V l ],) 2 Â„ > n (c[r , l]iL( n + 1 )+[Â’Px (n+1) ]i+p0 2 Â’ c 2 M[r l ],) 2 c< n (cplliU^+V+lVX^+^i+pi) 2 Â’ (D.2) for all i. Thus, by (AS1) and (AS2), Yll=i A t ,n+1) > 0 fÂ° r v (n+1) = cl, when c^O. Consequently, it can be concluded that f( n+ P > 0 for i/ n+ P ^ 0. This result implies that r< n+ P is strictly convex for t/ n+ P ^ 0. Now, we will show that the expression in (4.31) is the solution to the unconstrained optimization problem in (4.30). First, recall that. j A (n+ P(a?) = ^2 ^2 ^jfcA ( " +1) (xj x k ) , (D.3) j = 1 fceNj 149

PAGE 158

150 where A {n+l \t) = A(^" +1) ) + A(f (n+1) )(t f (n+1) ) + l -i{t (n+l] )(t f (n+1) ) 2 . (D.4) For A n+ 0 = ah n+1> Â— the surrogate function f^ n+1 ^ can be expressed as r (n+1) (a) = 0 (n+1) (a) + /3A (n+1) (a: (n+1) + av (n+l) ) (D.5) = Â£Â«!Â”Â«>(Â„) + /3 Â£Â£ u jk \(xy +l) x[ n+1) ) + c i = 1 j = 1 keNj \^( n+1 ) _ ^.(" +1 )\r, ,(Â«+!) _ Â„,("+!) + u> jk x{xf +1) 4 n+ 1 )K -vl 'at j = 1 keNj j = i tew,' 1 / (n+1) (n+l)\ / (n+1) (rt+l)\ 2 _,2 2 ' ')( ** ~ v k ) V , (D.6) where C = X^ =1 {pi + log(dJ)}. Thus, using the definition of 6) n+L> in (4.23), the first derivative of r^" +1 \ denoted by r(" +1 \ is (n+l) r (n+1) (a) = ]T{p< n+1 W^" +1) (0)} iÂ— 1 + pYIYI ujkHxj 4 n+1 ) W ; ) j = 1 keNj J +/?ee W (*r +i) -4 n+,, )(Â»r i) 4 n+,) ) 2 Â« Â• to.?) j=i fceVj Since r( ra+1 ) is strictly convex, the minimizer of r( n+ 0 can be found by setting its first derivative to zero: f (n+1) (a) = 0 . (D.8) Solving (D.8) results in the expression in (4.31).

PAGE 159

APPENDIX E CONVEXITY OF SURROGATE FUNCTIONS FOR OBJECTIVE FUNCTIONS IN APML LINE SEARCH In this appendix, our objective is to show that there exists a symmetric positive definite matrix M, which is independent of n, such that f G+ 1 ) > 2(u* n+1 ^) , A / I(u (n+1 ^), where D n+1 ) is the second derivative of the surrogate function FO+ 1 ) in (4.29). Note that TG+ 1 ) i s shown in (D.l). It is true that is bounded for all n, 7 (t) > 0 for or Â—00 < t < 00, and 7 is a continuous function over Â— 00 < t < 00. Thus, there exists a number 70 > 0 such that 7(Xj ,i+1 ^ Â— 1 1 ' > ) >70 for all j, k, and n. Hence, it follows that f<"+'> > Y. V +1) + /37Â„ X X V +1) V +1> ) 2 (E l) *=i j = 1 fceiVj / > Y. ^ +l) + 2/370 {v^yn(v^) , (E.2) i = 1 where the (l, m) th element of H is given by E fcgjvjWHk, l = m Mlm = { ~ ~^lm ? 0, l 7^ m and m Â£ Ni otherwise (E.3) Note that the matrix H is symmetric because wi m = w m i for all l and m. Now, we consider the term Ei/4Â” + ^First, note that can be expressed as (see (4.27)) (n+l) /(aS n+1) )d i ([Pt; ( " +1) ] i ) 2 , [Pv^+% Â± 0 0, [Pv( n+ % = 0 , (E.4) 151

PAGE 160

152 where f(t ) = ^ and a (Â„ +1) a | [Vv^\L^ + [Px^+% + p u [Pv^% >0 i [ 7 > w (Â«+l)].f/(n+l) + [Vx^+% + p h [Pv^ n+ % < 0 Also, note that s,!+1) is bounded below by 0 for all n and i (see (4.15)). Moreover, if we assume that ?/ n+1 ) is bounded for all n, it follows that is bounded above for all n and i. To see this fact, we only need to consider Z/ n+1 ) and [/^ n+1 ^ because cc (Â«+ 1 ) i s bounded for all n by Proposition 1. When v'1 1 ^ < 0 for all j, it follows that l/ n+1 ) = Â— oo. For this case, the fact that L^ n+1 ^ = Â— oo is not a problem because sj n+1) = ['Pv (?l+1) ] ! f7 ( " +1) + [' Px { n+l ' , ] i + p t , where t/ (n+1) is a finite number. Similarly, when > 0 for all j, it follows that U ^ n+ ^ = oo. For this case, s-" +1) = [Pi/ n+1 )].L( n+1 ) + [Px^ n+l \ + pi, where L < ' n+1 ' 1 is a finite number. Thus, s <" +1) is bounded for all n and i. Since f(t ) > 0 for 0 < t < oo, the function / is a continuous function over 0 < t < oo, and sj n+1 ^ > 0 is bounded above, there exists a number / 0 > 0 such that /(sn+1) ) > / 0 for all n and i. Thus, it follows that / r (n+1) > fo^ d i{[V vin+ %) 2 + 2(3^ 0 (v in+1) yn{v {n+1) ) (E.6) i = 1 > f Q (y (n+l) )'V'VV{v (n+l) ) + 2(3-f 0 (v {n+1) )'H{v {n+1) ) (E.7) > 2(u (n+1) )'.M('u (Tl+1) ) , (E.8) where V = diag(d) and M = ^V'VV + (3j 0 Tfj. Since V'VV and H are symmetric, the matrix M. is symmetric. Now, we only need to show that M. is positive definite. From the second term in the right hand side of (E.l), it is easy to see that 2 Hz > 0 for all 2 and zHz = 0 only when 2 = 0 or 2 = cl for some c ^ 0. Moreover, by (AS1) and (AS2), it follows that I 2 zV'VVz = c 2 d t ([-PI],) > 0 i = 1 (E.9)

PAGE 161

153 for z = cl and 0. Thus, from / 0 > 0 and /?7o > 0, it can be concluded that M is positive definite.

PAGE 162

REFERENCES [1] E. V. Garcia. T. L. Faber, J. R. Galt, C. D. Cooke, and R. D. Folks, Â“Advances in nuclear emission PET and SPECT imaging,Â” IEEE Eng. Med. Biol. Mag., vol. 19, pp. 21 33, Sept. /Oct. 2000. [2] M. M. Maisey, R. L. Wahl, and S. F. Barrington, Atlas of clinical positron emission tomography. London, Great Britain: Arnold, 1999. [3] G. K. von Schulthess, Ed., Clinical positron emission tomography: correlation with morphological cross-sectional imaging. Philadelphia, PA: Lippincott Williams and Wilkins, 2000. [4] J. M. Ollinger and J. A. Fessler, Â“Positron-emission tomography,Â” IEEE Signal Processing Mag., vol. 14, pp. 43 55, Jan. 1997. [5] S. R. Cherry, J. A. Sorenson, and M. E. Phelps, Physics in nuclear medicine, 3rd ed. Philadelphia, PA: Saunders, 2003. [6] Z. Liang, Â“Detector response restoration in image reconstruction of high resolution positron emission tomography,Â” IEEE Trans. Med. Imag., vol. 13, pp. 314 312, June 1994. [7] K. Choi, Mathematical modeling and correction of detector penetration in positron emission tomography. MasterÂ’s thesis, Department of Electrical and Computer Engineering, University of Florida, 2002. [8] H. Erdogan and J. A. Fessler, Â“Monotonic algorithms for transmission tomography,Â” IEEE Trans. Med. Imag., vol. 18, pp. 801 814, Sept. 1999. [9] J. M. M. Anderson, R. Srinivasan, B. A. Mair, and J. R. Votaw, Â“Hidden Markov model based attenuation correction for positron emission tomography,Â” IEEE Trans. Nucl. Sci, vol. 49, pp. 2103 2111, Oct. 2002. [10] H. Erdoi/an and J. A. Fessler, Â“Ordered subsets algorithms for transmission tomography,Â” Physics in Medicine and Biology , vol. 44, pp. 2835 2851, May 1999. [11] B. Bendriem and D. W. Townsend, Eds., The theory and practice of 3D PET. Upper Saddle River, NJ: Kluwer Academic Publishers, 1998. [12] K. Wienhard, L. Eriksson, S. Grootoonk, M. Casey, U. Pietrzyk, and W.-D. Heiss, Â“Performance evaluation of the positron scanner EC AT EXACT,Â” Journal of Computer Assisted Tomography, vol. 16, pp. 804 813, Sept. /Oct. 1992. 154

PAGE 163

155 [13] J. M. Ollinger, Â‘'Detector efficiency and Compton scatter in fully 3D PET.' IEEE Trans. Nucl. Sci, vol. 42, pp. 1168 1173, Aug. 1995. [14] D. L. Bailey, D. W. Townsend, P. E. Kinahan, S. Grootoonk, and T. Jones, Â“An investigation of factors affecting detector and geometric correction in normalization of 3-D PET data,Â” IEEE Trans. Nucl. Sci., vol. 43, pp. 3300 3307, Dec. 1996. [15] W.-H. Lee, J. M. M. Anderson, and J. R. Votaw, Â“Maximum likelihood estimation of detector efficiencies in positron emission tomography,Â’ in Proc. IEEE Nuclear Science Symposium Conference, 2001, pp. 2049 2053. [16] L. A. Shepp and Y. Vardi, Â“Maximum likelihood reconstruction for emission tomography,Â” IEEE Trans. Med. Imag., vol. 1. pp. 113 122, Oct. 1982. [17] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Englewood Cliffs, NJ: Prentice-IIall, 1993. [18] J -II. Chang, J. M. M. Anderson, and J. R. Votaw, Â“Regularized image reconstruction algorithms for positron emission tomography,Â” IEEE Trans. Med. Imag., vol. 23, Sept. 2004 (to be published). [19] J -II. Chang, J. M. M. Anderson, and B. A. Mair, Â“Accelerated penalized maximum likelihood algorithm for positron emission tomography,Â” IEEE Trans. Image Processing, under review. [20] D. L. Snyder and M. I. Miller, Â“The use of sieves to stabilize images produced with the EM algorithm for emission tomography,Â” IEEE Trans. Nucl. Sci., vol. 32, pp. 3864 3872, Oct. 1985. [21] E. Levitan and G. T. Herman, Â“A maximum a posteriori probability expectation maximization algorithm for image reconstruction in emission tomography,Â’ IEEE Trans. Med. Imag., vol. 6, pp. 185 192, Sept. 1987. [22] T. Hebert and R. Leahy, Â“A generalized EM algorithm for 3-D Bayesian reconstruction from poisson data using Gibbs priors,Â” IEEE Trans. Med. Imag., vol. 8, pp. 194 202, June 1989. [23] P. J. Green, Â“Bayesian reconstructions from emission tomography data using a modified EM algorithm,Â” IEEE Trans. Med. Imag., vol. 9, pp. 84 93, Mar. 1990. [24] K. Lange, Â“Convergence of EM image reconstruction algorithms with Gibbs smoothing,Â” IEEE Trans. Med. Imag., vol. 9, pp. 439 446, Dec. 1990. [25] A. R. De Pierro, Â“On the relation between the ISRA and the EM algorithm for positron emission tomography,Â” IEEE Trans. Med. Imag., vol. 12, pp. 328 333, June 1993.

PAGE 164

156 [26] C. L. Byrne, Â“Iterative image reconstruction algorithms based on cross-entropy minimization,Â” IEEE Trans. Image Processing, vol. 2, pp. 96 103, Jan. 1993. [27] Z. Wu, Â“MAP image reconstruction using wavelet decomposition, in Proc. 13th. Conference on Information Processing in Medical Imaging , 1993, pp. 354 371. [28] E. U. Mumcuoglu, R. M. Leahy, S. R. Cherry, and Z. Zhou, "Fast gradient-based methods for Bayesian reconstruction of transmission and emission PET images,Â” IEEE Trans. Med. Imag., vol. 13, pp. 687 701, Dec. 1994. [29] J. A. Fessler and A. O. Hero, Â“Penalized maximum-likelihood image reconstruction using space-alternating generalized EM algorithms,Â” IEEE Trans. Image Processing , vol. 4, pp. 1417 1429, Oct. 1995. [30] A. R. De Pierro, Â“A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography,Â” IEEE Trans. Med. Imag., vol. 14, pp. 132 137. Mar. 1995. [31] C. A. Bouman and K. Sauer, Â“A unified approach to statistical tomography using coordinate descent optimization,Â” IEEE Trans. Image Processing, vol. 5, pp. 480 492, Mar. 1996. [32] S. Alenius, U. Ruotsalainen, and J. Astola, Â“Using local median as the location of the prior distribution in iterative emission tomography image reconstruction,Â” IEEE Trans. Nucl. Sci., vol. 45, pp. 3097 3104, Dec. 1998. [33] A. R. De Pierro and M. E. B. Yamagishi, Â“Fast EM-like methods for maximum a posteriori estimates in emission tomography,Â” IEEE Trans. Med. Imag., vol. 20, pp. 280 288, Apr. 2001. [34] I.-T. Hsiao, A. Rangarajan, and G. Gindi, Â“A new convergent MAP reconstruction algorithm for emission tomography using ordered subsets and separable surrogates,Â” in Proc. IEEE International Symposium on Biomedical Imaging, 2002, pp. 409 412. [35] S. Ahn and J. A. Fessler, Â“Globally convergent image reconstruction for emission tomography using relaxed ordered subsets algorithms,Â” IEEE Trans. Med. Imag., vol. 22, pp. 613 626, May 2003. [36] B. W. Silverman, M. C. Jones, J. D. Wilson, and D. W. Nychka, Â“A soothed EM approach to indirect estimation problems, with particular reference to stereology and emmision tomography (with discussion),Â” Journal of the Royal Statistical Society B, vol. 52, pp. 271 324, 1990. [37] H. Lu and J. M. M. Anderson, Â“Image reconstruction of PET images using denoised data,Â” in Proc. IEEE Nuclear Science Symposium Conference, 2001, pp. 1746-1749.

PAGE 165

157 [38] H. Lu. Y. Kim, and J. M. M. Anderson. Â“Improved Poisson intensity estimation: denoising application using Poisson data, 1 IEEE Tmns. Image Processing , vol. 13, pp. 1128 1135, Aug. 2004. [39] W. I. Zangwill. Nonlinear programming. Englewood Cliffs, N.l: Prentice-Hall, 1969. [40] I.-T. Hsiao, A. Rangarajan, and G. Gindi, Â“A new convex edge-preserving median prior with applications to tomography,Â” IEEE Trans. Med. Imag., vol. 22, pp. 580 585, May 2003. [41] M. S. Bazaraa. H. D. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithms , 2nd ed. New York, NY: John Wiley and Sons, 1993. [42] J. A. Fessler, Â“Penalized weighted least-squares image reconstruction for positron emission tomography,Â” IEEE Trans. Med. Imag., vol. 13, pp. 290 300, June 1994. [43] D. G. Luenberger, Linear and nonlinear programming, 2nd ed. Reading, MA: AddisonWesley, 1984. [44] P. J. Huber, Robust statistics. New York, NY: John Wiley and Sons, 1981. [45] II. M. Hudson and R. S. Larkin, Â“Accelerated image reconstruction using ordered subsets of projection data,Â” IEEE Trans. Med. Imag., vol. 13, pp. 601 609, Dec. 1994. [46] S. Grootoonk, T. J. Spinks, T. Jones, C. Michel, and A. Bol, Â“Correction for scatter using a dual energy window technique with a tomograph operated without septa,Â” in Proc. IEEE Nuclear Science Symposium Conference, 1991, pp. 1569 1573. [47] L. Shao, J. S. Karp, and R. Freifelder, Â“Composite dual windeow scattering correction technique in PET,Â” in Proc. IEEE Nuclear Science Symposium Conference, 1993, pp. 1391 1395. [48] D. W. Townsend, A. Geissbuhler, M. Defrise, E. J. Hoffman, T. J. Spinks, D. L. Bailey, M.-C. Gilardi, and T. Jones, Â“Fully three-dimensional reconstruction for a PET camera with retractable septa,Â” IEEE Trans. Med. Imag., vol. 10, pp. 505 512, Dec. 1991. [49] L. Shao and J. S. Karp, Â“Cross-plane scattering correction-point source deconvolution in PET,Â” IEEE Trans. Med. Imag., vol. 10, pp. 234 239, Sept. 1991. [50] B. T. A. McKee, A. T. Gurvey, P. J. Harvey, and D. C. Howse, Â“A deconvolution scatter correction for a 3-D PET system,Â” IEEE Trans. Med. Imag., vol. 11, pp. 560 569, Dec. 1992.

PAGE 166

158 [51] M. E. DaubeWitherspoon, R. E. Carson, Y. Yan. and T. K. Yap. Â“Scatter correction in maximum likelihood reconstruction of PET data.Â” in Proc. IEEE Nuclear Science Symposium and Medical Imaging Conference, 1992, pp. 945 947. [52] J. M. Ollinger, Â“Model-based scatter correction for fully 3D PET.Â” Physics in Medicine and Biology , vol. 41, pp. 153 176, Jan. 1996. [53] E. U. Mumcuorylu. R. M. Leahy, and S. R. Cherry, Â“Bayesian reconstruction of PET images: methodology and performance analysis,Â” Physics in Medicine and Biology , vol. 41, pp. 1777 1807, Sept. 1996. [54] K. Lange, D. R. Hunter, and I. Yang, Â“Optimization transfer using surrogate objective functions,Â” Journal of Computational and Graphical Statistics, vol. 9, pp. 1 20, Mar. 2000. [55] T. K. Moon and W. C. Stirling, Mathematical methods and algorithms. Upper Saddle River. NJ: Prentice-Hall, 2000. [56] K. Lange and R. Carson, Â“EM reconstruction algorithms for emission and transmission tomography,Â” Journal of Computer Assisted Tomography, vol. 8, pp. 306 316, Apr. 1984. [57] N. L. Carothers, Real analysis. New York, NY: Cambridge University Press, 2000 . [58] R. G. Bartle, The elements of real analysis, 2nd eel. New York, NY : John Wiley and Sons, 1976. [59] D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar, Convex analysis and optimization. Belmont, MA: Athena Scientific, 2003. [60] A. M. Ostrowski, Solution of equations in Euclidean and Banach spaces. New York, NY: Academic Press, 1973. [61] J. Browne and A. R. De Pierro, Â“A row-action alternative to the EM algorithm for maximizing likelihoods in emission tomography,Â” IEEE Trans. Med. Imag ., vol. 15, pp. 687 699, Oct. 1996. [62] J. M. M. Anderson, R. Srinivasan, B. A. Mair, and J. R. Votaw, Â“Penalized weighted least-squares and maximum likelihood reconstruction of transmission images from PET transmission,Â” IEEE Trans. Med. Imag., under review. [63] G. Strang, Linear algebra and its applications, 3rd ed. Orlando, FL: Harcourt, 1988. [64] A. J. Reader, S. Ally, F. Bakatselos, R. Manavaki, R. J. Walledge, A. P. Jeavons, P. J. Julyan, S. Zhao, D. L. Hastings, and J. Zweit, Â“One-pass list-mode EM algorithm for highresolution 3-D PET image reconstruction into large arrays,Â” IEEE Trans. Nucl. Sci., vol. 49, pp. 693 699, June 2002.

PAGE 167

159 [65] A. J. Reader, P. J. Julyan. H. Williams, D. L. Hastings, and J. Zweit, Â“EM algorithm system modeling by image-space techniques for PET reconstruction. IEEE Trans. Nucl. Sci., vol. 50, pp. 1392 1397, Oct. 2003. [66] S. Kullback and R. Leibler, Â“On information and sufficiency,Â” Annals Mathematical Statistics , vol. 22, pp. 79 86, 1951. [67] I. Csiszdr and G. Tusnddy, Â“Information geometry and alternating minimization procedures,Â” Statistics & Decisions , vol. SI-1, pp. 205 237, 1984. [68] R. M. Kessler, J. R. Ellis, Jr., and M. Eden, Â“Analysis of emission tomographic scan data: limitations imposed by resolution and background,Â” Journal of Computer Assisted Tomography , vol. 8, pp. 514 522, 1984. [69] G. T. Herman and L. B. Meyer, Â“Algebraic reconstruction techniques can be made computationally efficient,Â” IEEE Trans. Med. Imag., vol. 12, pp. 600 609, 1993. [70] K. Rajan, L. M. Patnaik, and J. Ramakrishna, Â“Linear array implementation of the EM algorithm for PET image reconstruction,Â” IEEE Trans. Nucl. Sci., vol. 42, pp. 1439 1444, Aug. 1995. [71] D. W. Shattuck, J. Rapela, E. Asma, A. Chatzioannou, J. Qi, and R. M. Leahy, Â“Internet2-based 3D PET image reconstruction using a PC cluster.Â” Physics in Medicine and Biology, vol. 47, pp. 2785 2795, July 2002.

PAGE 168

BIOGRAPHICAL SKETCH Ji-Ho Chang was born in Daegu, Korea. He received the B.S. and M.E. degrees in electronic engineering from the Kangwon National University, Chuncheon, Korea, in 1997 and 1999, respectively. He received the M.S. degree in electrical and computer engineering from the University of Florida, Gainesville, Florida, in 2001. For the Ph.D degree, he has been working on developing image reconstruction algorithms for positron emission tomography. 160

PAGE 169

I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. IA-V /*Â•/ A/ Â• M. M. Anderson, Chair Associate Professor of Electrical and Computer Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor oÂ£J2fei|osophy . Jian Profai^or of Electrical and Computer Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation ^^nd is fully adequate, in scope and quality, as a dissertation for the degree ofJ^ctbr df Philosophy. C /Ered J/Taylor ^Professor of Computer and Information Science and Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is^fully adequate, in scope and quality, as a dissertation for the degree of Doctor oi Bernard A. Mair Professor of Mathematics

PAGE 170

This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. 0 August 2004 Pramod P. Khargonekar Dean, College of Engineering Kenneth J. Gerhardt Interim Dean, Graduate School