UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
A UNIFIED SUPERRESOLUTION APPROACH FOR OPTICAL AND SYNTHETIC APERTURE RADAR IMAGES By FRANK M. CANDOCIA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1998 To Yudit ACKNOWLEDGEMENTS It is easy to feel overwhelmed by the amount of time and effort required of the Ph.D. degree. These feelings were, often times, too familiar. With patience, persistence and the help of many individuals, my doctoral studies provided for a very memorable and fruitful four years. I would like to thank these people for their help and friendship throughout my studies. First, I wish to acknowledge and thank Dr. Jose C. Principe for his role as advisor and mentor throughout my doctoral studies. Our discussions and exchanging of ideas were fundamental in the shaping of this work. I would also like to thank him for providing a stimulating work environment in CNEL through which my engineering horizons were expanded. Thank you to my supervisory committee members: Dr. John M. M. Anderson, Dr. A. Antonio Arroyo, Dr. John G. Harris and Dr. Janice C. Honeyman your interest and suggestions were appreciated. Dr. John G. Harris deserves additional thanks, particularly for our fruitful discussions and his suggestion to me to work on optical images. I would like to thank Dr. Leslie Novak of MIT Lincoln Laboratory for providing me the opportunity to work with him during the summer of 1997. It was a truly rewarding experience. My colleagues in and around CNEL also deserve recognition for their help and friendship. These include Victor Brennan, Craig Fancourt, John Fisher, ChenHsien Wu, iii Dongxin Xu and LiKang Yen. Thanks for your insight into all the issues and questions I came to you with. I wish to give a special thanks to my parents and the Rodriguez family. All of the pastelitos you sent from Miami were a definite morale boost. Your love, encouragement, and support are indelibly etched in my memory. Last, but certainly not least, I wish to thank Yudit. Words cannot fully describe how special you have been. Your love, support, patience and sacrifice have been immeasurable. iv TABLE OF CONTENTS page A CKN OW LED GEM EN TS .................................................................................. ii A B ST R A C T ...................................................................................................... viii CHAPTERS 1 IN TR O D U CTIO N .............................................................................. 1 2 AN APPROACH TO SUPERRESOLUTION ..................................... 5 2.1 Introducing the Problem ........................................................ 5 2.2 Comments and Observations .................................................. 9 2.3 Motivation and Description of the Superresolution A rchitecture ........................................................................... 15 3 EXISTING APPROACHES TO RECONSTRUCTION OF OPTICAL AND SYNTHETIC APERTURE RADAR IMAGES ........................ 22 3.1 Optical Image Interpolation with a Single Kernel ................... 23 3.1.1 Com m on Kernels ......................................................... 24 3.1.2 Increasing the Sample Density ..................................... 28 3.1.3 Interpolation as Projections ......................................... 29 3.2 Review of Reconstruction for Optical Images ........................ 30 3.3 SAR Imaging via Nonparametric Spectral Estimation ............ 42 3.3.1 Phase History Data ...................................................... 43 3.3.2 Standard Approaches ................................................... 45 3.3.3 Minimum Variance Approaches .................................... 46 3.3.4 Estimating a Full Rank Autocorrelation Matrix ............. 49 3.4 Review of Reconstruction for Synthetic Aperture Radar Im ages ................................................................................... 5 1 4 SUPERRESOLUTION OF OPTICAL IMAGES ............................... 58 4.1 Optical Image Acquisition Model .......................................... 59 4.2 Using Local Information ....................................................... 61 4.2.1 Relation Between Correlation and Euclidean Distance .... 61 v 4.2.2 Existence of Locally Correlated Information in Images ... 66 4.2.2.1 Interblock Correlation .......................................... 67 4.2.2.2 Scale Interdependencies ........................................ 71 4.3 Architecture Specifics ........................................................... 79 4.4 Training Phase ...................................................................... 83 4.4.1 The Preprocessing ......................................................... 83 4.4.1.1 Decimation .......................................................... 84 4.4.1.2 Neighborhood Extraction ...................................... 84 4.4.2 Hard Partitioning the Input Space ................................. 86 4.4.2.1 Kohonen's SelfOrganizing Feature Map .............. 87 4.4.2.2 Cluster Formation ................................................. 88 4.4.3 Associative Memories ................................................... 89 4.4.3.1 Least Mean Squares ............................................. 89 4.4.3.2 Multilayer Perceptrons .......................................... 93 4.4.3.3 Association of Neighborhoods .............................. 94 4.5 Reconstruction Phase ........................................................... 97 4.6 Relation to the Mixture of Experts ........................................ 99 5 EXPERIMENTAL RESULTS FOR OPTICAL IMAGES ................ 103 5.1 Analysis and Comparison ..................................................... 103 5.1.1 Linear Associative Memories ....................................... 106 5.1.1.1 Standard Images ................................................. 106 5.1.1.2 Texture Images ................................................... 119 5.1.1.3 Additional Results and Comments ....................... 124 5.1.2 Nonlinear Associative Memories .................................. 129 5.1.3 Mixture of Experts ...................................................... 134 5.2 Feature Extraction ................................................................ 139 5.2.1 Image Features ............................................................ 139 5.2.2 Correlated Image Structure ......................................... 142 6 SUPERRESOLUTION OF SYNTHETIC APERTURE RADAR IM A G E S .......................................................................................... 144 6.1 Imaging with Multiple Models .............................................. 145 6.2 Architecture Specifics .......................................................... 146 6.3 Models Used ........................................................................ 148 6.4 The Methodology ................................................................. 151 6.4.1 The Imaging Criteria ................................................... 151 6.4.2 Selecting Between Models ........................................... 152 6.4.3 The Superresolution Process ....................................... 153 6.5 Experimental Results ............................................................ 155 6.5.1 MSTAR Slicy Data ..................................................... 155 6.5.2 MSTAR Target Data ................................................... 157 6.5.3 Detection Performance ................................................ 158 vi 7 CONCLUSIONS AND FUTURE W ORK ......................................... 166 REFERENCES ..................................................................................................... 171 BIOGRAPHICAL SKETCH ................................................................................. 178 vii Abstract of Dissertation Presented to the Graduate School Of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A UNIFIED SUPERRESOLUTION APPROACH FOR OPTICAL AND SYNTHETIC APERTURE RADAR IMAGES By Frank M. Candocia May 1998 Chairman: Dr. Jose C. Principe Major Department: Electrical and Computer Engineering This work addresses the issue of superresolving optical and synthetic aperture radar (SAR) images. Superresolution is the process of obtaining an image at a resolution higher than that afforded by the sensor used in the imaging. The concept of signal resolution is shown to be intimately related to the notion of perfect reconstruction. As such, the major issue addressed in this work is in establishing an appropriate set of bases for the reconstruction of images in these domains in terms of a finite set of collected samples. The existing theoretical foundations for perfect reconstruction in the aforementioned domains is expressed in terms of linear projections and infinite extent data. However, practical restrictions leading to finite collected data limits the resolution viii afforded by the theoretically established bases the sinc bases in the spatial domain and the Fourier bases in the frequency domain. Superresolution deals with this issue by incorporating a priori information into the process of establishing an appropriate set of projections for reconstruction. In the optical domain, a priori information is extracted locally in an unsupervised manner from low and high resolution versions of digital images of the same scene. This is in agreement with the local weighting of the theoretically established sinc bases for perfect reconstruction. In the SAR domain, a priori assumed models are inherent to the reconstruction. The information is provided globally in terms of a best match to these models. This is also in agreement with the global weighting of the theoretically established Fourier bases for perfect frequency reconstruction. The superresolution of images is accomplished by means of a modular architecture using a twostep approach. First, each image neighborhood is assigned to a priori determined clusters for the image class. Second, each cluster is linked to its optimal reconstructor which was trained from the collected set of samples. The results obtained herein are compared against several accepted sets of bases. Optical images of real scenes as well as textures are tested. SAR data from real metallic polyhedral objects as well as U.S. military tanks are imaged and tested for automatic target detection performance. It is shown that by appropriate incorporation of a priori information into the superresolution procedure, better reconstructed images result. ix CHAPTER 1 INTRODUCTION The intrinsic resolution limits of a sensor can typically be attributed to its physical properties. One way of increasing this resolution is to physically alter the sensor. Another approach would involve the appropriate signal processing of collected data. The work herein is concerned with the latter within the specific domains of optical and synthetic aperture radar (SAR) images. Superresolution is the term given to the signal processing of collected information which achieves a resolution higher than the one afforded by the physical sensor. Applications of this work include obtaining high quality image prints from digital images, increased detection of targets in military or civilian imaged scenes and detection of small tumors in medical imaging. Superresolution of optical images is generally viewed as seeking a perfect reconstruction. As such, a comparison to the interpolation of signals is natural and warranted. In sampling theory we can define the resolution of a signal as the minimum number of samples sufficient to completely represent our signal [Vetterli and Kovacevic, 1995]. To this end, one important result regarding the perfect reconstruction of a signal from its samples comes from the Shannon sampling theory [Marks, 1991; Oppenheim and Schafer, 1989]. In this theory, the original signal can theoretically be reconstructed from its samples provided certain conditions regarding the signal and its samples are met  irrespective of the statistics of the signal. If these conditions are violated, then common 1 2 engineering knowledge says that we can never recover information beyond half the sampling frequency. In fact, this is not necessarily so. A priori knowledge regarding the original signal can be used to optimally recover a signal from its samples. Superresolution of SAR images is generally viewed as the problem of detectability of complex sinusoids embedded in nonsinusoidal signals ("noise"). The desire to locate radar scattering elements (trihedral reflectors) implies the detection of these complex sinusoids. It is the detection of these reflectors upon which SAR imaging is presently based. Having said this, we can define the resolution of a signal in the SAR domain as the minimum number of samples sufficient to distinguish between the two complex sinusoids in that signal that are closest in 2D frequency. This resolution, or spectral resolution as it is also referred, is established in the frequency domain and is determined by the main lobe of the rectangular window [Kay, 1988; Stoica and Moses, 1997]. In SAR imaging, the superresolution of objects other than trihedral reflectors would benefit the automatic target detection and recognition (ATD/R) problem and it is the focus of the superresolution approach presented herein. The phase history of a typical radar illuminated scene contains, among other things, reflected energy from nonmetallic objects such as trees and open field. A priori knowledge of objects within the scene as well as how they ideally manifest themselves in the phase history can be used to enhance the performance of automatic target detection and recognition (ATD/R) systems. The issue of superresolution is generally viewed as different in the aforementioned image domains. I submit that it can be regarded as treating the same problem the perfect reconstruction problem in each image's respective domain which is space for optical images and 2D frequency for SAR images. Owing to this, an architecture for the 3 superresolution of images is described in this work which unifies the superresolution problem in both domains. An image is superresolved based on several modules. Each module is "tuned" to the specific character of the local image content at the point to be reconstructed. In order to determine the appropriate module for the point being imaged, a simple clustering on the signal samples is performed. Chapter 2 introduces and describes the superresolution problem more formally. The goal of the chapter is to motivate the superresolution architecture presented at the chapter's end. With this in mind, issues regarding the practical limitations of the perfect reconstruction of a signal from a set of samples are addressed and several observations are made regarding the mitigation of the problems faced. Chapter 3 serves to present methods in common use for the reconstruction of a signal from its samples in both the optical and SAR image domains. It is these methods that the superresolution procedure of this study is compared against. Chapter 4 concentrates on the details regarding the superresolution of optical images. Here we discuss the reasons for our choice of modules within the architecture presented. It is shown here that the architecture is implementing a convolution with a family of kernels. This is equivalent to convolving with a shiftvarying finite impulse response (FIR) filter where the filter weights are determined according to the local image characteristics. Chapter 5 is dedicated to experimental results regarding the superresolution of optical images. The results obtained are analyzed in detail and comparisons to the results of the reconstruction approaches presented in Chapter 3 are provided. Among the results analyzed are the use of linear associative memories and nonlinear associative memories in 4 our superresolution methodology as well as a comparison to the performance of the mixture of experts architecture. Chapter 6 reports on the methodology for superresolving SAR images. The superresolution is based on two models: the point scatter or trihedral reflector model for targets and a noise model for clutter. The superresolution here is also a shiftvarying FIR filtering scheme. The filter weights are based on minimum variance estimation, that is, they are dependent on the minimization of expected image power subject to how well the assumed model resided in the SAR phase history samples. Experimental results with this method are reported which increase automatic target detection performance over the conventional methods. Finally, Chapter 7 concludes this work with comments regarding the material presented herein. The contributions of this work are stated and a commentary on future research directions is given. CHAPTER 2 AN APPROACH TO SUPERRESOLUTION The resolution of a signal addresses the ability to perfectly reconstruct a continuous signal from a set of samples. It is dependent on two things: the sampling period for obtaining the samples and the number of samples collected. Accomplishing this perfect reconstruction with a universal set of linear projections defines signal resolution. If the conditions for perfect reconstruction from this universal set are not satisfied, then the reconstruction can only be approximated. Having said this, superresolution is the ability to reconstruct a signal (from a set of samples) beyond the limit imposed by sampling theory. This chapter introduces a methodology for the superresolution of images. We begin by introducing and defining the concepts of signal resolution. Then comments and observations on the practical limitations associated with perfect reconstruction are presented. The chapter concludes with a motivation for the superresolution of images via the methodology presented in this work as well as a description of the architecture which implements it. 2.1 Introducing the Problem Webster's dictionary defines resolution as the process of separating into constituent or elementary parts [Webster, 1989]. Implied in this definition is that, ideally, all of the elementary parts completely describe the object they are derived from. This 5 6 work is concerned with the elementary "parts" of signals and the ability to fully describe them from a set of samples. As such, we are necessarily addressing the issue of perfect signal reconstructability. In this sense, resolution is synonymous with reconstruction. This is now elucidated. Let x(a), where oo< < a< be a continuous signal with maximum frequency content Qc rad/sec. Thus our analysis is based on bandlimited signals. We can represent x(a) as a linear combination of an orthogonal set of basis functions as x(a) = Iy[n]ka,[n] (21) n where the linear weighting is given by the samples in y[n] and ka [n] represents our set of basis functions. Eqn. (21) addresses the perfect reconstruction of x(a) from a set of samples. For the sake of simplicity, our development is performed with ID signals. The extensions to 2D should be clear. For a signal of time, x(t), the perfect reconstruction of bandlimited signals from a set of samples is addressed in the Shannon sampling theory [Marks, 1991; Oppenheim and Schafer, 1989]. Here y[n] = x(nT,) for all integers n, that is oo < n < Go, and sampling period Ts. The equation describing this perfect reconstructability is t x(t) = x(n)sinc n (22) where, by definition, sinc(t) s7. We see that our basis functions are given by k,[n] = sinc( n) and the basis functions in this infinite set are orthogonal [Woodward, 1964], that is, Ssinc(# n)sinc(#m)dt= T,6[nm]. 7 The perfect reconstruction can be obtained if the sampling period satisfied T, < where TC = For time signals, L is the critical sampling rate (also known as the Nyquist rate). Therefore, every sample of x(t) can be exactly resolved with an infinite set of samples provided the sampling period is sufficient. For a signal in frequency, x(Q), the perfect reconstruction of bandlimited signals from a set of samples is addressed by the discretetime Fourier transform (DTFT). The set of samples y[n]= y(nT,) comes from the sampling of a superposition of complex exponentials. The equation describing this perfect reconstructability is x()= Zy(nT7)ein'" (23) n=co Here we see that our basis functions are given by k[Jn] = ejfT'" and the infinite set of these basis functions are orthogonal, that is, Sejn eJ"md d = 27rt[n m]. The perfect reconstruction can be obtained if the sampling period satisfied T, < T, where Tc = I as before. For frequency signals, Tc is the critical sampling rate. We have seen that every instant of our bandlimited time and frequency signals, x(t) and x(i) respectively, can be exactly recovered (resolved) with an infinite set of samples provided the sampling period is sufficient. Thus the sampling period T, provides the limit to which our signals can be perfectly reconstructed (resolved) from an orthogonal set of linear projections and an infinite number of samples. It is in this manner that the sinc and Fourier bases are the common and universal set of linear projections capable of perfectly reconstructing bandlimited signals in the time (space) and frequency domains, respectively. 8 These sets of bases are universal in that all appropriately sampled infinite extent bandlimited signals can be reconstructed with these bases. For finite extent data, eqns. (22) and (23) can be expressed respectively as x(t) = [x(n T)w[n]]sinc t n (24) n=0o T, and co i(Q) = [y(nT)w[n]]ejr' (25) where w[n] describes samples of our window function wn fl; 0 and N is the extent of our window. Notice that the hat character in i has been used purposely to denote the approximation afforded by a finite number of data samples. The finite number of data reduces the resolvability of the signal. We can see this by examining eqns. (24) and (25) in the frequency domain. The continuous Fourier transform (FT) of eqn. (24) yields i(n) = [ X(CO) W(co)]n(O ) and eqn. (25) can be equivalently expressed as (9) = [Y(o)) W(co)] where is the periodic convolution operator, co = QT, is angular frequency in radians, Y(co) is the DTFT of y(nT,) and is periodic of period 27n in co and in C (similarly for X(a) and W(co)) and F(n)={ 1 10< ; 0 otherwise }. The effect of the windowing function w[n] on eqns. (24) and (25) is to smear (distort) the true frequency 9 spectrum X(Q) of x(ac). This results in a decrease in the ability to properly resolve the signal x(a). Thus the resolution of a signal is also limited by the number of samples we have to describe the signal. The object of superresolution is to be able to reconstruct x(c) more faithfully than the resolution afforded by eqns. (24) and (25), that is, the resolution afforded from a finite set of observed data obtained at a sampling rate T. 2.2 Comments and Observations Practical issues may preclude the satisfaction of the conditions necessary for perfect signal reconstruction as given in the previous section. In particular, (1) there must be no noise associated with the collected samples (e.g. no quantization error), (2) the sampling rate must be less than the critical sampling rate of the signal and (3) the signal must have been sampled to infinite extent. In the practical sampling of SAR data, the issue concerning sampling rate is usually not critical. This is because the bandwidth and frequencies at which the radar operates are known at the time of data collection. The issue of noise is much more critical. For example, SAR data is typically imaged from a rectangular grid of equispaced radar returns. This data, however, is sometimes processed by interpolating through the actual collected return grid. This is particularly true for inverse SAR and spotlight mode SAR data collection where a typical grid consists of radially equispaced samples swept out through several angular increments [Odendaal et al., 1994; Mensa, 1991]. As such, the accuracy with which frequency information from the imaged scene can be produced is reduced. 10 In the practical sampling of optical images, the issue concerning quantization error is usually not critical. The standard use of an 8bit dynamic range usually yields highly acceptable and pleasing images. The issue of sampling frequency is much more critical. The information of a natural scene has typically very high spatial frequency content. The sharp contrast we perceive in order to delineate objects (object boundaries) as well as the textural character of those objects are but some of the attributes inherent to the high frequency information content of optical images. As such, the sampling frequency used in collecting optical images is generally not large enough to fully describe a "continuous image" in the sense of the Shannon sampling theory. An interesting attribute of optical images is their highly structured nature. This structure appears locally and can be used to characterize objects in these images, that is, portions of objects can be described as smooth, edgy, etc. Information such as this is not considered in the sampling theorem. Let us now make a few observations. Eqns. (22) and (23) specify a universal set of basis functions which are linearly weighted by a set of collected samples corresponding to signal x(ca). There is an uncountable infinity of basis functions in this universal set (one for each instant of a in our original signal to reconstruct) and they are given by s, = {sinc ( n)  oo < n < 0o, oo < t < oo} for spatial signals and, for frequency signals, by so = {e I < n < 0, oo < < oo}. These sets are universal with respect to those signals that are sampled beyond their critical rates and the resolution afforded by these sets is given by this rate. Hence, if samples were collected at a sampling rate Ts that did not meet the critical sampling rate, the sets of sinc or Fourier basis functions could not be linearly weighted as in eqns. (22) and (23) to perfectly reconstruct our signal. This does not preclude the existence of other sets of basis functions which could be linearly 11 combined by samples collected at a rate below the critical sampling rate to still yield the signal's perfect reconstruction. As an example, consider the aliased spectrum of fig. (21). Here, the original signal's scaled spectrum is given by the dashed triangular region centered at the origin. The original spectrum is X(p) c 0; otherwise The aliased spectrum of fig. (21) for I  < Kc when c, < C, < 2Q, is 1 n. 19n n nc< n f T, TOk and is given by the solid line. It can be easily seen that our contrived signal X(C) can be recovered by linear filtering (frequency multiplication) with the filter Mc 2 1 c Q,  0 o; I l > 9 as pictured in fig. (21), that is, X(2)= X, (Q)K(Q). This example shows that the original signal can be recovered via a linear filtering even if it has been aliased as long as the aliased spectrum has no null frequencies and the original signal's spectrum is known. In practice this information is not typically known making the recovery of "improperly" sampled signals usually a difficult task. 12 K(n) T, / .............. nsC K2 ac K2 f+"c 0 as, 2 nc Qs 9s+c Q 2 2 Figure 21. Recovering a signal from an aliased representation. Aliasing results when sampling below the Nyquist rate. Copies of the true signal spectrum are given by the dashed triangular regions. The aliased spectrum is given by the solid line. Here, the true signal can be recovered from the aliased signal by linear filtering (frequency multiplication) via the filter K(fQ) designed from a priori knowledge of the true spectrum. A priori knowledge of the character of a signal, rather than its frequency spectrum, could also be used to appropriately reconstruct the signal from its observed samples. As a simple example, let's consider the set of piecewise constant continuous time functions where each constant segment in the signals has duration T seconds (e.g. signals quantized in time). An illustration of such a function is provided in fig. (22a). Note that this signal has infinite frequency content due to its piecewise constant nature. If our observed samples were obtained by sampling this function every T seconds, then clearly the convolution of a zeroorderhold kernel with the observed samples would be optimal for recovering the piecewise constant function, that is, it would yield perfect reconstruction even though the function in question was grossly undersampled according to the Shannon sampling theory. This convolution is pictured in fig. (22b). The set of piecewise constant signals is not typically encountered in practice so the basis set resulting from the zeroorderhold kernel is of limited use. 13 T T x(t) 000 000 r T t (a) 000 T 000 1 (b) T Figure 22. Simple example illustrating how perfect reconstruction is possible when a priori knowledge of the relation between signal samples and its corresponding continuous function is known. (a) Piecewise constant continuous function grossly undersampled according to the Nyquist criterion, (b) Recovering the continuous function in (a) from its samples requires a simple convolution of the samples with a zeroorderhold kernel. However, this very simple example illustrates that superresolution could be based on a priori knowledge about the signal to reconstruct given the observed samples  irrespective of the frequency content of the signal. It also illustrates the fact that perfect reconstruction of a signal according to the Shannon sampling theory only establishes sufficient conditions for the perfect reconstruction of signals characterized by frequency from their samples. If a priori knowledge of the signal corresponding to the observed samples is available, then this can be used to develop a technique for superresolving a signal. Therefore, superresolving images is directly associated with methods to acquire a priori information about the signal of interest. Recently, optimal reconstruction of signals sampled below the Nyquist rate was proved possible by using a priori knowledge of the signal statistics [Ruderman and Bialek, 1992]. The statistics of the continuous signal were assumed known rather than any specific relation between the signal and its samples. In the results reported, the signal x(t) 14 was assumed to be bandlimited, Gaussian, zeromean and stationary. The samples x[n] from this signal were obtained via a point spread function f(t) that was not an ideal sampling Dirac delta and these samples were corrupted by zeromean, additive, independent and identically distributed (i.i.d.) Gaussian noise r7[n] of power a2 to yield the collected samples y[n] = x[n] + rl[n]. The authors obtained the same reconstruction filter when using two different optimization approaches: maximum likelihood and conditional average estimation. The optimal signal reconstruction was given by X(n) = K(fn)Y(Q) (26) where the filter K(D) was K( F()S(Q) K(Q) = 2 + F(+n n,)2 S( +nQ,) n=oo Notice that Y(K2) is the DTFT of the sequence y[n] so that Y(9) = Y(Q + n,) is periodic with period C, = and T, was the sampling period. S(Q) is the power spectrum of x(t) and F(KQ) is the FT of the point spread function f(t). From eqn. (26), the most likely signal i(t) is obtained through a linear filtering of the collected samples y[n]. In the case when the noise vanishes, the point spread function becomes an ideal sampler, the signal was bandlimited and the sampling frequency is above the Nyquist rate, that is, (2 = 0, F(n) = 1, X(9) = 0 for 101 > c, and 0, > 20c, the filter K(92) reduces to 0 otherwise 15 which is the FT of the sinc function. This result shows that the signal statistics play no role in perfect reconstruction when the Shannon sampling conditions are met. More importantly, this result shows that a priori knowledge of the statistics of a signal can be used to superresolve a signal from a collected set of samples. However, this analytic result is only valid for stationary Gaussian signal statistics and, in practice, signals are typically nonstationary and have very complex statistics. Our lack of knowledge of these statistics and the analytical intractability of determining the optimal filters for complicated density functions, as commented by the authors, limits the practical use of this method. 2.3 Motivation and Description of the Superresolution Architecture The motivation for the superresolution methodology stems from our goal of more efficiently using the information contained in available image samples. This requires projections onto data specific sets of bases instead of the bases established in the reconstruction theories. Naturally, learning or adaptive system theories play a crucial role in this methodology of designing data specific projections. The manner in which these models are realized must be consistent with the information character of images and how this relates to the superresolution problem. These issues are now addressed. The universality of the perfect reconstruction theories is an amazing result but the price paid is a strict limitation on the required resolution. The practical problem is to do the best we can with the available samples, that is, to superresolve the images. To yield better reconstructed images, we must find alternate sets of projections from which to reconstruct our images. It has been shown that perfect reconstruction can be achieved if a 16 priori knowledge is available, but in practice this knowledge is absent. So a central problem is how to capture a priori knowledge about the domain. In determining the set of projections to use, we must either make assumptions regarding our data or we must learn this a priori knowledge from the available data using nonparametric models. We are concerned with the latter. We propose in this dissertation a novel technique of extracting a priori information from an image by performing a down sampling operation on the original image. The high resolution image becomes the desired response to a learning system that receives the low resolution image as input. The highly structured and localized nature of images can be exploited in modeling the relation between low and high resolution versions of an image. It seems reasonable to consider several local models in the imaging process. This is due to the various structures present in images resulting from the objects in the imaged scene. Two particular image traits can be used in this modeling: there is much similarity in local structure throughout an image this structure is typically maintained across scales Local structure. A direct example of the first trait in optical images can be provided by considering an image of a person's face. Local neighborhoods about the person's cheeks and forehead are generally indistinguishable when viewed independently. We have assumed that the effects of lighting and other "special" attributes (scars, moles, birth marks, etc.) are absent in this comparison. An easy method to test this observation is to locate these similar image portions in an image and randomly swap them to form a new image. If the new image resembles the original one then our observation is correct. In this manner, all neighborhoods exhibiting a particular characteristic can be treated in 17 practically the same manner. These neighborhoods can be considered generated by the same statistical process. It will be shown that the models which we compare our neighborhoods against can be interpreted as the mean of each statistical process one for each model used. Examples of the existence of this first trait abound in images. It has recently been exploited to increase gains in compression schemes [Dony and Haykin, 1995a; Kambhatla and Leen, 1997]. The use of image compression has been popular for some time now and is evidenced by the widespread JPEG still image compression standard [Wallace, 1991] and PCA compression schemes [Dony and Haykin, 1995b]. The standard schemes for lossy image compression are based around the highly redundant information generally present in small image blocks which could be described in a more efficient and compact manner. In the case of compression via PCA, a single representation is established for all of the blocks in the image. The recent compression approaches exploit the first image trait by grouping the small blocks of an image into clusters that are most similar (correlated) with one another. In this way, each cluster can be described with an efficient representation of its own separate from those corresponding to other clusters. Overall, this results in a more efficient image representation than that afforded by the approaches of the standard compression schemes. The first image trait can also be seen in SAR images but in the frequency domain. It results from the radar backscattering of objects of similar composition and orientation. SAR imaging typically, however, is based solely on the backscatter model of the trihedral reflector. This reflector is ideally imaged as a bright point in a SAR image. However, the formation of SAR images should be based on a variety of models. In this 18 manner, objects exhibiting a particular backscatter characteristic can be imaged with the appropriate model. Across scale similarity. The second trait can be used in obtaining a high resolution image from a low resolution one. To accomplish this we must have knowledge of the relation between similar neighborhoods in a low resolution image and their higher resolution counterparts. In optical images, this relation can be established from available data using the down sampling operation. This is done by optimally associating homologous low and high resolution neighborhoods. In SAR images, the neighborhood relation is in the frequency domain. However, establishing the relation between SAR data and the a priori models used in the imaging is performed in the phase history domain. This is done by minimum variance model matching between the phase history data and the model used for the point being imaged. The manner in which we superresolve images exploits the aforementioned traits. Superresolving based on these two traits necessitates the need for establishing which neighborhoods of an image are similar in local structure how these neighborhoods relate across scale The superresolution architecture of fig. (23) operates on knowledge of these two premises. It exploits the relations between collected samples in establishing the basis functions for reconstruction since the conditions for perfect reconstruction of the sampling theorem cannot be met in practice. The neighborhood selection is established by a comparison of the distance between the received data and learned local models representative of the data. This is performed by the analysis and decision block of fig. 19 (23). This comparison is used to select the module most appropriate for the reconstruction of a particular image point from the C available modules. The reconstruction modules in the architecture for optical images are associative memories (optimal mappings) which produce the output pattern best associated with the input. They describe the basis functions for the reconstruction of image samples. They are established via training from available data of homologous low and high resolution neighborhoods. As such, the basis functions are highly localized in space. This is because the information most contributing to the character of an image sample resides about that sample. It is in agreement with the support properties of the sinc kernel in the sampling theorem. For SAR images, the modules of the architecture are nonparametric model estimators. The basis functions derived from these modules are global in extent. This affords the most frequency resolution in the reconstructed image. They are also in agreement with the global extent of the basis functions of the FT. One can expect that this methodology will yield better reconstruction than methods based on the sampling theorem. However, unlike the universal character of the sampling theorem, this superresolution method is specific to the character of images, that is, bases obtained for one class of images may perform poorly when reconstructing another class. Because of this, establishing the appropriate models with which to compare our data against is important to the successful superresolution of an image. With the appropriate models established, the collected data is transformed via the projections corresponding to each model. A recent approach in the neural network literature that is used in modeling the probability density function (PDF) of a set of data conditioned on another set of data is the mixture of experts (MOE) [Jacobs et al., 1991; Jacobs and Jordan, 1991]. The 20 conditional PDF in the MOE is modeled as a mixture Gaussian distribution. This architecture can be used in modeling the transformation between low and high resolution neighborhoods of images. The MOE will be compared to our superresolution methodology. Comparisons of the theoretical issues between the MOE and our approach as well as superresolution performance will be provided in a later chapter. Note that the reconstructed images from the architecture of fig. (23) are not continuous but are a set of samples at a higher density than the collected samples. Further details regarding this architecture for each image domain will appear in their respective chapters. MODULE 1  MODULE 2 Low Resolution Analysis and Decision Arranging of Local Superresolved Image Image Portions Image ...O J:. . 0o 0 MODULE C Figure 23. The general architecture for the superresolution of optical and SAR images. tk) CHAPTER 3 EXISTING APPROACHES TO RECONSTRUCTION OF OPTICAL AND SYNTHETIC APERTURE RADAR IMAGES This chapter presents background on common approaches to image reconstruction from collected samples. To address the perfect reconstruction issue, we require knowledge of the ideal image to be formed. The ideal optical image typically describes functionally the reflected light at each spatial location from the camera's viewpoint and within its field of view. This function describes the imaged environment as perceived by the human eye. The ideal SAR image does not have such a clear description. Among the possibilities, the ideal SAR image could represent the environment as perceived by the human eye (without the effects of atmospheric occlusions such as clouds, fog, precipitation, etc.). It could also describe functionally the electromagnetic reflectivity at each spatial location from the radar's viewpoint and within its "field of view" such that closely spaced scattering elements in the illuminated scene could be discriminated. In fact, SAR imaging currently attempts to address both of these issues. A description and formulation of the common approaches for reconstruction of optical and SAR images from collected samples is presented in this chapter. Also included herein are reviews of the recent work on the reconstruction and superresolution of optical and SAR images. The information presented in this chapter is the platform upon which the details of the superresolution methodology for both optical and SAR images will be developed in the appropriate chapters. These chapters will show that the common 22 23 approaches presented here are special cases of the superresolution methodology of this work. 3.1 Optical Image Interpolation with a Single Kernel A common manner in which optical images are reconstructed from a set of collected samples is by interpolation with a single 2D kernel. The process of interpolation was alluded to in the previous chapter when discussing perfect reconstruction via convolution with the sinc kernel. We will formally introduce the problem here for the sake of clarity. The interpolation methods of this section utilize the most commonly encountered 2D kernels. Interpolation is seen to take the form of a convolution between the observed image samples and a 2D kernel. The samples of an analog image x(t,, t2) are denoted by x[nl,n2]=x(tl,t2),=n ,,T=,r2 Each interpolated point i(t,,t2) in these techniques is linearly related to samples of the image in question. The procedures of this section for interpolating continuous images take the form x(tl,t2)= x[nl, 2]k n n2) (31) nl =oo n2 =o where k(t, t2) is the continuous interpolation kernel that defines the weighting of the signal samples and T, T2 denote the sampling periods in their respective axes. The kernels that are presented here are prototypes, that is, T7 = T2 = 1. A simple scaling by the appropriate sampling rate, as in eqn. (31), yields the proper kernel. There are two important properties which are common to interpolation. They will be referred to as interpolation properties 1 and 2. These are: 24 1. x(tl,t2) must pass through the image samples when tj is an integer multiple of T, and t2 is an integer multiple of T2. This is mathematically equivalent to constraining the interpolation kernel as follows k(t1t2 =l t, =0,t2 =0 1 0 t =m, Tt2 = T2; ,m2 Z where Z is the set of integers [Hamming, 1973]. 2. Signal sample weighting diminishes as the distance from the point being interpolated increases. This property constrains the kernel's envelope to be a decreasing function as we move away from its center. If the sample weighting does not diminish, then it is finite within a "small" support region. Property 1 is a consequence of the definition of interpolation. That is, samples are only constructed between the given sample locations since, according to the Shannon sampling theorem, the observed samples must be free of quantization error and noise for perfect signal reconstruction. Property 2 is usually invoked by drawing a parallel with the Shannon sampling theory. If samples of a function are drawn at small enough intervals, then it is known that any function value between these samples can be reconstructed with the use of an asymptotically decaying function the sinc function. Finite extent kernels do not necessarily need to adhere to this. Specifying finite sample weighting within a small region of support is sufficient for computational numerical stability. 3.1.1 Common Kernels We will now present and briefly describe four common kernels particular to image interpolation. These are the zeroorderhold, bilinear, bicubic and sinc kernels. 25 ZeroOrderHold. In general, this is the simplest of interpolation kernels. Interpolation with this kernel is commonly referred to as nearest neighbor interpolation or replication. Using this kernel, the interpolated point .(tl ,t2) is chosen as x[n ,n2] at the signal sample closest to (t1, t2). The interpolation kernel is a "rectangular box" given by fl \t,<1t2\ < (32) k(tp,t2) = 2 2 (32) 0 otherwise The region of support here is only one sample in extent. The interpolated image usually has a very "blocky" appearance due to the repeating of image samples. This kernel satisfies property 1 but is not in strict compliance with the decay concept of property 2. The signal sample weighting does not diminish with distance, rather it is constant only within a small rectangular region. This interpolation kernel is illustrated in fig. (31). Bilinear. Interpolation with this kernel is the 2D counterpart of ID linear interpolation. The bilinear interpolation kernel is separable and is composed of two 1 D linear interpolation kernels as shown in the equation below k (1it, i)(1 t2 I) It I< 1, t2 I<1 k(tv ,t2) = (33) [0 otherwise The region of support for this method is local. Its support region is comprised of the four samples nearest the point of interpolation. It satisfies both interpolation properties listed above. This interpolation kernel is illustrated in fig. (31). Bicubic. Unlike the interpolation kernels of eqns. (32) and (33), this one is smooth. Interpolation with this kernel locally fits a separable polynomial through our sample points. The kernel is constrained so that the polynomial's first derivative must 26 agree at the endpoints of adjacent polynomials. The ID prototype kernel developed to meet such a criterion for the bicubic kernel is (t +2)2(t+1) 2 k(t)= (t 1)(t2 2 t ) 0 t < 1 (34) l(t1)(t2)2 lt <2 0 otherwise The separable 2D kernel can be easily generated by considering the sampling rates T1 and T2 of each axis in the above equation, scaling appropriately, and multiplying the resulting ID kernels. Properties 1 and 2 both hold for the above interpolation kernel. The separable 2D kernel is shown in fig. (31). Sinc. This kernel was established in the Shannon sampling theory. It is well known that any bandlimited signal can be reconstructed from its samples provided that the signal was sampled at a rate higher than the Nyquist rate. Here the interpolation kernel is given by k(tt2) =sinc(t)sinc( t2)= sin(t) sintt2) (35) and the region of support is global. This is given by the extent of the kernel which is seen to be infinite due to the sinusoids in the kernel. Note that the weighting function k(t, t2) in this case is separable and corresponds to an ideal low pass filter. This ideal kernel is illustrated in fig. (31). Notice that each of these interpolation kernels can be used in the perfect reconstruction of a continuous signal from its samples. Convolving a kernel with the signal samples results in perfect reconstruction when the continuous signal is known (or 27 assumed) to be locally constant for the zeroorderhold kernel, locally linear between samples for the bilinear kernel, locally smooth up to the first derivative for the bicubic kernel and globally smooth up to a given frequency of I for the sinc kernel. Zero Order Hold Kernel Bilinear Kernel 0.2, 0.2 0 0 o1 1 0 0 .5 0 0.5 0.5 0.5 t2 1 1 t t2 t Bicubic Kernel Sin Kernare of finite extent except the sinel 0.5 0.51 0" 01 0.5 0.52 22 2 4 12 2 2 ti t2 4 4 tj Figure 31. Common interpolation kernels. The kernels depicted here are specified in eqns. (32)(35). All are separable and eightfold symmetric (TI = T2). Note that all kernels are of finite extent except the sinc kernel. A typical property, though not essential, of common interpolation kernels is the separability of the kernel itself. This property usually simplifies the task of kernel composition and analysis and could also be useful for efficient implementation of the interpolation process. Symmetry is also an attribute usually associated with interpolation kernels. In the case of 2D interpolation, the symmetry is usually fourfold, that is, 28 k(tj,t2) = k(tI,t2)= k(t1,t2). If the sampling rate is the same in each of the two independent axes, this fourfold symmetry becomes eightfold. In this case, k(tj,t2) = k(t1 ,t2) = k(tl,t2)= k(t2,t1). These additional properties are shared by the kernels presented in this section. Kernels of finite extent necessarily use local information in the construction of a point whereas kernels of infinite extent are required to use the global content of the given signal samples. 3.1.2 Increasing the Sample Density The continuous image obtained via the linear filtering of eqn. (31) must be discretized to view or store it on digital computer. This discrete representation allows interpolation to be viewed as the process of increasing the sampling density of a signal. In this case, the interpolated signal can be obtained by expanding the given signal samples x[n, n2] and convolving with a sampled interpolation kernel [Schafer and Rabiner, 1973]. This form of interpolation is a linear filtering process used in interpolating discrete images. For an expansion rate of G, x G2, our expanded signal to convolve is given by x e[ n, 2 ] n = 0 + + GG1, + 2G " ] x[n,,n2 2 n2= 0,= G,+2G2,... (36) [0 otherwise J and the corresponding sampled interpolation kernel, obtained by sampling a continuous kernel, would be k[ni,n2]= k(tI,t2)jt=^ ,2=2_. Gi, G2 are both integers greater than 1. G, G2 The interpolated signal x[n ,n2] at our new sampling density is i[n, n2 = Xe[ni, n2]**k[ni, n2 ]. (37) Eqn. (37) is the discrete counterpart of eqn. (31). It will be shown that eqn. (37) is a special case of the superresolution methodology for optical images. 29 3.1.3 Interpolation as Projections The interpolation paradigm can be cast in several different forms. The form most common in the signal processing literature is that given in eqns. (31) and (37). It is cast as the linear filtering (convolution) of the signal samples with an interpolation kernel (FIR filter). An equivalent and convenient form of expressing the interpolation operation is via matrix multiplication. This form is compact and directly accessible by the optical image superresolution methodology. We define a collection of 2D image samples to be a matrix: X = [x,, ] where x,,,,, = x[nn,n2 1x( 2 ,t=n ,t2=2T2 Similarly, X is a matrix of interpolated samples, that is, samples at the new sampling density. Notice that X = [x,,2 ] where x "2 = 2[n,, n2] and these are estimates of x(t1, t2 )=,T,',2= 2T for T'= T, / G, and T2 = T2 /G2. Note that G1, G2 > 1 when increasing the sampling density, that is, interpolating. For separable interpolation kernels we define two interpolation kernel matrices, K, and K2, as ... k(0,0) k(1,0) k(2,0) ... k(0,0) k(0, ) k(0,) . K,= k(,0) k(1,0) k(2,0) ; K2 = .. k(0,1) k(0,1) k(0,1) ... ... k(,0) k(1,0) k(2,0) ... ... k(0,2) k(0, 2) k(0,2)  It is readily seen that eqn. (31), with t, = pT, t2 =p2T2' and pI,P2 e Z, is equivalently expressed as X= KXK2. (38) Notice that if G, = G2, then K, = K2. If the interpolation kernel is of finite extent then K, and K2 are also finite extent matrices. Another useful form of expressing eqns. (37) and (38) results when representing the image samples in X as a vector rather than as a matrix. In this case, we draw from the 30 results of Kronecker products in linear algebra to establish our expression [Graham, 1981]. The Kronecker product AB of any two matrices A and B is the matrix obtained from the matrix A by replacing each entry ay of A with the matrix ajB. If A is any M x N matrix and aj is its jth column then vec(A) is the MN x 1 vector vec(A) = [aT,aa ,...,a] Thus the vec operation transforms a matrix into a column vector by placing the columns of the matrix one underneath the other. Let us also define an operator uvec which undoes the vec operation. That is, the uvec operation transforms a column vector into a matrix by partitioning the vector into adjacent sections of equal length and placing them side by side to form a matrix. A result from linear algebra which is useful to us here is that vec(ABCT) =(CA)vec(B). With this, eqn. (38) is equivalently expressed in vector form as vec(X) = (KT K,)vec(X). (39) Notice that in this expression the equivalent of one interpolation kernel is being used on a vector of data. 3.2 Review of Reconstruction for Optical Images Work related to the notion of reconstruction has been described in the literature as zooming, magnification, enlargement, resampling, sample expansion, interpolation and superresolution. The foundation of this work, leading up to the present, lies in interpolation theory. Below is a brief exposition of the core of classical interpolation methodologies. This is followed by a comprehensive account of the recent work into the reconstruction and superresolution of optical images of the last decade or so. 31 Interpolation has an extensive history and has been applied in a variety of fields. An early use of such procedures was to determine function values between mathematically tabulated data [Hamming, 1973; Kreyszig, 1993]. Interpolation has also found use in the design of car and ship bodies. More recently, interpolation has found applications in image coding and video compression [Lim, 1990; Cramer et al., 1996]. The object of classical interpolation is to find polynomials that fit "well" through sample points in the data set. These polynomials generally fall into one of two categories: unconstrained and constrained polynomials. Procedures such as Lagrange interpolation and Newton's divided difference are classic examples of the former type of interpolating polynomials [Kreyszig, 1993]. These two approaches only make use of the given data samples. The polynomials can be fit through the data locally or globally. In either case, a polynomial of degree N 1 must be used in order to fit through N data samples (this applies to the case of onedimensional interpolation). Spline interpolation is an example of the latter. The name is derived from thin elastic rods, called splines, which engineers used to fit smooth curves through given sample points. Today it finds use in the design of car and ship bodies as mentioned earlier. It is a procedure whereby contiguous chunks of data samples are each fit with a polynomial. Further, the derivative(s) at the endpoints of adjacent polynomials are required to agree [Hamming, 1973]. This constraint results in a smooth polynomial fit through the data set. The resulting interpolation is generally not as oscillatory between data samples as compared with an unconstrained polynomial fit [Kreyszig, 1993]. It has been observed that the quality of interpolation is not generally dependent on the polynomial order of the interpolating functions [Kreyszig, 1993]. In many cases, it is 32 sufficient to impose that the first and/or second derivatives of the interpolation kernel exist. The smoothness of interpolating with splines over polynomials that lack derivative constraints is most noticeable in regions of abrupt change. Splines tend to limit the amount of oscillation following an abrupt change. An important result in interpolation theory resulted from the works of Borel, Whittaker, Nyquist, Kotel'nikov and Shannon [Marks, 1991]. These men introduced what is known today as the ShannonWhittakerKotel'nikov Sampling Theorem. The fundamental result of the Shannon sampling theory, as it is referred to in this work, is that a bandlimited signal is uniquely specified by its sufficiently close equally spaced samples. In practice, interpolating through a set of samples using the sinc function of the Shannon sampling theory is not possible. Instead, it is necessary to integrate other functions and/or approaches into the interpolation process. This case has been extensively and effectively treated for the case of bandlimited signals sampled at a rate exceeding the Nyquist rate [Oppenheim and Schafer, 1989; Roberts and Mullis, 1987]. It is known that oversampling can be utilized in easing the constraints on the interpolation function and for the recovery of the original signal to highly acceptable levels of accuracy [Hauser, 1991]. This is evidenced by the compact digital audio technology used today. The interpolation work that is particular to images used the aforementioned polynomials a great deal up into the early 1980s. The 1D interpolation techniques were extended to the two dimensions corresponding to images by simply considering a separable interpolation polynomial. The most used interpolation polynomials, or kernels, consisted of either zero, first or third order polynomials. These were the zeroorderhold (nearest neighbor), bilinear and bicubic kernels [Netravali and Haskell, 1995]. The zero 33 orderhold and bilinear kernels do not produce a smooth transition between image samples. The bicubic is better in that its polynomial does have a continuous first derivative. This leads to a smoother transition between interpolated samples. The work of Hou and Andrews [1978] considered the use of a cubic Bspline as the interpolating kernel and Keys [1981] used a cubic spline. An analysis and comparison of these image interpolation methods is given by Parker et al. [1983]. There have been recent studies into the development of the theory of interpolating kernels [Lucke, 1993; Lanzavecchia and Bellon, 1994, 1995; Schaum, 1993; Appledom, 1996; Unser et al., 1992; Knoll, 1991]. These works focus on the developing of an "optimal" single interpolation kernel as a substitute for the sinc kernel. The kernels to develop are local. The assumption made in these works is that the signal has been sampled, at least, at the Nyquist rate. As mentioned earlier, the observed sample sequence is of finite extent, hence, the sinc kernel cannot be practically implemented. In particular, kernels have been composed of trigonometric functions [Lucke, 1993; Lanzavecchia and Bellon, 1994 and 1995], linear sums of Gaussian functions [Appledorn, 1996], polynomial Bsplines [Unser et al., 1992] and trigonometric as well as low order polynomial functions [Schaum, 1993]. It is interesting to note that some of the developed local kernels approach the sinc function in the limit of an infinite data sequence [Unser et al., 1992; Lucke, 1993]. Interpolation filters for restoring highly subsampled images have also been studied [Knoll, 1991]. The method is based on the iterative minimization of an error weighting function in a constrained mdimensional coefficient space. The use of a single linear interpolation kernel has proven to be very popular for image interpolation for two main reasons: (1) the design of linear kernels (kernels that 34 linearly combine sample points) is a mathematically tractable problem and (2) their implementation is nothing more than a convolution for which there exist fast algorithms for their implementation via the fast Fourier transform [Oppenheim and Schafer, 1989]. However, there are important considerations as to why these methods might not be best for reconstructing images. Images are typically characterized by the objects contained within the imaged scene. The figural continuity of objects which leads to an "instantaneous" transition of measured light intensity at objects' boundaries is probably the most prevalent characteristic particular to optical images. It is this abrupt change in reflected light which conveys the most information about a viewed scene. Other striking characteristics are the textures and light reflections from the objects which comprise the scene. All of this manifests itself as high spatial frequency information which is to be imaged. The observed image samples from a scene typically cannot represent well the scene in the sense of the Shannon sampling theory, that is, the aforementioned high spatial frequencies are aliased due to an insufficient sampling rate which comes from practical restrictions to the physical sensor. Recent trends in the literature show an effort in developing algorithms to better cope with this problem. There has been a shift from single kernel approaches, which cannot recover aliased and/or filtered high spatial frequencies to research effort which attempt to account for them. The methodologies developed thus far for addressing this problem can be roughly categorized as follows: (1) directional/edge based, (2) multiple kernel based, (3) orthogonal transform based and (4) variational/statistically based. The theoretical and practical issues characteristic of these methodologies are by no means mutually exclusive. As such, numerous algorithms cannot strictly be categorized as listed 35 above. However, for the sake of organization, these algorithms will be presented as categorically listed. Directional interpolation techniques recognize that high detail areas in images most often have a definite geometric structure or pattern, such as in edge regions [Lee and Paik, 1993; Algazi et al., 1991; Bayrakeri and Mersereau, 1995; Marsi et al., 1996; Jensen and Anastassiou, 1995]. In such cases, interpolation in the low frequency direction (along the edge) is much better than interpolation in the high frequency direction (across the edge). Thus, a directional interpolation scheme has to perform a local analysis of the image structure first and then base the interpolation on the local structure if a low frequency direction does exist. The algorithm by Lee and Paik uses an adaptive combination of zero orderhold and moving average filters. By combining these two kernels the authors show that an adaptive version of the fast Bspline algorithm is obtained. The work by Algazi et al. focuses on high contrast directional structures such as edges, thin lines and corners. The authors use rank order statistics to detect the presence of these local image structures. They do this because they state that each of these local image structures is characterized by a bimodal distribution. Jensen and Anastassiou have a method that first detects the presence of a high contrast edge then estimates its location and orientation. If an edge is detected, it fits an ideal step edge in the region. If no edge was detected the algorithm reverts to bilinear interpolation. The algorithm by Bayrakeri and Mersereau considers weighted directional interpolation. The interpolated values along various directions are combined using directional weights which depend on the variation in that direction. The interpolation value for each direction is assigned based on the magnitude of its directional derivative. Marsi et al. create a simple edgesensitive interpolation filter. It is a separable 36 filter with one adjustable parameter. The parameter is selected adaptively according to the sample values neighboring the point to be interpolated. The filter is shown to default to a linear interpolator when this adjustable parameter equals zero. It should be noted that these algorithms tend to be computationally cheap. Multiple kernels for interpolation analyze local image statistics in order to determine an appropriate interpolation kernel for the region under consideration [Wang and Mitra, 1988; Winans and Walowit, 1992; Darwish and Bedair, 1996; Mahmoodi 1993]. The choice of kernels are typically ad hoc and typically two to three classic interpolation kernels are employed by these algorithms. The actual interpolation kernel used in each of these techniques can be interpreted as a shiftvarying kernel whose form is dictated by an image's local structure. The work of Wang and Mitra bases its interpolation on three local models: constant, oriented and irregular. The constant model is used to describe a block whose intensity variation is negligible to the human eye. The oriented model represents a block where the intensity variation is along a particular direction and the irregular model is used on blocks with more than one pronounced direction present (such as corners). Winans and Walowit base their work on a cubic spline kernel with an adjustable parameter. The authors preset this parameter to one of three values, hence resulting in three interpolation kernels. The parameter is chosen according to the edge information in the neighborhood of the interpolated pixels. Darwish and Bedair also use three interpolation kernels: the zeroorderhold, a cubic spline and an additional kernel established by the authors. All three are separable. The zeroorderhold kernel is used in edge regions, the cubic spline is used in homogeneous (smooth) regions and their third kernel is used in regions that are not classified as smooth or edgy. 37 Mahmoodi describes a method for using two interpolation kernels. A fifth order kernel with a 2x2 region of support is used in high contrast regions while a cubic spline with a 4x4 region of support is used otherwise. In order to select a kernel, a threshold is checked between the ratio of the sample deviation of the 2x2 and the 4x4 block around the point to be interpolated. The use of orthogonal transforms for interpolation [Koltracht et al., 1996] has focused on using the discrete cosine transform (DCT) [Martucci, 1995; Shinbori and Takagi, 1994; Hong et al., 1996] and the wavelet transform [Keshavmurthy et al., 1996; Chang et al., 1995]. Work involving the wavelet transform utilizes information across image scales to predict information at finer scales. This exploits the fact that evolution across resolution scales characterizes the local regularity of a signal [Mallat and Zhong, 1992]. The work of Martucci performs the interpolation completely in the DCT domain. Previous work by the author has shown that it is possible to use the DCT to implement digital filters [Martucci, 1993]. As such, all spatial domain filtering is equivalently performed in the DCT domain. Shinbori and Takagi use an iterative application of the GerchbergPapoulis (GP) algorithm with the DCT to obtain a magnified image. This technique uses the GP algorithm as the basic principle for extending the frequency band. The spatial high frequency components are restored during the forward and inverse iterative transform process for the image by DCT using two constraints that the spatial extent of the image is finite and correct information is already known for the low frequency components. Hong et al. use the coefficients of a locally performed DCT to extract edge information. Five different edge types are identified and each type has a corresponding interpolation technique. Once the edge type is determined, the zeroorder 38 hold and bilinear interpolations are performed and combined. Then five different Gaussian low pass filters adaptively reduce the discontinuities which resulted from this aforementioned interpolation combination. The wavelet based work by Keshavmurthy et al. decomposes the image into an overcomplete (or redundant) representation via an autocorrelation shell derived from a Daubechies wavelet. The wavelet detail information across known scales is used to extrapolate the details at finer scales. Chang et al. implement a method that extrapolates the wavelet transform of the higher resolution image based on the evolution of the wavelet transform extrema across scales. By identifying three constraints which the higher resolution information needs to obey, they enhance the reconstructed image through alternating projections onto the sets defined by these constraints. Koltracht outlines a method based on parametric families of orthogonal transforms which arise from the wavelet multiresolution paradigm. The idea's development considers the original image as a low frequency component of a family of potential enlargements. Interpolation based on variational and/or statistical principles presents a different view of interpolation than from those previously discussed [Karyiannis and Venetsanopoulos, 1991; Saul and Jordan, 1996; Schultz and Stevenson, 1994; Ruderman and Bialek, 1992]. Despite this, some interesting connections between these recent methodologies and some previously mentioned have been made. In fact, certain interpolation techniques previously discussed have been found to be special cases of these new methodologies. The variational methods formulate the interpolation problem as the constrained minimization of a certain functional. The choice of the functional to be minimized varies and is based on a combination of the restrictions imposed by the 39 mathematical problem and certain considerations related to the physical problem. The work of Karyiannis and Venetsanopoulos concerns itself with the application of variational principles to the image interpolation problem. They formulated the problem as the constrained minimization of a quadratic functional, with particular emphasis on the class of quadratic functionals whose minimization amounts to 2D generalized Lsplines. The specific functionals used were based on certain image models. It is stated in this paper that the previously discussed method of cubic splines results from the minimization of a certain quadratic functional. The work of Saul and Jordan incorporates variational and statistical principles for the interpolation problem. Their approach considers a multidimensional input and a model of its density. They don't specifically concern themselves with images but focus on how to optimally interpolate between two points in space. They do this by assigning a cost to each path through space, based on two competing goals one to interpolate through regions of high density, the other to minimize arc length. They present an efficient solution (in high dimensions) to this problem for Gaussian, Dirichlet and mixture models. Schultz and Stevenson describe an approach which considers variational and statistical principles for image interpolation. Their methodology results in the optimization of convex functionals (not necessarily quadratic). A statistical model for the image was assumed which incorporates the convex Huber function. This function is quadratic up to a threshold and linear beyond this threshold. Use of this cost function helps maintain discontinuities from an image into its expanded version. The work of Ruderman and Bialek shows that if the probability density of a signal is known a priori, this knowledge can be used to dealias information contained in samples of that signal. In fact, this knowledge can be used to estimate frequencies 40 beyond the Nyquist limit. They posed the superresolution problem as one of finding the "best" estimate of the original signal given the original signal's PDF and the observed signal samples. The most useful estimator will depend on the costs for making different kinds of errors. The two natural choices for an estimator are maximum likelihood (ML) and the conditional average. Their observed signal samples are assumed to be obtained by a general point spread function (PSF), not just sampled by a train of impulses, and corrupted by Gaussian noise. The noise for each sample is assumed to be statistically independent. The example presented in their work assumes the original signal is drawn from a Gaussian ensemble. Under these assumptions both estimators yield the same result. They show that the optimal superresolution procedure for a Gaussian signal results in a convolution of the signal samples with an interpolation kernel which is a function of the noise variance, the point spread function, the original signal power spectrum and the sampling period. They obtain a very satisfying result using this interpolation methodology. For the case of zero noise variance, a delta PSF and a bandlimited original signal that was sampled above the Nyquist rate, the interpolation kernel defaults to the sinc kernel. There is other work that differs in approach from the above stated categories but is nonetheless noteworthy. The work of Cheung and Zeng only uses a single interpolation kernel. However, this kernel is derived using a least squares minimization process. The kernel derived has a four sample diamond like region of support [Cheung and Zeng, 1995]. Zeng and Venetsanopoulos use the median filter with local image features for their interpolation [Zeng and Venetsanopoulos, 1992]. The median filter is known to remove outliers in the local distribution of pixels and to preserve certain types of image structure [Huang and Tang, 1979]. They attempt to exploit this property of the median filter to 41 produce more accurate interpolation results in edge regions. Fathima and Yegnanarayana have used a maximum entropy approach to interpolation. They specifically address the problem of recovery of the missing samples of a signal from a few of its randomly distributed samples. They also discuss the appropriateness of their method for different distributions of the known samples [Fathima and Yegnanarayana, 1990]. Herodotou et al. have used a simple Gibbs random field model for interpolation. In their paper, a binary based Gibbs random field model is used for image interpolation. Images are interpolated from their downsampled versions along with a number of texture parameters that are estimated within smaller image blocks [Herodotou et al., 1995]. The use of wavelet bases for the interpolation of sparse data has been addressed by Pentland. This work introduces good approximate solutions for this problem in only a few iterative steps where each step requires O(n) operations and storage locations. The author illustrates the use of this technique for interpolating sparse range data [Pentland, 1994]. Work by DeNatale et al. uses a least squares bilinear interpolation scheme based on the least squares optimization of a splinelike scheme. This work incorporates adaptive image sampling and interpolation for data compression [DeNatale et al., 1994]. Tom and Katsaggelos address the problem of obtaining a high resolution image from multiple low resolution images that have been subsampled and displaced by different amounts of subpixel shifts. The paper poses the low to high resolution problem as an ML problem which is solved by the expectation maximization (EM) algorithm. By exploiting the structure of the matrices involved, the problem can be worked on in the discrete frequency domain [Tom and Katsaggelos, 1995]. Typically, multiple displaced versions of a single image are not available for interpolation. For a video sequence, however, adjacent temporal frames tend 42 to satisfy this assumption particularly well. A recent book addresses this problem [Tekalp, 1995]. 3.3 SAR Imaging via Nonparametric Spectral Estimation The remainder of this chapter is devoted to the reconstruction of SAR images from the collected samples of the radar backscattering. This set of collected samples is typically termed the phase history corresponding to a SAR image. An introduction regarding the information provided by the phase history data will be presented in this section. SAR images are commonly formed via spectral estimation techniques. The reason for this is elucidated herein. The "ideal" SAR image is typically assumed to discriminate between radar scattering elements in the illuminated scene. As such, the ideal image is considered a juxtaposition of delta functions in the frequency domain one for each scattering element. The SAR imaging process is generally performed with this in mind. Owing to this, spectral estimation serves as the basis for SAR imaging. Spectral estimation methods can be broadly classified as nonparametric or parametric. The nonparametric approaches generally perform a narrowband filtering over a band of frequencies. The parametric approaches postulate a model for the data and the parameters of this model must be estimated. The minimum variance (MV) superresolution approaches, to which our superresolution methodology belongs, postulate a local model for data to produce an optimal filter matched to that model. In a sense, this can be considered a semiparametric approach. Nonetheless, since it does produce a filter and model parameters are not estimated, it is considered in this work as a nonparametric approach. 43 3.3.1 Phase History Data The phase history data samples collected from a radar which electromagnetically illuminates a scene are used in the formation of SAR images. This section presents an introduction to the information provided by the phase history samples. It is provided so that the unfamiliar reader might better understand the premises underlying the imaging of SAR data. Interested readers are referred to the books by Hovanessian [1980], Mensa [1991] and Skolnik [1990] for detailed explanations of the SAR data collection process. The SAR phase history represents the reflected radar information in the downrange and crossrange (Doppler) directions. The downrange data is obtained by transmitting a steppedfrequency waveform in the downrange direction of known frequency step size and bandwidth. The range extent is a function of the frequency step size and the range resolution is a function of the total bandwidth of the steppedfrequency waveform. By varying the frequency step size and total bandwidth, arbitrary range depth and resolution can be achieved. The reflected radar information in the downrange direction is contained in the composite wave received by the radar. It contains the shifted incident waves which resulted from the objects present in the radar illuminated scene. To obtain data in the crossrange direction, the complete steppedfrequency waveform is transmitted for equidistant intervals along the radar's flight path. The crossrange resolution is a function of the transmit wavelength and the relative velocity between the radar and the scene. Along the cross range direction, objects also reflect the incident waves with different phase shifts. 44 The formation of a SAR image from this information involves determining the reflectivity function of objects in the radar illuminated scene in both the downrange and crossrange directions. The reflectivity at a given range is obtained by narrowband filtering at each of the frequencies transmitted by the radar. The filtered signal's magnitude about the center frequency in the narrowband is an estimate of the reflectivity value we seek. The reflectivity function is typically estimated using the fast Fourier transform (FFT). This is done because, first, it is known that each of the basis functions of the FFT is a narrowband filter [Stoica and Moses, 1997] and, second, the reflectivity function at all frequencies of interest can be computed efficiently via the FFT. The FFT is typically used in the downrange and crossrange directions. The reflectivity function for points in space, that is, for distances from the radar, are typically computed by taking the magnitude of the aforementioned FFT computation. It is known that this operation is one way to estimate the power spectral density (PSD) of a signal [Stoica and Moses, 1997]. In this manner, the formation of SAR images is a direct result of PSD estimation from phase history samples. The illuminated radar scene can also be assumed to consist of a finite number of prominent scattering elements. In this manner, the phase history is modeled as the superposition of 2D complex exponentials  one for each scattering element. The amplitudes of each complex exponential is proportional to the amount of scattering produced by each element. Spectral estimation techniques can also be used in detecting these scattering elements. Common approaches to spectral estimation (and hence SAR imaging) are now discussed. 45 3.3.2 Standard Approaches Spectral estimators are used to detect complex sinusoids embedded in noise from a finite number of observations [Stoica and Moses, 1997]. In 1D, we denote the N observed signal samples by x[n], n=0,...,N1; where x[n] = m[n] + e[n] and m[n] = ae'" is the complex sinusoid of frequency co and complex amplitude a which is additively embedded in noise e[n]. In 2D, the complex sinusoid's counterpart is the complex plane wave. These are denoted m.[nl,n2]= aej(ln"''+2" 2) where w = [C1,0)2]. Here too, the coefficient a is complex and accounts for the amplitude and phase shift of each wave. The standard methods for detecting these waves (and consequent scattering elements in a SAR image) are based on the FT. This is due to the basis functions of the FT being the waves we seek to detect. The discrete space FT (DSFT) of a finite set of data is defined as N IN,1 X(O), = ) x[nn2]eJ(anl+22) (310) n1 =0 i2 =0 where o, and o2 are in radians. Thus the DSFT of the noisy signal serves to project this signal onto the complex plane waves we seek. The presence of complex plane waves in the noisy signal would yield large amplitudes at those frequencies of the DSFT corresponding to the waves present in the noisy signal relative to all other frequencies. The ability to distinguish between several sinusoids spaced closely in frequency is limited by the number of data samples available. This was alluded to in the previous chapter. The resolution afforded is given by Aol = and Ao2 = , that is, the approximate halfpower (3dB) width of the main lobe of the 2D rectangular window. In other words, two waves of 2D frequency (o(),'1)) and (2 ), identified by the (2, ) 2 dniidb h 46 superscripts (1) and (2) respectively, can't be distinguished using a direct application of the DSFT if Ao, < Io(1 I(2) and Ao, < o1)2 02 . Side lobe artifacts which result from Fourier transforming a finite number of data can affect the ability to accurately detect the waves present in the noisy signal also [Stoica and Moses, 1997]. Side lobes resulting from the presence of several waves can also sum to create the appearance of a wave that is truly not present in the signal. These side lobes can be reduced using special window functions w[nl, n2] where n, = 0,...,Ni 1, (i = 1,2) [Harris, 1978]. The reduction of these sidelobes comes at the expense of wider main lobe width of the window function thus reducing the ability to distinguish between closely spaced waves. Because spectral estimation methods of this type, such as the Welch, Burg and Daniell method [Stoica and Moses, 1997], are not able to resolve these waves, they will not be explicity discussed here. Instead, the interested reader is referred to the book by Stoica and Moses or Kay for more details [Stoica and Moses, 1997; Kay, 1988]. 3.3.3 Minimum Variance Approaches It has been discussed that the ability to distinguish (or resolve) between two closely spaced frequencies is limited by the number of observed samples. The MV approaches have the ability to resolve beyond this inherent resolution of our data. Hence, they are superresolution approaches. The MV approaches are so called because they seek to minimize the power from additive noise (typically zeromean Gaussian noise) present in the observed signal while preserving the sinusoids sought. The power of this type of noise is known to be given by its variance [Stoica and Moses, 1997]. These methods are considered adaptive filtering methods which seek to minimize the output power of the filtered signal with respect to a given model. Specifically, let X = [x,,,, ] = x[n n2 ] and 47 W =[w,,,] = w[n,n2] for ni = 0,...,N, 1, (i = 1,2), denote the matrix of our 2D phase history samples and filter coefficients, respectively. Let us also define the vectors associated with these samples as x = vec(X) and w = vec(W). Each output image sample sq is defined by Sq =minl E[IXHWq 12] Wq = minE[ wXXwq ] (311) Wq q = min w Rwq Wq where q = [q, q2], q = 0,...,Q 1, (i = 1,2), defines the location on the frequency grid from which we form our SAR image and R is the autocorrelation matrix of vector x. From this formulation, the filter Wq is dependent on the point to image, hence the subscript q. It must obviously be constrained or the solution to eqn. (311) is trivial, that is, w q = 0. This would not be useful to us. Several constraints can be placed in order to arrive at a meaningful solution. The constraints and corresponding solutions of a few of the most used MV superresolution techniques are now discussed. Capon's Method. This is also referred to as the MV method. A single equality constraint is imposed here and it is known as the unit boresight constraint. Specifically wqmq = 1 where Wq is the vector of complex filter coefficients which weights the phase history samples. The point scatterer model is mq = vec(M) where the matrix M = [m ]= m, [n,, n2]= e j(q n+ ) for ni = 0,...,Ni 1 and Oq = [O, ,O)q2] where c q= ,, (i = 1,2). It is the specific wave model we seek at frequency grid coordinates q = [q, q2]. This constraint allows for the model to pass unattenuated if it is indeed present in the noisy observed signal. For each output sample, the optimization problem is 48 min wqRwq subject to wqmq = 1. This optimization problem is readily solved with the use of Lagrange multipliers. The optimal filter in Capon's method is given by [Stoica and Moses, 1997] R'mq w = R1m (312) mq Rmq and the spectral estimate S which is our superresolved SAR image for phase history X, is composed of the spectral estimates sq at each frequency grid location. These are obtained by plugging eqn. (312) into eqn. (311) to yield 1 q (313) q m'Rl'mq and the resultant SAR image is described by S=[sq]=s[q,, q2] for qi =0,...,Q1, (i= 1,2). MUSIC and Eigenvector. These two approaches follow the same approach taken by Capon's method. Where they differ is in their use of the autocorrelation matrix. In order to more clearly differentiate sinusoids from the noise they are embedded in, these methods seek to drive the spectral estimate to infinity when there is an exact match between the model vector and its presence in the observed signal. This is accomplished by considering the signal and noise subspaces to be orthogonal. Specifically, if we consider the eigendecomposition of the autocorrelation matrix R, then we can write R= UAUH ZmUmU= EZmmuH Z mUmU (314) m mESignal menoise where U=[UI,...,uM] is the matrix composed of eigenvectors um of matrix R. The eigenvector (EV) method directly uses the noise subspace in eqn. (314) for its spectral 49 estimate. Then by substituting R' in eqn. (313) with the "inverse" of the noise subspace, the EV spectral estimate is written 1 I sq = MEnoise The MUSIC method exploits the white noise model by replacing the measured noise eigenvalues with the average noise power a2. The spectral estimate for MUSIC is 1 sq mIH (2 mU H mq mrnoise 3.3.4 Estimating a Full Rank Autocorrelation Matrix Many MV related methods require knowledge of the inverse of the autocorrelation matrix R of our phase history samples. This matrix must be estimated from the available samples. In practice, we usually have one phase history observation. Given a matrix of phase history samples we can compute a full rank autocorrelation matrix by averaging subwindows of the given phase history samples. Specifically, let X=x[n,,n2] for n,,n2 = 0,1,...,N1 be an NxN matrix of our phase history samples. Also, let X, = x[i: i + P 1, j: j + P1] be the subwindows of size Px P in matrix X. Note that there will be (N P + 1)2 subwindows of size P x P in matrix X where i,j = 0,1,...,NP. The autocorrelation matrix R of the phase history samples in X is defined as R = E[xxH] where x = vec(X). This matrix is typically estimated by 50 1 NI However, in the case of one phase history observation (N = 1), R is not full rank. With the subwindows we've described, the autocorrelation matrix estimate would be 1 NPNP R= (N P+12 X.X (315) (N P+ 1) i=0 j=0 where xiy = vec(X,). The size of the subwindows given by P should be chosen so that they are as large as possible while also ensuring that R is full rank. It can be shown that the maximum possible P for which R can be full rank satisfies P < (N +1)/2, where P must be an integer. The resolution of the estimated matrix R could be increased if we had more subwindows to average. This is done by also averaging 'backward' versions of the subwindows x0 The backward subwindows corresponding to subwindow xy (in vector form) is given by Jx* where denotes conjugation and matrix J is a permutation matrix with ones on the antidiagonal. The resultant matrix is the forwardbackward averaged autocorrelation matrix. Here we have twice the number of subwindows to average compared with the 'forward only' autocorrelation matrix of eqn. (315) so the size of P can increase. The 'forwardbackward' estimate of the autocorrelation matrix is then R=NPNP X +Jx .xTJ (316) 1 ~\T N P+I)2 ': Y U Y 2(NP+) ,2i=O j=O Now the maximum possible P for which R can be full rank satisfies P < (2 2)(N +1) where P must be an integer. The use of subwindows necessarily decreases the resolution of our spectral estimate. But, in order to compute an MV spectral estimate, this must be done. 51 3.4 Review of Reconstruction for Synthetic Aperture Radar Images SAR images are reconstructed from phase history data. This reconstruction can, in general, be described as model based. These models characterize the back scatter of radar energy reflected from known objects. By far, the trihedral reflector is the most commonly considered model in SAR image formation. This is because the reflected energy of the trihedral reflector manifests itself as a complex plane wave (a 2D complex sinusoid) in the phase history of the radar return and as such, techniques based on the ubiquitous FFT can be used in the imaging process of signals in the SAR domain. SAR imaging based on other models is, to the author's knowledge, very scarce. Imaging based on trihedral as well as dihedral corner reflectors has been studied [Liu and Li, 1996]. These authors also state that dihedral feature extraction for imaging has, to their knowledge, never been considered. This being the case, the remainder of this section introduces methods for the imaging of SAR data based on the trihedral reflector model. Some of these methods were discussed in the previous section but are included here for a more coherent review. Ideally, when Fourier transforming the phase history, the trihedral reflector would appear as a point in the image corresponding to the true reflector's location in space. However, the finite number of collected samples (yielding the truncation of the complex plane wave), tends to smear or distort the actual location of the object [Stoica and Moses, 1997]. Another undesirable effect in SAR image formation via Fourier transform is the presence of side lobes corresponding to each reflector in the image. This is a wellknown consequence of Fourier transforming a truncated sinusoidal wave. It is known that these side lobes can be reduced or suppressed through the use of appropriately designed windows at the expense of increased main lobe width [Harris, 1978]. 52 The problem in the superresolution of SAR imagery is one of correctly detecting the presence of the models which the objects in the imaged scene are assumed to be composed of. Typically, the superresolved image has increased in sample density, decreased in main lobe width (relative to the resolution afforded by the number of phase history samples given) and suppressed side lobe artifacts. SAR superresolution techniques can be grouped into parametric and nonparametric approaches. Many spectral estimation books exist which describe well the theory and techniques used in SAR imaging [Kay, 1988; Marple, 1987; Stoica and Moses, 1997]. The parametric approaches essentially perform signal detection and model parameter estimation. The nonparametric techniques are generally based on adaptive filtering schemes. It was mentioned that our superresolution methodology is nonparametric in nature. As such, a review of the most common nonparametric SAR superresolution approaches is presented herein. Before this, a quick review on the use of windows and related schemes will be presented. Also, a brief discussion on parametric SAR superresolution is presented. The windowing of a signal for side lobe suppression is well established and studied [Harris, 1978; Skolnik, 1990]. The subsequent Fourier transforming of the windowed signal can produce a marked suppression of the side lobes which result from a limited number of collected data. As stated before, this is accomplished at the expense of an increase in main lobe width. This results in a decrease in spectral resolvability between closely spaced sinusoids. Stankwitz et al. have developed "smart" windowing schemes which eliminate the side lobes that would result from the FFT of the collected phase history of a trihedral comer reflector [Stankwitz et al., 1995]. These techniques are referred to as spatially variant apodization (SVA); apodization is a term borrowed from 53 optics which refers to the suppression of diffraction side lobes. SVA is based on deciding which window from the family of cosineonpedestal window functions would be most appropriate to use on the point to image. An adjustable parameter a ranging from 0 to 0.5 is used to describe which of the cosineonpedestal functions to select at the point being imaged. At each of the extremes of this family of windows is the rectangular window for a = 0 (all pedestal, no cosine) and the Hanning window for a = 0.5 (all cosine, no pedestal). Hamming weighting is a special case of cosineonpedestal (a = 0.43) which nulls the first side lobe. The advantage of this technique is that it is very computationally efficient. It can be implemented easily with a few FFTs. The drawbacks of this method are that it has only been developed to model trihedral reflectors and the best resolution it can afford is that of the main lobe width of the rectangular window hence its resolution is dependent on the number of collected samples. As such, it is not a superresolution technique per se. A variant of this technique which involves an iterative procedure of "SVA windowing" and phase history extrapolation has been reported which can potentially increase resolution beyond that afforded by the number of observed samples due to an extrapolation phase which increases the number of overall image samples. [Stankwitz and Kosek, 1996]. This approach has been termed super SVA. Superresolution can be performed parametrically by estimating the parameters of the models assumed to comprise the image [Bi et al., 1997]. In this case, the number of reflectors for each assumed model must be known a priori or must be estimated using some model order selection strategy such as Akaike information criteria [Soderstrom and Stoica, 1989]. This method tends to be slow in practice because the number of comer 54 reflectors is generally not known and solutions for several model orders must be obtained and compared before an appropriate model order, that is, number of scatterers, can be decided upon. Parameter estimates for each assumed model are typically obtained using least squares approaches. For the case of a complex plane wave, its complex amplitude and 2D frequency must be estimated. This must be done for each plane wave assumed to comprise the signal. The nonparametric adaptive filtering schemes are based on determining the optimal linear projection of the observed phase history samples for the point being imaged. Essentially, the image is formed on a grid of equispaced frequency coordinates, thus each imaged point corresponds to the 2D frequency of its associated grid location. The frequency samples lie in the rectangular interval [0,2t)x [0,27t). The optimal filter is the one that minimizes the output power at the point (frequency) being imaged subject to certain constraints. These techniques, generally referred to as minimum variance techniques, will be discussed here. What differentiates each of them are the constraints imposed and/or assumptions made concerning the autocorrelation matrix of the observed samples. The first of these SAR imaging methods can be attributed to Capon [Capon, 1969]. This approach is typically referred to as Capon's method or the minimum variance (MV) method. This method imposes what is known as the unit boresight constraint. Here, the filter should pass a complex plane wave of a specified frequency unimpeded while maximally suppressing signal power at all other frequencies. The disadvantage of this method is that the inverse of the autocorrelation matrix must be computed though there is typically only one phase history observation to work with for the scene being 55 imaged. What is typically done is that subwindows within the observed phase history are averaged in order to obtain a full rank correlation matrix which can be inverted. This approach sacrifices resolution (when considering subwindows) in order to obtain the minimum variance solution. A reduced rank minimum variance (RRMVM) method has been proposed which does not sacrifice as much resolution as its full rank counterpart [Benitz, 1993]. Here, the subwindows are larger than with the MV method but the autocorrelation matrix is not full rank. The unit boresight constraint is imposed as well as an inequality constraint on the norm of the filter. It turns out that the optimal filter is based on an optimal diagonal loading of the singular autocorrelation matrix leading to an autocorrelation matrix which can now be inverted to yield the optimal filter. Methods which assume orthogonality between the signal and noise subspaces are also popular [Odendaal et al., 1994]. These are the MUSIC and EV methods. In both of these methods, a full rank correlation matrix is needed and the unit boresight constraint is enforced. The spectral estimate though is determined by the noise subspace. The MUSIC algorithm explicitly whitens, or equalizes, the noise subspace eigenvalues while the EV method does not. This whitening tends to destroy the spatial inhomogeneities associated with terrain clutter, hence this method is not generally suitable for SAR imaging. In contrast, the EV method does tend to preserve clutter inhomogeneities. Autoregressive linear prediction (ARLP) can also be used in determining the optimal filter [Jackson and Chien, 1979]. Here, the optimal filter which linearly combines phase history samples to predict a "future" sample is the one that minimizes average prediction error. Based on the assumption that the predicted error signal is an innovations 56 process (white noise), the spectral estimate is the square root of the minimized prediction error energy divided by the magnitude squared of the transfer function. The filter need not be causal but the white noise assumption of the predicted error signal is generally invalid  leading to spiky and spurious behavior in the final estimate. Instead, it has been found useful to average images based on several prediction steps. The RMS average of ARLP images based on all possible prediction steps reduces to one of Pisarenko's spectral estimates [Pisarenko, 1972]. Other recent and noteworthy adaptive filtering approaches is APES and HDVI. The APES is a recent technique which generalizes the MV method [Li and Stoica, 1996]. The optimal filter in this method has the exact functional form as Capon's filter. The difference is that the autocorrelation matrix in the Capon filter is replaced with a matrix composed of the sum of this same autocorrelation matrix along with an additional matrix which results from the optimization carried out. The approach is shown to yield more accurate spectral estimates overall than Capon's method. High definition vector imaging (HDVI) is another recent technique developed at MIT Lincoln Labs for superresolution imaging [Benitz, 1997]. A typical effect of superresolving with the adaptive filtering techniques based on the trihedral model is that any radar illuminated object which does not satisfy the model is either greatly suppressed or distorted hence most background information can be modified in an undesired manner depending on what this information will be used for. The HDVI technique imposes constraints that less distorts the background relative to most common adaptive filtering schemes. Constraints require the optimal filter length (norm) to equal the length of the assumed model's projection onto the space spanned by the autocorrelation matrix. Also, the optimal filter is constrained to be a 57 given distance from the assumed model's projection onto the space spanned by the rows or columns of the autocorrelation matrix. CHAPTER 4 SUPERRESOLUTION OF OPTICAL IMAGES The superresolution of optical images requires the establishing of an appropriate set of bases by which the collected image samples can be weighted for the reconstruction of new image points (samples). The bases established are learned from available data and the basis function used in the creation of an image point is dependent on the local character of our image. The need for such local consideration will be discussed throughout this chapter. We will examine the interdependence of local information in optical images and make observations which will serve to validate the approach taken to the superresolution of optical images. The architecture which embodies our approach will be shown to be equivalent to a convolution with a family of kernels. The image reconstruction is formulated as .i[nf,n2 = x,[nI,n 2 *kc,,[nl, n2] (41) Let us note that the subscripts c and 1 are used in distinguishing which basis function to use in the formation of an image point. These subscripts index a kernel based on the local image characteristics about the point to reconstruct. The family of kernels is given by {kc, [n,,n2] I c = 1,2,,C;l = 1,2,. .,L} C represents the number of established local image characteristics (or features) from which to compare local neighborhood information and L is the number of kernels created per feature. In summary, eqn. (41) is a convolution with a shiftvarying kernel whose form is dictated by local image information. 58 59 It is a generalization of eqn. (37) and defaults to the standard convolution of eqn. (37) when C,L = 1. Hence, the common approaches to reconstruction which utilize eqn. (37) are a special case of the reconstruction implemented herein which is described by eqn. (4 1). There are two main issues to address regarding the superresolution of optical images: how to design members of the kernel family how to choose them once they have been established These issues will be addressed in this chapter. In approaching the superresolution problem, we first model the image acquisition process. The model which results is used to simulate low resolution representations of available images. In this manner, data at different resolutions can be used in developing the basis functions for reconstruction that we seek. We will examine and make observations concerning the existence of interdependent information in optical images. This is followed by a detailed explanation of a superresolution procedure which exploits the observations made. 4.1 Optical Image Acquisition Model Let the function x(t, ti, t2) represent a continuous, timevarying image impinging on a sensor plane. The spatial plane is referenced by the t,, t2 coordinate axes and time is referenced by the variable t. The imaging sensor plane is assumed to be a grid of contiguous N, x N2 rectangular sensor elements. These elements serve to sample the spatial plane within the camera's field of view. Each of these elements is said to have a physical size of p, x P2 units of area. The output of each element is proportional to the 60 amount of light which impinges on each sensor during a given time interval. The output of each sensor, given by x,[n,,n2] where n, = 0,l,...,NA 1 and n2 = 0,1,...,N2 1, can then be expressed as x, [n, n2]= JP(n+ JP2(n2+1)x(tt t)dt2dtdt where the integration over time is one timeunit in duration. The subscript 1 is used to denote a "low" resolution image. To obtain a higher resolution image, a finer sensor grid encompassing the same field of view used in obtaining x, [n, n2] would have to be employed. Let the resolution in each spatial dimension be increased by a factor of G, and G2 in their respective spatial dimensions. The physical size of the sensor elements now becomes lL x P units of area. The high resolution image is then given by xjh[m,m2], where m, = 0,1,...,M1 1 and m2 = 0,1,..., M2 1 and M, = GiN (i = 1,2). The output for each of the MI x M2 sensor elements for the high resolution image can be described by SGG2 pi(m+I1P)/G (Pm2+1)/G2 Xh[" m= pm Jm/G2 x(t,t =,t2)dt2dtAdt X m 0 plm/G m2/G2 Notice that the integration limits over time have been extended from one timeunit to G,G2 timeunits in order to maintain the average intensity value for each pixel in the image. The superresolution process this work addresses is to estimate the high resolution image xh [m~, m ] from the low resolution image x, [nj, n2]. One can notice that the process of acquiring x,[n, n2] from xh[ n, m ] is given by 1 G,(n +l)I G,(n,+l) xi[n,n2= xh 1 I 2] (42) SGIG2 m,=G,n, n2=G2n2 61 The decimation model in the above equation produces a low resolution image by averaging the pixels of G, x G2 nonoverlapping neighborhoods in the high resolution image. 4.2 Using Local Information Support for the local characterization of images comes from several sources. Image interpolation kernels typically have very local support regions. This was discussed in Chapter 3. In the Shannon sampling theory, the weighting of samples was seen to be inversely related to distance from the point to interpolate. As such, a high resolution image sample is, for all practical purposes, generated from local low resolution samples. Also, the use of finite support region kernels is attractive due to their numerical stability and computational efficiency. If we assume an image can be described locally by different statistical models, then we must somehow characterize information concerning each of the models which will be useful. In this section we experimentally study the interdependencies of local information in images and discuss their application to the superresolution problem. Note that we are referring to relations that exist among image neighborhoods when we speak of the local interdependencies of images. 4.2.1 Relation Between Correlation and Euclidean Distance In the sections to follow we will exploit the locally correlated neighborhoods in images for our superresolution paradigm. This section serves to detail the correlation under study and how it relates to Euclidean distance. Here, we introduce a correspondence between statistical correlations of random variables and inner products of vectors and also between statistical meansquare differences and squared distances 62 between vectors. A good development of this topic can be found in the text by Helstrom [1991]. This correspondence will serve to identify the analogous relations between samples of our observed data and the true statistical character from which they were drawn. The relations exploit the principle of orthogonality (as defined for both Cartesian space as well as for random variables) as we shall see. Let u,v denote two random variables and u,v two vectors in Cartesian space. The inner product of u,v is u'v and the length of a vector u is given by its 2norm lull2 = (uTu)7. The Cartesian space distance D between vectors u,v based at the origin is defined by D2 = (uv)T(uv) = uTu 2uTV+vTv and in the "space" of our random variables the distance D is defined correspondingly as D2 = E[(u)2]= E[u2]2E[uv] + E[v2] where E[.] denotes the statistical expectation operator. We will make use of the Schwartz inequality [Helstrom, 1991] shortly. For vectors, this inequality is given by (u T)2 T< uTuvT and for expected values of random variables it is (E[uv]) 2< E[u2]E[v2]. In an ordinary vector space, the triangle inequality Ilu + v2 I 2 + V1112 is a consequence of the Schwartz inequality. The triangle inequality can be expressed in the following manner by noting that cos 0 1: I u+v1i2 = UTu + vVv + 211u112 +V2cos0 S u vu + vTV + 211u112 lvl12 1 11112 + 211u112 Iv112 + 11V112 63 In the space of random variables, the corresponding triangle inequality yields the analogous relation E[(u + v)2 ] E[u2]+2 (E[u2 ])(E[v2])+ E[v2] We notice above the analogous relations between E[u2] and Iu2 = u Tu. These relations justify considering the mean square value E[u2] as the square of the "length" of a random variable and the correlation E[uv] as a scalar product. Also note that the definitions of orthogonality apply for these relations too. Two vectors are said to be orthogonal if uTv = 0. Similarly, one says that random variables are orthogonal if E[uv] = 0. We now discuss the distance measure previously defined for vectors and random variables. We will show an important relationship between inner products and distance in Cartesian space as well as correlations and distance in the analogous "probability space." Let us now consider the relationship between minimization of Euclidean distance with the maximization of inner products in a vector space. Consider a set of vectors x,fi,...,fu where f; is closest in Euclidean distance to x, that is, fix <1 f, x 22, iJ. (43) We can rewrite this as fiTf; 2ff"x < fjff 2f x Notice that if all the fk vectors are of equal length, we can equivalently express eqn. (43) as fiTx T ffx, i # j. (44) If the fk vectors are not of equal length, then eqn. (43) is equivalent to fifxfiTfi >ffxfff,. In either case, minimizing the Euclidean distance between the x and fk vectors is equivalent to maximizing the inner product of those vectors if the length of the fk vectors 64 are all equal. If the fk vectors are not of equal length, then the square of the vectors' length must be considered. A similar result arises when minimizing the mean squared distance of random variables. Consider the random variables x, f,..., fN where f, results in the minimum mean squared difference between the random variables f,,..., and x, that is, E[(fi x)2 ] E[(fj x)2], i j. (45) We can rewrite this as E[f7]2 E[fx] 5 E[f2] 2E[f x]. Notice that if all the random variables fk have equivalent second order moments or squared "lengths", then we can equivalently express eqn. (45) as E[fx] E[fx] If the random variables fk do not have equivalent second order moments, then eqn. (45) is equivalent to E[fx]E[f2] > E[fjx] E[fj]. (46) We see that minimizing the mean squared difference between x and the random variables f,,...,f is equivalent to maximizing their correlation if the second order moments of f,..., fN are equivalent. If the second order moments of the random variables f1,..., fj are not all equivalent, then their squared "length" must also be considered. Let us now note that the minimization of Euclidean distance between vectors is the template matching problem [Duda and Hart, 1973]. Template matching is a scheme used for classification purposes. Let vector x be a portion of an image with support size equal to that of the template. The fk vectors are the set of N templates from which we attempt to classify image regions. The best match is found to be with the template maximally 65 correlated with the image region of interest. This correlation measure is nonstatistical and is described in Chapter 7 of the text by Duda and Hart [1973]. A statistical interpretation of the template matching problem can also be developed. Assume a random vector x = [x,,...,xM ]T can be described by a mixture model as follows N pX (x) = P(k) p(x k) (47) N where ZP(k)=l and 0OP(k)< 1 for k=l,...,N. Also, let the mean of each k=1 conditional density function p, (xlk) be fk= E[Xk] where Xk = [Xk ,...,XkM ] is the random vector with conditional density px (xlk). The template matching problem can be posed as finding the mean fk closest in Euclidean distance with an observation x from the mixture density function. Specifically, let us define the template matching problem as J= argmin(tr{ E(xk f)(Xk fj)T]}) (48) where tr{} is the trace of a matrix and the best matched template is given by fj. Expanding eqn. (48) and evaluating terms, we can equivalently determine the best matched template from J = argmin(fff 2fffk) (49) from which it is easy to show that J= k is the solution. Specifically, if we letj = k in eqn. (49), then fT"fk 2f fk yields 0 <(f fk )T (f fk) where j = k achieves the equality. So the best matched template corresponds to the mean of the conditional density function which generated the observation x from p, (x). 66 If the templates are now interpreted statistically, then the template matching problem of eqn. (48) can be expressed as J = arg min(tr{E[(x f)(x f)]}) given a random vector x. This can be shown to be equivalent to J=argmax(E[ffx]E[fff,]) (410) which is seen to be analogous to that of eqn. (46). The solution to the template matching problem here then involves the template most correlated with the input. The two issues to stress here are the analogous relationships between random variables and vectors involving distance metrics in each of their respective spaces. The minimization of a squared distance measure is directly related to the maximization of a correlation measure, that is, the distance measure is inversely related to the correlation measure. In this way, observations from highly correlated random variables or random vectors are close in Euclidean space. In particular, if the random variables and/or random vectors are statistically independent and their covariances are assumed equal, then maximal correlation implies minimal Euclidean distance in the mean sense of those random variables. This can be seen by comparing eqn. (410) with eqn. (44) under these assumptions. 4.2.2 Existence of Locally Correlated Information in Images We can provide experimental evidence for the existence of different types of local interdependencies in images which can be exploited for the superresolution of images. The optical images for which results will be reported in this chapter are given in fig. (41). Interdependent information in realworld images are known to appear locally and have 67 been exploited in compression schemes [Dony and Haykin, 1995a; Kambhatla and Leen, 1997]. These methods attempt to exploit the correlation amongst nonoverlapping image blocks. Specifically, the blocks of the image are grouped into disjoint sets and compression is performed on each set individually. The next two sections will serve to illustrate interblock and scale interdependencies that exist within the images of fig (41). These interdependencies form the basis from which our superresolution methodology arises. It is the general existence of these interdependencies in realworld optical imagery which substantiates the use of our superresolution methodology. The methodology is described in detail later in this chapter. 4.2.2.1 Interblock correlation The interblock dependencies of images will be illustrated here. The benefits of the existence of such dependencies is that the samples from highly correlated blocks tend to be very similar as discussed in section 4.2.1. The similarity between these blocks, which can be located throughout an image, leads us to believe that the local statistics governing their "generation" are also similar. The similar blocks are assumed here to have been generated by the same statistical process. The marriage of interblock correlation and their interdependencies across scales makes the superresolution problem approachable from such local information. In order to substantiate the notion that an image can be described by different local statistical models, a simple experiment was performed. 68 (a) (b) Figure 41. Images used in this chapter. (a) 256x256 images (top left) Lena, (top right) Peppers, (bottom left) left Pentagon image of a stereo pair, (bottom right) right Pentagon image of a stereo pair; (b) 128x128 images obtained by the image acquisition model (left to right) Lena, Peppers, left Pentagon, right Pentagon. 69 All of the overlapping neighborhoods of size H, x H2 for the Lena 128x 128 image of fig. (41b) were extracted. Each neighborhood then had its sample mean subtracted from it yielding what we refer to as a "structural" neighborhood. It is the local structure of images that we seek to statistically describe. By considering structural neighborhoods, an image x[n ,n2] is then composed of two sets of ordered neighborhoods: one set contains the neighborhoods' mean m[n ,n2] and the other set the neighborhoods' structure s[n,, n2 ] such that x[n1,n2 ] =m[n ,n2 ] + s[nl, n2 ]. The extracted structural neighborhoods which comprise s[n1,n2] were then grouped into K disjoint clusters following a vector quantization (VQ) of these neighborhoods. The clustering was done a la eqn. (43) where x now denotes a structural neighborhood and f,,...,fK are the feature (codebook) vectors or Voronoi centers resulting from the VQ. The feature vectors can each be interpreted as corresponding to the mean of various statistical models which describe the structural neighborhoods. This was discussed in eqn. (47). After the structural neighborhoods were clustered, the neighborhoods within each cluster were randomly shuffled thus yielding a new set of ordered neighborhoods' structure 9[n1,n ]. If the clustered neighborhoods are truly similar in structure then their random shuffling should still produce a pleasing image. Finally, the composition of a new image results from i[n, ,n2] = m[nI, n2] + s[n, n2 ]. This type of operation could consider overlapping and nonoverlapping neighborhoods. When overlapping neighborhoods are considered, the overlapping samples from each neighborhood are averaged to yield the new image samples. Note that the "reconstruction" performance obtained using this approach is analogous to a test set performance for validating the existence of interblock correlations. 70 Fig. (42) illustrates the results of performing such a test with overlapping 3x3 (Hi=H2=3) neighborhoods on the Lena (Ni=128)x(N2=128) image. (a) (b) (c) Figure 42. Illustration of the effect of characterizing the Lena image based on the interblock correlation of 3x3 neighborhoods. The image used was 128x128. Increasing the number of clusters leads to increased correlation within the neighborhoods of a cluster  thus yielding a more accurate local representation of the image. (a) K=1 cluster, (b) K=15 clusters, (c) K=30 clusters. With K=l in fig. (42a), no clustering is being performed since all neighborhoods from the image are assumed to belong to the same cluster. The result of creating an image in this manner yields a poor (and noisy looking) image when compared to the original. In fig. (42b), K=15 clusters were formed from the structural neighborhoods of the image. The resulting image produces a better image representation. Fig. (42c) illustrates the result with K=30 clusters. This resulted in the most faithfully recomposed image upon comparison of their peak signaltonoise ratios (PSNRs). The PSNR is defined as PSNR = 10log1o(eLs) where N IN,1 e (x[n, n2 n,,n])2 (411) n, =0 2=o and x and i take values in [0,1] and are NIxN2 in size. Fig. (43) illustrates the recomposition performance for the Lena 128x 128 image using regions of support (ROS) 71 of size 3x3 and 5x5 for K=l, ..., 30 clusters. We can see that generally the PSNR of the new images increases with increasing number of clusters. The smaller the neighborhood, the more accurate we can consider our image model when based on interblock correlations. Though we only report these results on one image, it is generally appropriate to characterize optical images in this manner. 4.2.2.2 Scale interdependencies We have just seen how an image can be segmented locally into pieces that are most correlated with each other. Naturally, we can inquire about the existence of similar information across image scales. If a strong similarity exists between homologous and highly correlated regions of the low and high resolution images, then it seems feasible that a simple transformation of the low resolution image neighborhoods within each cluster can yield a corresponding high resolution neighborhood very similar to the low resolution neighborhood's homologous counterpart. This transformation is the link in associating low and high resolution neighborhoods. In this section, we wish to analyze experimentally whether such highly correlated information across image scales exists. That is, we wish to examine the correspondence of highly correlated neighborhood information between a low resolution image and its high resolution counterpart. We will show examples using the images shown in fig. (41). The experiment will report on the percentage of homologous neighborhoods from the low and high resolution counterparts of these images that were similarly clustered. If a high percentage of these neighborhoods are found, then a strong scale interdependence among neighborhoods is said to exist. We now explain the details for establishing this percentage. 72 Recomposing Lena with Correlated Image Neighborhoods 30... 29 28 27  26 25 24  24 ROS: 3x3   ROS: 5x5 23  22 '' 0 5 10 15 20 25 30 # of Clusters Figure 43. Quantitative illustration of the effect of interblock correlation on the recomposition of the Lena 128x128 image. The structural neighborhoods in each of the clusters from this image were randomly shuffled and the image was "pieced" back together. A very high interblock correlation results in an image very similar to the original. 73 Given a low and high resolution version of an image, x,[n, n2] and xh[n, n2] respectively, the homologous structural neighborhoods in each of these images were clustered to form K independent groups. The H1 x H2 neighborhoods in the N, x N2 image xi [nI, n2] forms the set of neighborhoods X = {x, [m, :m, +H1 1,m2 :2+H2 l]}m =0,...,N,Hm=0...,NH The homologous neighborhoods in the high resolution image are defined as the GAH1 xG2 H2 neighborhoods in the (M1 = GIN) x(M2= G2N2) image xh[nI,n2 ] which forms the set D = {xh[Glml: Gm + G1,H 1, G2m2 : G2m2 +G2H2 1]}m0,...,,H,,=0,...,NH, where recall that x, [n1, n2] was obtained from xh [n1, n2 ] through decimation by a factor of G, x G2 according to our image acquisition model. Now the neighborhoods in X are clustered to form K disjoint groups X ,...,XK and the neighborhoods in D are separately clustered to form K disjoint groups DI,..., DK. If the homologous neighborhoods in X and D form similar clusters in their respective images, then the information content of the low and high resolution images must be similar in some sense. As alluded to earlier, the image is said to exhibit scale interdependence. To determine how well clustered information from the same image relates across scales, we form a confusion matrix as shown in fig. (44). The entry in location (j,k) of the matrix is the number of homologous neighborhoods common to cluster X. and Dk, j,k=l,...,K. The interdependence across scales is determined as the maximum number of homologous neighborhoods common to the clusters formed. Since the ordering of the true clusters or "classes" 74 between Xi and Dk is not known, we can't just examine the contents of the diagonal of the confusion matrix. Di D2 DK 0 X2 *0* . XK Figure 44. Confusion matrix for clustered homologous neighborhoods within their low and high resolution images. The X, are the disjoint sets of clustered neighborhoods in the low resolution image and the Dk are the disjoint sets of clustered neighborhoods in the high resolution image; j,k = 1,...,K. Instead, we must search for the most likely ordering. This in turn yields the number which reveals a measure of how similarly homologous information in the low and high resolution images was clustered. This number is easily found with the following simple algorithm: Step 1: Initialize N=0; Step 2: Find the largest number L in the confusion matrix and save its row and column coordinates (r,c) Step 3: Perform N<N+L; Step 4: Remove row r and col c from the confusion matrix to form a new confusion matrix with one less row and column. 75 Step5: If confusion matrix has no more rows and cols: STOP else Go to step 2 The variable N represents the number of homologous neighborhoods common to similar clusters from the low and high resolution images. The percentage of such clustered neighborhoods is P = N ;,,) since there are a total of (N, H + 1)(N2 H2 +1) homologous neighborhoods to cluster in each image. In the plots below, this percentage is plotted as a function of the number of clusters. This is done for three images: Lena, Peppers and the Left Pentagon. The high resolution images clustered are 256x256 and their low resolution counterparts which were obtained through (G, = 2) x (G2 = 2) decimation were 128x128. The plots also report on two different neighborhood sizes tested: H, x H2 = 3 x 3 and H1 x H2 = 5 x 5. Figs. (4 5)(47) illustrate that there is a very strong interdependency of homologous neighborhoods across image scales for these three images even as the number of clusters increases towards K=30. The case of K=1 always yields an interdependency of 1. This is because for K=I, no clustering is actually being performed, that is, all neighborhoods are assumed to belong to the same cluster. As such, the "disjoint" sets (or single set in this case) have all the homologous neighborhoods in common. The interdependency in general decreases as the number of clusters increases. This is intuitively expected as an increase in the number of clusters results in the clustering of information increasingly specific to a particular image and scale. Because the frequency content between the low and high resolution counterpart images differ, the greater specialization of information within an image is expected to result in less interdependency among them. 76 Scale Interdependencies for the Lena Image __ _, ,  . 0.8 0.7 o 0.6 o 0.5 S0.4 0.3 ROS: 3x3 . ROS: 5x5 0.1 0II 5 10 15 20 25 30 # of Clusters Figure 45. Scale interdependency plot for the Lena image. The interdependency of homologous neighborhoods across image scales is reported. The high resolution image was 256x256 and the corresponding low resolution image was 128x 128. Two ROSs are reported for computing the interdependency: 3x3 and 5x5. A very high interdependency among homologous neighborhoods is seen for this image even when considering K=30 clusters. 77 Scale Interdependencies for the Peppers Image 0.7  L  0.7 S0.6 o 6 0.5 0.4 0.3  ROS: 3x3 ..... ROS: 5x5 0.1 5 10 15 20 25 30 # of Clusters Figure 46. Scale interdependency plot for the Peppers image. The interdependency of homologous neighborhoods across image scales is reported. The high resolution image was 256x256 and the corresponding low resolution image was 128x 128. Two ROSs are reported for computing the interdependency: 3x3 and 5x5. A very high interdependency among homologous neighborhoods is seen for this image even when considering K=30 clusters. 78 Scale Interdependencies for the Left Pentagon Image 0.9  0.8 0.7 o 0 .6 ....  1 0.5 2 0.4 0.3 0 ROS: 3x3  ROS: 5x5 0.1 5 10 15 20 25 30 # of Clusters Figure 47. Scale interdependency plot for the Left Pentagon image. The interdependency of homologous neighborhoods across image scales is reported. The high resolution image was 256x256 and the corresponding low resolution image was 128x128. Two ROSs are reported for computing the interdependency: 3x3 and 5x5. The interdependencies among homologous neighborhoods are not as strong in this image as compared with the Lena and Peppers images but still remain above 50% even when considering K=30 clusters. 79 Fig. (45) illustrates the scale interdependencies for the Lena image. These remain very high even when considering K=30 clusters. In this case, the interdependencies achieved were all above 75%. Fig. (46) reports on the Peppers image. This image yields interdependency percentages very much like those of the Lena image. Fig. (47) illustrates the interdependencies of the Left Pentagon image. Here, the interdependencies are seen to drop more rapidly as the number of clusters formed increased than with the previous two reported images. Further, there is roughly half as much interdependency with the Left Pentagon image as with the Lena and Peppers images going down to about 55% for the case of K=30 as compared to 75% with the Lena and Peppers images. 4.3 Architecture Specifics The modular design of the superresolution architecture to be presented is supported by the observations of the local interdependencies in images that have been discussed. Portions of images exhibiting different statistical characteristics are treated differently in order to fully utilize the local image information and provide for a reconstruction not afforded by the sampling theorem. The procedure for learning the a priori information needed for the superresolution of optical images is shown in fig. (48). This procedure, which will be discussed in detail in the next section, establishes the basis functions for reconstruction from the available image data. The information extracted is used in the superresolution of images that are similar in class to those from which our basis functions were derived. High Resolution Neighborhoods L 1 F. Extract Neighborhoods L High Resolution L 2 Image iSelf Organize the GI x G2 Extract Neighborhoods Neighborhoods to V Form C Clusters o     ...   ..................  L C Low Resolution Low Resolution C Neighborhood / Image Neighborhoods Clusters Association of Corresponding Subblocks Figure 48. Training process for the superresolution of optical images. 00 81 Fig. (49) illustrates the architecture used for superresolving images once the training is complete, that is, once the necessary information has been established. It is equivalent to a convolution with a family of kernels as presented in eqn. (41). Here, we wish to initially present this paradigm in a fashion which facilitates the explanation of our family of kernels later on. As we proceed, we will explain any information not cast in a manner directly conforming to eqn. (41) and relate it to that equation's form. There are two fundamental steps specific to the superresolution of optical images: clustering of local low resolution image data construction of local high resolution data. The first step can be viewed as the process of determining the neighborhoods most correlated with the mean (or "template" as discussed in section 4.2.1) of the underlying distribution from which those neighborhoods were drawn. It encompasses the operations of extracting all of the image neighborhoods of a predetermined size from the low resolution image and then clustering them. This clustering addresses the main issue of choosing a kernel. The predetermined neighborhood size describes the region of support of our family of kernels, the location of each neighborhood relates to the location of the convolution output and the cluster associated with each neighborhood will be used to select the kernel with which to compute superresolved neighborhoods. The second step exploits the interdependent information across scales which has been seen to exist in images. It involves the depicted linear associative memories (LAMs) and proper arranging of superresolved neighborhoods. Here we have addressed the main issue of designing the members of the kernel family. LAM 1  AM w Resolution  Arrange Superresolved Low e R Extract Neighborhoods Form C Clusters Neighborhoods into Image 0  LSuperresolved LAMC Image Figure 49. The superresolution architecture for optical images. This paradigm performs the equivalent operation of a convolution with a family of kernels. 00oo 83 Each LAM is encoded with the mapping that locally creates high resolution neighborhoods from low resolution neighborhoods via L transformation kernels. There are C LAMs and each is tuned to specific local image characteristics. Note that each LAM superresolves a neighborhood rather than just a single pixel. 4.4 Training Phase The basis functions from which to project our image samples are specified by the LAMs in fig. (49). They are determined via a training process from available data. The training procedure for the superresolution paradigm was pictured in fig. (48). It will now be discussed in detail. This procedure can be broken into two main sections: a data preprocessing section and a high resolution construction section. The preprocessing takes all of the local neighborhoods of a fixed size in the low resolution image and clusters them. Each low resolution neighborhood has a corresponding homologous high resolution neighborhood it is associated with. Each cluster consists of a training and a corresponding desired data set which is used to train a LAM. The high resolution construction is responsible for associating the training and desired sets in each cluster independent of the others. This information is now detailed. 4.4.1 The Preprocessing Two preprocessing steps are performed prior to the construction of high resolution images from low resolution information. The preprocessing consists of simulating low resolution images followed by neighborhood extraction for pairing homologous neighborhoods. 84 4.4.1.1 Decimation Ideally, the low and high resolution data sets used to train the aforementioned LAMs would each encompass the same scene and have been physically obtained by hardware with different, but known, resolution settings. Such data collection is not very common. Instead, the low resolution counterparts of the given signals are simulated. Decimation is only needed to simulate the low resolution counterparts to our high resolution images. If we had, a priori, the necessary two sets of images, this block would be removed from the training process depicted in fig. (48). Here, the simulated low resolution images will be obtained following the image acquisition model discussed earlier. In other words, decimation will be referred to as the process of averaging nonoverlapping image neighborhoods in order to obtain a lower resolution image, A la eqn. (42). 4.4.1.2 Neighborhood extraction Each Hi x H2 neighborhood of our low resolution image x, [nI ,n2] will be used in training our superresolution system. A low resolution image of size N, x N2 will then have (N, H + 1)(N2 H2 +1) neighborhoods. Note that N, = Mi / Gi and Mi, (i = 1,2), describes the size of our original image. Each low resolution neighborhood will be associated with its (2G, 1) x (2G2 1) homologous high resolution neighborhood. We have chosen to associate the low and high resolution neighborhood information in a symmetrical fashion. The homologous neighborhoods we are associating are defined as follows. The set of (N1 H1 + 1)(N2 H2 +1) neighborhoods in the low resolution image of size H, x H2 are given by X = x,[m ,: m, + 1,m2 :m2 +H2 1]} Lm,...,NH,,m=O,...,NH (412) The homologous neighborhoods to associate with these are described by 85 fxh[Glm +m, +1: G,(ml+2)+1 1, S G22 2+1:G(m22)+2 1]J ,=0,..., N,H,,m2=0,...,N2H, (413) where bi = (H3) and (i = 1,2). Hi and H2 are restricted to being odd numbers so that we can construct about each low resolution neighborhood's center. We also only construct high resolution information that is contained within the boundary of a low resolution neighborhood. The association of homologous neighborhood information described by eqns. (412) and (413) is depicted in fig. (410). 0 0 0 0*oo0ooe 0 X4 5 6 7 8 9 0 0 z 0 0 0 0 0 0 i 0 oo*oo* (a) (b) Figure 410. Local image neighborhoods and the pixels they reconstruct. Each circle represents a 2D high resolution image pixel. The shaded circles are the low resolution image pixels obtained via decimation of the high resolution image. The gray pixel is the center of the low resolution neighborhood. Each H1 x H2 low resolution neighborhood constructs a (2G1 1)x (2G2 1) high resolution neighborhood about the low resolution neighborhood's center these are depicted by the crossed circles. The numbers are a convention used to distinguish between constructed pixels in this neighborhood. (a) Decimation factor G, = G2 = 2. (b) Decimation factor G, = G2 = 3. Consider the center pixel of an H1 x H2 low resolution neighborhood. For the case of G, = G2 = 2 in fig. (410a), we are constructing the crossed circles about our center (gray) pixel. By not constructing the center pixel, we are strictly interpolating about our observed image samples. If we allow for construction of the center pixel, we 86 can change a 'noisy' observed sample. Fig. (410b) similarly illustrates this for the case of G, = G = 3. The selection of the low resolution neighborhood size is a tradeoff between the amount of local support to consider and the amount of information needed to construct. As a rule of thumb, one should set H, > 2G, 1 but should keep H,, (i = 1,2), reasonably small to preserve sufficient low resolution support associated with the high resolution neighborhood. As G, increases, there is more missing information to construct, hence more low resolution sample support is needed. H, should not be set too high as we want to keep the support as local as possible. We will see the effects of this later when the superresolution results are presented. Also, the smaller the H,, the smaller the number of free parameters needed in the LAMs. An important note to consider here is that overlapping neighborhoods construct some of the same high resolution samples. We must remember that our high resolution sample construction is a procedure whereby we estimate the samples of a high resolution image from a low resolution representation. In essence, we would like to obtain more than one estimate of each high resolution sample point. By averaging several high resolution sample estimates we believe the final high resolution sample estimated is more reliable. 4.4.2 Hard Partitioning the Input Space The input space consists of the neighborhoods in our low resolution image to be superresolved. Because we assume no prior information about our image, we cluster (or hard partition) our input space based solely on these neighborhoods. The segmentation is based on the interblock correlation among neighborhoods of our image. In order to group 87 (cluster) the image neighborhoods that are most correlated with one another, we interpret each H1xH2 sized neighborhood as a point in H1H2 dimensional Euclidean space, that is, as a vector from the origin of our space to the coordinates given by the neighborhood values. As previously discussed, the most correlated neighborhoods will be closest to each other in our HH2 dimensional vector space. The hard segmentation we seek is equivalent to a vector quantization (VQ) of our input space based on minimizing an L2 norm criteria. Techniques which have been developed to perform such VQ include the LGB [Linde, et al., 1980] and Neural Gas [Martinez, et al., 1993] algorithms as well as the topologically ordered Kohonen self organizing feature map (SOFM) approach [Kohonen, 1990]. These approaches are considered to be unsupervised learning techniques which produce a set of feature vectors also referred to in the literature as quantization nodes or codebook vectors. The results reported in this work utilize Kohonen's approach. This is now described. 4.4.2.1 Kohonen's selforganizing feature map The principal goal of Kohonen's selforganizing feature map (SOFM) is to transform a set of input exemplars to a discrete set of representative vectors. The algorithm is a topologically ordered vector quantization (VQ) on the set { x,. }, of exemplars. The representative vectors are typically termed codebook vectors, feature vectors and/or quantization nodes. This ordering serves two purposes: during training it encourages a full utilization of the codebook vectors while they're evolving and once the vectors have been established, it allows for "close" input exemplars to also be "close" in the representative set. 88 The feature vectors in the SOFM can be obtained using a simple iterative scheme. After randomly initializing a set of feature vectors fe, c = 1,...,C, the input exemplars are compared with the feature vectors one at a time. The best matching feature vector, in the Euclidean sense, is sought. This is given by eqn. (414) as Y(x,.)=argmin( x,fc 2); c = 1,2,...,C (414) C This "winning" vector and its neighbors are then updated by fc (n+1)==f(n)+ t(n)[x,'fj(n)]; cE A(,,)(n) Cfc f(n) otherwise where pt is the learning rate parameter, Ayx,,) is the neighborhood function centered around the winning feature vector y(x,) and n describes the discrete time step (iteration). The learning rate and neighborhood function are gradually decreased during training for best results. 4.4.2.2 Cluster formation The feature vectors completely define the hard partitioning of our input space. All image neighborhoods closest in Euclidean distance to one of the feature vectors are said to belong to a cluster. Mathematically, given a set of N vectors { x, }x from which C feature vectors { f }c have been extracted, the C clusters Kc, c = 1,2,...,C formed by the vectors are given by Kc ={x, :y(x,)= c;r=1,2,...,N} and the hard partitioning function spoken of earlier is y(xr) of eqn. (414). Generally, the feature vectors established from the available data reflect the statistics of the distribution of the data vectors. Regions in the input space of the input 89 data vectors with high probability will be represented by a subset of the feature vectors. This leads to a natural yet compressed representation of our data with which to compare test data sets against. In this work, our feature vectors are obtained by performing a VQ on the local structure of neighborhoods x, of our low resolution training images using the SOFM approach. 4.4.3 Associative Memories The associative memories considered here are essentially parametric mappings that optimally associate between a set of input exemplars X = {x, }$, and a set of desired exemplars D = { d, }N, [Haykin, 1994]. These sets of exemplars can be represented as the columns of a matrix, X and D respectively. The optimal association is typically defined in the least squares or mean squared sense. The process of arriving at the optimal parameters is accomplished iteratively when no closed form solution is available. This iterative process for optimal association between data sets (matrices) X and D is known as supervised learning in neural networks. The two supervised learning topologies discussed in this section are the linear network and the multilayer perceptron. Results using these topologies will be presented later. 4.4.3.1 Least mean squares Consider the minimization of the following cost function J(W) = D WX (415) where D e 9PxN, W e 9RPxQ, X e 9QxN and 1111 is the Frobenius matrix norm [Golub and Van Loan, 1989]. We are trying to find the matrix W such that matrices X and D are associated optimally in the least squares sense. Let us denote the rth column of matrices D 90 and X by dr and xr, respectively, where r = 1,...,N. Also, let Y=WX. The rth column of Y is then y, = Wx,. We will now solve for W. Rewriting eqn. (415), we have N J=Z Idr Yr,2 r=1 N = (d, yr)T(d,. yr) r=1 (416) =Z(dT dr 2dTYr YTYr) (d 2d, y y,"y,) r=l N Z(dld, 2drWxr, x Wx,) r=1 where the dependence of J on W is understood. To solve for W, we must take the derivative ofJ with respect to W and set the resulting equation to 0, that is, S. : =0 (417) Notice that Wpq, where p = 1,...,P and q = 1,...,Q, is an element of matrix W, that is, W =[Wpq]. Then, from eqns. (416) and (417) w= JL (drd, 2drWx, + xTW'Wx,) w aw Z dr Wx r+TWTWXr)j = (2 (dWx,.) + (xWTWx,)) (418) r=1 = (2drXT +2Wx,xT) r=l where it can be shown that _(a'Wb) = aTb and (aTW Wa) = 2WaaT. Setting eqn. (418) to 0 and multiplying both sides by L we obtain W Nj r NXT T d X (419) =1 =1 91 The summations of the outer products in eqn. (419) are equivalent to matrix multiplications. Eqn. (419) is rewritten as W(iXXr)= (UDXT). (420) Solving for W we obtain our solution as W=DXT(XXT)l. (421) A more commonly used approach to obtain the closed form solution of eqn. (421) makes use of the singular value decomposition (SVD) of XT [Golub and Van Loan, 1989]. This approach is more stable since it does not require finding the numerical inverse of a matrix. In this case, W = U +VT (422) where XT= U2VT is the SVD of XT, y = diag(, ,. ,0,...,0) and the nonzero singular values of X are denoted by ao, i = 1,...,s. The solution to the least squares problem can be arrived at iteratively via the least mean squares (LMS) algorithm [Haykin, 1994]. The weights are iteratively updated as follows W(n +1) = W(n) + p(D Y)X (423) where p. is the learning rate which is a small positive constant and n is the discrete time step. It is easy to see that this iterative scheme yields the solution of eqn. (421). When eqn. (423) reaches its global (and only) minimum, then at "steady state" W(n+1)=W(n) which implies that (D Y)XT = 0. Now by substituting Y = WX as defined earlier, we can solve for W which yields eqn. (421). 