A unified superresolution approach for optical and synthetic aperture radar images


Material Information

A unified superresolution approach for optical and synthetic aperture radar images
Physical Description:
ix, 178 leaves : ill. ; 29 cm.
Candocia, Frank M., 1967-
Publication Date:


bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )


Thesis (Ph. D.)--University of Florida, 1998.
Includes bibliographical references (leaves 171-177).
General Note:
General Note:
Statement of Responsibility:
by Frank M. Candocia.

Record Information

Source Institution:
University of Florida
Rights Management:
The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. §107) for non-profit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
Resource Identifier:
oclc - 39540341
System ID:

Full Text







To Yudit


It is easy to feel overwhelmed by the amount of time and effort required of the

Ph.D. degree. These feelings were, often times, too familiar. With patience, persistence

and the help of many individuals, my doctoral studies provided for a very memorable and

fruitful four years. I would like to thank these people for their help and friendship

throughout my studies.

First, I wish to acknowledge and thank Dr. Jose C. Principe for his role as advisor

and mentor throughout my doctoral studies. Our discussions and exchanging of ideas

were fundamental in the shaping of this work. I would also like to thank him for providing

a stimulating work environment in CNEL through which my engineering horizons were

expanded. Thank you to my supervisory committee members: Dr. John M. M. Anderson,

Dr. A. Antonio Arroyo, Dr. John G. Harris and Dr. Janice C. Honeyman your interest

and suggestions were appreciated. Dr. John G. Harris deserves additional thanks,

particularly for our fruitful discussions and his suggestion to me to work on optical

images. I would like to thank Dr. Leslie Novak of MIT Lincoln Laboratory for providing

me the opportunity to work with him during the summer of 1997. It was a truly

rewarding experience.

My colleagues in and around CNEL also deserve recognition for their help and

friendship. These include Victor Brennan, Craig Fancourt, John Fisher, Chen-Hsien Wu,


Dongxin Xu and Li-Kang Yen. Thanks for your insight into all the issues and questions I

came to you with.

I wish to give a special thanks to my parents and the Rodriguez family. All of the

pastelitos you sent from Miami were a definite morale boost. Your love, encouragement,

and support are indelibly etched in my memory.

Last, but certainly not least, I wish to thank Yudit. Words cannot fully describe

how special you have been. Your love, support, patience and sacrifice have been





A CKN OW LED GEM EN TS .................................................................................. ii

A B ST R A C T ...................................................................................................... viii


1 IN TR O D U CTIO N .............................................................................. 1

2 AN APPROACH TO SUPERRESOLUTION ..................................... 5

2.1 Introducing the Problem ........................................................ 5
2.2 Comments and Observations .................................................. 9
2.3 Motivation and Description of the Superresolution
A rchitecture ........................................................................... 15

AND SYNTHETIC APERTURE RADAR IMAGES ........................ 22

3.1 Optical Image Interpolation with a Single Kernel ................... 23
3.1.1 Com m on Kernels ......................................................... 24
3.1.2 Increasing the Sample Density ..................................... 28
3.1.3 Interpolation as Projections ......................................... 29
3.2 Review of Reconstruction for Optical Images ........................ 30
3.3 SAR Imaging via Nonparametric Spectral Estimation ............ 42
3.3.1 Phase History Data ...................................................... 43
3.3.2 Standard Approaches ................................................... 45
3.3.3 Minimum Variance Approaches .................................... 46
3.3.4 Estimating a Full Rank Autocorrelation Matrix ............. 49
3.4 Review of Reconstruction for Synthetic Aperture Radar
Im ages ................................................................................... 5 1

4 SUPERRESOLUTION OF OPTICAL IMAGES ............................... 58

4.1 Optical Image Acquisition Model .......................................... 59
4.2 Using Local Information ....................................................... 61
4.2.1 Relation Between Correlation and Euclidean Distance .... 61


4.2.2 Existence of Locally Correlated Information in Images ... 66 Interblock Correlation .......................................... 67 Scale Interdependencies ........................................ 71
4.3 Architecture Specifics ........................................................... 79
4.4 Training Phase ...................................................................... 83
4.4.1 The Preprocessing ......................................................... 83 Decimation .......................................................... 84 Neighborhood Extraction ...................................... 84
4.4.2 Hard Partitioning the Input Space ................................. 86 Kohonen's Self-Organizing Feature Map .............. 87 Cluster Formation ................................................. 88
4.4.3 Associative Memories ................................................... 89 Least Mean Squares ............................................. 89 Multilayer Perceptrons .......................................... 93 Association of Neighborhoods .............................. 94
4.5 Reconstruction Phase ........................................................... 97
4.6 Relation to the Mixture of Experts ........................................ 99


5.1 Analysis and Comparison ..................................................... 103
5.1.1 Linear Associative Memories ....................................... 106 Standard Images ................................................. 106 Texture Images ................................................... 119 Additional Results and Comments ....................... 124
5.1.2 Nonlinear Associative Memories .................................. 129
5.1.3 Mixture of Experts ...................................................... 134
5.2 Feature Extraction ................................................................ 139
5.2.1 Image Features ............................................................ 139
5.2.2 Correlated Image Structure ......................................... 142

IM A G E S .......................................................................................... 144

6.1 Imaging with Multiple Models .............................................. 145
6.2 Architecture Specifics .......................................................... 146
6.3 Models Used ........................................................................ 148
6.4 The Methodology ................................................................. 151
6.4.1 The Imaging Criteria ................................................... 151
6.4.2 Selecting Between Models ........................................... 152
6.4.3 The Superresolution Process ....................................... 153
6.5 Experimental Results ............................................................ 155
6.5.1 MSTAR Slicy Data ..................................................... 155
6.5.2 MSTAR Target Data ................................................... 157
6.5.3 Detection Performance ................................................ 158


7 CONCLUSIONS AND FUTURE W ORK ......................................... 166

REFERENCES ..................................................................................................... 171

BIOGRAPHICAL SKETCH ................................................................................. 178


Abstract of Dissertation Presented to the Graduate School
Of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy



Frank M. Candocia

May 1998

Chairman: Dr. Jose C. Principe
Major Department: Electrical and Computer Engineering

This work addresses the issue of superresolving optical and synthetic aperture

radar (SAR) images. Superresolution is the process of obtaining an image at a resolution

higher than that afforded by the sensor used in the imaging. The concept of signal

resolution is shown to be intimately related to the notion of perfect reconstruction. As

such, the major issue addressed in this work is in establishing an appropriate set of bases

for the reconstruction of images in these domains in terms of a finite set of collected


The existing theoretical foundations for perfect reconstruction in the

aforementioned domains is expressed in terms of linear projections and infinite extent data.

However, practical restrictions leading to finite collected data limits the resolution


afforded by the theoretically established bases the sinc bases in the spatial domain and

the Fourier bases in the frequency domain. Superresolution deals with this issue by

incorporating a priori information into the process of establishing an appropriate set of

projections for reconstruction.

In the optical domain, a priori information is extracted locally in an unsupervised

manner from low and high resolution versions of digital images of the same scene. This is

in agreement with the local weighting of the theoretically established sinc bases for perfect

reconstruction. In the SAR domain, a priori assumed models are inherent to the

reconstruction. The information is provided globally in terms of a best match to these

models. This is also in agreement with the global weighting of the theoretically established

Fourier bases for perfect frequency reconstruction.

The superresolution of images is accomplished by means of a modular architecture

using a two-step approach. First, each image neighborhood is assigned to a priori

determined clusters for the image class. Second, each cluster is linked to its optimal

reconstructor which was trained from the collected set of samples. The results obtained

herein are compared against several accepted sets of bases. Optical images of real scenes

as well as textures are tested. SAR data from real metallic polyhedral objects as well as

U.S. military tanks are imaged and tested for automatic target detection performance. It is

shown that by appropriate incorporation of a priori information into the superresolution

procedure, better reconstructed images result.




The intrinsic resolution limits of a sensor can typically be attributed to its physical

properties. One way of increasing this resolution is to physically alter the sensor. Another

approach would involve the appropriate signal processing of collected data. The work

herein is concerned with the latter within the specific domains of optical and synthetic

aperture radar (SAR) images. Superresolution is the term given to the signal processing

of collected information which achieves a resolution higher than the one afforded by the

physical sensor. Applications of this work include obtaining high quality image prints

from digital images, increased detection of targets in military or civilian imaged scenes and

detection of small tumors in medical imaging.

Superresolution of optical images is generally viewed as seeking a perfect

reconstruction. As such, a comparison to the interpolation of signals is natural and

warranted. In sampling theory we can define the resolution of a signal as the minimum

number of samples sufficient to completely represent our signal [Vetterli and Kovacevic,

1995]. To this end, one important result regarding the perfect reconstruction of a signal

from its samples comes from the Shannon sampling theory [Marks, 1991; Oppenheim and

Schafer, 1989]. In this theory, the original signal can theoretically be reconstructed from

its samples provided certain conditions regarding the signal and its samples are met -

irrespective of the statistics of the signal. If these conditions are violated, then common



engineering knowledge says that we can never recover information beyond half the

sampling frequency. In fact, this is not necessarily so. A priori knowledge regarding the

original signal can be used to optimally recover a signal from its samples.

Superresolution of SAR images is generally viewed as the problem of detectability

of complex sinusoids embedded in nonsinusoidal signals ("noise"). The desire to locate

radar scattering elements (trihedral reflectors) implies the detection of these complex

sinusoids. It is the detection of these reflectors upon which SAR imaging is presently

based. Having said this, we can define the resolution of a signal in the SAR domain as the

minimum number of samples sufficient to distinguish between the two complex sinusoids

in that signal that are closest in 2D frequency. This resolution, or spectral resolution as it

is also referred, is established in the frequency domain and is determined by the main lobe

of the rectangular window [Kay, 1988; Stoica and Moses, 1997].

In SAR imaging, the superresolution of objects other than trihedral reflectors

would benefit the automatic target detection and recognition (ATD/R) problem and it is

the focus of the superresolution approach presented herein. The phase history of a typical

radar illuminated scene contains, among other things, reflected energy from nonmetallic

objects such as trees and open field. A priori knowledge of objects within the scene as

well as how they ideally manifest themselves in the phase history can be used to enhance

the performance of automatic target detection and recognition (ATD/R) systems.

The issue of superresolution is generally viewed as different in the aforementioned

image domains. I submit that it can be regarded as treating the same problem- the perfect

reconstruction problem in each image's respective domain which is space for optical

images and 2D frequency for SAR images. Owing to this, an architecture for the


superresolution of images is described in this work which unifies the superresolution

problem in both domains. An image is superresolved based on several modules. Each

module is "tuned" to the specific character of the local image content at the point to be

reconstructed. In order to determine the appropriate module for the point being imaged, a

simple clustering on the signal samples is performed.

Chapter 2 introduces and describes the superresolution problem more formally.

The goal of the chapter is to motivate the superresolution architecture presented at the

chapter's end. With this in mind, issues regarding the practical limitations of the perfect

reconstruction of a signal from a set of samples are addressed and several observations are

made regarding the mitigation of the problems faced.

Chapter 3 serves to present methods in common use for the reconstruction of a

signal from its samples in both the optical and SAR image domains. It is these methods

that the superresolution procedure of this study is compared against.

Chapter 4 concentrates on the details regarding the superresolution of optical

images. Here we discuss the reasons for our choice of modules within the architecture

presented. It is shown here that the architecture is implementing a convolution with a

family of kernels. This is equivalent to convolving with a shift-varying finite impulse

response (FIR) filter where the filter weights are determined according to the local image


Chapter 5 is dedicated to experimental results regarding the superresolution of

optical images. The results obtained are analyzed in detail and comparisons to the results

of the reconstruction approaches presented in Chapter 3 are provided. Among the results

analyzed are the use of linear associative memories and nonlinear associative memories in


our superresolution methodology as well as a comparison to the performance of the

mixture of experts architecture.

Chapter 6 reports on the methodology for superresolving SAR images. The

superresolution is based on two models: the point scatter or trihedral reflector model for

targets and a noise model for clutter. The superresolution here is also a shift-varying FIR

filtering scheme. The filter weights are based on minimum variance estimation, that is,

they are dependent on the minimization of expected image power subject to how well the

assumed model resided in the SAR phase history samples. Experimental results with this

method are reported which increase automatic target detection performance over the

conventional methods.

Finally, Chapter 7 concludes this work with comments regarding the material

presented herein. The contributions of this work are stated and a commentary on future

research directions is given.



The resolution of a signal addresses the ability to perfectly reconstruct a

continuous signal from a set of samples. It is dependent on two things: the sampling

period for obtaining the samples and the number of samples collected. Accomplishing this

perfect reconstruction with a universal set of linear projections defines signal resolution.

If the conditions for perfect reconstruction from this universal set are not satisfied, then

the reconstruction can only be approximated. Having said this, superresolution is the

ability to reconstruct a signal (from a set of samples) beyond the limit imposed by

sampling theory.

This chapter introduces a methodology for the superresolution of images. We

begin by introducing and defining the concepts of signal resolution. Then comments and

observations on the practical limitations associated with perfect reconstruction are

presented. The chapter concludes with a motivation for the superresolution of images via

the methodology presented in this work as well as a description of the architecture which

implements it.

2.1 Introducing the Problem

Webster's dictionary defines resolution as the process of separating into

constituent or elementary parts [Webster, 1989]. Implied in this definition is that, ideally,

all of the elementary parts completely describe the object they are derived from. This



work is concerned with the elementary "parts" of signals and the ability to fully describe

them from a set of samples. As such, we are necessarily addressing the issue of perfect

signal reconstructability. In this sense, resolution is synonymous with reconstruction.

This is now elucidated.

Let x(a), where oo< < a< be a continuous signal with maximum frequency

content Qc rad/sec. Thus our analysis is based on bandlimited signals. We can represent

x(a) as a linear combination of an orthogonal set of basis functions as

x(a) = Iy[n]ka,[n] (2-1)

where the linear weighting is given by the samples in y[n] and ka [n] represents our set of

basis functions. Eqn. (2-1) addresses the perfect reconstruction of x(a) from a set of

samples. For the sake of simplicity, our development is performed with ID signals. The

extensions to 2D should be clear.

For a signal of time, x(t), the perfect reconstruction of bandlimited signals from a

set of samples is addressed in the Shannon sampling theory [Marks, 1991; Oppenheim and

Schafer, 1989]. Here y[n] = x(nT,) for all integers n, that is -oo < n < Go, and sampling

period Ts. The equation describing this perfect reconstructability is

x(t) = x(n)sinc --n (2-2)

where, by definition, sinc(t)- s7. We see that our basis functions are given by

k,[n] = sinc( n) and the basis functions in this infinite set are orthogonal [Woodward,

1964], that is,

Ssinc(# -n)sinc(#--m)dt= T,6[n-m].


The perfect reconstruction can be obtained if the sampling period satisfied T, < where

TC = For time signals, L is the critical sampling rate (also known as the Nyquist

rate). Therefore, every sample of x(t) can be exactly resolved with an infinite set of

samples provided the sampling period is sufficient.

For a signal in frequency, x(Q), the perfect reconstruction of bandlimited signals

from a set of samples is addressed by the discrete-time Fourier transform (DTFT). The set

of samples y[n]= y(nT,) comes from the sampling of a superposition of complex

exponentials. The equation describing this perfect reconstructability is

x()= Zy(nT7)e-in'" (2-3)

Here we see that our basis functions are given by k[Jn] = e-jfT'" and the infinite set of

these basis functions are orthogonal, that is,

Sejn e-J"md d = 27rt[n m].

The perfect reconstruction can be obtained if the sampling period satisfied T, < T, where

Tc = I as before. For frequency signals, Tc is the critical sampling rate.

We have seen that every instant of our bandlimited time and frequency signals, x(t)

and x(i) respectively, can be exactly recovered (resolved) with an infinite set of samples

provided the sampling period is sufficient. Thus the sampling period T, provides the limit

to which our signals can be perfectly reconstructed (resolved) from an orthogonal set of

linear projections and an infinite number of samples. It is in this manner that the sinc and

Fourier bases are the common and universal set of linear projections capable of perfectly

reconstructing bandlimited signals in the time (space) and frequency domains, respectively.


These sets of bases are universal in that all appropriately sampled infinite extent

bandlimited signals can be reconstructed with these bases.

For finite extent data, eqns. (2-2) and (2-3) can be expressed respectively as

x(t) = [x(n T)w[n]]sinc t n (2-4)
n=-0o T,


i(Q) = [y(nT)w[n]]e-jr' (2-5)

where w[n] describes samples of our window function

wn fl; 0 0; otherwise

and N is the extent of our window. Notice that the hat character in i has been used

purposely to denote the approximation afforded by a finite number of data samples. The

finite number of data reduces the resolvability of the signal. We can see this by examining

eqns. (2-4) and (2-5) in the frequency domain. The continuous Fourier transform (FT) of

eqn. (2-4) yields

i(n) = [ X(CO) W(co)]n(O )

and eqn. (2-5) can be equivalently expressed as

(9) = [Y(o)) W(co)]

where is the periodic convolution operator, co = QT, is angular frequency in radians,

Y(co) is the DTFT of y(nT,) and is periodic of period 27n in co and in C (similarly for

X(a) and W(co)) and F(n)={ 1 |10|< ; 0 otherwise }. The effect of the

windowing function w[n] on eqns. (2-4) and (2-5) is to smear (distort) the true frequency


spectrum X(Q) of x(ac). This results in a decrease in the ability to properly resolve the

signal x(a).

Thus the resolution of a signal is also limited by the number of samples we have to

describe the signal. The object of superresolution is to be able to reconstruct x(c) more

faithfully than the resolution afforded by eqns. (2-4) and (2-5), that is, the resolution

afforded from a finite set of observed data obtained at a sampling rate T.

2.2 Comments and Observations

Practical issues may preclude the satisfaction of the conditions necessary for

perfect signal reconstruction as given in the previous section. In particular, (1) there must

be no noise associated with the collected samples (e.g. no quantization error), (2) the

sampling rate must be less than the critical sampling rate of the signal and (3) the signal

must have been sampled to infinite extent.

In the practical sampling of SAR data, the issue concerning sampling rate is usually

not critical. This is because the bandwidth and frequencies at which the radar operates are

known at the time of data collection. The issue of noise is much more critical. For

example, SAR data is typically imaged from a rectangular grid of equispaced radar returns.

This data, however, is sometimes processed by interpolating through the actual collected

return grid. This is particularly true for inverse SAR and spotlight mode SAR data

collection where a typical grid consists of radially equispaced samples swept out through

several angular increments [Odendaal et al., 1994; Mensa, 1991]. As such, the accuracy

with which frequency information from the imaged scene can be produced is reduced.


In the practical sampling of optical images, the issue concerning quantization error

is usually not critical. The standard use of an 8-bit dynamic range usually yields highly

acceptable and pleasing images. The issue of sampling frequency is much more critical.

The information of a natural scene has typically very high spatial frequency content. The

sharp contrast we perceive in order to delineate objects (object boundaries) as well as the

textural character of those objects are but some of the attributes inherent to the high

frequency information content of optical images. As such, the sampling frequency used in

collecting optical images is generally not large enough to fully describe a "continuous

image" in the sense of the Shannon sampling theory. An interesting attribute of optical

images is their highly structured nature. This structure appears locally and can be used to

characterize objects in these images, that is, portions of objects can be described as

smooth, edgy, etc. Information such as this is not considered in the sampling theorem.

Let us now make a few observations. Eqns. (2-2) and (2-3) specify a universal set

of basis functions which are linearly weighted by a set of collected samples corresponding

to signal x(ca). There is an uncountable infinity of basis functions in this universal set (one

for each instant of a in our original signal to reconstruct) and they are given by

s, = {sinc (- n) | oo < n < 0o, oo < t < oo} for spatial signals and, for frequency signals,

by so = {e- I < n < 0, oo < < oo}. These sets are universal with respect to those

signals that are sampled beyond their critical rates and the resolution afforded by these sets

is given by this rate. Hence, if samples were collected at a sampling rate Ts that did not

meet the critical sampling rate, the sets of sinc or Fourier basis functions could not be

linearly weighted as in eqns. (2-2) and (2-3) to perfectly reconstruct our signal. This does

not preclude the existence of other sets of basis functions which could be linearly


combined by samples collected at a rate below the critical sampling rate to still yield the

signal's perfect reconstruction.

As an example, consider the aliased spectrum of fig. (2-1). Here, the original

signal's scaled spectrum is given by the dashed triangular region centered at the origin.

The original spectrum is

X(p) c
0; otherwise

The aliased spectrum of fig. (2-1) for I| | < Kc when c, < C, < 2Q, is

1 |n|.

19n n nc< n f
T, TOk

and is given by the solid line. It can be easily seen that our contrived signal X(C) can be

recovered by linear filtering (frequency multiplication) with the filter

Mc 2 1 c Q, |
0 o; I l > 9

as pictured in fig. (2-1), that is, X(2)= X, (Q)K(Q). This example shows that the

original signal can be recovered via a linear filtering even if it has been aliased as long as

the aliased spectrum has no null frequencies and the original signal's spectrum is known.

In practice this information is not typically known making the recovery of "improperly"

sampled signals usually a difficult task.


K(n) T,
/ ..............

-ns-C -K2 -ac K2 -f+"c 0 as-, 2 nc Qs 9s+c Q
2 2

Figure 2-1. Recovering a signal from an aliased representation. Aliasing results when
sampling below the Nyquist rate. Copies of the true signal spectrum are given by the
dashed triangular regions. The aliased spectrum is given by the solid line. Here, the true
signal can be recovered from the aliased signal by linear filtering (frequency multiplication)
via the filter K(fQ) designed from a priori knowledge of the true spectrum.

A priori knowledge of the character of a signal, rather than its frequency spectrum,

could also be used to appropriately reconstruct the signal from its observed samples. As a

simple example, let's consider the set of piecewise constant continuous time functions

where each constant segment in the signals has duration T seconds (e.g. signals quantized

in time). An illustration of such a function is provided in fig. (2-2a). Note that this signal

has infinite frequency content due to its piecewise constant nature.

If our observed samples were obtained by sampling this function every T seconds,

then clearly the convolution of a zero-order-hold kernel with the observed samples would

be optimal for recovering the piecewise constant function, that is, it would yield perfect

reconstruction even though the function in question was grossly undersampled according

to the Shannon sampling theory. This convolution is pictured in fig. (2-2b).

The set of piecewise constant signals is not typically encountered in practice so the

basis set resulting from the zero-order-hold kernel is of limited use.


T T x(t)
000 000
T t


000 T 000 1

(b) T

Figure 2-2. Simple example illustrating how perfect reconstruction is possible when a
priori knowledge of the relation between signal samples and its corresponding continuous
function is known. (a) Piecewise constant continuous function grossly undersampled
according to the Nyquist criterion, (b) Recovering the continuous function in (a) from its
samples requires a simple convolution of the samples with a zero-order-hold kernel.

However, this very simple example illustrates that superresolution could be based

on a priori knowledge about the signal to reconstruct given the observed samples -

irrespective of the frequency content of the signal. It also illustrates the fact that perfect

reconstruction of a signal according to the Shannon sampling theory only establishes

sufficient conditions for the perfect reconstruction of signals characterized by frequency

from their samples. If a priori knowledge of the signal corresponding to the observed

samples is available, then this can be used to develop a technique for superresolving a

signal. Therefore, superresolving images is directly associated with methods to acquire a

priori information about the signal of interest.

Recently, optimal reconstruction of signals sampled below the Nyquist rate was

proved possible by using a priori knowledge of the signal statistics [Ruderman and Bialek,

1992]. The statistics of the continuous signal were assumed known rather than any

specific relation between the signal and its samples. In the results reported, the signal x(t)


was assumed to be bandlimited, Gaussian, zero-mean and stationary. The samples x[n]

from this signal were obtained via a point spread function f(t) that was not an ideal

sampling Dirac delta and these samples were corrupted by zero-mean, additive,

independent and identically distributed (i.i.d.) Gaussian noise r7[n] of power a2 to yield

the collected samples y[n] = x[n] + rl[n].

The authors obtained the same reconstruction filter when using two different

optimization approaches: maximum likelihood and conditional average estimation. The

optimal signal reconstruction was given by

X(n) = K(fn)Y(Q) (2-6)

where the filter K(D) was

K( F()S(Q)
K(Q) =
2 + F(+n n,)2 S( +nQ,)

Notice that Y(K2) is the DTFT of the sequence y[n] so that Y(9) = Y(Q + n,) is periodic

with period C, = and T, was the sampling period. S(Q) is the power spectrum of x(t)

and F(KQ) is the FT of the point spread function f(t). From eqn. (2-6), the most likely

signal i(t) is obtained through a linear filtering of the collected samples y[n].

In the case when the noise vanishes, the point spread function becomes an ideal

sampler, the signal was bandlimited and the sampling frequency is above the Nyquist rate,

that is, (2 = 0, F(n) = 1, X(9) = 0 for 101 > c, and 0, > 20c, the filter K(92) reduces


0 otherwise


which is the FT of the sinc function. This result shows that the signal statistics play no

role in perfect reconstruction when the Shannon sampling conditions are met.

More importantly, this result shows that a priori knowledge of the statistics of a

signal can be used to superresolve a signal from a collected set of samples. However, this

analytic result is only valid for stationary Gaussian signal statistics and, in practice, signals

are typically nonstationary and have very complex statistics. Our lack of knowledge of

these statistics and the analytical intractability of determining the optimal filters for

complicated density functions, as commented by the authors, limits the practical use of this


2.3 Motivation and Description of the Superresolution Architecture

The motivation for the superresolution methodology stems from our goal of more

efficiently using the information contained in available image samples. This requires

projections onto data specific sets of bases instead of the bases established in the

reconstruction theories. Naturally, learning or adaptive system theories play a crucial role

in this methodology of designing data specific projections. The manner in which these

models are realized must be consistent with the information character of images and how

this relates to the superresolution problem. These issues are now addressed.

The universality of the perfect reconstruction theories is an amazing result but the

price paid is a strict limitation on the required resolution. The practical problem is to do

the best we can with the available samples, that is, to superresolve the images. To yield

better reconstructed images, we must find alternate sets of projections from which to

reconstruct our images. It has been shown that perfect reconstruction can be achieved if a


priori knowledge is available, but in practice this knowledge is absent. So a central

problem is how to capture a priori knowledge about the domain. In determining the set of

projections to use, we must either make assumptions regarding our data or we must learn

this a priori knowledge from the available data using nonparametric models. We are

concerned with the latter.

We propose in this dissertation a novel technique of extracting a priori information

from an image by performing a down sampling operation on the original image. The high

resolution image becomes the desired response to a learning system that receives the low

resolution image as input. The highly structured and localized nature of images can be

exploited in modeling the relation between low and high resolution versions of an image.

It seems reasonable to consider several local models in the imaging process. This is due to

the various structures present in images resulting from the objects in the imaged scene.

Two particular image traits can be used in this modeling:

there is much similarity in local structure throughout an image

this structure is typically maintained across scales

Local structure. A direct example of the first trait in optical images can be

provided by considering an image of a person's face. Local neighborhoods about the

person's cheeks and forehead are generally indistinguishable when viewed independently.

We have assumed that the effects of lighting and other "special" attributes (scars, moles,

birth marks, etc.) are absent in this comparison. An easy method to test this observation is

to locate these similar image portions in an image and randomly swap them to form a new

image. If the new image resembles the original one then our observation is correct. In

this manner, all neighborhoods exhibiting a particular characteristic can be treated in


practically the same manner. These neighborhoods can be considered generated by the

same statistical process. It will be shown that the models which we compare our

neighborhoods against can be interpreted as the mean of each statistical process one for

each model used.

Examples of the existence of this first trait abound in images. It has recently been

exploited to increase gains in compression schemes [Dony and Haykin, 1995a; Kambhatla

and Leen, 1997]. The use of image compression has been popular for some time now and

is evidenced by the widespread JPEG still image compression standard [Wallace, 1991]

and PCA compression schemes [Dony and Haykin, 1995b]. The standard schemes for

lossy image compression are based around the highly redundant information generally

present in small image blocks which could be described in a more efficient and compact

manner. In the case of compression via PCA, a single representation is established for all

of the blocks in the image. The recent compression approaches exploit the first image trait

by grouping the small blocks of an image into clusters that are most similar (correlated)

with one another. In this way, each cluster can be described with an efficient

representation of its own separate from those corresponding to other clusters. Overall,

this results in a more efficient image representation than that afforded by the approaches

of the standard compression schemes.

The first image trait can also be seen in SAR images but in the frequency

domain. It results from the radar backscattering of objects of similar composition and

orientation. SAR imaging typically, however, is based solely on the backscatter model of

the trihedral reflector. This reflector is ideally imaged as a bright point in a SAR image.

However, the formation of SAR images should be based on a variety of models. In this


manner, objects exhibiting a particular backscatter characteristic can be imaged with the

appropriate model.

Across scale similarity. The second trait can be used in obtaining a high

resolution image from a low resolution one. To accomplish this we must have knowledge

of the relation between similar neighborhoods in a low resolution image and their higher

resolution counterparts. In optical images, this relation can be established from available

data using the down sampling operation. This is done by optimally associating

homologous low and high resolution neighborhoods. In SAR images, the neighborhood

relation is in the frequency domain. However, establishing the relation between SAR data

and the a priori models used in the imaging is performed in the phase history domain. This

is done by minimum variance model matching between the phase history data and the

model used for the point being imaged.

The manner in which we superresolve images exploits the aforementioned traits.

Superresolving based on these two traits necessitates the need for establishing

which neighborhoods of an image are similar in local structure

how these neighborhoods relate across scale

The superresolution architecture of fig. (2-3) operates on knowledge of these two

premises. It exploits the relations between collected samples in establishing the basis

functions for reconstruction since the conditions for perfect reconstruction of the sampling

theorem cannot be met in practice. The neighborhood selection is established by a

comparison of the distance between the received data and learned local models

representative of the data. This is performed by the analysis and decision block of fig.


(2-3). This comparison is used to select the module most appropriate for the

reconstruction of a particular image point from the C available modules.

The reconstruction modules in the architecture for optical images are associative

memories (optimal mappings) which produce the output pattern best associated with the

input. They describe the basis functions for the reconstruction of image samples. They

are established via training from available data of homologous low and high resolution

neighborhoods. As such, the basis functions are highly localized in space. This is because

the information most contributing to the character of an image sample resides about that

sample. It is in agreement with the support properties of the sinc kernel in the sampling

theorem. For SAR images, the modules of the architecture are nonparametric model

estimators. The basis functions derived from these modules are global in extent. This

affords the most frequency resolution in the reconstructed image. They are also in

agreement with the global extent of the basis functions of the FT.

One can expect that this methodology will yield better reconstruction than methods

based on the sampling theorem. However, unlike the universal character of the sampling

theorem, this superresolution method is specific to the character of images, that is, bases

obtained for one class of images may perform poorly when reconstructing another class.

Because of this, establishing the appropriate models with which to compare our data

against is important to the successful superresolution of an image. With the appropriate

models established, the collected data is transformed via the projections corresponding to

each model. A recent approach in the neural network literature that is used in modeling

the probability density function (PDF) of a set of data conditioned on another set of data is

the mixture of experts (MOE) [Jacobs et al., 1991; Jacobs and Jordan, 1991]. The


conditional PDF in the MOE is modeled as a mixture Gaussian distribution. This

architecture can be used in modeling the transformation between low and high resolution

neighborhoods of images. The MOE will be compared to our superresolution

methodology. Comparisons of the theoretical issues between the MOE and our approach

as well as superresolution performance will be provided in a later chapter.

Note that the reconstructed images from the architecture of fig. (2-3) are not

continuous but are a set of samples at a higher density than the collected samples. Further

details regarding this architecture for each image domain will appear in their respective



Low Resolution Analysis and Decision Arranging of Local Superresolved
Image Image Portions Image
---------...---------------O-- J:. .


Figure 2-3. The general architecture for the superresolution of optical and SAR images.




This chapter presents background on common approaches to image reconstruction

from collected samples. To address the perfect reconstruction issue, we require

knowledge of the ideal image to be formed. The ideal optical image typically describes

functionally the reflected light at each spatial location from the camera's viewpoint and

within its field of view. This function describes the imaged environment as perceived by

the human eye. The ideal SAR image does not have such a clear description. Among the

possibilities, the ideal SAR image could represent the environment as perceived by the

human eye (without the effects of atmospheric occlusions such as clouds, fog,

precipitation, etc.). It could also describe functionally the electromagnetic reflectivity at

each spatial location from the radar's viewpoint and within its "field of view" such that

closely spaced scattering elements in the illuminated scene could be discriminated. In fact,

SAR imaging currently attempts to address both of these issues.

A description and formulation of the common approaches for reconstruction of

optical and SAR images from collected samples is presented in this chapter. Also included

herein are reviews of the recent work on the reconstruction and superresolution of optical

and SAR images. The information presented in this chapter is the platform upon which

the details of the superresolution methodology for both optical and SAR images will be

developed in the appropriate chapters. These chapters will show that the common



approaches presented here are special cases of the superresolution methodology of this


3.1 Optical Image Interpolation with a Single Kernel

A common manner in which optical images are reconstructed from a set of

collected samples is by interpolation with a single 2D kernel. The process of interpolation

was alluded to in the previous chapter when discussing perfect reconstruction via

convolution with the sinc kernel. We will formally introduce the problem here for the sake

of clarity. The interpolation methods of this section utilize the most commonly

encountered 2D kernels. Interpolation is seen to take the form of a convolution between

the observed image samples and a 2D kernel. The samples of an analog image x(t,, t2)

are denoted by x[nl,n2]=x(tl,t2),=n ,,T=,r2 Each interpolated point i(t,,t2) in these

techniques is linearly related to samples of the image in question. The procedures of this

section for interpolating continuous images take the form

x(tl,t2)= x[nl, 2]k -n -n2) (3-1)
nl =-oo n2 =--o

where k(t, t2) is the continuous interpolation kernel that defines the weighting of the

signal samples and T, T2 denote the sampling periods in their respective axes. The kernels

that are presented here are prototypes, that is, T7 = T2 = 1. A simple scaling by the

appropriate sampling rate, as in eqn. (3-1), yields the proper kernel.

There are two important properties which are common to interpolation. They will

be referred to as interpolation properties 1 and 2. These are:


1. x(tl,t2) must pass through the image samples when tj is an integer multiple

of T, and t2 is an integer multiple of T2. This is mathematically equivalent

to constraining the interpolation kernel as follows

k(t1t2 =l t, =0,t2 =0 1
0 t =m, Tt2 = T2; ,m2 Z

where Z is the set of integers [Hamming, 1973].

2. Signal sample weighting diminishes as the distance from the point being

interpolated increases. This property constrains the kernel's envelope to be

a decreasing function as we move away from its center. If the sample

weighting does not diminish, then it is finite within a "small" support


Property 1 is a consequence of the definition of interpolation. That is, samples are only

constructed between the given sample locations since, according to the Shannon sampling

theorem, the observed samples must be free of quantization error and noise for perfect

signal reconstruction. Property 2 is usually invoked by drawing a parallel with the

Shannon sampling theory. If samples of a function are drawn at small enough intervals,

then it is known that any function value between these samples can be reconstructed with

the use of an asymptotically decaying function the sinc function. Finite extent kernels do

not necessarily need to adhere to this. Specifying finite sample weighting within a small

region of support is sufficient for computational numerical stability.

3.1.1 Common Kernels

We will now present and briefly describe four common kernels particular to image

interpolation. These are the zero-order-hold, bilinear, bicubic and sinc kernels.


Zero-Order-Hold. In general, this is the simplest of interpolation kernels.

Interpolation with this kernel is commonly referred to as nearest neighbor interpolation or

replication. Using this kernel, the interpolated point .(tl ,t2) is chosen as x[n ,n2] at the

signal sample closest to (t1, t2). The interpolation kernel is a "rectangular box" given by

fl \t,<1t2\ < (3-2)
k(tp,t2) = 2 2 (3-2)
0 otherwise

The region of support here is only one sample in extent. The interpolated image usually

has a very "blocky" appearance due to the repeating of image samples. This kernel

satisfies property 1 but is not in strict compliance with the decay concept of property 2.

The signal sample weighting does not diminish with distance, rather it is constant only

within a small rectangular region. This interpolation kernel is illustrated in fig. (3-1).

Bilinear. Interpolation with this kernel is the 2D counterpart of ID linear

interpolation. The bilinear interpolation kernel is separable and is composed of two 1 D

linear interpolation kernels as shown in the equation below
k (1-it, i)(1- t2 I) It I< 1, t2 I<1
k(tv ,t2) = (3-3)
[0 otherwise

The region of support for this method is local. Its support region is comprised of the four

samples nearest the point of interpolation. It satisfies both interpolation properties listed

above. This interpolation kernel is illustrated in fig. (3-1).

Bicubic. Unlike the interpolation kernels of eqns. (3-2) and (3-3), this one is

smooth. Interpolation with this kernel locally fits a separable polynomial through our

sample points. The kernel is constrained so that the polynomial's first derivative must


agree at the endpoints of adjacent polynomials. The ID prototype kernel developed to

meet such a criterion for the bicubic kernel is

(t +2)2(t+1) -2 -(t+l)(t2+ -) -1 <0
k(t)= (t- 1)(t2 2 t -) 0 t < 1 (3-4)
-l(t-1)(t-2)2 lt <2
0 otherwise

The separable 2D kernel can be easily generated by considering the sampling rates T1 and

T2 of each axis in the above equation, scaling appropriately, and multiplying the resulting

ID kernels. Properties 1 and 2 both hold for the above interpolation kernel. The

separable 2D kernel is shown in fig. (3-1).

Sinc. This kernel was established in the Shannon sampling theory. It is well

known that any bandlimited signal can be reconstructed from its samples provided that the

signal was sampled at a rate higher than the Nyquist rate. Here the interpolation kernel is

given by

k(tt2) =sinc(t)sinc( t2)= sin(t) sintt2) (3-5)

and the region of support is global. This is given by the extent of the kernel which is seen

to be infinite due to the sinusoids in the kernel. Note that the weighting function k(t, t2)

in this case is separable and corresponds to an ideal low pass filter. This ideal kernel is

illustrated in fig. (3-1).

Notice that each of these interpolation kernels can be used in the perfect

reconstruction of a continuous signal from its samples. Convolving a kernel with the

signal samples results in perfect reconstruction when the continuous signal is known (or


assumed) to be locally constant for the zero-order-hold kernel, locally linear between

samples for the bilinear kernel, locally smooth up to the first derivative for the bicubic

kernel and globally smooth up to a given frequency of I- for the sinc kernel.

Zero Order Hold Kernel Bilinear Kernel

0.2, 0.2
0 0
o1 1
0 0 .5 0 0.5
-0.5 -0.5
t2 -1 -1 t t2 t

Bicubic Kernel
Sin Kernare of finite extent except the sinel

0.5 0.51

0" 01

-0.5 -0.52
22 2 4

12 -2 -2 ti t2 -4 -4 tj

Figure 3-1. Common interpolation kernels. The kernels depicted here are specified in
eqns. (3-2)-(3-5). All are separable and eightfold symmetric (TI = T2). Note that all
kernels are of finite extent except the sinc kernel.

A typical property, though not essential, of common interpolation kernels is the

separability of the kernel itself. This property usually simplifies the task of kernel

composition and analysis and could also be useful for efficient implementation of the

interpolation process. Symmetry is also an attribute usually associated with interpolation

kernels. In the case of 2D interpolation, the symmetry is usually fourfold, that is,


k(tj,t2) = k(-tI,t2)= k(t1,-t2). If the sampling rate is the same in each of the two

independent axes, this fourfold symmetry becomes eightfold. In this case,

k(tj,t2) = k(-t1 ,t2) = k(tl,-t2)= k(t2,t1). These additional properties are shared by the

kernels presented in this section. Kernels of finite extent necessarily use local information

in the construction of a point whereas kernels of infinite extent are required to use the

global content of the given signal samples.

3.1.2 Increasing the Sample Density

The continuous image obtained via the linear filtering of eqn. (3-1) must be

discretized to view or store it on digital computer. This discrete representation allows

interpolation to be viewed as the process of increasing the sampling density of a signal. In

this case, the interpolated signal can be obtained by expanding the given signal samples

x[n, n2] and convolving with a sampled interpolation kernel [Schafer and Rabiner, 1973].

This form of interpolation is a linear filtering process used in interpolating discrete images.

For an expansion rate of G, x G2, our expanded signal to convolve is given by

x e[ n, 2 ] n = 0 -+ + GG1, + -2G "- ]
x[n,,n2 2 n2= 0,= G,+2G2,...- (3-6)
[0 otherwise J

and the corresponding sampled interpolation kernel, obtained by sampling a continuous

kernel, would be k[ni,n2]= k(tI,t2)jt=^ ,2=2_. Gi, G2 are both integers greater than 1.
G, G2
The interpolated signal x[n ,n2] at our new sampling density is

i[n, n2 = Xe[ni, n2]**k[ni, n2 ]. (3-7)

Eqn. (3-7) is the discrete counterpart of eqn. (3-1). It will be shown that eqn. (3-7) is a

special case of the superresolution methodology for optical images.


3.1.3 Interpolation as Projections

The interpolation paradigm can be cast in several different forms. The form most

common in the signal processing literature is that given in eqns. (3-1) and (3-7). It is cast

as the linear filtering (convolution) of the signal samples with an interpolation kernel (FIR

filter). An equivalent and convenient form of expressing the interpolation operation is via

matrix multiplication. This form is compact and directly accessible by the optical image

superresolution methodology.

We define a collection of 2D image samples to be a matrix: X = [x,, ] where

x,,,,, = x[nn,n2 1x( 2 ,t=n ,t2=2T2 Similarly, X is a matrix of interpolated samples, that

is, samples at the new sampling density. Notice that X = [x,,2 ] where x "2 = 2[n,, n2]

and these are estimates of x(t1, t2 )=,T,',2= 2T for T'= T, / G, and T2 = T2 /G2. Note that

G1, G2 > 1 when increasing the sampling density, that is, interpolating. For separable

interpolation kernels we define two interpolation kernel matrices, K, and K2, as

... k(0,0) k(-1,0) k(-2,0) ... k(0,0) k(0, ) k(0,--) .
K,= k(,0) k(-1,0) k(--2,0) ; K2 = ..- k(0,-1) k(0,--1) k(0,--1) ...
... k(-,0) k(--1,0) k(-2,0) ... ... k(0,-2) k(0, -2) k(0,--2) -

It is readily seen that eqn. (3-1), with t, = pT, t2 =p2T2' and pI,P2 e Z, is

equivalently expressed as

X= KXK2. (3-8)

Notice that if G, = G2, then K, = K2. If the interpolation kernel is of finite extent then

K, and K2 are also finite extent matrices.

Another useful form of expressing eqns. (3-7) and (3-8) results when representing

the image samples in X as a vector rather than as a matrix. In this case, we draw from the


results of Kronecker products in linear algebra to establish our expression [Graham,

1981]. The Kronecker product AB of any two matrices A and B is the matrix

obtained from the matrix A by replacing each entry ay of A with the matrix ajB. If A is

any M x N matrix and aj is its jth column then vec(A) is the MN x 1 vector

vec(A) = [aT,aa ,...,a] Thus the vec operation transforms a matrix into a column

vector by placing the columns of the matrix one underneath the other. Let us also define

an operator uvec which undoes the vec operation. That is, the uvec operation transforms a

column vector into a matrix by partitioning the vector into adjacent sections of equal

length and placing them side by side to form a matrix. A result from linear algebra which

is useful to us here is that vec(ABCT) =(CA)vec(B). With this, eqn. (3-8) is

equivalently expressed in vector form as

vec(X) = (KT K,)vec(X). (3-9)

Notice that in this expression the equivalent of one interpolation kernel is being used on a

vector of data.

3.2 Review of Reconstruction for Optical Images

Work related to the notion of reconstruction has been described in the literature as

zooming, magnification, enlargement, resampling, sample expansion, interpolation and

superresolution. The foundation of this work, leading up to the present, lies in

interpolation theory. Below is a brief exposition of the core of classical interpolation

methodologies. This is followed by a comprehensive account of the recent work into the

reconstruction and superresolution of optical images of the last decade or so.


Interpolation has an extensive history and has been applied in a variety of fields.

An early use of such procedures was to determine function values between mathematically

tabulated data [Hamming, 1973; Kreyszig, 1993]. Interpolation has also found use in the

design of car and ship bodies. More recently, interpolation has found applications in

image coding and video compression [Lim, 1990; Cramer et al., 1996]. The object of

classical interpolation is to find polynomials that fit "well" through sample points in the

data set. These polynomials generally fall into one of two categories: unconstrained and

constrained polynomials. Procedures such as Lagrange interpolation and Newton's

divided difference are classic examples of the former type of interpolating polynomials

[Kreyszig, 1993]. These two approaches only make use of the given data samples. The

polynomials can be fit through the data locally or globally. In either case, a polynomial of

degree N -1 must be used in order to fit through N data samples (this applies to the case

of one-dimensional interpolation). Spline interpolation is an example of the latter. The

name is derived from thin elastic rods, called splines, which engineers used to fit smooth

curves through given sample points. Today it finds use in the design of car and ship

bodies as mentioned earlier. It is a procedure whereby contiguous chunks of data

samples are each fit with a polynomial. Further, the derivative(s) at the endpoints of

adjacent polynomials are required to agree [Hamming, 1973]. This constraint results in a

smooth polynomial fit through the data set. The resulting interpolation is generally not as

oscillatory between data samples as compared with an unconstrained polynomial fit

[Kreyszig, 1993].

It has been observed that the quality of interpolation is not generally dependent on

the polynomial order of the interpolating functions [Kreyszig, 1993]. In many cases, it is


sufficient to impose that the first and/or second derivatives of the interpolation kernel

exist. The smoothness of interpolating with splines over polynomials that lack derivative

constraints is most noticeable in regions of abrupt change. Splines tend to limit the

amount of oscillation following an abrupt change.

An important result in interpolation theory resulted from the works of Borel,

Whittaker, Nyquist, Kotel'nikov and Shannon [Marks, 1991]. These men introduced what

is known today as the Shannon-Whittaker-Kotel'nikov Sampling Theorem. The

fundamental result of the Shannon sampling theory, as it is referred to in this work, is that

a bandlimited signal is uniquely specified by its sufficiently close equally spaced samples.

In practice, interpolating through a set of samples using the sinc function of the Shannon

sampling theory is not possible. Instead, it is necessary to integrate other functions and/or

approaches into the interpolation process. This case has been extensively and effectively

treated for the case of bandlimited signals sampled at a rate exceeding the Nyquist rate

[Oppenheim and Schafer, 1989; Roberts and Mullis, 1987]. It is known that oversampling

can be utilized in easing the constraints on the interpolation function and for the recovery

of the original signal to highly acceptable levels of accuracy [Hauser, 1991]. This is

evidenced by the compact digital audio technology used today.

The interpolation work that is particular to images used the aforementioned

polynomials a great deal up into the early 1980s. The 1D interpolation techniques were

extended to the two dimensions corresponding to images by simply considering a

separable interpolation polynomial. The most used interpolation polynomials, or kernels,

consisted of either zero, first or third order polynomials. These were the zero-order-hold

(nearest neighbor), bilinear and bicubic kernels [Netravali and Haskell, 1995]. The zero-


order-hold and bilinear kernels do not produce a smooth transition between image

samples. The bicubic is better in that its polynomial does have a continuous first

derivative. This leads to a smoother transition between interpolated samples. The work

of Hou and Andrews [1978] considered the use of a cubic B-spline as the interpolating

kernel and Keys [1981] used a cubic spline. An analysis and comparison of these image

interpolation methods is given by Parker et al. [1983].

There have been recent studies into the development of the theory of interpolating

kernels [Lucke, 1993; Lanzavecchia and Bellon, 1994, 1995; Schaum, 1993; Appledom,

1996; Unser et al., 1992; Knoll, 1991]. These works focus on the developing of an

"optimal" single interpolation kernel as a substitute for the sinc kernel. The kernels to

develop are local. The assumption made in these works is that the signal has been

sampled, at least, at the Nyquist rate. As mentioned earlier, the observed sample sequence

is of finite extent, hence, the sinc kernel cannot be practically implemented. In particular,

kernels have been composed of trigonometric functions [Lucke, 1993; Lanzavecchia and

Bellon, 1994 and 1995], linear sums of Gaussian functions [Appledorn, 1996], polynomial

B-splines [Unser et al., 1992] and trigonometric as well as low order polynomial functions

[Schaum, 1993]. It is interesting to note that some of the developed local kernels

approach the sinc function in the limit of an infinite data sequence [Unser et al., 1992;

Lucke, 1993]. Interpolation filters for restoring highly subsampled images have also been

studied [Knoll, 1991]. The method is based on the iterative minimization of an error

weighting function in a constrained m-dimensional coefficient space.

The use of a single linear interpolation kernel has proven to be very popular for

image interpolation for two main reasons: (1) the design of linear kernels (kernels that


linearly combine sample points) is a mathematically tractable problem and (2) their

implementation is nothing more than a convolution for which there exist fast algorithms

for their implementation via the fast Fourier transform [Oppenheim and Schafer, 1989].

However, there are important considerations as to why these methods might not be best

for reconstructing images. Images are typically characterized by the objects contained

within the imaged scene. The figural continuity of objects which leads to an

"instantaneous" transition of measured light intensity at objects' boundaries is probably the

most prevalent characteristic particular to optical images. It is this abrupt change in

reflected light which conveys the most information about a viewed scene. Other striking

characteristics are the textures and light reflections from the objects which comprise the

scene. All of this manifests itself as high spatial frequency information which is to be

imaged. The observed image samples from a scene typically cannot represent well the

scene in the sense of the Shannon sampling theory, that is, the aforementioned high spatial

frequencies are aliased due to an insufficient sampling rate which comes from practical

restrictions to the physical sensor.

Recent trends in the literature show an effort in developing algorithms to better

cope with this problem. There has been a shift from single kernel approaches, which

cannot recover aliased and/or filtered high spatial frequencies to research effort which

attempt to account for them. The methodologies developed thus far for addressing this

problem can be roughly categorized as follows: (1) directional/edge based, (2) multiple

kernel based, (3) orthogonal transform based and (4) variational/statistically based. The

theoretical and practical issues characteristic of these methodologies are by no means

mutually exclusive. As such, numerous algorithms cannot strictly be categorized as listed


above. However, for the sake of organization, these algorithms will be presented as

categorically listed.

Directional interpolation techniques recognize that high detail areas in images most

often have a definite geometric structure or pattern, such as in edge regions [Lee and Paik,

1993; Algazi et al., 1991; Bayrakeri and Mersereau, 1995; Marsi et al., 1996; Jensen and

Anastassiou, 1995]. In such cases, interpolation in the low frequency direction (along the

edge) is much better than interpolation in the high frequency direction (across the edge).

Thus, a directional interpolation scheme has to perform a local analysis of the image

structure first and then base the interpolation on the local structure if a low frequency

direction does exist. The algorithm by Lee and Paik uses an adaptive combination of zero-

order-hold and moving average filters. By combining these two kernels the authors show

that an adaptive version of the fast B-spline algorithm is obtained. The work by Algazi et

al. focuses on high contrast directional structures such as edges, thin lines and corners.

The authors use rank order statistics to detect the presence of these local image structures.

They do this because they state that each of these local image structures is characterized

by a bimodal distribution. Jensen and Anastassiou have a method that first detects the

presence of a high contrast edge then estimates its location and orientation. If an edge is

detected, it fits an ideal step edge in the region. If no edge was detected the algorithm

reverts to bilinear interpolation. The algorithm by Bayrakeri and Mersereau considers

weighted directional interpolation. The interpolated values along various directions are

combined using directional weights which depend on the variation in that direction. The

interpolation value for each direction is assigned based on the magnitude of its directional

derivative. Marsi et al. create a simple edge-sensitive interpolation filter. It is a separable


filter with one adjustable parameter. The parameter is selected adaptively according to the

sample values neighboring the point to be interpolated. The filter is shown to default to a

linear interpolator when this adjustable parameter equals zero. It should be noted that

these algorithms tend to be computationally cheap.

Multiple kernels for interpolation analyze local image statistics in order to

determine an appropriate interpolation kernel for the region under consideration [Wang

and Mitra, 1988; Winans and Walowit, 1992; Darwish and Bedair, 1996; Mahmoodi

1993]. The choice of kernels are typically ad hoc and typically two to three classic

interpolation kernels are employed by these algorithms. The actual interpolation kernel

used in each of these techniques can be interpreted as a shift-varying kernel whose form is

dictated by an image's local structure. The work of Wang and Mitra bases its

interpolation on three local models: constant, oriented and irregular. The constant model

is used to describe a block whose intensity variation is negligible to the human eye. The

oriented model represents a block where the intensity variation is along a particular

direction and the irregular model is used on blocks with more than one pronounced

direction present (such as corners). Winans and Walowit base their work on a cubic spline

kernel with an adjustable parameter. The authors preset this parameter to one of three

values, hence resulting in three interpolation kernels. The parameter is chosen according

to the edge information in the neighborhood of the interpolated pixels. Darwish and

Bedair also use three interpolation kernels: the zero-order-hold, a cubic spline and an

additional kernel established by the authors. All three are separable. The zero-order-hold

kernel is used in edge regions, the cubic spline is used in homogeneous (smooth) regions

and their third kernel is used in regions that are not classified as smooth or edgy.


Mahmoodi describes a method for using two interpolation kernels. A fifth order kernel

with a 2x2 region of support is used in high contrast regions while a cubic spline with a

4x4 region of support is used otherwise. In order to select a kernel, a threshold is

checked between the ratio of the sample deviation of the 2x2 and the 4x4 block around

the point to be interpolated.

The use of orthogonal transforms for interpolation [Koltracht et al., 1996] has

focused on using the discrete cosine transform (DCT) [Martucci, 1995; Shinbori and

Takagi, 1994; Hong et al., 1996] and the wavelet transform [Keshavmurthy et al., 1996;

Chang et al., 1995]. Work involving the wavelet transform utilizes information across

image scales to predict information at finer scales. This exploits the fact that evolution

across resolution scales characterizes the local regularity of a signal [Mallat and Zhong,

1992]. The work of Martucci performs the interpolation completely in the DCT domain.

Previous work by the author has shown that it is possible to use the DCT to implement

digital filters [Martucci, 1993]. As such, all spatial domain filtering is equivalently

performed in the DCT domain. Shinbori and Takagi use an iterative application of the

Gerchberg-Papoulis (GP) algorithm with the DCT to obtain a magnified image. This

technique uses the GP algorithm as the basic principle for extending the frequency band.

The spatial high frequency components are restored during the forward and inverse

iterative transform process for the image by DCT using two constraints that the spatial

extent of the image is finite and correct information is already known for the low

frequency components. Hong et al. use the coefficients of a locally performed DCT to

extract edge information. Five different edge types are identified and each type has a

corresponding interpolation technique. Once the edge type is determined, the zero-order-


hold and bilinear interpolations are performed and combined. Then five different Gaussian

low pass filters adaptively reduce the discontinuities which resulted from this

aforementioned interpolation combination. The wavelet based work by Keshavmurthy et

al. decomposes the image into an overcomplete (or redundant) representation via an

autocorrelation shell derived from a Daubechies wavelet. The wavelet detail information

across known scales is used to extrapolate the details at finer scales. Chang et al.

implement a method that extrapolates the wavelet transform of the higher resolution image

based on the evolution of the wavelet transform extrema across scales. By identifying

three constraints which the higher resolution information needs to obey, they enhance the

reconstructed image through alternating projections onto the sets defined by these

constraints. Koltracht outlines a method based on parametric families of orthogonal

transforms which arise from the wavelet multiresolution paradigm. The idea's

development considers the original image as a low frequency component of a family of

potential enlargements.

Interpolation based on variational and/or statistical principles presents a different

view of interpolation than from those previously discussed [Karyiannis and

Venetsanopoulos, 1991; Saul and Jordan, 1996; Schultz and Stevenson, 1994; Ruderman

and Bialek, 1992]. Despite this, some interesting connections between these recent

methodologies and some previously mentioned have been made. In fact, certain

interpolation techniques previously discussed have been found to be special cases of these

new methodologies. The variational methods formulate the interpolation problem as the

constrained minimization of a certain functional. The choice of the functional to be

minimized varies and is based on a combination of the restrictions imposed by the


mathematical problem and certain considerations related to the physical problem. The

work of Karyiannis and Venetsanopoulos concerns itself with the application of variational

principles to the image interpolation problem. They formulated the problem as the

constrained minimization of a quadratic functional, with particular emphasis on the class of

quadratic functionals whose minimization amounts to 2D generalized L-splines. The

specific functionals used were based on certain image models. It is stated in this paper

that the previously discussed method of cubic splines results from the minimization of a

certain quadratic functional. The work of Saul and Jordan incorporates variational and

statistical principles for the interpolation problem. Their approach considers a

multidimensional input and a model of its density. They don't specifically concern

themselves with images but focus on how to optimally interpolate between two points in

space. They do this by assigning a cost to each path through space, based on two

competing goals one to interpolate through regions of high density, the other to

minimize arc length. They present an efficient solution (in high dimensions) to this

problem for Gaussian, Dirichlet and mixture models. Schultz and Stevenson describe an

approach which considers variational and statistical principles for image interpolation.

Their methodology results in the optimization of convex functionals (not necessarily

quadratic). A statistical model for the image was assumed which incorporates the convex

Huber function. This function is quadratic up to a threshold and linear beyond this

threshold. Use of this cost function helps maintain discontinuities from an image into its

expanded version. The work of Ruderman and Bialek shows that if the probability density

of a signal is known a priori, this knowledge can be used to dealias information contained

in samples of that signal. In fact, this knowledge can be used to estimate frequencies


beyond the Nyquist limit. They posed the superresolution problem as one of finding the

"best" estimate of the original signal given the original signal's PDF and the observed

signal samples. The most useful estimator will depend on the costs for making different

kinds of errors. The two natural choices for an estimator are maximum likelihood (ML)

and the conditional average. Their observed signal samples are assumed to be obtained by

a general point spread function (PSF), not just sampled by a train of impulses, and

corrupted by Gaussian noise. The noise for each sample is assumed to be statistically

independent. The example presented in their work assumes the original signal is drawn

from a Gaussian ensemble. Under these assumptions both estimators yield the same result.

They show that the optimal superresolution procedure for a Gaussian signal results in a

convolution of the signal samples with an interpolation kernel which is a function of the

noise variance, the point spread function, the original signal power spectrum and the

sampling period. They obtain a very satisfying result using this interpolation methodology.

For the case of zero noise variance, a delta PSF and a bandlimited original signal that was

sampled above the Nyquist rate, the interpolation kernel defaults to the sinc kernel.

There is other work that differs in approach from the above stated categories but is

nonetheless noteworthy. The work of Cheung and Zeng only uses a single interpolation

kernel. However, this kernel is derived using a least squares minimization process. The

kernel derived has a four sample diamond like region of support [Cheung and Zeng,

1995]. Zeng and Venetsanopoulos use the median filter with local image features for their

interpolation [Zeng and Venetsanopoulos, 1992]. The median filter is known to remove

outliers in the local distribution of pixels and to preserve certain types of image structure

[Huang and Tang, 1979]. They attempt to exploit this property of the median filter to


produce more accurate interpolation results in edge regions. Fathima and Yegnanarayana

have used a maximum entropy approach to interpolation. They specifically address the

problem of recovery of the missing samples of a signal from a few of its randomly

distributed samples. They also discuss the appropriateness of their method for different

distributions of the known samples [Fathima and Yegnanarayana, 1990]. Herodotou et al.

have used a simple Gibbs random field model for interpolation. In their paper, a binary

based Gibbs random field model is used for image interpolation. Images are interpolated

from their downsampled versions along with a number of texture parameters that are

estimated within smaller image blocks [Herodotou et al., 1995]. The use of wavelet bases

for the interpolation of sparse data has been addressed by Pentland. This work introduces

good approximate solutions for this problem in only a few iterative steps where each

step requires O(n) operations and storage locations. The author illustrates the use of this

technique for interpolating sparse range data [Pentland, 1994]. Work by DeNatale et al.

uses a least squares bilinear interpolation scheme based on the least squares optimization

of a spline-like scheme. This work incorporates adaptive image sampling and

interpolation for data compression [DeNatale et al., 1994]. Tom and Katsaggelos address

the problem of obtaining a high resolution image from multiple low resolution images that

have been subsampled and displaced by different amounts of subpixel shifts. The paper

poses the low to high resolution problem as an ML problem which is solved by the

expectation maximization (EM) algorithm. By exploiting the structure of the matrices

involved, the problem can be worked on in the discrete frequency domain [Tom and

Katsaggelos, 1995]. Typically, multiple displaced versions of a single image are not

available for interpolation. For a video sequence, however, adjacent temporal frames tend


to satisfy this assumption particularly well. A recent book addresses this problem [Tekalp,


3.3 SAR Imaging via Nonparametric Spectral Estimation

The remainder of this chapter is devoted to the reconstruction of SAR images from

the collected samples of the radar backscattering. This set of collected samples is typically

termed the phase history corresponding to a SAR image. An introduction regarding the

information provided by the phase history data will be presented in this section.

SAR images are commonly formed via spectral estimation techniques. The reason

for this is elucidated herein. The "ideal" SAR image is typically assumed to discriminate

between radar scattering elements in the illuminated scene. As such, the ideal image is

considered a juxtaposition of delta functions in the frequency domain one for each

scattering element. The SAR imaging process is generally performed with this in mind.

Owing to this, spectral estimation serves as the basis for SAR imaging.

Spectral estimation methods can be broadly classified as nonparametric or

parametric. The nonparametric approaches generally perform a narrowband filtering over

a band of frequencies. The parametric approaches postulate a model for the data and the

parameters of this model must be estimated. The minimum variance (MV) superresolution

approaches, to which our superresolution methodology belongs, postulate a local model

for data to produce an optimal filter matched to that model. In a sense, this can be

considered a semiparametric approach. Nonetheless, since it does produce a filter and

model parameters are not estimated, it is considered in this work as a nonparametric



3.3.1 Phase History Data

The phase history data samples collected from a radar which electromagnetically

illuminates a scene are used in the formation of SAR images. This section presents an

introduction to the information provided by the phase history samples. It is provided so

that the unfamiliar reader might better understand the premises underlying the imaging of

SAR data. Interested readers are referred to the books by Hovanessian [1980], Mensa

[1991] and Skolnik [1990] for detailed explanations of the SAR data collection process.

The SAR phase history represents the reflected radar information in the downrange

and crossrange (Doppler) directions. The downrange data is obtained by transmitting a

stepped-frequency waveform in the downrange direction of known frequency step size and

bandwidth. The range extent is a function of the frequency step size and the range

resolution is a function of the total bandwidth of the stepped-frequency waveform. By

varying the frequency step size and total bandwidth, arbitrary range depth and resolution

can be achieved.

The reflected radar information in the downrange direction is contained in the

composite wave received by the radar. It contains the shifted incident waves which

resulted from the objects present in the radar illuminated scene. To obtain data in the

crossrange direction, the complete stepped-frequency waveform is transmitted for

equidistant intervals along the radar's flight path. The crossrange resolution is a function

of the transmit wavelength and the relative velocity between the radar and the scene.

Along the cross range direction, objects also reflect the incident waves with different

phase shifts.


The formation of a SAR image from this information involves determining the

reflectivity function of objects in the radar illuminated scene in both the downrange and

crossrange directions. The reflectivity at a given range is obtained by narrowband filtering

at each of the frequencies transmitted by the radar. The filtered signal's magnitude about

the center frequency in the narrowband is an estimate of the reflectivity value we seek.

The reflectivity function is typically estimated using the fast Fourier transform (FFT). This

is done because, first, it is known that each of the basis functions of the FFT is a

narrowband filter [Stoica and Moses, 1997] and, second, the reflectivity function at all

frequencies of interest can be computed efficiently via the FFT. The FFT is typically used

in the downrange and crossrange directions.

The reflectivity function for points in space, that is, for distances from the radar,

are typically computed by taking the magnitude of the aforementioned FFT computation.

It is known that this operation is one way to estimate the power spectral density (PSD) of

a signal [Stoica and Moses, 1997]. In this manner, the formation of SAR images is a

direct result of PSD estimation from phase history samples. The illuminated radar scene

can also be assumed to consist of a finite number of prominent scattering elements. In this

manner, the phase history is modeled as the superposition of 2D complex exponentials -

one for each scattering element. The amplitudes of each complex exponential is

proportional to the amount of scattering produced by each element. Spectral estimation

techniques can also be used in detecting these scattering elements. Common approaches

to spectral estimation (and hence SAR imaging) are now discussed.


3.3.2 Standard Approaches

Spectral estimators are used to detect complex sinusoids embedded in noise from a

finite number of observations [Stoica and Moses, 1997]. In 1D, we denote the N

observed signal samples by x[n], n=0,...,N-1; where x[n] = m[n] + e[n] and

m[n] = ae'" is the complex sinusoid of frequency co and complex amplitude a which is

additively embedded in noise e[n]. In 2D, the complex sinusoid's counterpart is the

complex plane wave. These are denoted m.[nl,n2]= aej(ln"''+2" 2) where w = [C1,0)2].

Here too, the coefficient a is complex and accounts for the amplitude and phase shift of

each wave. The standard methods for detecting these waves (and consequent scattering

elements in a SAR image) are based on the FT. This is due to the basis functions of the

FT being the waves we seek to detect. The discrete space FT (DSFT) of a finite set of

data is defined as

N -IN,-1
X(O), = ) x[nn2]eJ(anl+22) (3-10)
n1 =0 i2 =0

where o, and o2 are in radians. Thus the DSFT of the noisy signal serves to project this

signal onto the complex plane waves we seek. The presence of complex plane waves in

the noisy signal would yield large amplitudes at those frequencies of the DSFT

corresponding to the waves present in the noisy signal relative to all other frequencies.

The ability to distinguish between several sinusoids spaced closely in frequency is

limited by the number of data samples available. This was alluded to in the previous

chapter. The resolution afforded is given by Aol = and Ao2 = -, that is, the

approximate half-power (-3dB) width of the main lobe of the 2D rectangular window. In

other words, two waves of 2D frequency (o(),'1)) and (2 ), identified by the
(2, ) 2 dniidb h


superscripts (1) and (2) respectively, can't be distinguished using a direct application of

the DSFT if Ao, < Io(1 I(2) and Ao, < o1)2 -02 .

Side lobe artifacts which result from Fourier transforming a finite number of data

can affect the ability to accurately detect the waves present in the noisy signal also [Stoica

and Moses, 1997]. Side lobes resulting from the presence of several waves can also sum

to create the appearance of a wave that is truly not present in the signal. These side lobes

can be reduced using special window functions w[nl, n2] where n, = 0,...,Ni -1, (i = 1,2)

[Harris, 1978]. The reduction of these sidelobes comes at the expense of wider main lobe

width of the window function thus reducing the ability to distinguish between closely

spaced waves. Because spectral estimation methods of this type, such as the Welch, Burg

and Daniell method [Stoica and Moses, 1997], are not able to resolve these waves, they

will not be explicity discussed here. Instead, the interested reader is referred to the book

by Stoica and Moses or Kay for more details [Stoica and Moses, 1997; Kay, 1988].

3.3.3 Minimum Variance Approaches

It has been discussed that the ability to distinguish (or resolve) between two

closely spaced frequencies is limited by the number of observed samples. The MV

approaches have the ability to resolve beyond this inherent resolution of our data. Hence,

they are superresolution approaches. The MV approaches are so called because they seek

to minimize the power from additive noise (typically zero-mean Gaussian noise) present in

the observed signal while preserving the sinusoids sought. The power of this type of noise

is known to be given by its variance [Stoica and Moses, 1997]. These methods are

considered adaptive filtering methods which seek to minimize the output power of the

filtered signal with respect to a given model. Specifically, let X = [x,,,, ] = x[n n2 ] and


W =[w,,,] = w[n,n2] for ni = 0,...,N, -1, (i = 1,2), denote the matrix of our 2D phase

history samples and filter coefficients, respectively. Let us also define the vectors

associated with these samples as x = vec(X) and w = vec(W). Each output image

sample sq is defined by

Sq =minl E[IXHWq 12]
= minE[ wXXwq ] (3-11)
Wq q
= min w Rwq

where q = [q, q2], q = 0,...,Q -1, (i = 1,2), defines the location on the frequency grid

from which we form our SAR image and R is the autocorrelation matrix of vector x.

From this formulation, the filter Wq is dependent on the point to image, hence the

subscript q. It must obviously be constrained or the solution to eqn. (3-11) is trivial, that

is, w q = 0. This would not be useful to us. Several constraints can be placed in order to

arrive at a meaningful solution. The constraints and corresponding solutions of a few of

the most used MV superresolution techniques are now discussed.

Capon's Method. This is also referred to as the MV method. A single equality

constraint is imposed here and it is known as the unit boresight constraint. Specifically

wqmq = 1 where Wq is the vector of complex filter coefficients which weights the phase

history samples. The point scatterer model is mq = vec(M) where the matrix

M = [m ]= m, [n,, n2]= e j(q n+ ) for ni = 0,...,Ni -1 and Oq = [O, ,O)q2] where

c q= ,, (i = 1,2). It is the specific wave model we seek at frequency grid coordinates

q = [q, q2]. This constraint allows for the model to pass unattenuated if it is indeed

present in the noisy observed signal. For each output sample, the optimization problem is


min wqRwq subject to wqmq = 1. This optimization problem is readily solved with the

use of Lagrange multipliers. The optimal filter in Capon's method is given by [Stoica and

Moses, 1997]

w = R-1m (3-12)
mq R-mq

and the spectral estimate S which is our superresolved SAR image for phase history X, is

composed of the spectral estimates sq at each frequency grid location. These are obtained

by plugging eqn. (3-12) into eqn. (3-11) to yield

q (3-13)
q m'R-l'mq

and the resultant SAR image is described by S=[sq]=s[q,, q2] for qi =0,...,Q-1,

(i= 1,2).

MUSIC and Eigenvector. These two approaches follow the same approach

taken by Capon's method. Where they differ is in their use of the autocorrelation matrix.

In order to more clearly differentiate sinusoids from the noise they are embedded in, these

methods seek to drive the spectral estimate to infinity when there is an exact match

between the model vector and its presence in the observed signal. This is accomplished by

considering the signal and noise subspaces to be orthogonal. Specifically, if we consider

the eigendecomposition of the autocorrelation matrix R, then we can write

R= UAUH ZmUmU= EZmmuH Z mUmU (3-14)
m mESignal menoise

where U=[UI,...,uM] is the matrix composed of eigenvectors um of matrix R. The

eigenvector (EV) method directly uses the noise subspace in eqn. (3-14) for its spectral


estimate. Then by substituting R-' in eqn. (3-13) with the "inverse" of the noise

subspace, the EV spectral estimate is written

sq =


The MUSIC method exploits the white noise model by replacing the measured noise

eigenvalues with the average noise power a2. The spectral estimate for MUSIC is

mIH (-2 mU H mq

3.3.4 Estimating a Full Rank Autocorrelation Matrix

Many MV related methods require knowledge of the inverse of the autocorrelation

matrix R of our phase history samples. This matrix must be estimated from the available

samples. In practice, we usually have one phase history observation. Given a matrix of

phase history samples we can compute a full rank autocorrelation matrix by averaging

subwindows of the given phase history samples.

Specifically, let X=x[n,,n2] for n,,n2 = 0,1,...,N-1 be an NxN matrix of our

phase history samples. Also, let X, = x[i: i + P -1, j: j + P-1] be the subwindows of size

Px P in matrix X. Note that there will be (N P + 1)2 subwindows of size P x P in

matrix X where i,j = 0,1,...,N-P. The autocorrelation matrix R of the phase history

samples in X is defined as

R = E[xxH]

where x = vec(X). This matrix is typically estimated by


1 N-I

However, in the case of one phase history observation (N = 1), R is not full rank. With

the subwindows we've described, the autocorrelation matrix estimate would be

1 N-PN-P
R= (N P+12 X.X (3-15)
(N P+ 1) i=0 j=0

where xiy = vec(X,).

The size of the subwindows given by P should be chosen so that they are as large

as possible while also ensuring that R is full rank. It can be shown that the maximum

possible P for which R can be full rank satisfies P < (N +1)/2, where P must be an

integer. The resolution of the estimated matrix R could be increased if we had more

subwindows to average. This is done by also averaging 'backward' versions of the

subwindows x0 The backward subwindows corresponding to subwindow xy (in vector

form) is given by Jx* where denotes conjugation and matrix J is a permutation matrix

with ones on the antidiagonal. The resultant matrix is the forward-backward averaged

autocorrelation matrix. Here we have twice the number of subwindows to average

compared with the 'forward only' autocorrelation matrix of eqn. (3-15) so the size of P

can increase. The 'forward-backward' estimate of the autocorrelation matrix is then

R=N-PN-P X +Jx .xTJ (3-16)
1 ~\T N -P+I)2 --':-- Y U Y
2(N-P+) ,2i=O j=O

Now the maximum possible P for which R can be full rank satisfies P < (2 -2-)(N +1)

where P must be an integer. The use of subwindows necessarily decreases the resolution

of our spectral estimate. But, in order to compute an MV spectral estimate, this must be



3.4 Review of Reconstruction for Synthetic Aperture Radar Images

SAR images are reconstructed from phase history data. This reconstruction can, in

general, be described as model based. These models characterize the back scatter of radar

energy reflected from known objects. By far, the trihedral reflector is the most commonly

considered model in SAR image formation. This is because the reflected energy of the

trihedral reflector manifests itself as a complex plane wave (a 2D complex sinusoid) in the

phase history of the radar return and as such, techniques based on the ubiquitous FFT can

be used in the imaging process of signals in the SAR domain. SAR imaging based on

other models is, to the author's knowledge, very scarce. Imaging based on trihedral as

well as dihedral corner reflectors has been studied [Liu and Li, 1996]. These authors also

state that dihedral feature extraction for imaging has, to their knowledge, never been

considered. This being the case, the remainder of this section introduces methods for the

imaging of SAR data based on the trihedral reflector model. Some of these methods were

discussed in the previous section but are included here for a more coherent review.

Ideally, when Fourier transforming the phase history, the trihedral reflector would

appear as a point in the image corresponding to the true reflector's location in space.

However, the finite number of collected samples (yielding the truncation of the complex

plane wave), tends to smear or distort the actual location of the object [Stoica and Moses,

1997]. Another undesirable effect in SAR image formation via Fourier transform is the

presence of side lobes corresponding to each reflector in the image. This is a well-known

consequence of Fourier transforming a truncated sinusoidal wave. It is known that these

side lobes can be reduced or suppressed through the use of appropriately designed

windows at the expense of increased main lobe width [Harris, 1978].


The problem in the superresolution of SAR imagery is one of correctly detecting

the presence of the models which the objects in the imaged scene are assumed to be

composed of. Typically, the superresolved image has increased in sample density,

decreased in main lobe width (relative to the resolution afforded by the number of phase

history samples given) and suppressed side lobe artifacts. SAR superresolution techniques

can be grouped into parametric and nonparametric approaches. Many spectral estimation

books exist which describe well the theory and techniques used in SAR imaging [Kay,

1988; Marple, 1987; Stoica and Moses, 1997]. The parametric approaches essentially

perform signal detection and model parameter estimation. The nonparametric techniques

are generally based on adaptive filtering schemes. It was mentioned that our

superresolution methodology is nonparametric in nature. As such, a review of the most

common nonparametric SAR superresolution approaches is presented herein. Before this,

a quick review on the use of windows and related schemes will be presented. Also, a brief

discussion on parametric SAR superresolution is presented.

The windowing of a signal for side lobe suppression is well established and studied

[Harris, 1978; Skolnik, 1990]. The subsequent Fourier transforming of the windowed

signal can produce a marked suppression of the side lobes which result from a limited

number of collected data. As stated before, this is accomplished at the expense of an

increase in main lobe width. This results in a decrease in spectral resolvability between

closely spaced sinusoids. Stankwitz et al. have developed "smart" windowing schemes

which eliminate the side lobes that would result from the FFT of the collected phase

history of a trihedral comer reflector [Stankwitz et al., 1995]. These techniques are

referred to as spatially variant apodization (SVA); apodization is a term borrowed from


optics which refers to the suppression of diffraction side lobes. SVA is based on deciding

which window from the family of cosine-on-pedestal window functions would be most

appropriate to use on the point to image. An adjustable parameter a ranging from 0 to

0.5 is used to describe which of the cosine-on-pedestal functions to select at the point

being imaged. At each of the extremes of this family of windows is the rectangular

window for a = 0 (all pedestal, no cosine) and the Hanning window for a = 0.5 (all

cosine, no pedestal). Hamming weighting is a special case of cosine-on-pedestal (a =

0.43) which nulls the first side lobe.

The advantage of this technique is that it is very computationally efficient. It can

be implemented easily with a few FFTs. The drawbacks of this method are that it has only

been developed to model trihedral reflectors and the best resolution it can afford is that of

the main lobe width of the rectangular window hence its resolution is dependent on the

number of collected samples. As such, it is not a superresolution technique per se. A

variant of this technique which involves an iterative procedure of "SVA windowing" and

phase history extrapolation has been reported which can potentially increase resolution

beyond that afforded by the number of observed samples due to an extrapolation phase

which increases the number of overall image samples. [Stankwitz and Kosek, 1996]. This

approach has been termed super SVA.

Superresolution can be performed parametrically by estimating the parameters of

the models assumed to comprise the image [Bi et al., 1997]. In this case, the number of

reflectors for each assumed model must be known a priori or must be estimated using

some model order selection strategy such as Akaike information criteria [Soderstrom and

Stoica, 1989]. This method tends to be slow in practice because the number of comer


reflectors is generally not known and solutions for several model orders must be obtained

and compared before an appropriate model order, that is, number of scatterers, can be

decided upon. Parameter estimates for each assumed model are typically obtained using

least squares approaches. For the case of a complex plane wave, its complex amplitude

and 2D frequency must be estimated. This must be done for each plane wave assumed to

comprise the signal.

The nonparametric adaptive filtering schemes are based on determining the optimal

linear projection of the observed phase history samples for the point being imaged.

Essentially, the image is formed on a grid of equispaced frequency coordinates, thus each

imaged point corresponds to the 2D frequency of its associated grid location. The

frequency samples lie in the rectangular interval [0,2t)x [0,27t). The optimal filter is the

one that minimizes the output power at the point (frequency) being imaged subject to

certain constraints. These techniques, generally referred to as minimum variance

techniques, will be discussed here. What differentiates each of them are the constraints

imposed and/or assumptions made concerning the autocorrelation matrix of the observed


The first of these SAR imaging methods can be attributed to Capon [Capon,

1969]. This approach is typically referred to as Capon's method or the minimum variance

(MV) method. This method imposes what is known as the unit boresight constraint.

Here, the filter should pass a complex plane wave of a specified frequency unimpeded

while maximally suppressing signal power at all other frequencies. The disadvantage of

this method is that the inverse of the autocorrelation matrix must be computed though

there is typically only one phase history observation to work with for the scene being


imaged. What is typically done is that subwindows within the observed phase history are

averaged in order to obtain a full rank correlation matrix which can be inverted. This

approach sacrifices resolution (when considering subwindows) in order to obtain the

minimum variance solution.

A reduced rank minimum variance (RRMVM) method has been proposed which

does not sacrifice as much resolution as its full rank counterpart [Benitz, 1993]. Here, the

subwindows are larger than with the MV method but the autocorrelation matrix is not full

rank. The unit boresight constraint is imposed as well as an inequality constraint on the

norm of the filter. It turns out that the optimal filter is based on an optimal diagonal

loading of the singular autocorrelation matrix leading to an autocorrelation matrix which

can now be inverted to yield the optimal filter.

Methods which assume orthogonality between the signal and noise subspaces are

also popular [Odendaal et al., 1994]. These are the MUSIC and EV methods. In both of

these methods, a full rank correlation matrix is needed and the unit boresight constraint is

enforced. The spectral estimate though is determined by the noise subspace. The MUSIC

algorithm explicitly whitens, or equalizes, the noise subspace eigenvalues while the EV

method does not. This whitening tends to destroy the spatial inhomogeneities associated

with terrain clutter, hence this method is not generally suitable for SAR imaging. In

contrast, the EV method does tend to preserve clutter inhomogeneities.

Autoregressive linear prediction (ARLP) can also be used in determining the

optimal filter [Jackson and Chien, 1979]. Here, the optimal filter which linearly combines

phase history samples to predict a "future" sample is the one that minimizes average

prediction error. Based on the assumption that the predicted error signal is an innovations


process (white noise), the spectral estimate is the square root of the minimized prediction

error energy divided by the magnitude squared of the transfer function. The filter need not

be causal but the white noise assumption of the predicted error signal is generally invalid -

leading to spiky and spurious behavior in the final estimate. Instead, it has been found

useful to average images based on several prediction steps. The RMS average of ARLP

images based on all possible prediction steps reduces to one of Pisarenko's spectral

estimates [Pisarenko, 1972].

Other recent and noteworthy adaptive filtering approaches is APES and HDVI.

The APES is a recent technique which generalizes the MV method [Li and Stoica, 1996].

The optimal filter in this method has the exact functional form as Capon's filter. The

difference is that the autocorrelation matrix in the Capon filter is replaced with a matrix

composed of the sum of this same autocorrelation matrix along with an additional matrix

which results from the optimization carried out. The approach is shown to yield more

accurate spectral estimates overall than Capon's method. High definition vector imaging

(HDVI) is another recent technique developed at MIT Lincoln Labs for superresolution

imaging [Benitz, 1997]. A typical effect of superresolving with the adaptive filtering

techniques based on the trihedral model is that any radar illuminated object which does not

satisfy the model is either greatly suppressed or distorted hence most background

information can be modified in an undesired manner depending on what this information

will be used for. The HDVI technique imposes constraints that less distorts the

background relative to most common adaptive filtering schemes. Constraints require the

optimal filter length (norm) to equal the length of the assumed model's projection onto the

space spanned by the autocorrelation matrix. Also, the optimal filter is constrained to be a


given distance from the assumed model's projection onto the space spanned by the rows

or columns of the autocorrelation matrix.



The superresolution of optical images requires the establishing of an appropriate

set of bases by which the collected image samples can be weighted for the reconstruction

of new image points (samples). The bases established are learned from available data and

the basis function used in the creation of an image point is dependent on the local

character of our image. The need for such local consideration will be discussed

throughout this chapter. We will examine the interdependence of local information in

optical images and make observations which will serve to validate the approach taken to

the superresolution of optical images. The architecture which embodies our approach will

be shown to be equivalent to a convolution with a family of kernels. The image

reconstruction is formulated as

.i[nf,n2 = x,[nI,n 2 *kc,,[nl, n2] (4-1)

Let us note that the subscripts c and 1 are used in distinguishing which basis function to

use in the formation of an image point. These subscripts index a kernel based on the local

image characteristics about the point to reconstruct. The family of kernels is given by

{kc, [n,,n2] I c = 1,2,---,C;l = 1,2,. -.,L} C represents the number of established local

image characteristics (or features) from which to compare local neighborhood information

and L is the number of kernels created per feature. In summary, eqn. (4-1) is a

convolution with a shift-varying kernel whose form is dictated by local image information.



It is a generalization of eqn. (3-7) and defaults to the standard convolution of eqn. (3-7)

when C,L = 1. Hence, the common approaches to reconstruction which utilize eqn. (3-7)

are a special case of the reconstruction implemented herein which is described by eqn. (4-


There are two main issues to address regarding the superresolution of optical


how to design members of the kernel family

how to choose them once they have been established

These issues will be addressed in this chapter. In approaching the superresolution

problem, we first model the image acquisition process. The model which results is used to

simulate low resolution representations of available images. In this manner, data at

different resolutions can be used in developing the basis functions for reconstruction that

we seek. We will examine and make observations concerning the existence of

interdependent information in optical images. This is followed by a detailed explanation of

a superresolution procedure which exploits the observations made.

4.1 Optical Image Acquisition Model

Let the function x(t, ti, t2) represent a continuous, time-varying image impinging

on a sensor plane. The spatial plane is referenced by the t,, t2 coordinate axes and time is

referenced by the variable t. The imaging sensor plane is assumed to be a grid of

contiguous N, x N2 rectangular sensor elements. These elements serve to sample the

spatial plane within the camera's field of view. Each of these elements is said to have a

physical size of p, x P2 units of area. The output of each element is proportional to the


amount of light which impinges on each sensor during a given time interval. The output of

each sensor, given by x,[n,,n2] where n, = 0,l,...,NA -1 and n2 = 0,1,...,N2 -1, can then

be expressed as

x, [n, n2]= JP(n+ JP2(n2+1)x(tt t)dt2dtdt

where the integration over time is one time-unit in duration. The subscript 1 is used to

denote a "low" resolution image.

To obtain a higher resolution image, a finer sensor grid encompassing the same

field of view used in obtaining x, [n, n2] would have to be employed. Let the resolution in

each spatial dimension be increased by a factor of G, and G2 in their respective spatial

dimensions. The physical size of the sensor elements now becomes -lL x P- units of area.

The high resolution image is then given by xjh[m,m2], where m, = 0,1,...,M1 -1 and

m2 = 0,1,..., M2 1 and M, = GiN (i = 1,2). The output for each of the MI x M2 sensor

elements for the high resolution image can be described by

SG|G2 pi(m+I1P)/G (Pm2+1)/G2
Xh[" m= pm Jm/G2 x(t,t =,t2)dt2dtAdt
X m 0 plm/G m2/G2

Notice that the integration limits over time have been extended from one time-unit to

G,G2 time-units in order to maintain the average intensity value for each pixel in the


The superresolution process this work addresses is to estimate the high resolution

image xh [m~, m ] from the low resolution image x, [nj, n2]. One can notice that the process

of acquiring x,[n, n2] from xh[ n, m ] is given by

1 G,(n +l)-I G,(n,+l)-
xi[n-,n2= xh 1 I 2] (4-2)
SGIG2 m,=G,n, n2=G2n2


The decimation model in the above equation produces a low resolution image by

averaging the pixels of G, x G2 nonoverlapping neighborhoods in the high resolution


4.2 Using Local Information

Support for the local characterization of images comes from several sources.

Image interpolation kernels typically have very local support regions. This was discussed

in Chapter 3. In the Shannon sampling theory, the weighting of samples was seen to be

inversely related to distance from the point to interpolate. As such, a high resolution

image sample is, for all practical purposes, generated from local low resolution samples.

Also, the use of finite support region kernels is attractive due to their numerical stability

and computational efficiency.

If we assume an image can be described locally by different statistical models, then

we must somehow characterize information concerning each of the models which will be

useful. In this section we experimentally study the interdependencies of local information

in images and discuss their application to the superresolution problem. Note that we are

referring to relations that exist among image neighborhoods when we speak of the local

interdependencies of images.

4.2.1 Relation Between Correlation and Euclidean Distance

In the sections to follow we will exploit the locally correlated neighborhoods in

images for our superresolution paradigm. This section serves to detail the correlation

under study and how it relates to Euclidean distance. Here, we introduce a

correspondence between statistical correlations of random variables and inner products of

vectors and also between statistical mean-square differences and squared distances


between vectors. A good development of this topic can be found in the text by Helstrom

[1991]. This correspondence will serve to identify the analogous relations between

samples of our observed data and the true statistical character from which they were

drawn. The relations exploit the principle of orthogonality (as defined for both Cartesian

space as well as for random variables) as we shall see.

Let u,v denote two random variables and u,v two vectors in Cartesian space. The

inner product of u,v is u'v and the length of a vector u is given by its 2-norm

lull2 = (uTu)7. The Cartesian space distance D between vectors u,v based at the origin is

defined by

D2 = (u-v)T(u-v) = uTu -2uTV+vTv

and in the "space" of our random variables the distance D is defined correspondingly as

D2 = E[(u-)2]= E[u2]-2E[uv] + E[v2]

where E[-.] denotes the statistical expectation operator. We will make use of the Schwartz

inequality [Helstrom, 1991] shortly. For vectors, this inequality is given by

(u T)2 T< uTuvT

and for expected values of random variables it is

(E[uv]) 2< E[u2]E[v2].

In an ordinary vector space, the triangle inequality Ilu + v2 I 2 + V1112 is a consequence

of the Schwartz inequality. The triangle inequality can be expressed in the following

manner by noting that cos 0 1:

I u+v1i2 = UTu + vVv + 211u112 +V2cos0 S u vu + vTV + 211u112 lvl12 1 11112 + 211u112 Iv112 + 11V112


In the space of random variables, the corresponding triangle inequality yields the

analogous relation

E[(u + v)2 ] E[u2]+2 (E[u2 ])(E[v2])+ E[v2]

We notice above the analogous relations between E[u2] and I|u|2 = u Tu. These relations

justify considering the mean square value E[u2] as the square of the "length" of a random

variable and the correlation E[uv] as a scalar product. Also note that the definitions of

orthogonality apply for these relations too. Two vectors are said to be orthogonal if

uTv = 0. Similarly, one says that random variables are orthogonal if E[uv] = 0.

We now discuss the distance measure previously defined for vectors and random

variables. We will show an important relationship between inner products and distance in

Cartesian space as well as correlations and distance in the analogous "probability space."

Let us now consider the relationship between minimization of Euclidean distance

with the maximization of inner products in a vector space. Consider a set of vectors

x,fi,...,fu where f; is closest in Euclidean distance to x, that is,

fi-x <1 f, -x 22, iJ. (4-3)

We can rewrite this as fiTf; 2ff"x < fjff 2f x Notice that if all the fk vectors are of

equal length, we can equivalently express eqn. (4-3) as

fiTx T ffx, i # j. (4-4)

If the fk vectors are not of equal length, then eqn. (4-3) is equivalent to

fifx-fiTfi >ffx-fff,.

In either case, minimizing the Euclidean distance between the x and fk vectors is

equivalent to maximizing the inner product of those vectors if the length of the fk vectors


are all equal. If the fk vectors are not of equal length, then the square of the vectors'

length must be considered.

A similar result arises when minimizing the mean squared distance of random

variables. Consider the random variables x, f,..., fN where f, results in the minimum mean

squared difference between the random variables f,,..., and x, that is,

E[(fi -x)2 ] E[(fj -x)2], i j. (4-5)

We can rewrite this as E[f7]-2 E[fx] 5 E[f2]- 2E[f x]. Notice that if all the random

variables fk have equivalent second order moments or squared "lengths", then we can

equivalently express eqn. (4-5) as

E[fx] E[fx]

If the random variables fk do not have equivalent second order moments, then eqn. (4-5)

is equivalent to

E[fx]--E[f2] > E[fjx] E[fj]. (4-6)

We see that minimizing the mean squared difference between x and the random variables

f,,...,f is equivalent to maximizing their correlation if the second order moments of

f,..., fN are equivalent. If the second order moments of the random variables f1,..., fj are

not all equivalent, then their squared "length" must also be considered.

Let us now note that the minimization of Euclidean distance between vectors is the

template matching problem [Duda and Hart, 1973]. Template matching is a scheme used

for classification purposes. Let vector x be a portion of an image with support size equal

to that of the template. The fk vectors are the set of N templates from which we attempt

to classify image regions. The best match is found to be with the template maximally


correlated with the image region of interest. This correlation measure is nonstatistical and

is described in Chapter 7 of the text by Duda and Hart [1973].

A statistical interpretation of the template matching problem can also be

developed. Assume a random vector x = [x,,...,xM ]T can be described by a mixture

model as follows

pX (x) = P(k) p(x k) (4-7)
where ZP(k)=l and 0OP(k)< 1 for k=l,...,N. Also, let the mean of each
conditional density function p, (xlk) be fk= E[Xk] where Xk = [Xk ,...,XkM ] is the

random vector with conditional density px (xlk). The template matching problem can be

posed as finding the mean fk closest in Euclidean distance with an observation x from the

mixture density function. Specifically, let us define the template matching problem as

J= argmin(tr{ E(xk -f)(Xk -fj)T]}) (4-8)

where tr{-} is the trace of a matrix and the best matched template is given by fj.

Expanding eqn. (4-8) and evaluating terms, we can equivalently determine the best

matched template from

J = argmin(fff 2fffk) (4-9)

from which it is easy to show that J= k is the solution. Specifically, if we letj = k in eqn.

(4-9), then fT"fk -2f fk
yields 0 <(f -fk )T (f- fk) where j = k achieves the equality. So the best matched

template corresponds to the mean of the conditional density function which generated the

observation x from p, (x).


If the templates are now interpreted statistically, then the template matching

problem of eqn. (4-8) can be expressed as

J = arg min(tr{E[(x f)(x f)]})

given a random vector x. This can be shown to be equivalent to

J=argmax(E[ffx]--E[fff,]) (4-10)

which is seen to be analogous to that of eqn. (4-6). The solution to the template matching

problem here then involves the template most correlated with the input.

The two issues to stress here are the analogous relationships between random

variables and vectors involving distance metrics in each of their respective spaces. The

minimization of a squared distance measure is directly related to the maximization of a

correlation measure, that is, the distance measure is inversely related to the correlation

measure. In this way, observations from highly correlated random variables or random

vectors are close in Euclidean space. In particular, if the random variables and/or random

vectors are statistically independent and their covariances are assumed equal, then

maximal correlation implies minimal Euclidean distance in the mean sense of those random

variables. This can be seen by comparing eqn. (4-10) with eqn. (4-4) under these


4.2.2 Existence of Locally Correlated Information in Images

We can provide experimental evidence for the existence of different types of local

interdependencies in images which can be exploited for the superresolution of images.

The optical images for which results will be reported in this chapter are given in fig. (4-1).

Interdependent information in real-world images are known to appear locally and have


been exploited in compression schemes [Dony and Haykin, 1995a; Kambhatla and Leen,

1997]. These methods attempt to exploit the correlation amongst nonoverlapping image

blocks. Specifically, the blocks of the image are grouped into disjoint sets and

compression is performed on each set individually.

The next two sections will serve to illustrate interblock and scale interdependencies

that exist within the images of fig (4-1). These interdependencies form the basis from

which our superresolution methodology arises. It is the general existence of these

interdependencies in real-world optical imagery which substantiates the use of our

superresolution methodology. The methodology is described in detail later in this chapter. Interblock correlation

The interblock dependencies of images will be illustrated here. The benefits of the

existence of such dependencies is that the samples from highly correlated blocks tend to be

very similar as discussed in section 4.2.1. The similarity between these blocks, which

can be located throughout an image, leads us to believe that the local statistics governing

their "generation" are also similar. The similar blocks are assumed here to have been

generated by the same statistical process. The marriage of interblock correlation and their

interdependencies across scales makes the superresolution problem approachable from

such local information. In order to substantiate the notion that an image can be described

by different local statistical models, a simple experiment was performed.




Figure 4-1. Images used in this chapter. (a) 256x256 images (top left) Lena, (top right)
Peppers, (bottom left) left Pentagon image of a stereo pair, (bottom right) right Pentagon
image of a stereo pair; (b) 128x128 images obtained by the image acquisition model (left
to right) Lena, Peppers, left Pentagon, right Pentagon.


All of the overlapping neighborhoods of size H, x H2 for the Lena 128x 128 image

of fig. (4-1b) were extracted. Each neighborhood then had its sample mean subtracted

from it yielding what we refer to as a "structural" neighborhood. It is the local structure

of images that we seek to statistically describe. By considering structural neighborhoods,

an image x[n ,n2] is then composed of two sets of ordered neighborhoods: one set

contains the neighborhoods' mean m[n ,n2] and the other set the neighborhoods'

structure s[n,, n2 ] such that x[n1,n2 ] =m[n ,n2 ] + s[nl, n2 ]. The extracted structural

neighborhoods which comprise s[n1,n2] were then grouped into K disjoint clusters

following a vector quantization (VQ) of these neighborhoods. The clustering was done a

la eqn. (4-3) where x now denotes a structural neighborhood and f,,...,fK are the feature

(codebook) vectors or Voronoi centers resulting from the VQ. The feature vectors can

each be interpreted as corresponding to the mean of various statistical models which

describe the structural neighborhoods. This was discussed in eqn. (4-7). After the

structural neighborhoods were clustered, the neighborhoods within each cluster were

randomly shuffled thus yielding a new set of ordered neighborhoods' structure 9[n1,n ].

If the clustered neighborhoods are truly similar in structure then their random shuffling

should still produce a pleasing image. Finally, the composition of a new image results

from i[n, ,n2] = m[nI, n2] + s[n, n2 ]. This type of operation could consider overlapping

and nonoverlapping neighborhoods. When overlapping neighborhoods are considered, the

overlapping samples from each neighborhood are averaged to yield the new image

samples. Note that the "reconstruction" performance obtained using this approach is

analogous to a test set performance for validating the existence of interblock correlations.


Fig. (4-2) illustrates the results of performing such a test with overlapping 3x3

(Hi=H2=3) neighborhoods on the Lena (Ni=128)x(N2=128) image.

(a) (b) (c)

Figure 4-2. Illustration of the effect of characterizing the Lena image based on the
interblock correlation of 3x3 neighborhoods. The image used was 128x128. Increasing
the number of clusters leads to increased correlation within the neighborhoods of a cluster
- thus yielding a more accurate local representation of the image. (a) K=1 cluster, (b)
K=15 clusters, (c) K=30 clusters.

With K=l in fig. (4-2a), no clustering is being performed since all neighborhoods

from the image are assumed to belong to the same cluster. The result of creating an image

in this manner yields a poor (and noisy looking) image when compared to the original. In

fig. (4-2b), K=15 clusters were formed from the structural neighborhoods of the image.

The resulting image produces a better image representation. Fig. (4-2c) illustrates the

result with K=30 clusters. This resulted in the most faithfully recomposed image upon

comparison of their peak signal-to-noise ratios (PSNRs). The PSNR is defined as

PSNR = -10log1o(eLs) where

N -IN,-1
e (x[n, n2 n,,n])2 (4-11)
n, =0 2=o

and x and i take values in [0,1] and are NIxN2 in size. Fig. (4-3) illustrates the

recomposition performance for the Lena 128x 128 image using regions of support (ROS)


of size 3x3 and 5x5 for K=l, ..., 30 clusters. We can see that generally the PSNR of the

new images increases with increasing number of clusters. The smaller the neighborhood,

the more accurate we can consider our image model when based on interblock

correlations. Though we only report these results on one image, it is generally appropriate

to characterize optical images in this manner. Scale interdependencies

We have just seen how an image can be segmented locally into pieces that are most

correlated with each other. Naturally, we can inquire about the existence of similar

information across image scales. If a strong similarity exists between homologous and

highly correlated regions of the low and high resolution images, then it seems feasible that

a simple transformation of the low resolution image neighborhoods within each cluster can

yield a corresponding high resolution neighborhood very similar to the low resolution

neighborhood's homologous counterpart.

This transformation is the link in associating low and high resolution

neighborhoods. In this section, we wish to analyze experimentally whether such highly

correlated information across image scales exists. That is, we wish to examine the

correspondence of highly correlated neighborhood information between a low resolution

image and its high resolution counterpart. We will show examples using the images

shown in fig. (4-1). The experiment will report on the percentage of homologous

neighborhoods from the low and high resolution counterparts of these images that were

similarly clustered. If a high percentage of these neighborhoods are found, then a strong

scale interdependence among neighborhoods is said to exist. We now explain the details

for establishing this percentage.


Recomposing Lena with Correlated Image Neighborhoods



27 -


---24- ----
24 ROS: 3x3
-- --- ROS: 5x5
23 -

22 ''
0 5 10 15 20 25 30
# of Clusters

Figure 4-3. Quantitative illustration of the effect of interblock correlation on the
recomposition of the Lena 128x128 image. The structural neighborhoods in each of the
clusters from this image were randomly shuffled and the image was "pieced" back
together. A very high interblock correlation results in an image very similar to the


Given a low and high resolution version of an image, x,[n, n2] and xh[n, n2]

respectively, the homologous structural neighborhoods in each of these images were

clustered to form K independent groups. The H1 x H2 neighborhoods in the N, x N2

image xi [nI, n2] forms the set of neighborhoods

X = {x, [m, :m, +H1 -1,m2 :2+H2 -l]}m =0,...,N,-Hm=0...,N-H

The homologous neighborhoods in the high resolution image are defined as the

GAH1 xG2 H2 neighborhoods in the (M1 = GIN) x(M2= G2N2) image xh[nI,n2 ] which

forms the set

D = {xh[Glml: Gm + G1,H -1, G2m2 : G2m2 +G2H2 -1]}m0,...,,-H,,=0,...,N-H,

where recall that x, [n1, n2] was obtained from xh [n1, n2 ] through decimation by a factor

of G, x G2 according to our image acquisition model.

Now the neighborhoods in X are clustered to form K disjoint groups X ,...,XK

and the neighborhoods in D are separately clustered to form K disjoint groups DI,..., DK.

If the homologous neighborhoods in X and D form similar clusters in their respective

images, then the information content of the low and high resolution images must be similar

in some sense. As alluded to earlier, the image is said to exhibit scale interdependence.

To determine how well clustered information from the same image relates across scales,

we form a confusion matrix as shown in fig. (4-4).

The entry in location (j,k) of the matrix is the number of homologous

neighborhoods common to cluster X. and Dk, j,k=l,...,K. The interdependence

across scales is determined as the maximum number of homologous neighborhoods

common to the clusters formed. Since the ordering of the true clusters or "classes"


between Xi and Dk is not known, we can't just examine the contents of the diagonal of

the confusion matrix.

Di D2 DK




Figure 4-4. Confusion matrix for clustered homologous neighborhoods within their low
and high resolution images. The X, are the disjoint sets of clustered neighborhoods in
the low resolution image and the Dk are the disjoint sets of clustered neighborhoods in
the high resolution image; j,k = 1,...,K.

Instead, we must search for the most likely ordering. This in turn yields the

number which reveals a measure of how similarly homologous information in the low and

high resolution images was clustered. This number is easily found with the following

simple algorithm:

Step 1: Initialize N=0;

Step 2: Find the largest number L in the confusion matrix and save its row
and column coordinates (r,c)

Step 3: Perform N<-N+L;

Step 4: Remove row r and col c from the confusion matrix to form a new
confusion matrix with one less row and column.


Step5: If confusion matrix has no more rows and cols: STOP else Go to
step 2

The variable N represents the number of homologous neighborhoods common to

similar clusters from the low and high resolution images. The percentage of such

clustered neighborhoods is P = N ;,-,) since there are a total of

(N, H + 1)(N2 H2 +1) homologous neighborhoods to cluster in each image.

In the plots below, this percentage is plotted as a function of the number of

clusters. This is done for three images: Lena, Peppers and the Left Pentagon. The high

resolution images clustered are 256x256 and their low resolution counterparts which were

obtained through (G, = 2) x (G2 = 2) decimation were 128x128. The plots also report on

two different neighborhood sizes tested: H, x H2 = 3 x 3 and H1 x H2 = 5 x 5. Figs. (4-

5)-(4-7) illustrate that there is a very strong interdependency of homologous

neighborhoods across image scales for these three images even as the number of clusters

increases towards K=30. The case of K=1 always yields an interdependency of 1. This is

because for K=I, no clustering is actually being performed, that is, all neighborhoods are

assumed to belong to the same cluster. As such, the "disjoint" sets (or single set in this

case) have all the homologous neighborhoods in common.

The interdependency in general decreases as the number of clusters increases. This

is intuitively expected as an increase in the number of clusters results in the clustering of

information increasingly specific to a particular image and scale. Because the frequency

content between the low and high resolution counterpart images differ, the greater

specialization of information within an image is expected to result in less interdependency

among them.


Scale Interdependencies for the Lena Image

----__ --_-, ,---- ----- .---



o 0.6



ROS: 3x3
.-- ROS: 5x5


5 10 15 20 25 30
# of Clusters

Figure 4-5. Scale interdependency plot for the Lena image. The interdependency of
homologous neighborhoods across image scales is reported. The high resolution image
was 256x256 and the corresponding low resolution image was 128x 128. Two ROSs are
reported for computing the interdependency: 3x3 and 5x5. A very high interdependency
among homologous neighborhoods is seen for this image even when considering K=30


Scale Interdependencies for the Peppers Image

0.7 -
L--- ----------


6 0.5-


-- ROS: 3x3
..... ROS: 5x5


5 10 15 20 25 30
# of Clusters

Figure 4-6. Scale interdependency plot for the Peppers image. The interdependency of
homologous neighborhoods across image scales is reported. The high resolution image
was 256x256 and the corresponding low resolution image was 128x 128. Two ROSs are
reported for computing the interdependency: 3x3 and 5x5. A very high interdependency
among homologous neighborhoods is seen for this image even when considering K=30


Scale Interdependencies for the Left Pentagon Image

0.9 -


0 .6 .... --------

1 0.5-

2 0.4-

0 ROS: 3x3
--- ROS: 5x5


5 10 15 20 25 30
# of Clusters

Figure 4-7. Scale interdependency plot for the Left Pentagon image. The interdependency
of homologous neighborhoods across image scales is reported. The high resolution image
was 256x256 and the corresponding low resolution image was 128x128. Two ROSs are
reported for computing the interdependency: 3x3 and 5x5. The interdependencies among
homologous neighborhoods are not as strong in this image as compared with the Lena and
Peppers images but still remain above 50% even when considering K=30 clusters.


Fig. (4-5) illustrates the scale interdependencies for the Lena image. These remain

very high even when considering K=30 clusters. In this case, the interdependencies

achieved were all above 75%. Fig. (4-6) reports on the Peppers image. This image yields

interdependency percentages very much like those of the Lena image. Fig. (4-7) illustrates

the interdependencies of the Left Pentagon image. Here, the interdependencies are seen to

drop more rapidly as the number of clusters formed increased than with the previous two

reported images. Further, there is roughly half as much interdependency with the Left

Pentagon image as with the Lena and Peppers images going down to about 55% for the

case of K=30 as compared to 75% with the Lena and Peppers images.

4.3 Architecture Specifics

The modular design of the superresolution architecture to be presented is

supported by the observations of the local interdependencies in images that have been

discussed. Portions of images exhibiting different statistical characteristics are treated

differently in order to fully utilize the local image information and provide for a

reconstruction not afforded by the sampling theorem. The procedure for learning the a

priori information needed for the superresolution of optical images is shown in fig. (4-8).

This procedure, which will be discussed in detail in the next section, establishes the

basis functions for reconstruction from the available image data. The information

extracted is used in the superresolution of images that are similar in class to those from

which our basis functions were derived.

High Resolution

L 1

F--------------.---------------- Extract Neighborhoods L

High Resolution L 2
iSelf Organize the
GI x G2 Extract Neighborhoods Neighborhoods to
V Form C Clusters o

--- --------------- ------ ------ -...-- -- ------------------ -----..................-- --- L C

Low Resolution Low Resolution C Neighborhood /
Image Neighborhoods Clusters

Association of

Figure 4-8. Training process for the superresolution of optical images.


Fig. (4-9) illustrates the architecture used for superresolving images once the

training is complete, that is, once the necessary information has been established. It is

equivalent to a convolution with a family of kernels as presented in eqn. (4-1). Here, we

wish to initially present this paradigm in a fashion which facilitates the explanation of our

family of kernels later on. As we proceed, we will explain any information not cast in a

manner directly conforming to eqn. (4-1) and relate it to that equation's form.

There are two fundamental steps specific to the superresolution of optical images:

clustering of local low resolution image data

construction of local high resolution data.

The first step can be viewed as the process of determining the neighborhoods most

correlated with the mean (or "template" as discussed in section 4.2.1) of the underlying

distribution from which those neighborhoods were drawn. It encompasses the operations

of extracting all of the image neighborhoods of a predetermined size from the low

resolution image and then clustering them. This clustering addresses the main issue of

choosing a kernel. The predetermined neighborhood size describes the region of support

of our family of kernels, the location of each neighborhood relates to the location of the

convolution output and the cluster associated with each neighborhood will be used to

select the kernel with which to compute superresolved neighborhoods.

The second step exploits the interdependent information across scales which has

been seen to exist in images. It involves the depicted linear associative memories (LAMs)

and proper arranging of superresolved neighborhoods. Here we have addressed the main

issue of designing the members of the kernel family.


AM w Resolution -- Arrange Superresolved
Low e R Extract Neighborhoods Form C Clusters Neighborhoods
into Image

0 ------------------------

LAMC-- Image

Figure 4-9. The superresolution architecture for optical images. This paradigm performs the equivalent operation of a convolution
with a family of kernels.



Each LAM is encoded with the mapping that locally creates high resolution

neighborhoods from low resolution neighborhoods via L transformation kernels. There

are C LAMs and each is tuned to specific local image characteristics. Note that each

LAM superresolves a neighborhood rather than just a single pixel.

4.4 Training Phase

The basis functions from which to project our image samples are specified by the

LAMs in fig. (4-9). They are determined via a training process from available data. The

training procedure for the superresolution paradigm was pictured in fig. (4-8). It will now

be discussed in detail. This procedure can be broken into two main sections: a data

preprocessing section and a high resolution construction section. The preprocessing takes

all of the local neighborhoods of a fixed size in the low resolution image and clusters them.

Each low resolution neighborhood has a corresponding homologous high resolution

neighborhood it is associated with. Each cluster consists of a training and a corresponding

desired data set which is used to train a LAM. The high resolution construction is

responsible for associating the training and desired sets in each cluster independent of the

others. This information is now detailed.

4.4.1 The Preprocessing

Two preprocessing steps are performed prior to the construction of high resolution

images from low resolution information. The preprocessing consists of simulating low

resolution images followed by neighborhood extraction for pairing homologous


84 Decimation

Ideally, the low and high resolution data sets used to train the aforementioned

LAMs would each encompass the same scene and have been physically obtained by

hardware with different, but known, resolution settings. Such data collection is not very

common. Instead, the low resolution counterparts of the given signals are simulated.

Decimation is only needed to simulate the low resolution counterparts to our high

resolution images. If we had, a priori, the necessary two sets of images, this block would

be removed from the training process depicted in fig. (4-8). Here, the simulated low

resolution images will be obtained following the image acquisition model discussed earlier.

In other words, decimation will be referred to as the process of averaging nonoverlapping

image neighborhoods in order to obtain a lower resolution image, A la eqn. (4-2). Neighborhood extraction

Each Hi x H2 neighborhood of our low resolution image x, [nI ,n2] will be used in

training our superresolution system. A low resolution image of size N, x N2 will then

have (N, H + 1)(N2 -H2 +1) neighborhoods. Note that N, = Mi / Gi and Mi,

(i = 1,2), describes the size of our original image. Each low resolution neighborhood will

be associated with its (2G, -1) x (2G2 -1) homologous high resolution neighborhood.

We have chosen to associate the low and high resolution neighborhood information in a

symmetrical fashion. The homologous neighborhoods we are associating are defined as

follows. The set of (N1 H1 + 1)(N2 H2 +1) neighborhoods in the low resolution image

of size H, x H2 are given by

X = x,[m ,: m, + -1,m2 :m2 +H2 -1]} Lm,...,N-H,,m=O,...,NH (4-12)

The homologous neighborhoods to associate with these are described by


fxh[Glm +m, +1: G,(ml+2)+-1 -1,
S G22 2+1:G(m22)+2 -1]J ,=0,..., N,-H,,m2=0,...,N2-H, (4-13)

where bi =- (H-3) and (i = 1,2). Hi and H2 are restricted to being odd numbers so that we

can construct about each low resolution neighborhood's center. We also only construct

high resolution information that is contained within the boundary of a low resolution

neighborhood. The association of homologous neighborhood information described by

eqns. (4-12) and (4-13) is depicted in fig. (4-10).

0 0 0 0*oo0ooe

0 X4 5 6

7 8 9 0 0 z 0 0 0 0
0 0 i 0 oo*oo*
(a) (b)

Figure 4-10. Local image neighborhoods and the pixels they reconstruct. Each circle
represents a 2D high resolution image pixel. The shaded circles are the low resolution
image pixels obtained via decimation of the high resolution image. The gray pixel is the
center of the low resolution neighborhood. Each H1 x H2 low resolution neighborhood
constructs a (2G1 -1)x (2G2 -1) high resolution neighborhood about the low resolution
neighborhood's center these are depicted by the crossed circles. The numbers are a
convention used to distinguish between constructed pixels in this neighborhood. (a)
Decimation factor G, = G2 = 2. (b) Decimation factor G, = G2 = 3.

Consider the center pixel of an H1 x H2 low resolution neighborhood. For the

case of G, = G2 = 2 in fig. (4-10a), we are constructing the crossed circles about our

center (gray) pixel. By not constructing the center pixel, we are strictly interpolating

about our observed image samples. If we allow for construction of the center pixel, we


can change a 'noisy' observed sample. Fig. (4-10b) similarly illustrates this for the case of

G, = G = 3.

The selection of the low resolution neighborhood size is a trade-off between the

amount of local support to consider and the amount of information needed to construct.

As a rule of thumb, one should set H, > 2G, 1 but should keep H,, (i = 1,2), reasonably

small to preserve sufficient low resolution support associated with the high resolution

neighborhood. As G, increases, there is more missing information to construct, hence

more low resolution sample support is needed. H, should not be set too high as we want

to keep the support as local as possible. We will see the effects of this later when the

superresolution results are presented. Also, the smaller the H,, the smaller the number of

free parameters needed in the LAMs.

An important note to consider here is that overlapping neighborhoods construct

some of the same high resolution samples. We must remember that our high resolution

sample construction is a procedure whereby we estimate the samples of a high resolution

image from a low resolution representation. In essence, we would like to obtain more

than one estimate of each high resolution sample point. By averaging several high

resolution sample estimates we believe the final high resolution sample estimated is more


4.4.2 Hard Partitioning the Input Space

The input space consists of the neighborhoods in our low resolution image to be

superresolved. Because we assume no prior information about our image, we cluster (or

hard partition) our input space based solely on these neighborhoods. The segmentation is

based on the interblock correlation among neighborhoods of our image. In order to group


(cluster) the image neighborhoods that are most correlated with one another, we interpret

each H1xH2 sized neighborhood as a point in H1H2 dimensional Euclidean space, that is,

as a vector from the origin of our space to the coordinates given by the neighborhood

values. As previously discussed, the most correlated neighborhoods will be closest to

each other in our HH2 dimensional vector space.

The hard segmentation we seek is equivalent to a vector quantization (VQ) of our

input space based on minimizing an L2 norm criteria. Techniques which have been

developed to perform such VQ include the LGB [Linde, et al., 1980] and Neural Gas

[Martinez, et al., 1993] algorithms as well as the topologically ordered Kohonen self-

organizing feature map (SOFM) approach [Kohonen, 1990]. These approaches are

considered to be unsupervised learning techniques which produce a set of feature vectors

also referred to in the literature as quantization nodes or codebook vectors. The results

reported in this work utilize Kohonen's approach. This is now described. Kohonen's self-organizing feature map

The principal goal of Kohonen's self-organizing feature map (SOFM) is to

transform a set of input exemplars to a discrete set of representative vectors. The

algorithm is a topologically ordered vector quantization (VQ) on the set { x,. }, of

exemplars. The representative vectors are typically termed codebook vectors, feature

vectors and/or quantization nodes. This ordering serves two purposes: during training it

encourages a full utilization of the codebook vectors while they're evolving and once the

vectors have been established, it allows for "close" input exemplars to also be "close" in

the representative set.


The feature vectors in the SOFM can be obtained using a simple iterative scheme.

After randomly initializing a set of feature vectors fe, c = 1,...,C, the input exemplars are

compared with the feature vectors one at a time. The best matching feature vector, in the

Euclidean sense, is sought. This is given by eqn. (4-14) as

Y(x,.)=argmin( x,-fc 2); c = 1,2,...,C (4-14)

This "winning" vector and its neighbors are then updated by

fc (n+1)==f(n)+ t(n)[x,'-fj(n)]; cE A(,,)(n)
Cfc f(n) otherwise

where pt is the learning rate parameter, Ayx,,) is the neighborhood function centered

around the winning feature vector y(x,) and n describes the discrete time step (iteration).

The learning rate and neighborhood function are gradually decreased during training for

best results. Cluster formation

The feature vectors completely define the hard partitioning of our input space. All

image neighborhoods closest in Euclidean distance to one of the feature vectors are said to

belong to a cluster. Mathematically, given a set of N vectors { x, }x from which C

feature vectors { f }c have been extracted, the C clusters Kc, c = 1,2,...,C formed by the

vectors are given by

Kc ={x, :y(x,)= c;r=1,2,...,N}

and the hard partitioning function spoken of earlier is y(xr) of eqn. (4-14).

Generally, the feature vectors established from the available data reflect the

statistics of the distribution of the data vectors. Regions in the input space of the input


data vectors with high probability will be represented by a subset of the feature vectors.

This leads to a natural yet compressed representation of our data with which to compare

test data sets against. In this work, our feature vectors are obtained by performing a VQ

on the local structure of neighborhoods x, of our low resolution training images using the

SOFM approach.

4.4.3 Associative Memories

The associative memories considered here are essentially parametric mappings that

optimally associate between a set of input exemplars X = {x, }$, and a set of desired

exemplars D = { d, }N, [Haykin, 1994]. These sets of exemplars can be represented as the

columns of a matrix, X and D respectively. The optimal association is typically defined in

the least squares or mean squared sense. The process of arriving at the optimal

parameters is accomplished iteratively when no closed form solution is available. This

iterative process for optimal association between data sets (matrices) X and D is known as

supervised learning in neural networks. The two supervised learning topologies discussed

in this section are the linear network and the multilayer perceptron. Results using these

topologies will be presented later. Least mean squares

Consider the minimization of the following cost function

J(W) = ||D- WX|| (4-15)

where D e 9PxN, W e 9RPxQ, X e 9QxN and 11-11 is the Frobenius matrix norm [Golub and

Van Loan, 1989]. We are trying to find the matrix W such that matrices X and D are

associated optimally in the least squares sense. Let us denote the rth column of matrices D


and X by dr and xr, respectively, where r = 1,...,N. Also, let Y=WX. The rth column of Y

is then y, = Wx,. We will now solve for W. Rewriting eqn. (4-15), we have

J=Z Idr Yr,2
= (d, -yr)T(d,. -yr)
r=1 (4-16)
=Z(dT dr 2dTYr YTYr)
(d -2d, y -y,"y,)
Z(dld, -2drWxr, -x Wx,)

where the dependence of J on W is understood. To solve for W, we must take the

derivative ofJ with respect to W and set the resulting equation to 0, that is,

S. : =0 (4-17)

Notice that Wpq, where p = 1,...,P and q = 1,...,Q, is an element of matrix W, that is,

W =[Wpq]. Then, from eqns. (4-16) and (4-17)

w-= JL (drd, -2drWx, + xTW'Wx,)
w aw Z dr Wx r+TWTWXr)j

= (-2- (dWx,.) + -(xWTWx,)) (4-18)

= (-2drXT +2Wx,xT)

where it can be shown that _(a'Wb) = aTb and -(aTW Wa) = 2WaaT. Setting eqn.

(4-18) to 0 and multiplying both sides by -L we obtain

W Nj r NXT T d X (4-19)
-=1 =1-


The summations of the outer products in eqn. (4-19) are equivalent to matrix

multiplications. Eqn. (4-19) is rewritten as

W(iXXr)= (UDXT). (4-20)

Solving for W we obtain our solution as

W=DXT(XXT)-l. (4-21)

A more commonly used approach to obtain the closed form solution of eqn. (4-21)

makes use of the singular value decomposition (SVD) of XT [Golub and Van Loan,

1989]. This approach is more stable since it does not require finding the numerical inverse

of a matrix. In this case,

W = U +VT (4-22)

where XT= U2VT is the SVD of XT, y = diag(,- -,-. ,0,...,0) and the nonzero

singular values of X are denoted by ao, i = 1,...,s.

The solution to the least squares problem can be arrived at iteratively via the least

mean squares (LMS) algorithm [Haykin, 1994]. The weights are iteratively updated as


W(n +1) = W(n) + p(D Y)X (4-23)

where p. is the learning rate which is a small positive constant and n is the discrete time

step. It is easy to see that this iterative scheme yields the solution of eqn. (4-21). When

eqn. (4-23) reaches its global (and only) minimum, then at "steady state" W(n+1)=W(n)

which implies that (D- Y)XT = 0. Now by substituting Y = WX as defined earlier, we

can solve for W which yields eqn. (4-21).