Citation |

- Permanent Link:
- http://ufdc.ufl.edu/AA00040853/00001
## Material Information- Title:
- Focus of attention based on gamma kernels for automatic target recognition
- Creator:
- Kim, Munchurl, 1966-
- Publication Date:
- 1996
- Language:
- English
- Physical Description:
- ix, 184 leaves : ill. ; 29 cm.
## Subjects- Subjects / Keywords:
- Datasets ( jstor )
False alarms ( jstor ) Image processing ( jstor ) Learning ( jstor ) Neural networks ( jstor ) Outliers ( jstor ) Pixels ( jstor ) Radar ( jstor ) Signals ( jstor ) Stencils ( jstor ) Automatic tracking ( lcsh ) Dissertations, Academic -- Electrical and Computer Engineering -- UF Electrical and Computer Engineering thesis, Ph. D Image processing -- Digital techniques ( lcsh ) Tracking radar ( lcsh ) - Genre:
- bibliography ( marcgt )
non-fiction ( marcgt )
## Notes- Thesis:
- Thesis (Ph. D.)--University of Florida, 1996.
- Bibliography:
- Includes bibliographical references (leaves 177-183).
- General Note:
- Typescript.
- General Note:
- Vita.
- Statement of Responsibility:
- by Munchurl Kim.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. Â§107) for non-profit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
- Resource Identifier:
- 023813159 ( ALEPH )
35754063 ( OCLC )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

FOCUS OF ATTENTION BASED ON GAMMA KERNELS FOR AUTOMATIC TARGET RECOGNITION By MUNCHURL KIM A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1996 UNIVERSITY OF FLORIDA LIBRARIES To my mother, who loved and supported his son with a great deal of endurance in spite of many difficulties during the period of his son's study. ACKNOWLEDGEMENTS I would first like to thank my advisor, Dr. Jose C. Principe, for his patient guideance and inspiration during my study. His availability to his students and the abundance of the ideas he offered are irreplaceable. Special thanks are due to Dr. John G. Harris for constructive comments to this study. I also thank my committee members Dr. William W. Edmonson, Dr. Jian Li, and Dr. Joseph N. Wilson for serving on my supervisory committee. I would like to thank John Fisher II for sharing his broad knowledge and large experience of radar with me and for helping me conduct ATD/R research in the Computational NeuroEngineering laboratory. A special thank is due to Frank Candocia for his careful proofreading of this dissertation. I am especially grateful to my parents who supported and prayed for their son with a great deal of love. Finally, I would like to thank my wife, SoYoung Lee, and my son, Mujoon, for their love during the course of my studies. TABLE OF CONTENT page ACKNOW LEDGEMENTS .......................................... iii A B STR A C T .................................................... viii CHAPTERS 1 INTRODUCTION ........................................... I 1.1 Automatic Target Detection/Recognition Technology ............. 1 1.2 A Multi-stage Automatic Target Detection/Recognition System ..... 2 1.3 Overview of the Dissertation ................................ 6 2 SYNTHETIC APERTURE RADAR (SAR) DATA DESCRIPTION ..... 7 2.1 Introduction ............................................. 7 2.2 SAR Image Formation ...................................... 8 2.2.1 Range Processing ...................................... 9 2.2.2 Azim uth Processing .................................... 13 2.3 ISAR Image Formation ..................................... 16 2.4 Preprocessing ............................................ 20 2.4.1 Polarimetric Clutter Model ............................... 20 2.4.2 Polarimetric Whitening Filter (PWF) Processing ............. 23 2.4.3 Preprocessing SAR Imagery ............................. 26 2.5 SAR Image Visualization .................................. 28 2.6 Examples of SAR Images .................................. 30 2.7 Target Embedding Strategy ................................. 31 2.7.1 Development of a Target Embedding Method .............. 31 2.7.2 Embedding the TABLIS 24 ISAR Target Data into the Mission 90 Pass 5 SAR Data Set .................................. 34 3 PRESCREEN ERS ........................................... 38 3.1 Introduction .............................................. 38 3.2 A One-parameter Constant False Alarm Rate (CFR) Detector ..... 38 3.3 A Two-parameter CFAR Detector ............................. 40 3.4 Extension to the Two-Parameter CFAR Detector ................ 43 3.4.1 Introduction .......................................... 43 3.4.2 Gamma Filter and Gamma Kernels ........................ 43 3.4.3 2-D Extension of 1-D Gamma Kernels .................... 49 3.4.4 Gamma CFAR ()CFAR) Detector ........................ 50 3.4.5 Implementation of the 7CFAR Detector ................... 56 3.5 Receiver Operating Characteristics (ROC) ..................... 57 4 QUADRATIC GAMMA DETECTOR ............................ 58 4.1 Introduction .............................................. 58 4.2 Discriminant Functions ..................................... 58 4.2.1 Linear Discriminant Functions ........................... 58 4.2.2 Generalized Linear Discriminant Functions ................. 59 4.3 Extension to the 7CFAR detector ............................. 60 4.3.1 Training the QGD ..................................... 63 4.3.1.1 Close Form Pseudo-Inverse Solution .................. 63 4.3.1.2 Iterative Solution Based on Gradient Descent ............ 64 4.3.2 1-D Implementation ................................... 69 4.4 Comparison of the QGD with the CFAR and the 7CFAR Detectors... 70 4.5 Artificial Neural Networks .................................. 70 4.5.1 Introductions ........................................ 70 4.5.2 Multi-layer Perceptrons (MLPs) ......................... 71 4.5.3 Training M LPs ....................................... 73 4.5.3.1 On-Line Learning ................................. 73 4.5.3.2 Learning Rate and Momentum ....................... 76 4.5.3.3 Batch Learning ................................... 77 4.5.4 Validation of A Neural Model ........................... 78 4.5.4.1 Bias/Variance Dilemma ............................ 78 4.5.4.2 Network Complexity and Early Stopping .............. 79 4.6 Nonlinear Extension to the QGD (NL-QGD) ................... 80 4.6.1 Introduction ......................................... 80 4.6.2 Training the NL-QGD ................................. 81 5 TRAINING STRATEGIES FOR NL-QGD ....................... 85 5.1 Introduction ............................................. 85 5.2 Optim ality Index ......................................... 86 5.2.1 L2 N orm ............................................ 86 5.2.2 Training with Excluding Outliers from Non-Target Class with L2 N orm .............................................. 89 5.2.3 Lp N orm ............................................ 92 5.2.4 M ixed Lp Norm ...................................... 95 5.2.5 Cross Entropy ....................................... 96 6 EXPERIMENTS AND RESULTS ............................... 100 6.1 Introduction ............................................. 100 6.2 Prescreening SAR Imagery ................................. 100 6.2.1 Two-Parameter CFAR Processing ........................ 100 6.2.2 'jCFAR Processing .................................... 103 6.2.2.1 Optimal Parameter Search .......................... 103 6.2.2.2 Impact of Stencil Size in False Alarm ................. 104 6.2.2.3 Batch-running the "CFAR Detector ................... 107 6.2.3 Performance Comparison of the Two-Parameter CFAR Detector and the TCFAR Detector .............................. 107 6.2.4 Conclusion ......................................... 109 6.3 False Alarm Reduction .................................... 114 6.3.1 CFA R/Q G D ......................................... 114 6.3.1.1 Training Q G D ................................... 114 6.3.1.1.1 Training Data Preparation ...................... 114 6.3.1.1.2 Optimal Weights by the Closed Form Solution ....... 114 6.3.1.1.3 Testing the QGD .............................. 118 6.3.1.2 Training the QGD in an Iterative Manner ............... 118 6.3.1.3 Independent Testing Results for the QGD .............. 125 6.3.1.4 Conclusion ...................................... 127 6.3.2 CFAR/NL-QGDs ..................................... 127 6.3.2.1 L2 Based Training and Testing of the NL-QGDs ........ 127 6.3.2.2 Training and Testing of the NL-QGDs without Non-Target O utliers .......................................... 135 6.3.2.3 Lp Based Training and Testing of the NL-QGDs ........ 140 6.3.2.4 Training and Testing of the NL-QGDs with Mixed Lp Norms 147 6.3.2.5 Training and Testing of the NL-QGDs with Cross Entrophy 153 6.3.3 Summary of Detection Performance in the CFAR/QGD and CFAR/NL-QGDs ..................................... 158 6.3.4 "tCFAR/QGD and -ICFAR/NL-QGDs ..................... 162 6.3.5 Fast Implementation of jCFAR/QGD and ,CFAR/NL-QGDs .. 163 7 CONCLUSIONS AND FUTURE WORKS ....................... 167 7.1 Sum m ary ............................................... 167 7.2 Future W orks ............................................ 170 A PPEN D IX .................................................... 171 POLARIZATION BASIS TRANSFORMATION ...................... 171 A. 1 Representation of a Plane Wave Polarization ................... 171 A.2 Circular-to-Linear and Linear-to-Circular Polarization Basis Transform ations ......................................... 173 REFEREN CES ................................................... 177 BIOGRAPHICAL SKETCH ......................................... 184 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy FOCUS OF ATTENTION BASED ON GAMMA KERNELS FOR AUTOMATIC TARGET RECOGNITION By MUNCHURL KIM August 1996 Chairperson: Dr. Jose C. Principe Major Department: Electrical and Computer Engineering A multi-stage approach has been attractive to avoid prohibitive processing in automatic target detection/recognition (ATD/R) of sensor imagery. Multi-stage ATD/R systems usually implement a focus of attention stage to localize the regions of interest where possible target candidates are found and apply a recognition stage only at the regions of interest selected by the focus of attention stage. The focus of attention stage rejects most clutter (uninteresting background) in sensor imagery and detects targets of interest at a high probability of detection rate (usually 100% detection). This dissertation addresses a novel approach to the design of a focus of attention stage in a multi-stage ATD/R system. The focus of attention stage consists of two subsystems: (1) a front-end detection stage in which a conventional two-parameter constant false alarm rate (CFAR) detector is extended to the gamma CFAR ( ,CFAR) detector. The -2FAR detector relaxes the constraint of a fixed stencil size in the two-parameter CFAR stencil by using gamma kernels; (2) a false alarm reduction stage implementing a quadratic detector of the intensity features estimated with the gamma kernels, which is called the quadratic gamma detector (QGD). The QGD extends the two-parameter CFAR test with respect to: i) the stencil shape; ii) the features used in the decision function; iii) the selection of weights which are not a priori chosen but are found through optimization. The QGD is further extended to a nonlinear adaptive structure (multi-layer perceptron) which is denoted by the NL-QGD. The training strategies of the NL-QGD are discussed in terms of detection theory. Several norms such as L8, L1 l/L8, cross entropy function, L2 with removal of non-target outliers during the training are implemented to train the NL-QGD. The effect of different norms is measured in terms of receiver operating characteristic (ROC) in a large data set of synthetic aperture radar (SAR) clutter (about 7 km2) with targets embedded. With these new criterions, the NL-QGD was able to surpass the performance of the QGD. CHAPTER 1 INTRODUCTION Automatic Target Detection/Recognition (ATD/R) is a challenging problem. The goal is to detect and recognize objects of interest in a clutter dominated imagery (e.g., a forward-looking infrared radar, synthetic aperture radar or laser radar etc.). Early radar systems displayed all incoming information on a screen. Clutter, noise, and target amplitude variations were displayed simultaneously. Target detection was performed by human operators, visually monitoring image intensity variations in order to discriminate targets against background clutter and noise. These raw data displays are still incorporated, in some sense, into most major systems. The objective of automatic detection processing is to automatically detect targets and to provide target reports without human intervention. Background clutter, which usually dominates sensor imagery, may be divided into two clutter types; natural clutter which describes natural scenery (trees, bushes, grass and forest etc.) and cultural clutter which envelops man-made objects (cars, bridges, power lines, buildings etc.). Background clutter may not be typically neither stationary, ergodic, nor Gaussian, especially in high resolution imagery [55] [87]. Target signatures can vary depending upon viewing angle and posture. The difficulty of the ATD/R problem ascribes to such complicated variations of target signatures and background clutter in sensor imagery. 1.1 Automatic Target Detection/Recognition Technology Sensor technology and computing power have made great progress in forming and acquiring sensor data. However, relatively less progress has been made in ATD/R algorithms. Many ATD/R approaches have been proposed which include detection theory [42], pattern recognition technique [9] [11] [47], neural networks [3] [11] [76] [82] [87] [88], and model-based algorithms [21] [84] [85]. Detection theory is attractive for the ATD/R problem. When target signatures and back-ground are described by statistical models an optimal detection can be theoretically derived, that is, a required detection probability can be determined given a false alarm rate. The advantage of the detection theory approach is that target signatures and background clutter can be expressed in an efficient way by statistical parameters and optimal solutions can be derived. However, this approach requires that the statistical model be valid and analytically tractable for target signatures and background clutter. When the statistical model does not adequately describe real-life raw data, it degrades detection performance. Pattern recognition representations typically involve feature extraction from targets and background. The features are essential to the target recognition process. Distinction between different target types and background clutter should clearly be based on the extracted features. While many such efforts have been made to solve the ATD/R problem, none so far has succeeded because variations in both target signature and background clutter contribute to the difficulty of the ATD/R problem [47]. 1.2 A Multi-stage Automatic Target Detection/Recognition System Besides the significant variation of target signatures and background clutter which adds to the difficulty of the ATD/R problem, ATD/R systems usually have to deal with prohibitive amounts of image data. Furthermore, it is attractive to seek the construction of a single algorithm which exploits all of the information of high resolution imagery and solve the ATR problem. The single-algorithm approach is computationally too expensive and high resolution imagery is difficult to model accurately and hence is poorly under- stood. Due to the prohibitive amount of sensor imagery to be processed, real-time processing requirements mandate efficient algorithms and powerful processing architectures. The multistage approach becomes an attractive alternative because it progressively reduces the number of interesting areas of the image and narrows down their consideration in the further stages, allowing a recognition algorithm to avoid the processing of entire images [3] [11] [57] [58] [601 [76] [86]. Figure 1 shows a conceptual flow of image data processed in multistage ATR systems. Detection sensor imagery containing potential targets List of identified targets discriminant non-targets clutter C 0 processing steps Figure 1 Data processing flow and algorithm complexity of a multi-stage ATD/R system1. 1. The algorithm complexity does not necessarily increase linearly with the processing steps. The detection stage can be thought of as a data reduction stage. A simple prescreening algorithm in the detection stage operates over the entire imagery and selects regions of interest (ROIs) where all target-like objects are found, and passes the locations of the ROls to the recognition stage for further consideration. The recognition stage deploys a recognition algorithm only over the ROIs, rejects non-targets and recognizes the remaining targets. The Lincoln Laboratory baseline ATR system [57] consists of three stages which include the prescreening stage (or front-end detection stage), the discrimination stage, and the classification. A two-parameter constant-false alarm (CFAR) detector serves as a prescreener in the front-end detection stage and locates all possible target-like objects based on pixel intensity. The discrimination stage receives the locations in which target candidates are found from the prescreener and rejects natural clutter [9] [57]. In the classification stage, the classifier rejects cultural clutter and assigns the remaining objects (targets) to one of a finite number of categories. Figure 2 depicts our multi-stage approach to ATD/R problem. The focus of attention block diagram in Figure 2a can be thought of as a data reduction stage because only regions of interest in the input imagery are passed to the classification stage. The focus of attention is very important in multi-stage ATD/R problems in the sense that the performance of the focus of attention stage impacts the global performance of ATD/R systems in terms of detection rates and processing powers of the systems. This dissertation addresses the focus of attention for a multi-stage ATD/R system depicted in Figure 2b. We will consider the well known two-parameter CFAR detector [57] in a signal processing perspective. The two-parameter CFAR detector estimates the mean and variance from a locally defined region in a distance away from a test pixel and performs a thresholding function on a normalized pixel intensity difference by the local variance between the local mean and an estimated target mean under test. Here, the twoparameter CFAR test statistic can be thought of as two moment decompositions by the local operators (two windows) by the CFAR stencil using a prior knowledge from a detection theory. A new two-parameter CFAR structure is proposed which decomposes the image based on a gamma kernel basis, which is called the 7CFAR detector. The 7CFAR statistic is a two-moment decomposition which is a projection of the image onto a set of basis functions which are the 2D extensions of the integrands of the gamma functions. These integrands are called the gamma kernels [ 19] [64]. There is a free parameter in this kernel set that controls the region of support of the kernels. Then, the two-parameter CFAR detector will be further analyzed and be generalized to the Quadratic Gamma Detector (QGD) [63] which is designated to be used in the false alarm reduction stage. a) A multi-stage ATD/R system Input data P. Focus of attention Classification Target sensor imagery stagetage categories Z Focus of attention SCFAR detector or O GDor CFAR d oNL-QGD Regions Front-end detection stage False alarm reduction stage of interests b) An implementation of Focus of attention block Figure 2 A multistage approach to ATD/R problems. The construction of the QGD, inspired by the two-parameter CFAR detector, is viewed in a signal processing and pattern recognition context. The QGD effectively constructs a set of features by using a feature extractor. The feature extractor projects image intensity in local regions (and the intensity square) onto the gamma kernel basis. These features (quadratic in the image intensity) are then classified by a linear classifier (the quadratic gamma detector QGD) or a neural network (NL-QGD). Preliminary tests conducted at Lincoln Laboratory showed marked improvements of the CFAR/QGD with respect to the conventional CFAR detector, both with 1 foot fully polarimetric SAR data, and also with 1 meter single polarization SAR [63]. Presently this combination is used in the benchmark ATR algorithm suite at MIT/LL and also as the focus of attention in the ARPA Monitor program. This dissertation explains the structure of the focus of attention in detail and its extensions to neural networks. We also discuss tests conducted in our laboratory with ISAR targets embedded in SAR imagery (the MIT/LL ATDS mission 90 data pass 5 data set). 1.3 Overview of the Dissertation Chapter 2 gives a brief introduction to Synthetic Aperture Radar (SAR) and Inverse SAR (ISAR) image formation used for this study. It discusses the Polarimetric Whitening Filter (PWF) as a preprocessor for SAR data as well as a target embedding strategy. Chapter 3 discusses a two-parameter Constant False Alarm Rate (CFAR) detector as a prescreener used for the front-end detection stage of the focus of attention stage. A gamma CFAR (yCFAR) detector is invented as an alternative to the two-parameter CFAR detector, using a set of gamma kernel functions. In Chapter 4, the QGD is introduced for the false alarm reduction stage and is extended to the NL-QGD. The training strategies for the NL-QGD are discussed in Chapter 5. In Chapter 6, the results of experiments measuring the performance of the focus of attention in the ATD/R system on real-life imagery (Mission 90 Pass 5 SAR data set) are discussed. Chapter 7 concludes the study and presents a summary with recommendations for future work. CHAPTER 2 SYTHETIC APERTURE RADAR (SAR) DATA DESCRIPTION 2.1 Introduction SAR is a coherent system in that it retains both phase and magnitude of the backscattered signals (echoes). SAR refers to a technique used to synthesize a very long antenna by combining echoed signals received by the radar as it moves along its flight track. The high resolution is achieved by synthesizing an extremely long antenna aperture [16]. Aperture refers to the opening used to collect the reflected energy that is used to form an image. In the case of a camera, this would be the shutter opening; for radar it is the antenna. A synthetic aperture can be therefore constructed by moving a real aperture or antenna through a series of positions along the flight track. The net effect is that a SAR system is capable of achieving a resolution independent of a sensor altitude [16] [24]. This characteristic makes SAR an extremely valuable instrument for space observation. As an active system, SAR provides its own illumination and is not dependent on light from the sun, thus permitting continuous day/night operation, and has the additional advantage of operating successfully in all weather conditions since neither fog nor precipitation have a significant effect on microwaves, depending on the wavelengths. There are three common SAR imaging modes: spotlight, stripmap, and scan. During a spotlight mode data collection, the sensor steers its beam antenna to continuously illuminate a terrain patch being imaged. In the stripmap mode, antenna pointing is fixed relative to the flight line, resulting in a moving antenna footprint that sweeps along a strip of terrain parallel to the pass of platform motion. In the scan mode, the sensor steers the antenna beam to illuminate a strip of terrain at any angle to the path of motion. The scan mode is a versatile operating mode that encompasses both the spotlight and stripmap modes as special cases. Because the scan mode involves additional operation and processing complexity, spotlight and stripmap modes are the most common SAR imaging modes. The spotlight mode allows the capability of collecting fine-resolution data from localized areas. The stripmap mode is appropriate for imaging large regions with coarse resolution. 2.2 SAR Image Formation The high resolution in radar systems can be achieved by a technique called aperture synthesis [24]. This technique enables much finer resolution to be achieved than would be possible with a conventional side-looking radar. In a side-looking radar, an antenna which is fixed parallel to the track directs a radar beam broadside and downward from the platform tracks as shown in Figure 3. The ground area that one pulse illuminates is called the radar's footprint. The beam is scanned by the motion of the platform so that the beam footprint is swept along a swath on the terrain surface. The dimension of the footprint is determined by the antenna size, the range, and the transmitted wavelength. With an antenna length L and a transmitted wavelength X, the azimuth width of a footprint is approximately XR/L at a range R [24]. The footprint of the illuminated on the ground does not disappear outside this region but fades quickly. The width XR/L specifies the 3 dB level where the power of the footprint is half the maximum power. The radar receives and records the backscattered energy from the swath surface and generates an image of the surface reflectivity. The spatial (range and azimuth) resolution of the image is determined by the pulse width and the radar beam width in the range direction [24]. While the pulse width can be narrowed, and a finer range resolution achieved, the length L of the radar antenna determines the resolution in the azimuth direction of the image; the longer the antenna is the finer the resolution in this direction will be. As an example, in order to achieve a 25 meter azimuth resolution from the Seaset satellite with X = 23.5 cm and R = 850 km, the antennal length requirement is 8 km = (23.5 cm)(850 km)/(25 m). This is obviously a prohibitively large antenna and not practical for achieving the specified azimuth resolution. In conclusion, the goal is to synthesize an image with the resolution of a focused large aperture antenna system using the data returned from a physically small sized antenna by using SAR. Figure 3 Imaging geometry of a side-looking aperture radar. 2.2.1 Range Processing A real aperture can achieve the range resolution by emitting a brief intense rectangular pulse, then sampling the returned signal and averaging over time intervals no shorter than the emitted pulses. That is, the effective duration and energy of the transmitted pulse determines the range resolution and maximum range of a radar system. Shorter duration pulses allow closely spaced targets to be discriminated, while high energy pulses provide measurable reflections from targets at large ranges. In order to avoid the difficult and flight pass azimuth direction footprint imaged swath azimuth beamwidth = ALL terrain surface expensive development of hardware to generate short duration pulses with energy characteristics, increased duration pulses are coded for transmission and then compressed at echo reception. A linear frequency modulated (FM) waveform of a finite duration is often used for pulse coding, which is called the chirp waveform, and a correlation (matched filter) receiver is used for compression [24]. The frequency modulation enables high range resolution to be achieved at low transmitter peak power. Functions of the form 1 2 F12] j2it(ft+ at) cos 2r (ft + at2) , or more generally e , compress into very sharp auto12 correlations. For the complex exponential with phase 2t (ft + I at ) , the first time derivative of the phase of the waveform is 2 t (f+ at) . The frequencies of the waveform changes linearly with a slope of a as time t increases. The larger the value of a the faster the frequencies change. Figure 4 depicts a chirp wave and its autocorrelation function. Figure 4 A chirp waveform and its autocorrelation function. 14 a) A chirp wave b) The autocorrelation function of the chirp wave in a) The autocorrelation A (T) of the chirp can be calculated as 1 2T1 To+ 2T -j2 x (ft + at 2) -j2 xt(f(t+ ) + a (t + C) ) A (T) = J e e dt TO j2x (ft + Ia2) T 2T = e 2 f e -j2nattdt TO J2x'c (f + a ( To + -1 TM sin [ Ttar ( T- IJl )] =e 7caT(T-ItI) for -T < T < T nax(TIxl)(1) The term T- Iti is a triangle function weighting the sin(x)/x or sinc function. The width of the main lobe of the autocorrelation function is approximately 2/aT and the half power is about 1/aT. Note that the product aT is the bandwidth of the chirp over the pulse duration T. The gain in resolution, or pulse compression ratio, is T divided by 1/aT or al2. This is also the time-bandwidth product of the chirp signal. Thus a high time-bandwidth is required for high resolution. For a pulse shape function, u(t), and received signal of the form r(t) = au(t-t), the receiver that implements a correlation for complex signals is given by y(t) =Ju* (s) r(s+ t)ds = afu* (s) r (s + t - c) ds = TA (t -tr) (2) where a is the target reflectivity from the range corresponding to time tr. When the pulse shape u(t) is selected such that its autocorrelation fades quickly as time lag t increases the output y(t) of the receiver will be maximum when t equals t and be small otherwise. This means that the output of the receiver will have spikes associated with time delays which correspond to reflecting objects. In general, if there are N reflectors in a target reflecting energy, then there will be N output spikes from the correlation receiver with each one scaled based on the reflectivity N of the associated target. If r(t) is given by Y aiu (t - Ti) , then the output of the correlai=1 tion receiver is y(t) =fu* (s) r (s + t) ds N IT X ifU* (s)u(s+t-'i)ds i=1 N =- A X ia(t-Ti) (3) i= 1 The output of the receiver can be expressed as the convolution of the received signal with an impulse response h(t) as follow, y(t) = fu* (s) r(s+ t) ds = f u(s) r* (- (t -s) ) ds = f r (s) h (t - s) ds(4 The impulse response of the linear filter which implements the correlation receiver is therefore h (t) = u* (-t). This convolution implementation is descriptively called the time reversed filter which can be easily implemented in an existing linear filter architecture by modifying the impulse response and the output is then y(t) = r(t)*h(t) (5) This filter can be implemented in the frequency domain as Y (J) = R (f) H (f) = R (f) U* () (6) where Y(f), R (f), and H (3) are the Fourier transform of y (t), r (t), and h (t) respectively. U* (J) is the conjugate of U (f) which is the Fourier transform of u (t). The complex conjugate operator in the frequency domain corresponds to a complex conjugate together with time reversal in the time domain [59]. The filter described by the corre- lation, time reversed, and conjugated receivers is referred to as the matched filter. This is because the filter is essentially a reference replica of the transmitted pulse which is compared to the received signal. When characteristics of the propagation medium are known, the reference waveform is given the shape of the anticipated received signal. The filter output is a measure of how precisely the received signal and reference match. 2.2.2 Azimuth processing The range resolution in a radar system was determined by the type of pulse coding and the way in which the return from each pulse is processed. All the radar systems, conventional radars or SARs resolve targets in the range dimension in the same way. It is the resolution of targets in the azimuth dimension that distinguishes a SAR from other radar systems. The principle of SAR is to store successive echoes to a moving radar from targets in ground, and to process them to synthesis a long aperture, thereby achieving high azimuth resolution [17]. A radar with an antenna length La in the azimuth direction generates the radar beam that has an angular spread of 0 = X/La in Figure 5. Two point targets on the ground separated by an amount of Sx in the azimuth direction and a slant range R can be resolved only if they are not both in the radar beam and 5x = RO = R- (7) H La This is the resolution limit of a conventional side-looking real aperture radar. It is clear from this that the azimuth resolution capability of the conventional radar varies inversely with the physical antenna size, becoming finer for increasing antenna length L , and degrading with increased slant range R. synthetic aperture length point target Figure 5 Aperture synthesis. The point target is in the beam for a time Ts = L/V. After phasing correcting the signals, a synthetic antenna pattern is obtained which is equivalent to that of a conventional antenna of length 2Ls. In SAR operation, suppose that we consider a radar beam to sweep over an arbitrary target as the platform flies over the scene in Figure 5. The point target remains in the beam for a certain time interval T.. During this time interval the radar transmits pulses at a certain rate (the pulse repetition frequency or PRF) and also receives backscatter off of the point target during the repetition times between successive pulse transmissions. Therefore after the time interval Ts, a collection of backscatters are built up which span a spatial interval Ls, equal to the beam width. R Ls = 8X = XLR (8) a The backscatter from the point target is distributed over a large number of apertures along the track's spatial extent. A large antenna aperture has the difficulty of being physically implemented but is synthesized by sequentially gathering the backscatters using a small sized antenna at different positions which collectively define the antenna array. The slant range R from the radar to a point target can be written 2 + 2 R = Ro+x =Ro(1 + (vt) 2 i ) (9) 0 2R if 0o, x where Ro is the slant range when perpendicular to the flight line and t is the elapsed time from when the platform passed its closest point of approach to the point target. Hence the phase shift of transmitted and received signals is 2t = --- x 2R 21t x2 (10) where X is the radar wavelength, and 00 = -41tRo/X. The Doppler frequency shift between the transmitted and received signals is given by dO (t) 2v2 __- t ot (11) fD= dt XR0 From (11), The azimuth spread of the point target response approximates a linear frequency modulated waveform. The Doppler frequency shift is highest when the point target enters the radar beam. This decreases with time until it becomes negative and reaches a minimum before the point target moves out of the area of illumination. The azimuth resolution to resolve two consecutive point targets in the azimuth direction is determined by 2vAx Af- Roj (12) and R X Ax - xAf (13) The Doppler resolution of the processing is the reciprocal of the time T taken to synthesize the aperture, which is T - (14) vLa The maximum azimuth resolution is therefore the value of Ax which corresponds to a Doppler bandwidth of l/Ts R X vLa La Ax- 2v X R- 2 (15) (15) implies that the azimuth resolution improves as the antenna length La decreases. However, shorter antennas require more power for signal transmission and a longer synthetic aperture. 2.3 ISAR image formation Besides the three SAR imaging modes mentioned earlier, there is a fourth operating mode called inverse SAR (ISAR). SAR in this mode produces radar signal data similar to that of spotlight mode SAR. However, the ISAR mode is different in that data collection is accomplished with the radar stationary and the target moving. The signals are similar because it is the relative position and motion between the sensor and scene being imaged. Since the signals are similar, the processing required to produce an image is also similar. Therefore, ISAR imaging refers to the use of target motion alone to generate a synthetic aperture for azimuth resolution. ISAR imaging can be accomplished by rotating the platform (turntable) on which a target is imaged. The ISAR operation is very useful for collecting the RCS (radar cross section) information of a target depending on many different aspect angles given different depression angles. ISAR uses the same carrier term as in the SAR case but the difference is that the motion of a target relative to the radar platform is rotational as opposed to linear. The resulting Doppler phase term due to rotation can be linearized as a function of azimuth range position. Figure 6 displays an ISAR imaging scenario. The radar is stationary as the target rotates at a constant angular rotation rate o in rad/s on a turntable. As a scatterer at a point x on the x-axis is rotated through an angle AO, the change in the Doppler phase is 8fD = 20corc (16) The azimuth resolution to resolve two scatterers on the x-axis for a small viewingangle rotation AO is obtained for AfD = l/T as Arc= 2-OAfD- 2toT 1 T X (17) 2A0 The range resolution for ISAR is obtained by using wideband waveforms as with SAR. rotate stationary radar ................................................. a) stationary radar b) Figure 6 An ISAR imaging mode. a) top view, b) front view. Figure 7 illustrates ISAR imaged vehicles from the TABILS 24 ISAR data set. The radar used for the data set is a fully polarimetric, Ka band radar. In Figure 7a and Figure 7b, the MV0015 and MV0095 vehicles are shown at depression angles of 20' and 15' respectively, each with aspect angles of 00, 400, 80', and 120ï¿½. Target scattering looks very different, depending on the azimuth angles. 400 aspect angle Q ('O t orah nnIa 19f0 anee't 2n0l1P ,, U ,.,,,L 12. a) . . a..., 0' aspect angle 400 aspect angle 800 aspect angle b) 1200 aspect angle Figure 7 Examples of ISAR imagery. Down range is increasing from left to right. 0ï¿½ aspect angle 2.4 Preprocessing It is often required that before deploying detection and recognition tasks on the image, the image enhancement is often necessary to enhance target detection/recognition performance. Image enhancement is therefore viewed as a preprocessing step before proceeding ATD/R tasks. With the availability of fully polarimetric high-resolution SAR imagery, several image enhancement techniques have been developed, exploiting the polarization scattering characteristics of targets and background clutter. SAR processing allows for high-resolution images, but introduces considerable amount of speckle in the image due to the coherent nature of the imaging process. The primary goal of preprocessing in polarimetric SAR imagery is to reduce image speckle and to improve target-to-clutter contrast. Although many image enhancement techniques have been developed, polarimetric enhancement techniques are particularly desirable over other enhancement techniques because they can provide significant speckle reduction and target-to-clutter ratio improvement while preserving the resolution of the original SAR imagery. Novak et al developed the polarimetric whitening filter (PWF) which combines polarimetric measurements to produce an intensity image having minimum speckle [53] [54] [55] [56]. Such an improvement led to enhance target detection performance [14] [53], clutter segmentation ability [10], and texture discrimination ability [9].The detection algorithms discussed and developed in this dissertation are tested and evaluated based on the PWF SAR. The PWF technique developed by Novak et al. is introduced in the following sections. 2.4.1 Polarimetric Clutter Model A mathematical model is used to characterize fully polarimetric radar returns from ground clutter. When operating in a linear polarization basis, a synthetic aperture radar uses four polarizations (HH, HV, VH and VV to measure the full polarization scattering matrix for any such clutter region. Since the HV has a reciprocity relationship with the VH, the set of three polarizations (HH, HV, and VV) contains all the information in the polarization scattering matrix. A realistic assumption is often made on ground clutter and sea clutter because they are spatially nonhomogeneous. Such clutter are modeled with non-Gaussian models. The polarimetric measurement Y for each SAR pixel in ground clutter is expressed by three complex elements: HH, HV, and VV [H HH1 J+jHHQ Y= HV = HVI+jHVQI (18) _vV vvI +jvvQwhere HHI and HHQ are the in-phase and quadrature components of the complex HH measurement. Y is assumed to be the product of a complex Gaussian vector X and a spatially varying texture variable Tg. Y = 4X (19) The vector X is assumed to be zero-mean (due to the random absolute phase of its components) and complex Gaussian. Hence, the probability density function of the vector X is given by f(X) - exp(-Xt Y.'X) (20) where the symbol t represents the operation of Hermitian transposition, and I = E (XXt) is the polarization covariance matrix of X. In general, the polarization covariance matrix in a homogeneous region of clutter takes the form 1I 0 P 1F E a HH 0 E 0 (21) 0 j where 0Y HH = E (IHHI2) (22) E (IHI2) 2 (23) E (IHi2) Y E (I VV]2) (4 2 /-(24) E (I H1 H/2) E(HH. VV*) [E (I2 1/2 (25) It is assumed that the product multiplier g is a gamma-distributed random variable. This assumption is universal: the log-normal and Weibull models are also widely used. The gamma-distributed random variable g has the form of a distribution I' g V- v)1 fG(g) = ( -i)exp ( -) (26) where the parameters gand v are related to the mean and the variance of the random variable g. E (g) = g-v (27) E (g2) = 9 v (v + 1) (28) where E is a statistical mean operator. Therefore, the resulting probability density function of Y is the modified Bessel function, or generalized K-distribution [52], given by 2 K3vt (9 f(Y) =73-VF(v) III (gyY y) (3-v)/2 (29) 1 Note that when g = - so that the mean of the texture variable is unity the nonV Gaussian model, in the limit as v - 0 reduces to the Gaussian model. The assumption that HV is uncorrelated with HH and VV is not always true (especially for man-made targets or for a polarimetric SAR with cross-talk between channels) but is valid for ground clutter [53]. 2.4.2 Polarimetric Whitening Filter (PWF) The mathematical model established in the previous section is now used for processing the polarimetric measurements HH, HV, and VV to form an enhanced SAR intensity image. An optimal processor known as the PWF is derived, which combines the polarimetric measurements to produce an intensity image having a minimum amount of speckle. A quadratic processing of the polarimetric measurement to an intensity image is constructed as follows, y = Y[AY = gXtAX (30) where the weighting matrix A is assumed to be Hermitian symmetric and nonnegative definite, and g is a spatially varying texture variable. The objective is to find the optimal weighting matrix which leads the quadratic processing of the polarimetric measurement to produce an intensity image with a minimum amount of speckle. The ratio of the standard deviation of the image pixel intensities to the mean of the image pixel intensities is used as a measure of speckle and is given by s standard deviation of y -VAR(YtAY) 1/3 -- = =(31) m mean of y E2(YtAY) where VAR is the variance. Instead of minimizing the speckle amount s/n, we minimize the square of the speckle amount (s/m)2. s 2 VAR(YfAY) = VAR(gXAX) (32) (-) = 232 m E2 (YA Y) E2 (gXf AX) We use the following useful results: 3 E (XfAX) = tr (YA) = i (33) i=1 3 VAR (XtAX) = tr(XA)2 X2 (34) i=1I where tr is the trace, and XI, IX2' and X 3 are the eigenvalues of the matrix YA. With these results, the square of the s/m ratio can be written as (s)2= VAR(g)]x[E(g2) VAR(XTAX) VAR(g) rn E2 (g) E2 (9g) E 2 (IAX) E 2 (g) V 3 2 V Note that v is a constant in (35), and minimizing (s/M)2 is equivalent to minimizing 3 (36) C3)2 Note from Eq 35 that if the set {JX'1, X2, '3} yields a minimum for (s/m)2, then so does the set aX1, (aX,, oX3 for any scalar t. Therefore, we can minimize (35) by minimizing 3 its numerator X 3/ subject to the arbitrary constraints X, = 3 on its denominator. i=1 i= I This modified optimization problem can be solved with the method of Lagrange multipliers. Using a Lagrange multiplier P, we minimize the unconstrained functional 3 (( 3 2 f(X1'9X2,X3') = -X2+ X9XX.Ji (37) Taking partial derivatives with respect to X., and setting the results equal to zeros yields 3 2k.- 2P = i 0 for i = 1, 2, 3 (38) x. - 2X1-- ' J i=1 Thus we find that X- X 2 13 (39) _= i=1 i=1 3 The above result (together with the condition i = 3) implies that a minimizing solui=1I tion is )11 2 3 (40) (40) leads to the following result: A =1 (41) A* = -1 (42) (40) and (41) imply that the optimal weighting matrix A* is the one that makes all of the eigenvalues of IA equal to one. The minimum speckle intensity image is therefore constructed as y = gXe '-Ix (43) This solution can be interpreted as a polarimetric whitening filter. That is, the polarimetric measurement Y is transformed to a new coordinate system by the filter 1-X to obtain 1 W = Jrg 2x (44) This transform whitens the polarimetric measurements, so that 1 1 Xw=E(WWf) =gX 2E(XXI)X 2 1 1 (45) = gy 21y 2 = gi The minimum speckle image is then obtained by noncoherently averaging the power in the elements of the whitened vector W, as shown by 3 y= 11Wil2 i= 1 (46) =WtW = gXf Y-X In conclusion, the PWF changes the polarimetric base from a linear polarimetric base to a new base given by HH, HV (VV-p*,yHH) (47) yE -(lI-1p 2) I In this new basis, the polarimetric channels are uncorrelated and have equal expected power. Thus the optimal way to reduce speckle polarimetrically is to sum the powers noncoherently in these polarimetric channels. Another advantage of the PWF processed image is that it merges into a single image of the scatters that show up only in one of the polarization channels. Therefore, the PWF image is very useful for locating the features that might be ultimately used in a recognition task. Figure 8 summarizes the PWF processing procedure. 2.4.3 Preprocessing SAR Imaze The focus of attention of the ATD/R system (Figure 2) is deployed on the PWF SAR image whose primary characteristic is that the speckle in the SAR imagery is optimally reduced [54]. Therefore, the input to preprocessing is a sequence of fully polarimetric images where each polarimetric image in the sequence is a set of three complex-valued images denoted by HH, HV and VV when the images are expressed in a linear polarization basis. Each set of three complex-valued (HH, HV, VV) pixels is optimally combined and transformed to the real-valued pixel intensity y by the PWF. 1. Whitening the original image Whitening HH H Filter HV r V -12 10 w --V (VV- p* /HH) y(l - p12) 2. Compute a minimum speckle image y = In,2+ HV2+ (VV-p*4yHH) 2 /y(1 -IPl2) Figure 8 Minimum-speckle image processing [55]. Each set of three complex-valued (HH, HV, VV) pixels is optimally combined and transformed to the real-valued pixel intensity y by the PWF Each pixel intensity y is related to the vector Y = (HH, HV, VV) of corresponding complex polarization values HH, HV, and VV by the quadratic relation y = Y4 A Y where t denotes the complex transposition and the matrix A is determined such that the speckle amount is minimum. Finding A requires that the polarization covariance matrix of the surrounding clutter be found. The mission 90 pass 5 SAR imagery used in this study is a strip mode, fully polarimetric image with a linear-polarization basis. The scrub region located in the vicinity of the powerline towers in frame 105 of mission 90 pass 5 SAR data set was used for computing the covariance matrix of a typical clutter background and for the PWF processing. The estimated polarimetric covariance matrix estimated is reported to be [55] 1.00 +jO.00 - 0.01 +j0.02 0.60 -j0.05] IC = 0.098. -0.01 -jO.02 0.19 +jO.00 - 0.00 -j0.00 (48) 0.60 +jO.05 - 0.00 +jO.00 1.08 +jO.00J 2.5 SAR Image Visualization It is often required that the image data being used as input need to be visualized on a display screen to observe the characteristics of the data such as trees, shadows, buildings. The visualization of the data is also required in the target embedding, the preparation of training and testing data for the detectors, and the selection of regions of interest. Khoros [43] was used for such purposes, which is an integrated software development environment for information processing and visualization. Khoros worksheet brings up interactive display glypses with pan and zoom capabilities in Figure 9. This worksheet was created for a display purpose. The worksheet loads frames and concatenates them in order. Using interactive display it allows a users to scroll through the entire plane of the loaded images and to zoom in and out on the regions of interest. Reading the current cursor position provides the corresponding coordinates (xy) of the image. Figure 9 displays an example of visualizing two frames of the SAR images in the Khoros working environment. One frame in the Mission 90 pass 5 SAR data set has a size of 2048 and 512 pixels in azimuth and cross range. Each pixel is a 8-bit integer value which ranges from 0 and 255. The pixel values were linearly converted from the PWF transformed data having mostly -50 dB to +30 dB. age for the: KHO!OS System Figure 9 Khoros worksheet for a display purpose. 2.6 Examples of SAR Images Radar images are composed of many pixel elements (pixels). Each pixel in the radar image represents the radar backscatter; brighter areas represent high backscatter. Bright features mean that a large fraction of a transmitted radar signal was reflected back to the radar receiver, while dark features imply very little reflection back from targets. Backscatter for a target area at a particular wave length depends on a variety of conditions: the size of scatters in the target area, moisture content of the target area, polarization of the pulses transmitted, the wavelength used in the radar transmitter, and observation angles. Backscatter also depends on the use of different polarizations. Since polarimetric SARs measure the phases of incoming pulses, the phase differences (in degrees) in the return of HH and VV signals are frequently the result of structural characteristics of the scatters. Figure 10 displays some SAR imagery preprocessed by the PWF. In general, the higher or brighter the backscatter on the image, the rougher the surface being imaged. Flat surfaces that reflect little or no radar transmitted signal back towards the radar receiver will always appear dark in radar images. Figure 10a shows a highway and a bridge. The surfaces of the highway on the bridge appear dark since they are flat. Surfaces inclined towards the radar usually have a stronger backscatter than surfaces which slope away from the radar and tend to appear brighter in a radar image. Figure 10b displays some houses and parked cars. The roof surfaces inclined toward the SAR appear much brighter than the surfaces inclined away from the radar. Some areas not illuminated by a radar, such as the back slope of mountains, appear dark and are called the radar shadow. As an example of this, the houses (in Figure lOb) display radar shadows at the back sides. Received radar pulses that bounced off of several objects appear very bright (white) in radar images. Vegetation is typically rough on the scale of most radar wavelengths and appears light grey in a radar image. Figure 10c displays scenes of powerline towers, trees and scrub. Two pairs of powerline towers are visible in the upper-left and the lower-right sides of the picture. A narrow scrub region running diagonally through the picture appears moderately coarse and the trees divided by the narrow scrub region look rough so that the trees are almost individually discernible. Buildings which do not line up so that the radar pulses are reflected straight back will appear light grey, like very rough surfaces. Backscatter is also sensitive to the target's electrical properties, including water content. Wetter objects will appear bright, and drier targets will appear dark. The exception to this is a smooth body of water, which reflects incoming pulses away from a target; these bodies will appear dark. 2.7 Target Embedding Strategy 2.7.1 Development of a Target Embedding Method ATR systems employ detection/recognition algorithms over the sensor imagery. A target in the imagery can exhibit an infinite number of different shapes depending on depression and aspect angles by SAR. In order to reliably test the performance of an ATR system, it is very difficult and impractical to actually place targets with many different aspect angles at many different locations over a terrain and to image the terrain. ISAR operation provides rich shape information about a target with different view angles. By employing an appropriate method of target embedding, targets can be placed at many different locations with many different view-angles. Figure 11 depicts a target embedding methodology. The methodology developed for target embedding proceeds as follows: (1) appropriate locations for target embedding are selected such that targets are placed in the clear and in between large scattering centers in the regions of cultural clutter, (2) in order to handle the circular-polarization-basis ISAR target images for the image a) b) Figure 10 Some examples of SAR images from the Mission 90 pass 5 SAR data set. The images were sensored at an altitude of 2 km with a depression angle of 22.50 with a slant range of 7 km. The HH, HV and VV returns of the images were combined to produce a minimum speckle image via PWF processing. The radar sensor is located at the top of each image, looking down so that the radar shadows go downward. (Mission 90 Pass 5 data set) in linear-polarization-basis target images, a polarization basis transformation is applied to the ISAR target images, resulting in the corresponding linearpolarization-basis target images, (3) two different images having the same polarization basis with the same resolution generated at the same radar wavelength, clutter image and target image, are coherently added at the locations selected for target embedding, (4) the PWF transformation is finally applied to the new image coherently added with targets. 35 GHz Mission 90 Pass 5 SAR (HH, HV, VV) Users without targets 35 GHz TABILS 24 ISAR target data SAR data with targets embedded Figure 11 Target Embedding Procedure. The coherent addition means that, for example, the in-phase component (HHI) and the quadrature component (HHQ) of a target pixel are added into the in-phase component (HHI) and quadrature component (HHQ) of the clutter pixel, respectively, at a location selected for target embedding. This procedure is also applied to the other two components (HV and VV) of the pixel. The PWF transformation leads the three complex values of each pixel to a single image pixel value. Therefore, the PWF transformed image is useful for locating the features which might be ultimately useful for recognition. After target embedding, further processing can be applied to the image with targets embedded before the detection/recognition algorithms are employed. For example, PWF processing can be applied for speckle reduction in the image. 2.7.2 Embedding the TABILS 24 ISAR Target Data into the Mission 90 Pass 5 SAR Data Set In order to utilize and migrate plenty of targets from the TABILS 24 ISAR target data set into the strip mode SAR image, it is required that an appropriate transformation of polarization bases be applied to the ISAR target images which are circular-polarizationbasis image. Therefore, the circular-polarization-basis images (i.e., LL, LR and RR) of the TABLIS 24 ISAR targets are transformed to the corresponding linear-polarization-basis images (HH, HV, and VV). This transformed target images are coherently added into the appropriate locations of the mission 90 pass 5 SAR imagery selected for target embedding which is later discussed in detail. The embedding method developed for testing performance with targets in clutter (Figure 11) combines the turntable data (TABILS 24) with the SAR data (mission 90 pass 5). No embedding method will be perfect, but it is our belief that the method we are using preserves the gross statistical characteristics of non-occluded targets in clutter upon which the pre-detection schemes in the focus of attention stage depend. Independent testing of the algorithms by the MIT Lincoln laboratory on real targets in clutter (SAR data in which targets were in the field of view during data collection) corroborated the results of the superiority of the CFAR/QGD combination [63]. So this seems to bear out the fact that the embedding method used is sufficient for development and generation of preliminary ROCs. We chose ISAR target data that was taken at approximately the same depression angle as the mission 90 pass 5 (23 degrees). Our assumption is that ISAR data with the same resolution of mission 90 data, and depression angles that differ from the mission 90 pass 5 depression angle by less than 3 degrees are suitable for embedding. ISAR target images were collected from 22 target data sources of TABILS 24 data set. Since some of 22 target data sources have the same target types but were measured under different weather conditions, 22 target data sources consist of 10 different target types. For each target data source, target images were extracted at 7.50 azimuth increment over a complete of azimuth. This resulted in 1000 target images (345 training, 345 cross validation, and 345 testing). The ISAR target images within the training set, cross validation, and testing sets were separated by 7.50 azimuth increment, that is, the target images were picked up at each increment step of 7.50 and assigned to the training, cross validation, and testing sets in a sequence. The targets then need to be placed in the clear and in between large scattering centers in regions of cultural clutter. In order to embed the targets into the clutter data, three polarizations of complex ISAR data are extracted (HH, HV, VV). These are added coherently to the different regions in the three polarizations of complex SAR data. The PWF transformation is then applied to the new data. The PWF transformed data is scaled logarithmically to the range [0, 255]. Since the points at which the ISAR data is embedded are range pixels with low RCS (although surrounding regions may contain range cell with very large RCS (Radar Cross Section)), the large scattering centers on the ISAR targets are not changed significantly. These are the points upon which the prescreening algorithms depend. An example of the embedding process is shown in Figure 12. At the top of the figure is an example of an ISAR image (PWF). At the bottom left of the figure is a section of cultural clutter. At the bottom right is the same section with the ISAR target embedded. The images are shown after PWF processing, but of course, the target is embedded prior to such processing. As can be seen the target signature is not changed dramatically after embedding. Figure 13 displays some embedded targets at a variety of aspect angles. Original ISAR image chip, no background Cultural clutter region, no embedded Cultural clutter region with target target embedded Figure 12 Target embedding. Figure 13 Some examples of the TABLIS 24 ISAR targets at variety of aspect angles after embedding. CHAPTER 3 PRESCREENERS 3.1 Introduction As mentioned in Chapter 1, the front-end detection stage aims to locate potential targets in the sensor imagery and allows for significant reduction of data processing in subsequent stages. Due to the direct processing of the entire imagery, the front-end detection stage requires computationally simple and efficient algorithms. A two-parameter CFAR detector has been used as a prescreener for the front-end detection stage [57]. It computes a Mahalanobis distance between a pixel under test and its neighbor pixels defined in a predetermined size of window (CFAR stencil). The twoparameter CFAR detector meets the requirements of algorithm simplicity and efficiency. There is however room for improvement of the CFAR detector and we discuss this possibility of improvement. We introduce a novel detector which is called the "2FAR detector. The 7CFAR detector extends the conventional CFAR structure by using a set of gamma kernels as an alternative to the fixed and predetermined size of the CFAR stencil. Before we discuss the two-parameter CFAR detector, we start with a one-parameter CFAR detector to pave the road for a two-parameter CFAR detector. 3.2 A One-Parameter Constant False Alarm Rate (CFAR) Detector A cell-averaging (CA) CFAR detector controls the detection threshold for a specific resolution cell based on the estimate of a sufficient statistic of the clutter. The detector estimates the clutter power in the cells surrounding a cell under test. When the clutter is statis- tically homogeneous over the resolution cells, this detector works well, otherwise, the performance degrades. In order to implement a CA operation, a stencil can be used locally for a particular location in the image (Figure 14). A test pixel is defined at the center area (Rt) of the stencil and the clutter intensity mean is estimated in a local region (Rc) in a distance away from the pixel under test. We refer to the stencil as the CFAR stencil. The output of the CA-CFAR operation can be expressed as t,j, x(i ci,jRX(i,j) (49) where x represents the intensity pixels and N, and Nc are the numbers of pixels in R, and Rc in the stencil. This CA-CFAR detector can be called a one-parameter CFAR detector because only the mean information is utilized in the operation in (49). The hypothesis testing is then performed over the outputs after the CA-CFAR processing such that target Y TCFAR (50) non - target From a signal processing perspective, the one-parameter CFAR detector in (49) is a band pass filter running over the image in which the outputs of a low pass filter by the target mask of the template is subtracted from the outputs of another low pass filter by the clutter mask. Abrupt changes in image intensities produce large outputs by the one-parameter CFAR detector. Since man-made objects are usually strongly scattered back to millimeter SARs, the large intensity differences between pixels under test and their clutter background are easily detected. However, an output of the one-parameter CFAR processing in (49) is the result of convolving the image with two rectangular-shaped kernels which cause large losses or ripples outside their main lobes in the frequency domain. Contrast enhancement between tar- gets and clutter can be achieved by incorporating smooth-shaped kernels in the stencil [26]. The threshold of the one-parameter CFAR detector in (49) is sensitive to the scale of the image in which the threshold is linearly proportional to the scale factor of the image. In the next section, a two-parameter CFAR detector is introduced which incorporates a normalization factor. This factor is a standard deviation of the clutter in Re. 3.3 A Two-Parameter CFAR Detector A two-parameter CFAR detector was first developed for use on I -D range profile by Goldstein [29]. Later, Novak et al. extended it for use in SAR imagery [56] [57]. The two-parameter CFAR detector has been commonly used as a pre-screener in multi-stage SAR ATD/R systems. Its popularity is due to an excellent figure of merit in terms of performance/simplicity. The name indicates that a constant false alarm probability of detection is achieved, but in fact this is only true for Gaussian distributed targets and clutter [57]. Normally in SAR imagery this is not the case for targets in clutter. Nevertheless, experience has shown that the two parameter CFAR is a robust detector for manmade clutter and targets [57]. The reasons for this success can be found in the simplicity and the discriminating power of the test. Basically, the CFAR compares the intensity of a pixel under test with the normalized intensity of a surrounding area. Since man-made objects are normally bright in SAR imagery, this is a very effective test, which can be efficiently implemented in digital hardware. In terms of statistical detection theory, the CFAR is estimating the parameters of the local intensity probability density function (pdf), and making a decision when there is a brightness deviation of the pixel under test with respect to the normalized mean background intensity, i.e. a) CFAR stencil local areatetpxl gadbn test ixel guard band loca ara tet pxelwidth b) top view of CFAR stencil Figure 14 CFAR stencil. The amplitude of the test pixel is compared with the mean and the standard deviation of the local area. The guard area ensures that no target pixels are included in the measurement of the local statistics [57]. Xt - c target > TCFAR (51) ^c < OC clutter where Xt is a pixel under test, TCFAR is a threshold for the two-parameter CFAR detector and X and bc are the estimates of the local statistics of mean and standard deviation measured in a defined local area by the CFAR stencil (see Figure 14). The estimates, Xc and 63c are computed by XC x(i,j) (52) where x(ij) is the pixel intensities at (ij) locations and ac4 Z (x(ij) -Xc) 2 (53) C,' j e!a where Qc defines the local area where Xc and 6. are computed, and N. is the number of pixels in Qc* The shape of the stencil ensures that when the center pixel is on target, the neighborhood falls in the background such that its local statistics can be reasonably well estimated. The shape of the stencil (in particular the guardband) is governed by the target size [54]. In SAR imagery, the reflectivity of the object is only weakly coupled to its geometric shape, so a priori stencil dimensions based solely on target size cause suboptimal performance. In terms of statistical pattern recognition, we can interpret the CFAR in a slightly different way. One can think that the CFAR stencil is extracting intensity features in the neighborhood of the pixel under test. In fact, the CFAR equation can be rewritten as 2 2X ,2_ r2 v2 2 -2 >(4 Xt -2XX +x CFARc + TCFARXc < 0(54) where -2 and X2(=- x2 (i,j)) are the square of an estimated mean and an estici, j E Q mated mean of intensity power respectively, both of which are measured in a neighboring area selected by the CFAR stencil. Notice that this expression is a linear combination (with fixed weighting) of the image intensity and its square at the pixel and the mean, power, and mean square of the intensity at the neighborhood. Hence we can interpret the CFAR as implementing a "restricted" linear discriminant function of quadratic terms of the image intensity. From this perspective, the two parameter CFAR can be improved: first, it uses only some of the quadratic terms of the intensity on the pixel and its surroundings; second, it implements a fixed parametric combination of these features; and thirdly, there is little flexibility in the feature extraction because the kernel is ad-hoc. These three aspects can be greatly improved if more mathematically oriented features are computed and if trainable classifiers are built. In Chapter 4, this is exactly what the quadratic gamma detector (QGD) and even more the NL-QGD (Nonlinear Extension to QGD) will provide. 3.4 Extension to the Two-Parameter CFAR Detector 3.4.1 Introduction In this section, the conventional CFAR detector discussed in the previous section is improved, incorporating a new stencil which is called the yCFAR (gamma CFAR) stencil. In Section 3.4.2, the gamma filter and gamma kernel [ 19] [64] are reviewed. They constitute the basis of the new stencil. The 2-D extension of I -D gamma kernels are introduced in Section 3.4.3. As extensions to the conventional CFAR detector, 'yCFAR detectors are introduced and their characteristics are discussed in Section 3.4.4. 3.4.2 Gamma Filter and Gamma Kernel The gamma kernel was originally developed for time series analysis as a short-term memory network structure for sequence processing neural network [ 19]. With the property of completeness in Hilbert space (L2 space), any finite energy signals can be approximated arbitrary closely using a finite number of gamma kernels. The gamma kernels are defined as 9kM It k -_ e-Att(5 ( -- (-)!(55) The implementation of the gamma kernels is accomplished by introducing a local feedback loop around the delay operator in Figure 15. The impulse response from the input to the kth tap generates the kth order gamma kernel by (55). This structure is called the gamma filter. The Laplace transform of the gamma kernels is given as It k Gk(s) = (-) = G k(s) (56) where I G (s) - (57) s+it The kth order gamma filter is characterized by a pole at s = -g with multiplicity k. The location of zeros are determined by the weights w1, w2,..., wk and the parameter p.. The discrete time gamma filter is depicted in Figure 15b, The discrete time gamma kernels and their corresponding z-transform are given by gk(n) = k-lItk(l-)nk for nk (58) Gk(z) = Iz-(l-I) = Gk(z) (59) The gamma filters in both continuous and discrete time are locally recurrent, globally feedforward structures. y(t) a) continuous time b) discrete time Figure 15 The gamma filter structures a) in continuous time and b) in discrete time. x(n) a) Effect of changing kernel order (k) gk,g(t) b) Effect of changing g Figure 16 Shape of gamma kernels effected by parameter and kernel order (p and k). The discrete gamma filter generalizes a tapped delay line structure. For the case of pt = 1, the gamma filter reduces to a transversal filter (or a finite impulse response (FIR) filter). The stability of the continuous time gamma filter is guaranteed because the system poles always exist in the left half plane of Laplace domain as long as p. > 0 is guaranteed. For the discrete case, the poles are located inside the unit circle in Z-domain as long as 0 < p. < 2 is guaranteed. So the stability problem in the discrete time is also trivial. The interesting point is that the functions constitute a non-orthogonal basis that is complete for signals of finite energy (in L2) [19]. Hence we can interpret the gamma filter output as a projection of the input signal in a linear space defined by the convolution of the input with the basis functions [20]. Figure 16 illustrates, in continuous time, an example of gamma kernel shapes as functions of the kernel order k and parameter p.. Note that the shape of the kernels is very similar when one selects the order for a fixed p., or when p. is selected for a given kernel order. The main characteristic of the gamma filter is that time in the filter taps is scaled by the feedback parameter p. (linear time warping). In other words, the region of support of the impulse response is controlled by the single parameter p. such that by changing p. the impulse response can be stretched out or shrunk as a rubber band (Figure 16b). In a signal processing framework, the parameter p. can be adapted with the output mean square error using a gradient descent approach, so that the best local features are captured by the filter [64]. The linear filters are often characterized in the frequency domain in terms of their magnitudes and phase responses. For the temporal processing of signals, linear filters such as the FIR, IIR and gamma filter can be understood as memory filters which are characterized by the memory depth and memory resolution of the filters. The memory depth D is defined as the temporal mean value of the impulse response g(t) of a filter used for storing temporal information [19]. D = -tg-(t)dt = -G(s) for continuous time (60) jg d s=O D = ng(n) = -zd-G(z) for discrete time (61) n=0 Zz = The memory resolution R is defined as the number of degrees of freedom (i.e. memory state variables) per unit time. As a memory filter, the impulse response of a kth order FIR filter at each tap is gk (n) = 5 (n - k) . The transfer function in the Z-domain is given by G (z) = z-k. The memory depth D of the filter is D = K. That is, the memory depth of an FIR filter is equal to the filter order. Due to the limitation of memory depth by the filter, a low order FIR filter has a poor modeling capability of low-pass frequency bounded signals [64]. On the other hand, the structure of an IIR filter contains feedback connections, and consequently the memory depth is not limited by the filter order. The IIR filters, however, suffer from stability problems. The memory depth of a kth order gamma filter for both discrete and continuous time domains is obtained by (61) D = k (62) Contrary to an FIR filter, a gamma filter has a memory depth uncoupled to the filter order. The gamma filter shares the property of IIR filters in memory depth due to locally introduced feedback between taps. This also relaxes the stability problem as long as the local feed back parameter is limited in the range of 0 < t < 2 for discrete case. The memory resolution of a kth order discrete time gamma filter is equivalent to the number of taps divided by the memory depth. R k (63) (63) can be written as k =D x R (64) For a given order k, the trade-off between memory depth and memory resolution is possible so that a very deep memory structure can be obtained at the expense of a low memory resolution. In an FIR case, the memory depth is the filter order and this is a special case of g = 1 in the gamma filter. 3.4.3 2-D Extension of 1-D Gamma Kernels Theoretically the gamma filter can be extended to N dimensions without problems. This is done by simply substituting t in (55) with an N dimensional basis vector. In this extension, circularly symmetric gamma kernels are obtained but more general cases can be considered. Specifically for our applications to 2-D (image) data, the gamma kernels can be extended by gn, t(k,1) = Cgn,(t) (65) where the constant C is a normalization factor. The resulting 2-D kernel has a circularly symmetric shape given by _g"4__ (k_2+12) n-i 12 gI(k, ) 2itn! (66) Q= { (k,l);-N5k,1 where Q is the region of support of the kernel, k the kernel order, and gi the parameter that controls the shape and scale of the kernel. As a result the resulting 2-D gamma kernels are circularly symmetric. Since the 2-D circularly symmetric gamma kernels are created from the corresponding I -D gamma kernels in the spatial domain, they preserve the spatial characteristics of the 1 -D gamma kernels. That is, the concept of a time warping parameter extrapolates to the spatial domain as a scale parameter that controls the region of support of the 2-D gamma kernels. If the 2-D gamma kernel (in (66)) defined in the Cartesian coordinate system is converted into the circular coordinate system, the memory depth of the 2-D gamma kernel can be easily obtained. By letting k = rcos0, I = rsinO and considering dk = rdr, dl = dO, (66) in the circular coordinated system is given by n+ 1 gn, (r, O) 2 tn_ rne-gr (67) The memory depth in 2-D spatial domain is expressed as D= frgn,,t(r,O)drdO (68) 0 0 (68) is equivalent to the 1-D case. This means that the spatial characteristics of I -D gamma kernels is preserved in a 2-D domain by the circularly symmetric rotation. Figure 17 shows the characteristics of 2-D gamma kernels in the spatial domain. The 1st order (n = 1) kernel has its peak at the pivot point (0,0) with an exponentially decaying amplitude. All the other kernels have a peak at the radius n/4, creating concentric smooth rings around the pivot point (Figure 17). For a fixed order (n = 15) the radial distance where the kernel peaks is still dependent upon the parameter i, as in the 1 -D case. 3.4.4 Gamma CFAR (,CFAR) Detector By analogy to the CFAR stencil, any combination of the first kernel with one of the higher order kernels produces a similar stencil, although the shapes with the 2-D gamma kernels are smoother. We call this stencil the yCFAR. It is interesting to contrast the 7CFAR detector with the CFAR detector. Target masking in the CFAR stencil is implemented by the first order 2-D gamma kernel so that target intensity mean is estimated closely around the center pixel under test, while a higher order kernel creates a clutter masking so that the statistics of clutter are measured in the roundshaped ring. With the yCFAR, we have a better handle on the shape of the kernel due to its analytic formalization. Figure 18 shows the yCFAR stencil for CFAR test. .10, -n 0 -40 n =1, g= 0.598 -4O -4 0 n = 15, gt = 0.916 . 10 -gp 20 -40 -4O n = 1, g = 0.357 . -'IjlPP"'- 20 -40 -40 n = 15, g. = 0.776 .10, II 0.81 0.6.] 0.] 0 40 -'9-. -20 -40 -4O n = 1, g = 0.236 -40 -40 n = 15, p = 0.598 Figure 17 2-D gamma kernels. 0.01. Since the parameter g. in 2-D has exactly the same function as the 1 -D counterpart, i.e., it shrinks or stretches the region of support of the image response, we can adaptively select p. to better perform CFAR test. In fact, after fixing the order of the kernel, we have a single parameter that controls its spatial extent, and we can derive equations that will change the parameter g. to minimize an output error. Figure 19 shows the block diagram of a one-parameter TCFAR detector. Two gamma kernels are linearly combined to form an output y of the one-parameter CFAR detector. We call this the one-parameter yCFAR detector a counterpart for the one -parameter CFAR detector in (49). The output can be written as Y = W gm, A. * X + W2gn, *,, * X (69) where o represents a convolution operator, w1 and w2 are weights, the kernel order m = I and n > 1, g.m and p.n are the parameters that control the extent of the kernel. It is expected that w2 needs to be negative so that the output will be high over areas of largest contrast as is the case in the one-parameter CFAR detector. In addition to the degrees of freedom associated with the controllability of pg for the kernel extent, the smooth shape of the gamma kernels yields smaller sidelobes than those associated with the CFAR stencil. From a signal processing perspective, as is the case in the one-parameter CFAR detector, the one-parameter )CFAR detector performs bandpass filtering but has an ability of correctly choosing the frequency bands of the bright pixel intensities in the image by adapting the region of support of the kernels, and the energy of pixel intensities can be better preserved in the selected frequency band with less loss due to the smoothness of gamma kernels. Interpreted from the projection point of view, the input signal (an image in the 2-D case) is still being projected onto a local basis obtained by convolving the input with the first kernel (m = 1) and a higher order kernel (n > 1). a) ' CFAR stencil b) intensity plot of the yCFAR stencil Figure 18 The yCFAR detector: a) the center kernel has an order of I and the rounding kernel is of an order 15; b) the rounding kernel defines a local area where the local statistics of mean and standard deviation are measured. The peaky kernel averages a pixel under test and the very closely neighbored pixels around a pixel under test. Figure 19 One-parameter CFAR detector. Due to the exponential amplitude decay from the pivot point, the spatial extent of the kernel can be truncated to a small value. This projection on gamma kernels of order m and n will not be complete, meaning that the input image can not be recovered from the projection. But as the CFAR experience shows, it will still preserve the important features to distinguish man-made objects from clutter. The decision rule in the yCFAR detector is defined as gm, P. * X xgn, g * X target > TCFAR <- (70) clutter where * represents the inner product operator, TyCFAR is a threshold for determining target or clutter, and 2gn, rt ,X2- (gn , X)2 (71) As shown in (70) and (71), the local mean and standard deviation are computed by a higher order kernel (n > 1) and the test pixel under test is averaged with its adjacent pixels less emphasized away from the pivot point. The yCFAR decision can also be recast as a discriminating function (gi*X)2-2(gnX) (gn*X) + (g, . X) 2 2 )2 >(72) --k2FAR (9n * X ) + T7yCFAR (9n < X)20 < 0 It is instructive to analyze experimentally the dependency of the accuracy of the CFAR test as a function of the size of the stencil, now that we have a parametric form for the stencil. The local image features computed by the convolution can be specified as projections of intensity and power onto gamma kernels, n, gn,t(i'j)x(ij) (73) i,j ~ 9n, g X2 = 1 g" , (i,j)x2(i,j) (74) i,ijr Note that these operations can be interpreted as FIR filtering with gamma stencil 9n, g (i, j). The output values can be viewed as estimates of local 1 st and 2nd moments respectively, which are sufficient to compute the local variance. The local standard deviation in (53) is needed to compute the two-parameter CFAR statistics, and has been shown to be an important measure in deciding whether a target is present or not [57]. The TCFAR detector separates targets from clutter based on the values of 4 features (gm * X) 2, (gin * X) (g" * X), (g" * X) 2, and gm * X2 which are needed to compute local mean and standard deviation. These terms enable a one-to-one comparison of the ",CFAR detector and the two-parameter CFAR detector, and are also used to study the benefits of using adaptive feature extraction and adaptive weights in Chapter 4. In conclusion, the potential advantage of the yCFAR detector implementation is that the extent of the localized mean and surrounding areas can be adaptively set. This could lead to fine tuning the area where local statistics are measured. The )tCFAR detector can be a promising device based on 2-D gamma filters as an extension of the conventional CFAR detector. 3.4.5 Implementation of the "CFAR Detector Computing the feature set at each pixel of the image amounts to correlating each of the two kernels gi, , and gn, , with both the original image and the image squared. Each kernel assumes a role of an FIR filter with rectangular support (size (N + I ) x (N + I)). The three base features ( g., 4 9 X, gn, P * X, and gn, 4 * X2 ) at a point (i,j) in the image are then obtained using a translated gamma kernels as (gnoX) = .,g,'(k+i,1+j)xP(i,j) p = 1 or 2 k 1 N N (75) Q = { (k,l);- where n stands for the kernel order and p indicates either the first or the second moment. Correlations are computed in the frequency domain using FFTs to obtain better computational efficiency. For processing large images and to avoid memory problems we divide the image into overlapping radix 2 windows which are individually processed and combined using an overlap and save method [59]. 3.5 Receiver Operating Characteristic (ROC) A plot of detection probability Pd as a function of false alarm probability Pf is referred to as receiver operating characteristic (ROC) curve. The ROC is a very important assessment for detection systems and depends on the probability density function of a observed signal under each hypothesis Hj, that is, P), Hj (Y Hj),j = 0, 1. Figure 20b depicts an ROC curve with the probability functions for two hypotheses. In Figure 20a, varying a threshold K controls the area representing Pd and Pf. The corresponding ROC curve is shown in Figure 20b. = e-Xu (x) H1) = oce-axu (x) ROC curves 0.9 t 0.8f 0.3 020.3 0 0.2 04 0.6 0.6 Pf b) Figure 20 ROC: a) probability density functions, b) ROC curves. CHAPTER 4 QUADRATIC GAMMA DETECTOR 4.1 Introduction The goal of a prescreener in the front-end detection stage was to locate potential targets and to eliminate a large amount of clutter in the sensor imagery. The output of the prescreener are regions of interest which need be further considered in the following stage, false alarm reduction stage. The regions of interest are indicated by the coordinates (xy) in the image and the further processing is applied to the locations reported by the prescreener. Section 4.2 briefly reviews the discriminant functions. Section 4.3 develops a novel detector (QGD) [63], which is extended from the two-parameter CFAR and 'yCFAR detectors, and discusses its discrimination power. In Section 4.5 a multilayer perceptron (MLP) structure is reviewed and its training and the generalization ability of the network is also discussed. Finally the QGD is extended into a nonlinear structure such MLPs, which is called the NL-QGD (Nonlinear extension to the QGD) in Section 4.6. 4.2 Discriminant Functions 4.2.1 Linear Discriminant Functions In general, a discriminant function is an operator that, when applied to a pattern, yields decisions concerning the class membership of the pattern. The action of a discriminant function is to produce a mapping from pattern space to attribute space. A general linear discriminant function in n-dimensional pattern space is of the form g MX = W 1x1 + w2x2 + ... + W nxn + Wn+ I = WTX+ wn+ 1 (76) where W is called the weight vector and wn+1 the threshold weight. The decision rule of a two-category classifier is implemented by the following property g(X) = wTX+wn + >0 if X e Co1 (77) 0 if X Eo2 The input pattern vector X is categorized into the class (o if g(X) > 0 and into w 2 if g(X) < 0. For a multicategory case, linear discriminant functions implement the decision rule: assign X to (oi if gi (X) > gj (X) for all j # i where gi (X) = WTx + w, + 1. In case of ties, decision is left undefined. 4.2.2 Generalized Linear Discriminant Functions The linear discriminant function discussed in the previous section can be extended to the generalized linear discriminant function of the form g MX = W lf1 MX + W2,2 MX +.. + Wnfn MX +' W, + (78 (78) = wTF where F = [fl (X) f2 (X) ... fn (x) I ] T are a real valued vector with function elements of the pattern X, and W is a n-dimensional weight vector. Any desired discriminant function can be approximated by a series expansion {fi (X) } by selecting these functions judiciously and letting n be sufficiently large. The discriminant function in (78) is not linear in the original input pattern space, but linear in the transformed pattern (F) space. The pattern F is therefore separated in the transformed space by a hyperplane but the pattern X by the hypersurface in the original pattern space. Therefore, the mapping from X to F reduces the problem to one of finding a linear discriminant function. 4.3 Extension to the YCFAR Detector Quadratic discriminant functions implement the optimal classifier for Gaussian distributed classes [8] [27]. A quadratic discriminant function in d dimensional space is d d-1 d d g(X) = WjjX + WkXjXk "+- Y WJXj + Wd+1 (79) j=l j=1 k=j+l j=l where w is a set of adjustable parameters. Probably the most common way to construct a quadratic classifier is to create a quadratic processor that creates all the terms of g(X), followed by a linear machine which simply weighs each one of these terms. Figure 21 depicts the implementation of the quadratic gamma detector (QGD) that follows this methodology. The QGD extends the yCFAR detector as a generalized form by exploiting all quadratic and linear terms of the two intensity features g. * X and g, * X in (70). Notice that in addition to 4 terms of the quadratic form, we included 3 more terms gm * X2, gm * X, and gn * X for a direct analogy with the CFAR detector. In fact, our two input features are the intensities at the pixel under test g. o X and the intensity in the ring neighborhood gn * X. From these quantities the traditional quadratic detector creates (gm * X) 2, (gn * X) 2, g' * X, gn * X, (gm e X) (g, * X), and the bias. For a direct comparison with the CFAR we will add gm * X2 and g, * X2 for a total of 8 features. The feature vector reads F, It,, = "' P. * X g"n * X gn, L. X2 gnX2 (gm g. 0 X) 2 (g, 4. * X) 2 (gm, P. * X) (gn, p. * X) 1 (80) and the quadratic detector becomes target Y > non-tarl W1 W2 W3 W4 W5 W6 Feature Expan 9n, gm, R m Regions of interest declared by the prescreener. Figure 21 Quadratic Gamma Detector (QGD). y = WTFg.,g =W I(gm, gt 0 X) + w2 (gn, gt 0 X) + w3 (gin, , X2) + w4 (gn, X2) + W5 (gn, gm 0 X) 2 + w6 (gn, g 0 X) 2 + w7 (gm X) (g,, tX) + w8 * T (target) T (clutter) (81) where W = [w1 w2 w3 W4 w5 w6 w7 w8] T (82) With this formulation, we can understand a little better what was said previously regarding the restricted nature of the two parameter CFAR detector when seen from a pattern recognition stand point. From pattern recognition theory we know that any quadratic or higher order polynomial decision function can be implemented with a linear decision function if the feature vector is appropriately expanded [81]. The added features therefore help increase the separability between target and clutter in pattern space. The weight vector W is obtained during a training procedure which will be discussed in the following section 4.3.1. The discrimination between the two input classes (target and clutter) is done using a single threshold. The discrimination function is a quadratic function of the image intensity features extracted by the gamma kernels, which leads to naming the detector as the QGD (quadratic gamma detector). The parameter vector for the yCFAR (or equivalently for the two parameter CFAR) is W =[0 0 0 -T 1 P -2 0] T 0= (83) where some of the parameters are set to zero and others are fixed. Since the increase of the number of free parameters of a system is coupled with more flexibility, we can improve the performance of the yCFAR detector by creating more parameters and adapting them with representative data. 4.3.1 Training the OGD In an adaptive pattern recognition framework, the free parameters of the classifiers which define the positioning of the discriminant function for maximum performance are learned from a set of representative data. This is called the training of the classifier. Given a set of training image chips {X1,X2,..., XN} centered around points of a known class, we compute the corresponding feature vector F. The corresponding desired values of the chips are l's for target class and O's for clutter which construct a desired vector d = {d1, d2,..., dN IT. 4.3.1.1 Closed Form Pseudo-Inverse Solutions If the mean square error between the system output and the desired response is the cost function, there is an analytical solution to the problem [30]. The method solves in the least square sense an overdetermined (assuming N > 8) system of linear equations for the unknown coefficient vector W by using the Moore-Penrose pseudo-inverse. minll d - FWII 2 (84) w yielding W = (F F) -'FTd (85) When there exist more weights than equations, the system is under-determined and an optimal weight can be calculated by W = FT (FF) -ld (86) In either case, when the pseudo-inverse has zero eigenvalues, singular value decomposition techniques can be used to select the optimal weight vector which has the smallest Euclidean norm [32]. Since the feature vector F is a function of the parameter p. the problem we are facing is in fact a parametric least squares, which does not have a closed form solution due to the nonlinear dependence on the parameter p.. There are two possibilities for solving this problem. Either we first determine the best values of gm and p.n through an exhaustive search in the parameter space, or we have to use an iterative approach to find both the weight vector, gtm and ptn" One remark is necessary at this point. Training of an adaptive system needs to follow certain steps for good results [1]. In particular the training samples should be representative of the conditions found during the testing, and they should outnumber (at least) 10:1 the number of free parameters in the network. For the QGD the minimum number of training exemplars is easily met. The training and testing procedures are depicted in Figure 22. The methods of finding a weight vector are further discussed in the following two subsections. 4.3.1.2 Iterative Solution Based on Gradient Descent An optimal set of g.m and gn can be determined through the parameter space search to provide a minimum false alarm for 100% detection in the training set. However, the search of the parameter space p. is two-dimensional so that finding p. becomes computationally expensive. Computing both the weight vector W and the values of p.m and p.n can be accomplished by using a gradient descent method (LMS algorithm), borrowed from adaptive signal processing theory. It is known that the LMS method converges to the MoorePenrose solution of the overdetermined least squares problem [35]. Using the method of gradient descent, the weights and the parameters are adjusted in an iterative manner along the error surface with the aim of moving them progressively toward the optimum solution. Figure 23 depicts an adaptation scheme for the QGD parameters and weights. Training subimage a) T Testing imagery raining phase of QGD b) Testing phase of QGD Figure 22 Training and testing of the Quadratic Gamma Detector (QGD). desired response error training subimages Figure 23 Adaptation scheme for parameters and weights of the QGD. In the batch mode of adaptation, weight updating is performed after the presentation of all the training examples (this constitutes an epoch). For a given set with N training examples, a cost function can be defined as Eav E(k) = -Id(k) -y(k)IP 1 av N Ear= I I E (k) n=1 N Np Id (k) - y(k) IP (87) k= I where E(k) is the p normed absolute value of an instantaneous error. The adjustment applied to the weights (MK and the parameter (p) are derived AW Eav N -1 -, sign(d(k) -y(k))Id(k) -y(k)IP-lj4,(k) k= N A-sign (d (k) - y (k)) Id (k) - y (k)Ip -F (k) 11j=115(88) 3Eaw k= 1 N k= sign (d (k) - y (k)) Id (k) -F1k) p - F, (k) sign (d M- (k) (89) where ay F yw (90) a T a~ and 1 if i>0 sign(W) = -1 if W We can compute the derivative of the feature vector as agre, X ag,, ox 0 agm, gt. ï¿½ 2 ag,. 0 (agm Mt ï¿½X 0 (gn, g." X) -' * X 0 ag n 0 agn, Pt. DRx 0 agn t., X 2 0 2 ( gn, Pt, X ) 0 n X (agn ) (g," X)n 0 (n+ 1) p2n 2 n -1 g + I 2n ni .(11, 12) - 27tn! (11+12) e 27tn! 1 +12) e (n+l) (n 1) (gn,gt ('11,12) -gn+l, it(ll,12)) 192) (93) The adaptations of W and tn can be summarized as W(k+ 1) = W(k) +AW(k) gn (k+ 1) = gn (k) +AN (k) One problem of adapting .t is to revisit the 2-D image domain and to run a convolution operation for extracting feature vector F at each iteration. This is computationally intensive. Since the gamma kernels are circularly symmetric, we first project the convolved image to the radial direction, and then use the already developed 1-D algorithm to adapt gt. This will be discussed in details in the following section. a~ml' A. ï¿½agm,, t where (94) 4.3.2 1-D Implementation The feature vector F is constructed from the convolution of gamma kernels and training subimages. The adaptation of g. requires the extraction of a new feature vector F based on a different kernel shape; that is, we would have to go back to the image plane and compute a new feature vector F after every epoch. The 2-D convolution takes O(N4) multiplications, so for each training epoch, the adaptation of g. would be extremely computationally expensive even for small training subimages. Fortunately, the 2-D gamma kernels are circularly symmetric so, without loss of information, the result of the convolution can be projected radially to 1 -D (see Figure 24) as follows: The pixel intensities in concentric annulus around the pivot point are converted to a 1-D signal, and this signal is then convolved with 1-D gamma kernels. Figure 24 Image converted into 1 -D radial energy. Therefore, we convert the training subimages to 1 -D radial profiles, generate 1 -D gamma kernels and utilize the recursive implementation of the gamma kernel to perform the convolution. This scheme reduces much of the computational burden of 2-D implementation. The 2-D gamma kernels are reduced to the forms of I-D gamma kernels. 2itn g2( n g,,4 (l1, 12) (95) The adaptation of the weights W and of g. can be performed using the LMS (least mean square) algorithm in 1-D. 4.4 Comparison of the OGD with the CFAR and the 7CFAR Detectors To draw a parallel between the QGD decision function (in (81)) and the two parameter CFAR detector (in (51)), we first observe that Xc and are estimates of the local mean and variance, respectively, around the test cell X, and that the standard deviation can be computed from the first and the second moments by (52) and (53). (54) can be interpreted as a linear decision function with a fixed set of weights for a given TcAR. The measurements that appear in this equation correspond closely to some of those used in the QGD (in (81)). Specifically, g. It. 9 X has the same role as X,, and gn, P. e X corresponds to the local mean, Xc. Similarly, for the second moment we have correspondence between 9n, 4, 0 X2and C" With this formulation for the QGD we preserved the similarity to the two parameter CFAR detector (the types of features used) but have generalized it with respect to: (1) the shape of the kernels used for the estimation of the mean and variance and (2) the selection of the weights of the decision function which are not chosen a priori but are found through optimization. Note also that the QGD has two additional linear terms and a bias term. The two parameter CFAR detector is therefore a special case of the QGD. In this formulation however, we can no longer guarantee the constant false alarm rate property of the two parameter CFAR which is achieved for Gaussian clutter statistics. 4.5 Artificial Neural Networks (ANNs) 4.5.1 Introduction Human being performs many complex tasks such as speech and image recognition with relative ease which are very difficult to solve using traditional algorithmic computing techniques. Much effort has been made on the development of machines which have human-like information processing capabilities. A neural network model, a simplified model of brain, is motivated from the neuroanatomy of living animals with cells corresponding to neurons, activations corresponding to neuronal firing rates, connections corresponding to synapses, and connection weights corresponding to synaptic strengths. The neural network model is also called a connectionist network and consists of a set of computational units and connection weights between the units, with the processing tasks being distributed across many units. Most of the current neural network models are far from realistic biological neural models but they serve as good models for essential information processing that organisms perform. In this chapter, one of the most popular neural models, a multi-layer perceptron (MLP) model is discussed along with its learning algorithms, validation and generalization problems. The idea of the MLP here is used for extending the QGD into a non-linear structure, which will be later discussed in Chapter 4.6. 4.5.2 Multi-layer Perceptrons (MLPs) A multi-layer perceptron (MLP) has played a central role in neural network modeling, which is probably the most widely used ANN. An MLP is a feedforward network in which the network input is propagated forward through several processing layers before the network output is obtained. Each layer is composed of a number of nodes, and each node forms a weighted sum of inputs from the nodes in the previous layer and nonlinearly transforms the sum through a bounded, continuously increasing nonlinearity. A multi-layer feed-forward network is shown in Figure 25. Learning is accomplished by minimizing the error between outputs and target values [68], or by maximizing an entropy measure on the outputs. hidden layers Figure 25 Multi-later perceptrons (MLPs). The network learning based on gradient descent requires knowledge of the derivative of the activation functions associated with neurons so that the activation functions need be differentiable with respect to the network weights. The sigmoidal and hyperbolic tangent functions are commonly used as differentiable activation functions in the MLPs. Thus the network output is a continuous (continuously differentiable) function of every weight in the network, enabling it to be trained using gradient descent rules. The availability of such learning algorithms popularized the MLP. The model structure does not depend on the learning rule. It is known that in classification tasks a three-layer MLP network with threshold activation functions could represent an arbitrary decision boundary to arbitrary accuracy [38]. For functional approximation, a three-layer MLP with sufficient nodes in network outputs network input layer output layer the hidden layer could approximate, to arbitrary degree, any continuous nonlinear function [37]. It is obvious that an MLP can be used as a tool for providing a general framework for representing non-linear functional mapping between the input space and the output space. 4.5.3 Training MLPs A neural network has the ability of the network to learn from its environment. A neural network adjusts its free parameters towards minimizing or maximizing a criterion function in an iterative manner during its learning period. There are two general learning paradigms, supervised and unsupervised learning. In supervised learning the output of a network is compared with a desired response and the error between the outputs and the desired responses are used to correct the network parameters. In unsupervised learning any desired responses are not provided to the network. The network discovers for itself interesting categories or features in the input data. In the mid-eighties, a gradient descent learning algorithm called Back Propagation (BP), which enables MLPs to learn arbitrary functional mapping, stimulated considerable interest in learning systems. BP is a supervised learning mechanism in feedforward networks using a cost function and gradient descent which is the most widely used training algorithm. This section discusses BP algorithms (on-line learning and batch learning) for a network having a feed-forward topology and differentiable non-linear activation functions for the case of a differentiable cost function. 4.5.3.1 On-line Learning In the on-line learning process, a network uses only the information provided by a single training example {x(t), y(t) } when the network parameters are adapted. Consider an epoch of N training examples in the following order: { x(l), d(l) } ... {x(l), d(N)}. For a training pattern, each node computes a weighted sum of inputs of the form vj(n) = ,wji(n)yi(n) (96) i where Yi(n) is the activation of a node or input, which is connected to nodej and wji(n) is the weight associated with that connection. The sum in Eq 68 is transformed by a non-linear activation function f(.) to give activation yj{n) of nodej in the form yj (n) = f (vj (n) ) (97) Since the activation outputs are successively computed layer by layer this process is often called forward propagation. Now, the instantaneous error signal at the output of node k at iteration n is defined by e (n) = dk (n) - yk (n) (98) On-line back propagation learning generally attempts to minimize the sum of squared instantaneous errors (SSE) at the nodes in the output layer. The SSE is defined by K (n) = I e (k) (99) where K is the number of nodes in the output layer. For a given training set, E(n) represents the cost function as a measure of training set learning performance. The object of the learning process is to adjust the free parameters of the network so as to minimize the cost function. The adjustments to the weights are made in accordance with the respective errors computed for each training example to the network. Gradient descent learning algorithms adapt the weights in the direction of the negative gradient of the cost function. The correction Awji (n) applied to wji(n) is defined by the delta rule Awji (n) = - w(n) (100) where 'q is the learning rate of the BP algorithm. The learning rate controls the speed of network training and affects the network stability. For a nodej in the output layer, the par- tial derivative, aE (n) /awji (n), can be factorized in the form by the chain rule as follows E -(n) -E(n) avj(n) aWji (n) c-vj. (n) awji (n) av ' (n) i --j(a Wji (n) (101) where 8. (n) = -aE (n)/ vj (n) is a local gradient at a node in the output layer, which J is a sensitive factor of the cost function with respect to the output of a node i before the activation function. Differentiating vn) with respect to wji(n) yields avj(n) jin) - Y(n) (102) The local gradient 6. (n) at a node j in the output layer is obtained by 6j(n) = ej(n)p'(vj(n)) (103) Accordingly, the correction Awji (n) with (102) and (103) can be written by Awji(n) = rj.(n)yi(n) atanodejin the output layer (104) For a node in a hidden layer, the local gradient can be computed recursively in terms of the local gradients of all nodes in the next layer in the forward direction. (E(n) n vj(n) DE (n) aYk (n) k bvk(n) Dvj(n) I Yk (n) = X~k(n) avj(n) k at a nodej in a hidden layer (105) where the sum runs over all units k to which the nodej sends connections. The net activation level at node k is Vk(n) = Xwkj(n)yj(n) (106) I The partial derivative in (105) can be rewritten ayk (n) aYk (n) avk (n) avj (n) -vk (n) avj (n) = a' (Vk (n)) wkj() (107) Thus, using (105) and (106), we obtain the following back-propagation formula (n) = Y'(vj(n))>18(n)wkj(n) atanodejinahiddenlayer (108) k The local gradient at a node in the jth hidden layer can be computed by propagating the local gradients backwards from the nodes in the j+ I layer. 4.5.3.2 Learning Rate and Momentum The network training based on the gradient descent method requires to choose a suitable value for the learning rate 11. The effectiveness and convergence of the BP learning algorithm depend significantly on the value of the learning rate Tj. When the curvature of the error surface varies significantly with the direction of interest in the weight space a large value of 1 will result in oscillation in the error surface and, for a fairly flat error surface, a small value of il will result in a very slow convergence in the error minimization process because the weight adaptation applied to a weight is proportional to the derivative of E with respect to the weight. Only small learning rates guarantee a true gradient descent, increasing the total number of learning steps that needed to reach a satisfactory solution. One of the simplest methods in accelerating convergence speech is addition of a momentum term in the weight adaptation (in (104)). Awji (n) = 1 j (n) yi (n) + aAwji (n - 1) (109) where (x is a momentum rate and Awji (n - 1) is a weight correction when the n-I th training example is present at the network input. Summary of Back Propagation Algorithm 1. Determine a network topology, initialize randomly all weights and set a learning rate and momentum rate. 2. Repeat until a network performance is satisfied. 2.1 Forward Propagation step: Feed-forward the training examples into the network and compute the activations of all nodes 2.2 Calculate errors in the output layer and compute the local gradients at all nodes 8j(n) = eJ(n) Y' (vj(n) ) atnodejintheoutputlayer (110) 86 (n) = ' (vj (n) )1 X_.j(n) wkj(n) at node in a hidden layer (111) k 2.3 Update the weights wji (n + I) = wji (n) +il 8j (n)Yi (n) + oAwji (n -1) (112) 4.5.3.3 Batch Learning In on-line BP learning, the weight adaptation was performed after the presentation of each training example. The local gradient estimates use only a single piece of training information when the weight adaptation is performed. In batch BP learning, the weights are adjusted after presenting the entire set of training examples. The batch learning provides a more accurate estimate of the gradient vector. The instantaneous squared errors are summed up over the entire training examples and the average of the total squared errors can be defined as a cost function Ia N K ) - 2N I e(k)2 (113) n=lk= I where Nis the number of training examples. A correction Aw.i applied to a weight wji is proportional to the negative gradient of the cost function Wji-" -1 Eav T1 N aej (n) -TV ej(n) swji (114) n=l 4.5.4 Validation of A Neural Model In any learning system, the validation of system learning is one of the most important parts because the objective of network training is not to learn an exact representation of the training data itself, but rather to learn the underlying statistics of the process which govern the data. Therefore, during the learning phase, the network should assess how well it has learned the training data and is able to generalize unforeseen data. In order to develop quantitative techniques to evaluate a network's performance with real-world data, rigorous mathematical foundations must be developed to determine the characteristics of the training set and the network's ability to generalize from the training set. However, although the question of whether a network possesses the ability to generalize correctly (or sufficiently accurately) is still unsolved, most learning algorithms can successfully learn a set of training examples given a sufficiently flexible model structure or an appropriate learning algorithm. A popular methods in assessing the validity of trained networks is to split the available data into two sets, a training set and a test set. The training set is further split into two subsets; one set used for training the network and the other for evaluating the performance of the network during the training phase [80]. 4.5.4.1 Bias/Variance Dilemma A network model that is too inflexible will have a large bias, while one that is too flexible will have a large variance. This can be explained by the bias/variance dilemma [28]. The MSE performance measure for a given training set can be decomposed into the sum of two components which reflect the squared bias of the network error, and the variance of the estimates of the trained network. Let D denote a training set, y(x,D) ne the output of a network trained using the data contained in D and ED(.) denote the expectation operator taken over all possible training sets, then the output error is given by ey (x) =(x) - y(x,OD) (115) The effectiveness of the network as a predictor of 9 (x) by calculating the MSE for all possible training sets D is given by ED (ey (x)) =ED ( (x) - y (x, D) ) =(5(x)-ED(y(x,D))) +ED((9(x)-y(x,D))2) The first term in the right hand side is called the square of the bias, and the second term estimates the variance of the network approximations. The bias measures the average modelling error and the variance measures how sensitive the network modelling is to a particular choice of data set. When the network is too flexible, a large variance occurs so that the performance of the network is very sensitive to a particular training set. This results in a poor MSE performance. The network also produces a poor MSE performance when it possesses too little flexibility which causes a large bias. Bias and variance are complementary quantities. In general, a network should be flexible enough to ensure that the modelling error (bias) is small but should not be over-parameterized because its performance is highly sensitive to a particular training set (high variance). 4.5.4.2 Network Complexity and Early Stopping The problem of choosing a network complexity (in an MLP) is to choose the size of the free parameter set used to model the data set (in terms of number of nodes and hidden layers). A good estimate of the true performance of a network is required to determine whether the complexity of a network model is effective for a particular data set. If the model complexity is increased and this resulted in a lower modelling error, it would indicate that the network is not flexible. Similarly, if a more flexible model produced a higher modelling error, this would indicate that the network overfitted the training data set and a simpler network would be preferred for a better generalization. An alternative to obtaining the effective complexity of a network is the procedure of early stopping. Typically, iterative training of learning systems reduces a cost function error as more iterations are made during the training phase. However, the generalization error measured with respect to a validation set often decreases as the number of iterations increases and starts increasing after a certain iteration point where the training causes the network to over-fit the training set. For a good generalization, training can be stopped at the point the generalization error starts increasing. This is referred to as the cross-validation and this procedure provides an appealing guide to better generalization performance of the network. 4.6 Nonlinear Extension To QGD (NL-OGD) 4.6.1 Introduction One of the possible extensions to the QGD is to augment the output adder with a set of nonlinear processing elements which nonlinearly combine the feature elements, i.e. to implement a neural classifier such as an MLP. The structure is called the NL-QGD (NonLinear extension to QGD). Since the MLP is capable of creating arbitrary discriminant functions this extension has the potential to improve performance. Note that the QGD creates a quadratic discriminant function of the image intensity, which is only optimal for unimodal probability density functions [8] [27]. Moreover, a nonlinear system normally generalizes better, so the performance in the test set can also improve. In order to fully use the neural network approach we have to develop an iterative algo- rithm to adapt both the weights and g. (using the backpropagation procedure). Availability of on-line methods of adapting gt effectively means that the detector output error can be used for refining gt, or equivalently, to search on-line for the optimal guard band. The relationship between the minimum number of false alarms and the minimum mean square error becomes crucial now, since the system will be adapting g. for the smallest MSE. The cost function used should correspond to the minimum false alarm rate, since this is the basis of scoring. Alternate error norms may have to be used to adapt the weights to guarantee that the minimum cost corresponds to the minimum false alarm rate. Figure 26 displays the block diagram of the NL-QGD. As indicated in this figure the feature expansion is still the same as that of the QGD. In Figure 21, we can think of the QGD as the linear part of one of the hidden layer processing elements. The two basic issues that need to be addressed when using a neural network are the training and the net topology. 4.6.2 Training the NL-OGD In order to fully develop the neural network approach, an iterative learning scheme is required to adapt the weights and the parameter gt. Availability of on-line methods of adapting p. effectively means that the detector output error can be used to search on-line for the optimal local area and guard band. The NL-QGD is trained with a desired signal d (Is for the target class and Os for the non-target class) using a back-propagation algorithm [68]. d(n) 1 feature vector belongs to the target class d()= {(117) 0 feature vector belongs to the non -target class The sum of squared errors is utilized in this work, i.e. 1in112n -(118) mint E id (n) - z (n)[12 (118) n= I e(n)l. d(n) - V(2) yi(n) Feature Expansion training PWF and PWF2 subimages 4subimages Figure 26 Adaptation scheme of the NL-QGD Moreover, in order to adapt the parameter gt which controls the scale of the CFAR stencil, the error generated at the detector output is back-propagated up to the input layer. The decision boundary of the NL-QGD is therefore formed in conjunction with the parameter g.. The correction Ali (n) at each iteration is proportional to the instantaneous gradient bE (n) /a (n) . According to the chain rule, this gradient is expressed as follow, aE (n) _ E (n) vi (n) (119) agt(n) a+ i (n) n (119 where the index i runs over the nodes in the first hidden layer. vi(n) is the output before the nonlinear transformation is applied at the ith node in the first hidden layer and is expressed by P Vi (n) = Xwi(n)fp (n) (120) p= I where the feature elementfp(n) of F is given by (80) at each iteration n and F(n) = fl (n),f2(n), ...,fp(n)] T withP=8. The local gradients aE (n) /vi (n) can be computed by (105). Therefore, (119) is written with the partial derivative of vi(n) with respect to p.(n) as =E(n) Io w i pn) = ( - i (n) J i ( (n) P afP (n =i (n) Wip (n) it ( n) p=I i P afp (n) 1 8 p(n) agt (n) p= 1 n F (n) AT(n) agt(n) (121) where Si (n) and 5p (n) are the local gradients at the ith node in the first hidden layer and pth node in the input layer respectively. The local gradient vector in the input layer is defined as Ap (n) = [81 (n) 82 (n) ... 8p (n) ] T. wip(n) is a weight between the pth 84 node in the input layer and the ith node in the first hidden layer. The gradient DF (n) /Ji (n) is given by (92). The adaptation of the parameter g. is given by .(n + 1) =p.(n) E g (n) + PAp gj (F I (n) (122) where P is the step size, and j = n, m (one equation for each kernel). CHAPTER 5 TRAINING STRATEGIES FOR NL-QGD 5.1 Introduction Neural network models are usually defined by three major parts: architecture, cost function and training algorithm. Here the cost function typically measures errors between network estimates and the actual outputs from the training data and plays an important role as a guidance for network optimization. The sum of squares error (SSE) function (L2 norm) has been widely used as an optimality index for network training. There are many other possible choices of cost functions which can be also considered, depending on particular applications. Minimizing the SSE is equivalent to the maximum likelihood principle for Gaussian distribution errors in a regression problems where the goal is to model the conditional distribution of the output variables given the input samples. When used as a classification device, a feed-forward neural network is normally trained with binary desired responses (0 or 1) given training data during the training period. In order to guarantee that the outputs represent probabilities, the sum of the output values should be equal to one and each output value should lie in the range (0, 1). For classification problems, the error distribution is far away from the Gaussian distribution because the target variables are binary. For two-class classification problems, the error has the form of a binomial distribution and multinomial distribution for multi-class classification problems in which the network has the same number of output nodes with the number of class membership. Cross-entropy can also be used as a cost function for network optimization [2] [36] [79]. The cross-entropy cost function maximizes the likelihood of observing a training data set during the training. In this sense, the cross-entropy is theoretically the most appropriate cost function for network optimization in the case of binary classification [69]. However, when a neural network is used as a detector with a single output node in the output layer for two classes and supposed to detect samples from one particular class and reject samples from other classes as much as possible, the network needs to be trained on an optimality condition which is different from the classification problems. Presently this optimal condition has not been formulated. So the problem has to be solved experimentally. Through the search of cost functions for network optimization, the performance of the two detectors (QGD and NL-QGD) are discussed based on cost functions used for training. 5.2 Optimality Index 5.2.1 L Norm One of the most popular norms in training networks is the L2 criterion. For a network with linear outputs, training the network leads to an optimal solution, the least squares solution. The L2 norm is an appropriate choice for normally distributed inputs in the sense of both minimum cost and minimum probability of prediction error (maximum likelihood). Consider a set of training pairs {x(n), d(n)}, n = 1, 2,..., N. Assume that the input vectors xi are drawn randomly and independently from a normal distribution. The objective of network training is not to memorize training data, but rather to model the underlying generator of the data. Therefore the best possible prediction for desired responses d can be made when new data is presented to the trained network. A network training scheme is shown in Figure 27. The most general and complete description of the generator of the data is in terms of the probability density p(x,d) in the joint input-target space [5]. e(n) Figure 27 A network training scheme. The joint probability of the input and desired responses can be expressed in terms of the conditional probability density of the target data and unconditional probability density of the input data, p (x, d) = p (xl d)p (x) (123) where p (xl d) is the probability density of the target variable given an input x, and p(x) is the probability density of the input given by p(x) = fp(x,s)ds (124) By the principle of maximum likelihood, we want to maximize the joint probability density in the input-target space given a set of input and target data. It is often convenient to minimize the negative logarithm of the likelihood, p(x,d) since the negative logarithm is a monotonic function. We therefore minimize E= -In p(x, d) =-ln p(d (n)Ix(n) -ln p(x (n)) (125) n n x(n) Since the second term in (125) is independent of the network parameter, minimizing E is equivalent to minimizing the first term so the error function can be written as E = -Xln p(d (n) Ix (n)) (126) n We assume that the input data has a Gaussian distribution, the target value has a form of deterministic function with added Gaussian noise ek, so that dk = hk(x;w) +ek (127) We now want to model the deterministic function hk(x) by a neural network with output yk(x,w) where w is the set of weight parameters determining the network mapping. The distribution of e(n) is given by 1 (e2 p(ek) = 1 eexp(--e2 (128) 2in cy 202) Using (127) and (128), the probability density of the target variables is given by p(dkx) = I-exp1 (; 2- ) (129) ,2_n 2(y The mapping function between input and target, hk(x), was replaced by the network model yk(x;w). Substituting (129) into (126), we obtain the error function [5], 1 N C NC E = 2 I2 , {Yk(x(n);w) -dk(n)12+NCIn+ 2 ln(27t) (130) 2 n = lk =I1 where C is the number of classes and N the number of exemplars. In (130), the second and third term are independent of the network parameters w and can be omitted. The premultiplication factor in the first term can also be omitted. The error function then has a form of the SSE function N C 1 {Yk(x(n);w) -dk(n)12 (131) n= lk= I Here, the SSE cost function was derived from the principle of maximum likelihood on the assumption of Gaussian distributed target data. 5.2.2 Training with Excluding Outliers from Non-Target Class with L2 Norm In the training data for the QGD and NL-QGD, it is observed that some of the target and non-target class data overlap because when a pixel intensity significantly exceeds the mean intensity in its surrounding region the two-parameter CFAR or yCFAR detector declares the pixel as detected because targets and man-made objects (non-targets) are usually brighter than background in their surrounding areas. The objective of training the NL-QGD (or any detector for that matter) is not to obtain minimum classification error but to achieve a minimum false alarm rate while maintaining a high probability of target detection. The training of neural networks are mostly intended to produce minimum classification error in given training data and to well generalize the data which has never been seen by the network. This leads to partitioning of the input space into subregions according to the number of classes and usually places a decision boundary among highly populated regions of classes. In two-class detection problems, the input space is partitioned into two subspaces, one for the target class and the other for the non-target class. The NL-QGD produces very low values of outputs for target data which are far away from the decision boundary and reside in a deep region of the non-target subregion. These low output values really degrade high probability of target detection because a threshold for the NL-QGD is set based on minimum target output value for a 100% target detection. It is therefore desired that a decision boundary be placed in order to encompass all target samples into the target subregion and exclude non-target samples by as many as possible from the target subregion. This, of course, may not yield minimum classification errors but leads to obtaining high probability detection in favor of one class. Figure 28 illustrates the effect of outliers in placing a decision boundary in the input space. In Figure 28a, the data set contains 15 samples (marked by "*") for target class and 15 samples (marked by "o") in the input space. The desired responses are 1l's and -1 's respectively for the target class and non-target class. In Figure 28a, the decision boundary #1 was configured based on the LS solution by using all samples while the decision boundary #2 was formed by the LS solution which was computed after removing four non-target outliers ((-3,1), (-2,2), (-1,2) and (-1 1)) from the data set. By removing the four non-target outliers when computing a LS solution, the decision boundary moves in the direction where the target outliers are located so that more target-class samples are included above the decision boundary #2. Figure 28b plots the outputs of the data set based on the two LS solutions. From the decision boundary #1, 10 false alarms occurs in order to detect all target outputs. However the decision boundary #2 yields 8 false alarms for all target detection. There is a lack of theoretical foundation to design cost functions for network training in such a case. A simple way of implementing the idea is to train the NL-QGD, excluding outliers from non-target samples during the training period. Since outliers from the training data cause low output values for target class and high output values for non-target class, this yields large error values which are used to correct network parameters during training. Error values serve as forces that position the decision boundaries in the direction of the corresponding samples. Therefore removing non-target outliers helps the decision boundary to shift towards the direction of target outliers. This can yield larger output values of the NL-QGD from target outliers and reduces false alarms correspondingly on the operating point of high probability target detection. utliers (x1 Ix2) (-3, 1), (-2, 2) (-1, 2),(-1, 1) Least Squares outputs from decision boundary #1 Least Squares outputs from decision boundary #2 minimum target output 1 False alarms 0 S 11 .5 - target outputs ,-4-- non.target outputs 5 10 15 20 25 30 Figure 28 An example of outlier effect to decision boundary formation and false alarm rate. |

Full Text |

PAGE 1 FOCUS OF ATTENTION BASED ON GAMMA KERNELS FOR AUTOMATIC TARGET RECOGNITION By MUNCHURL KIM A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1996 UNIVERSITY OF FLORIDA LIBRARIES PAGE 2 To my mother, who loved and supported his son with a great deal of endurance in spite of many difficulties during the period of his sonÂ’s study. 11 PAGE 3 ACKNOWLEDGEMENTS I would first like to thank my advisor. Dr. Jose C. Principe, for his patient guideance and inspiration during my study. His availability to his students and the abundance of the ideas he offered are irreplaceable. Special thanks are due to Dr. John G. Harris for constructive comments to this study. 1 also thank my committee members Dr. William W. Edmonson, Dr. Jian Li, and Dr. Joseph N. Wilson for serving on my supervisory committee. 1 would like to thank John Fisher 111 for sharing his broad knowledge and large experience of radar with me and for helping me conduct ATD/R research in the Computational NeuroEngineering laboratory. A special thank is due to Frank Candocia for his careful proofreading of this dissertation. I am especially grateful to my parents who supported and prayed for their son with a great deal of love. Finally, I would like to thank my wife, SoYoung Lee, and my son, Mujoon, for their love during the course of my studies. PAGE 4 TABLE OF CONTENT page ACKNOWLEDGEMENTS iii ABSTRACT viii CHAPTERS 1 INTRODUCTION 1 I . I Automatic Target Detection/Recognition Technology 1 1.2 A Multi-stage Automatic Target Detection/Recognition System 2 1.3 Overview of the Dissertation 6 2 SYNTHETIC APERTURE RADAR (SAR) DATA DESCRIPTION 7 2.1 Introduction 7 2.2 SAR Image Formation 8 2.2.1 Range Processing 9 2.2.2 Azimuth Processing 13 2.3 ISAR Image Formation 16 2.4 Preprocessing 20 2.4.1 Polarimetric Clutter Model 20 2.4.2 Polarimetric Whitening Filter (PWF) Processing 23 2.4.3 Preprocessing SAR Imagery 26 2.5 SAR Image Visualization 28 2.6 Examples of SAR Images 30 2.7 Target Embedding Strategy 31 2.7. 1 Development of a Target Embedding Method 31 2.7.2 Embedding the TABLIS 24 ISAR Target Data into the Mission 90 Pass 5 SAR Data Set 34 3 PRESCREENERS 38 IV PAGE 5 3.1 Introduction 38 3.2 A One-parameter Constant False Alarm Rate (CFR) Detector 38 3.3 A Two-parameter CFAR Detector 40 3.4 Extension to the Two-Parameter CFAR Detector 43 3.4.1 Introduction 43 3.4.2 Gamma Filter and Gamma Kernels 43 3.4.3 2-D Extension of 1-D Gamma Kernels 49 3.4.4 Gamma CFAR ( 7 CFAR) Detector 50 3.4.5 Implementation of the 7 CFAR Detector 56 3.5 Receiver Operating Characteristics (ROC) 57 4 QUADRATIC GAMMA DETECTOR 58 4.1 Introduction 58 4.2 Discriminant Functions 58 4.2. 1 Linear Discriminant Functions 58 4.2.2 Generalized Linear Discriminant Functions 59 4.3 Extension to the 7 CFAR detector 60 4.3.1 Training the QGD 63 4. 3. 1.1 Close Form Pseudo-Inverse Solution 63 4.3. 1 .2 Iterative Solution Based on Gradient Descent 64 4.3.2 1-D Implementation 69 4.4 Comparison of the QGD with the CFAR and the 7 CFAR Detectors. . . 70 4.5 Artificial Neural Networks 70 4.5.1 Introductions 70 4.5.2 Multi-layer Perceptrons (MLPs) 71 4.5.3 Training MLPs 73 4.5.3. 1 On-Line Learning 73 4.5. 3.2 Learning Rate and Momentum 76 4.5. 3. 3 Batch Learning 77 4.5.4 Validation of A Neural Model 78 4.5.4. 1 BiasA^ariance Dilemma 78 4.5.4.2 Network Complexity and Early Stopping 79 4.6 Nonlinear Extension to the QGD (NL-QGD) 80 V PAGE 6 4.6.1 Introduction 80 4.6.2 Training the NL-QGD 81 5 TRAINING STRATEGIES FOR NL-QGD 85 5.1 Introduction 85 5.2 Optimality Index 86 5.2.1 L 2 Norm 86 5.2.2 Training with Excluding Outliers from Non-Target Class with L 2 Norm 89 5.2.3 Lp Norm 92 5.2.4 Mixed Lp Norm 95 5.2.5 Cross Entropy 96 6 EXPERIMENTS AND RESULTS 100 6.1 Introduction 100 6.2 Prescreening SAR Imagery 100 6.2. 1 Two-Parameter CFAR Processing 100 6.2.2 yCFAR Processing 103 6.2.2. 1 Optimal Parameter Search 103 6.2.2.2 Impact of Stencil Size in False Alarm 104 6.2. 2. 3 Batch-running the yCFAR Detector 107 6.2.3 Performance Comparison of the TwoParameter CFAR Detector and the yCFAR Detector 107 6.2.4 Conclusion 109 6.3 False Alarm Reduction 114 6.3.1 CFAR/QGD 114 6. 3. 1.1 Training QGD 114 6. 3. 1.1.1 Training Data Preparation 114 6. 3. 1.1. 2 Optimal Weights by the Closed Form Solution 114 6.3.1.1.3 Testing the QGD 118 6.3. 1 .2 Training the QGD in an Iterative Manner 118 6.3. 1 .3 Independent Testing Results for the QGD 125 6.3. 1.4 Conclusion 127 VI PAGE 7 6.3.2 CFAR/NL-QGDs 127 6.3.2. 1 L 2 Based Training and Testing of the NL-QGDs 127 6. 3.2.2 Training and Testing of the NL-QGDs without Non-Target Outliers 135 6. 3. 2. 3 Lp Based Training and Testing of the NL-QGDs 140 6. 3. 2.4 Training and Testing of the NL-QGDs with Mixed Lp Norms 147 6. 3. 2. 5 Training and Testing of the NL-QGDs with Cross Entrophy 153 6.3.3 Summary of Detection Performance in the CFAR/QGD and CFAR/NL-QGDs 158 6.3.4 yCFAR/QGD and yCFAR/NL-QGDs 162 6.3.5 Fast Implementation of yCFAR/QGD and yCFAR/NL-QGDs 163 7 CONCLUSIONS AND FUTURE WORKS 167 7.1 Summary 167 7.2 Future Works 170 APPENDIX 171 POLARIZATION BASIS TRANSFORMATION 171 A.l Representation of a Plane Wave Polarization 171 A.2 Circular-to-Linear and Linear-to-Circular Polarization Basis Transformations 173 REFERENCES 177 BIOGRAPHICAL SKETCH 184 vii PAGE 8 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy FOCUS OF ATTENTION BASED ON GAMMA KERNELS FOR AUTOMATIC TARGET RECOGNITION By MUNCHURL KIM August 1996 Chairperson: Dr. Jose C. Principe Major Department: Electrical and Computer Engineering A multi-stage approach has been attractive to avoid prohibitive processing in automatic target detection/recognition (ATD/R) of sensor imagery. Multi-stage ATD/R systems usually implement a focus of attention stage to localize the regions of interest where possible target candidates are found and apply a recognition stage only at the regions of interest selected by the focus of attention stage. The focus of attention stage rejects most clutter (uninteresting background) in sensor imagery and detects targets of interest at a high probability of detection rate (usually 100% detection). This dissertation addresses a novel approach to the design of a focus of attention stage in a multi-stage ATD/R system. The focus of attention stage consists of two subsystems: (1) a front-end detection stage in which a conventional two-parameter constant false alarm rate (CFAR) detector is extended to the gamma CFAR (yCFAR) detector. The yCFAR detector relaxes the constraint of a fixed stencil size in the two-parameter CFAR stencil by using gamma kernels; (2) a false alarm reduction stage implementing a quadratic detector of the intensity features estimated with the gamma kernels, which is called the quadratic viii PAGE 9 gamma detector (QGD). The QGD extends the two-parameter CFAR test with respect to: i) the stencil shape; ii) the features used in the decision function; iii) the selection of weights which are not a priori chosen but are found through optimization. The QGD is further extended to a nonlinear adaptive structure (multi-layer perceptron) which is denoted by the NL-QGD. The training strategies of the NL-QGD are discussed in terms of detection theory. Several norms such as Lg, Li j/Lg, cross entropy function, L 2 with removal of non-target outliers during the training are implemented to train the NL-QGD. The effect of different norms is measured in terms of receiver operating characteristic (ROC) in a large data set of synthetic aperture radar (SAR) clutter (about 7 km ) with targets embedded. With these new criterions, the NL-QGD was able to surpass the performance of the QGD. IX PAGE 10 CHAPTER 1 INTRODUCTION Automatic Target Detection/Recognition (ATD/R) is a challenging problem. The goal is to detect and recognize objects of interest in a clutter dominated imagery (e.g., a forward-looking infrared radar, synthetic aperture radar or laser radar etc.). Early radar systems displayed all ineoming information on a screen. Clutter, noise, and target amplitude variations were displayed simultaneously. Target detection was performed by human operators, visually monitoring image intensity variations in order to discriminate targets against background clutter and noise. These raw data displays are still incorporated, in some sense, into most major systems. The objective of automatic detection processing is to automatieally deteet targets and to provide target reports without human intervention. Background clutter, which usually dominates sensor imagery, may be divided into two clutter types; natural clutter which describes natural scenery (trees, bushes, grass and forest etc.) and cultural clutter which envelops man-made objects (cars, bridges, power lines, buildings etc.). Background clutter may not be typically neither stationary, ergodic, nor Gaussian, especially in high resolution imagery [55] [87]. Target signatures can vary depending upon viewing angle and posture. The difficulty of the ATD/R problem ascribes to such complicated variations of target signatures and background clutter in sensor imagery. 1 . 1 Automatic Target Detection/Reeognition Technology Sensor technology and computing power have made great progress in forming and 1 PAGE 11 2 acquiring sensor data. However, relatively less progress has been made in ATD/R algorithms. Many ATD/R approaches have been proposed which include detection theory [42], pattern recognition technique [9] [11] [47], neural networks [3] [11] [76] [82] [87] [88], and model -based algorithms [21] [84] [85]. Detection theory is attractive for the ATD/R problem. When target signatures and back-ground are described by statistical models an optimal detection can be theoretically derived, that is, a required detection probability can be determined given a false alarm rate. The advantage of the detection theory approach is that target signatures and background clutter can be expressed in an efficient way by statistical parameters and optimal solutions can be derived. However, this approach requires that the statistical model be valid and analytically tractable for target signatures and background clutter. When the statistical model does not adequately describe real-life raw data, it degrades detection performance. Pattern recognition representations typically involve feature extraction from targets and background. The features are essential to the target recognition process. Distinction between different target types and background clutter should clearly be based on the extracted features. While many such efforts have been made to solve the ATD/R problem, none so far has succeeded because variations in both target signature and background clutter contribute to the difficulty of the ATD/R problem [47]. 1.2 A Multi-stage Automatic Target Detection/Recognition System Besides the significant variation of target signatures and background clutter which adds to the difficulty of the ATD/R problem, ATD/R systems usually have to deal with prohibitive amounts of image data. Furthermore, it is attractive to seek the construction of a single algorithm which exploits all of the information of high resolution imagery and solve the ATR problem. The single-algorithm approach is computationally too expensive and high resolution imagery is difficult to model accurately and hence is poorly under- PAGE 12 3 stood. Due to the prohibitive amount of sensor imagery to be processed, real-time processing requirements mandate efficient algorithms and powerful processing architectures. The multistage approach becomes an attractive alternative because it progressively reduces the number of interesting areas of the image and narrows down their consideration in the further stages, allowing a recognition algorithm to avoid the processing of entire images [3] [1 1] [57] [58] [60] [76] [86]. Figure 1 shows a conceptual flow of image data processed in multistage ATR systems. Detection Figure 1 Data processing flow and algorithm complexity of a multi-stage ATD/R system V 1. The algorithm complexity does not necessarily increase linearly with the processing steps. PAGE 13 4 The detection stage can be thought of as a data reduction stage. A simple prescreening algorithm in the detection stage operates over the entire imagery and selects regions of interest (ROIs) where all target-like objects are found, and passes the locations of the ROls to the recognition stage for further consideration. The recognition stage deploys a recognition algorithm only over the ROIs, rejects non-targets and recognizes the remaining targets. The Lincoln Laboratory baseline ATR system [57] consists of three stages which include the prescreening stage (or front-end detection stage), the discrimination stage, and the classification. A two-parameter constant-false alarm (CFAR) detector serves as a prescreener in the front-end detection stage and locates all possible target-like objects based on pixel intensity. The discrimination stage receives the locations in which target candidates are found from the prescreener and rejects natural clutter [9] [57]. In the classification stage, the classifier rejects cultural clutter and assigns the remaining objects (targets) to one of a finite number of categories. Figure 2 depicts our multi-stage approach to ATD/R problem. The focus of attention block diagram in Figure 2a can be thought of as a data reduction stage because only regions of interest in the input imagery are passed to the classification stage. The focus of attention is very important in multi-stage ATD/R problems in the sense that the performance of the focus of attention stage impacts the global performance of ATD/R systems in terms of detection rates and processing powers of the systems. This dissertation addresses the focus of attention for a multi-stage ATD/R system depicted in Figure 2b. We will consider the well known two-parameter CFAR detector [57] in a signal processing perspective. The two-parameter CFAR detector estimates the mean and variance from a locally defined region in a distance away from a test pixel and performs a thresholding function on a normalized pixel intensity difference by the local variance between the local mean and an estimated target mean under test. Here, the twoparameter CFAR test statistic can be thought of as two moment decompositions by the PAGE 14 5 local operators (two windows) by the GEAR stencil using a prior knowledge from a detection theory. A new two-parameter GEAR structure is proposed which decomposes the image based on a gamma kernel basis, which is called the 7 GEAR detector. The 7 GEAR statistic is a two-moment decomposition which is a projection of the image onto a set of basis functions which are the 2D extensions of the integrands of the gamma functions. These integrands are called the gamma kernels [19] [64]. There is a free parameter in this kernel set that controls the region of support of the kernels. Then, the two-parameter GEAR detector will be further analyzed and be generalized to the Quadratic Gamma Detector (QGD) [63] which is designated to be used in the false alarm reduction stage. Eigure 2 A multistage approach to ATD/R problems. The construction of the QGD, inspired by the two-parameter GEAR detector, is viewed in a signal processing and pattern recognition context. The QGD effectively constructs a set of features by using a feature extractor. The feature extractor projects image intensity in local regions (and the intensity square) onto the gamma kernel basis. These PAGE 15 6 features (quadratic in the image intensity) are then classified by a linear classifier (the quadratic gamma detector QGD) or a neural network (NL-QGD). Preliminary tests conducted at Lincoln Laboratory showed marked improvements of the CFAR/QGD with respect to the conventional CFAR detector, both with 1 foot fully polarimetric SAR data, and also with 1 meter single polarization SAR [63], Presently this combination is used in the benchmark ATR algorithm suite at MIT/LL and also as the focus of attention in the ARPA Monitor program. This dissertation explains the structure of the focus of attention in detail and its extensions to neural networks. We also discuss tests conducted in our laboratory with ISAR targets embedded in SAR imagery (the MIT/LL ATDS mission 90 data pass 5 data set). 1 .3 Overview of the Dissertation Chapter 2 gives a brief introduction to Synthetic Aperture Radar (SAR) and Inverse SAR (ISAR) image formation used for this study. It discusses the Polarimetric Whitening Filter (PWF) as a preprocessor for SAR data as well as a target embedding strategy. Chapter 3 discusses a two-parameter Constant False Alarm Rate (CFAR) detector as a prescreener used for the front-end detection stage of the focus of attention stage. A gamma CFAR (yCFAR) detector is invented as an alternative to the two-parameter CFAR detector, using a set of gamma kernel functions. In Chapter 4, the QGD is introduced for the false alarm reduction stage and is extended to the NL-QGD. The training strategies for the NL-QGD are discussed in Chapter 5. In Chapter 6, the results of experiments measuring the performance of the focus of attention in the ATD/R system on real-life imagery (Mission 90 Pass 5 SAR data set) are discussed. Chapter 7 concludes the study and presents a summary with recommendations for future work. PAGE 16 CHAPTER 2 SYTHETIC APERTURE RADAR (SAR) DATA DESCRIPTION 2,1 Introduction SAR is a coherent system in that it retains both phase and magnitude of the backscattered signals (echoes). SAR refers to a technique used to synthesize a very long antenna by combining echoed signals received by the radar as it moves along its flight track. The high resolution is achieved by synthesizing an extremely long antenna aperture [16]. Aperture refers to the opening used to collect the reflected energy that is used to form an image. In the case of a camera, this would be the shutter opening; for radar it is the antenna. A synthetic aperture can be therefore constructed by moving a real aperture or antenna through a series of positions along the flight track. The net effect is that a SAR system is capable of achieving a resolution independent of a sensor altitude [16] [24]. This characteristic makes SAR an extremely valuable instrument for space observation. As an active system, SAR provides its own illumination and is not dependent on light from the sun, thus permitting continuous day/night operation, and has the additional advantage of operating successfully in all weather conditions since neither fog nor precipitation have a significant effect on microwaves, depending on the wavelengths. There are three common SAR imaging modes: spotlight, stripmap, and scan. During a spotlight mode data collection, the sensor steers its beam antenna to continuously illuminate a terrain patch being imaged. In the stripmap mode, antenna pointing is fixed relative to the flight line, resulting in a moving antenna footprint that sweeps along a strip of terrain parallel to the pass of platform motion. In the scan mode, the sensor steers the antenna 7 PAGE 17 8 beam to illuminate a strip of terrain at any angle to the path of motion. The scan mode is a versatile operating mode that encompasses both the spotlight and stripmap modes as special cases. Because the scan mode involves additional operation and processing complexity, spotlight and stripmap modes are the most common SAR imaging modes. The spotlight mode allows the capability of collecting fine-resolution data from localized areas. The stripmap mode is appropriate for imaging large regions with coarse resolution. 2.2 SAR Image Formation The high resolution in radar systems can be achieved by a technique called aperture synthesis [24]. This technique enables much finer resolution to be achieved than would be possible with a conventional side-looking radar. In a side-looking radar, an antenna which is fixed parallel to the track directs a radar beam broadside and downward from the platform tracks as shown in Figure 3. The ground area that one pulse illuminates is called the radarÂ’s footprint. The beam is scanned by the motion of the platform so that the beam footprint is swept along a swath on the terrain surface. The dimension of the footprint is determined by the antenna size, the range, and the transmitted wavelength. With an antenna length L and a transmitted wavelength X, the azimuth width of a footprint is approximately XR/L at a range R [24]. The footprint of the illuminated on the ground does not disappear outside this region but fades quickly. The width XR/L specifies the 3 dB level where the power of the footprint is half the maximum power. The radar receives and records the backscattered energy from the swath surface and generates an image of the surface reflectivity. The spatial (range and azimuth) resolution of the image is determined by the pulse width and the radar beam width in the range direction [24]. While the pulse width can be narrowed, and a finer range resolution achieved, the length L of the radar antenna determines the resolution in the azimuth direction of the image; the longer the antenna is the finer the resolution in this direction will be. As an example, in order to achieve a 25 meter azimuth resolution from the Seaset satellite with X 23.5 cm and R = 850 km, the antennal PAGE 18 9 length requirement is S km = (23.5 cm)(850 km)/{25 m). This is obviously a prohibitively large antenna and not practical for achieving the specified azimuth resolution. In conclusion, the goal is to synthesize an image with the resolution of a focused large aperture antenna system using the data returned from a physically small sized antenna by using SAR. Figure 3 Imaging geometry of a side-looking aperture radar. 2.2.1 Range Processing A real aperture can achieve the range resolution by emitting a brief intense rectangular pulse, then sampling the returned signal and averaging over time intervals no shorter than the emitted pulses. That is, the effective duration and energy of the transmitted pulse determines the range resolution and maximum range of a radar system. Shorter duration pulses allow closely spaced targets to be discriminated, while high energy pulses provide measurable reflections from targets at large ranges. In order to avoid the difficult and PAGE 19 10 expensive development of hardware to generate short duration pulses with energy characteristics, increased duration pulses are coded for transmission and then compressed at echo reception. A linear frequency modulated (FM) waveform of a finite duration is often used for pulse coding, which is called the chirp waveform, and a correlation (matched filter) receiver is used for compression [24], The frequency modulation enables high range resolution to be achieved at low transmitter peak power. Functions of the form jln(ft+-ar) , or more generally e , compress into very sharp autocos 2n{ft+^^ap-) 1 correlations. For the complex exponential with phase 2k {ft -f-;^at ) , the first time derivative of the phase of the waveform is 2n {f+ at) . The frequencies of the waveform changes linearly with a slope of a as time t increases. The larger the value of a the faster the frequencies change. Figure 4 depicts a chirp wave and its autocorrelation function. Figure 4 A chirp waveform and its autocorrelation function. PAGE 20 11 The autocorrelation A (t) of the chirp can be calculated as Tq + 2T -j2n(ft+-ar) -j2n{fU + x) +-a(t + x)") ^ e ^ dt A(x) = J e To = e e To j2nx(f+a(To+-T)) [-jiaX (T |x|) ] for -r < X < T Kax{T|x|) ( 1 ) The term T|x| is a triangle function weighting the sin{x)/x or sine function. The width of the main lobe of the autocorrelation function is approximately HaT and the half power is about HaT. Note that the product oT is the bandwidth of the chirp over the pulse This is also the time-bandwidth product of the chirp signal. Thus a high time-bandwidth is required for high resolution. For a pulse shape function, u{t), and received signal of the form r{t) = au{t-x), the receiver that implements a correlation for complex signals is given by where a is the target reflectivity from the range corresponding to time x. When the pulse shape u{t) is selected such that its autocorrelation fades quickly as time lag x increases the output y{t) of the receiver will be maximum when t equals x and be small otherwise. This means that the output of the receiver will have spikes associated with time delays which correspond to reflecting objects. duration T. The gain in resolution, or pulse compression ratio, is T divided by HaT or af^. = oA (t-x) ( 2 ) In general, if there are N reflectors in a target reflecting energy, then there will be N PAGE 21 12 output spikes from the correlation receiver with each one scaled based on the reflectivity N of the associated target. If r(t) is given by ^ o.u {tx.) , then the output of the correlai= 1 tion receiver is y (t) = ju* (s) r {s + t) ds N = ^ O-^u* (s) u {s + t X.) ds I = 1 N = ( 3 ) 1 = 1 The output of the receiver can be expressed as the convolution of the received signal with an impulse response h(t) as follow, y (t) = Jm* (5) r (5 + t) = Jm (^) r* (-{t s)) ds = jr(s)h(t-s)ds ^4^ The impulse response of the linear filter which implements the correlation receiver is therefore h(t) = u* (-t) . This convolution implementation is descriptively called the time reversed filter which can be easily implemented in an existing linear filter architecture by modifying the impulse response and the output is then y{t) = ( 5 ) This filter can be implemented in the frequency domain as F(/) = R{f)H(J) = R(f)U*(f) (6) where Y(J) , R (f) , and H (f) are the Fourier transform of y {t) , r{t), and h ( t) respectively, f/ (/) is the conjugate of U (J) which is the Fourier transform of w (t) . The complex conjugate operator in the frequency domain corresponds to a complex conjugate together with time reversal in the time domain [ 59 ]. The filter described by the corre- PAGE 22 13 lation, time reversed, and conjugated receivers is referred to as the matched filter. This is because the filter is essentially a reference replica of the transmitted pulse which is compared to the received signal. When characteristics of the propagation medium are known, the reference waveform is given the shape of the anticipated received signal. The filter output is a measure of how precisely the received signal and reference match. 2.2.2 Azimuth processine The range resolution in a radar system was determined by the type of pulse coding and the way in which the return from each pulse is processed. All the radar systems, conventional radars or SARs resolve targets in the range dimension in the same way. It is the resolution of targets in the azimuth dimension that distinguishes a SAR from other radar systems. The principle of SAR is to store successive echoes to a moving radar from targets in ground, and to process them to synthesis a long aperture, thereby achieving high azimuth resolution [17]. A radar with an antenna length La in the azimuth direction generates the radar beam that has an angular spread of 0^ = 'k/L^ in Figure 5. Two point targets on the ground separated by an amount of 6x in the azimuth direction and a slant range R can be resolved only if they are not both in the radar beam and 5x = Rd^ (7) This is the resolution limit of a conventional side-looking real aperture radar. It is clear from this that the azimuth resolution capability of the conventional radar varies inversely with the physical antenna size, becoming finer for increasing antenna length and degrading with increased slant range R. PAGE 23 14 Figure 5 Aperture synthesis. The point target is in the beam for a time = L/V. After phasing correcting the signals, a synthetic antenna pattern is obtained which is equivalent to that of a conventional antenna of length 2L^. PAGE 24 15 In SAR operation, suppose that we consider a radar beam to sweep over an arbitrary target as the platform flies over the scene in Figure 5. The point target remains in the beam for a certain time interval T^. During this time interval the radar transmits pulses at a certain rate (the pulse repetition frequency or PRF) and also receives backscatter off of the point target during the repetition times between successive pulse transmissions. Therefore after the time interval T^, a collection of backscatters are built up which span a spatial interval L^, equal to the beam width. L=5x = (8) The backscatter from the point target is distributed over a large number of apertures along the trackÂ’s spatial extent. A large antenna aperture has the difficulty of being physically implemented but is synthesized by sequentially gathering the backscatters using a small sized antenna at different positions which collectively define the antenna array. The slant range R from the radar to a point target can be written R = 1^0 + x 1 -I2RI if R^ Â» jc (9) where Rq is the slant range when perpendicular to the flight line and t is the elapsed time from when the platform passed its closest point of approach to the point target. Hence the phase shift of transmitted and received signals is 27t (j) = Â— X 2R _ . 2k ( 10 ) where X is the radar wavelength, and (f)^ = -4kR^X . The Doppler frequency shift between the transmitted and received signals is given by PAGE 25 16 ^ _ d(^{t) _ 2v^ ^ ~ dt ~ xrJ ( 11 ) From (11), The azimuth spread of the point target response approximates a linear frequency modulated waveform. The Doppler frequency shift is highest when the point target enters the radar beam. This decreases with time until it becomes negative and reaches a minimum before the point target moves out of the area of illumination. The azimuth resolution to resolve two consecutive point targets in the azimuth direction is determined by and A/ = 2vAx Ax 2v xA/ ( 12 ) (13) The Doppler resolution of the processing is the reciprocal of the time T taken to synthesize the aperture, which is T = vLÂ„ (14) The maximum azimuth resolution is therefore the value of Ax which corresponds to a Doppler bandwidth of l/T^ A V Ax = Â— X (15) 2v R^X 2 (15) implies that the azimuth resolution improves as the antenna length decreases However, shorter antennas require more power for signal transmission and a longer synthetic aperture. 2.3 ISAR image formation Besides the three SAR imaging modes mentioned earlier, there is a fourth operating mode called inverse SAR (ISAR). SAR in this mode produces radar signal data similar to PAGE 26 17 that of spotlight mode SAR. However, the ISAR mode is different in that data collection is accomplished with the radar stationary and the target moving. The signals are similar because it is the relative position and motion between the sensor and scene being imaged. Since the signals are similar, the processing required to produce an image is also similar. Therefore, ISAR imaging refers to the use of target motion alone to generate a synthetic aperture for azimuth resolution. ISAR imaging can be accomplished by rotating the platform (turntable) on which a target is imaged. The ISAR operation is very useful for collecting the RCS (radar cross section) information of a target depending on many different aspect angles given different depression angles. ISAR uses the same carrier term as in the SAR case but the difference is that the motion of a target relative to the radar platform is rotational as opposed to linear. The resulting Doppler phase term due to rotation can be linearized as a function of azimuth range position. Figure 6 displays an ISAR imaging scenario. The radar is stationary as the target rotates at a constant angular rotation rate to in rad/s on a turntable. As a scatterer at a point X on the x-axis is rotated through an angle A0, the change in the Doppler phase is The azimuth resolution to resolve two scatterers on the x-axis for a small viewingangle rotation A0 is obtained for A/^ = l/T as Ar C X 2(0 A/n = lA 2(oT lA 2A0 (17) The range resolution for ISAR is obtained by using wideband waveforms as with SAR. PAGE 27 18 Figure 7 illustrates ISAR imaged vehicles from the TABILS 24 ISAR data set. The radar used for the data set is a fully polarimetric, band radar. In Figure 7a and Figure 7b, the MV0015 and MV0095 vehicles are shown at depression angles of 20Â° and 15Â° respectively, each with aspect angles of 0Â°, 40Â°, 80Â°, and 120Â°. Target scattering looks very different, depending on the azimuth angles. PAGE 28 19 Figure 7 Examples of ISAR imagery. Down range is increasing from left to right. PAGE 29 20 2.4 Preprocessing It is often required that before deploying detection and recognition tasks on the image, the image enhancement is often necessary to enhance target detection/recognition performance. Image enhancement is therefore viewed as a preprocessing step before proceeding ATD/R tasks. With the availability of fully polarimetric high-resolution SAR imagery, several image enhancement techniques have been developed, exploiting the polarization scattering characteristics of targets and background clutter. SAR processing allows for high-resolution images, but introduces considerable amount of speckle in the image due to the coherent nature of the imaging process. The primary goal of preprocessing in polarimetric SAR imagery is to reduce image speckle and to improve target-to-clutter contrast. Although many image enhancement techniques have been developed, polarimetric enhancement techniques are particularly desirable over other enhancement techniques because they can provide significant speckle reduction and target-to-clutter ratio improvement while preserving the resolution of the original SAR imagery. Novak et al developed the polarimetric whitening filter (PWF) which combines polarimetric measurements to produce an intensity image having minimum speckle [53] [54] [55] [56]. Such an improvement led to enhance target detection performance [14] [53], clutter segmentation ability [10], and texture discrimination ability [9]. The detection algorithms discussed and developed in this dissertation are tested and evaluated based on the PWF SAR. The PWF technique developed by Novak et al. is introduced in the following sections. 2.4.1 Polarimetric Clutter Model A mathematical model is used to characterize fully polarimetric radar returns from ground clutter. When operating in a linear polarization basis, a synthetic aperture radar uses four polarizations {HH, HV, VH and W) to measure the full polarization scattering PAGE 30 21 matrix for any such clutter region. Since the HV has a reeiprocity relationship with the VH, the set of three polarizations {HH, HV, and W) contains all the information in the polarization scattering matrix. A realistic assumption is often made on ground clutter and sea clutter because they are spatially nonhomogeneous. Such clutter are modeled with non-Gaussian models. The polarimetric measurement Y for each SAR pixel in ground clutter is expressed by three complex elements: HH, HV, and W. HH HHi + JHHq Y = HV HVj+jHVq VV VV,+jVVQ where HHj and HHq are the in-phase and quadrature components of the complex HH measurement. Y is assumed to be the product of a complex Gaussian vector X and a spatially varying texture variable Jg. Y= JgX (19) The vector X is assumed to be zero-mean (due to the random absolute phase of its components) and complex Gaussian. Hence, the probability density function of the vector X is given by f(X) = -^expi-Xtl-^X) (20) where the symbol t represents the operation of Hermitian transposition, and Z = Â£ (XXt ) is the polarization covariance matrix of X. In general, the polarization covariance matrix in a homogeneous region of clutter takes the form Z = a HH 1 0 pTy 0 e 0 P*jy0 y (21) where PAGE 31 22 (22) E{\HV\^) E{\HH\^) E(,\VV\^) E{\HH\^) (23) (24) EiHH-VV*) [E(\HH\^)E{\VV\^)]^^^ (25) It is assumed that the product multiplier ^ is a gamma-distributed random variable. This assumption is universal: the log-normal and Weibull models are also widely used. The gamma-distributed random variable g has the form of a distribution 1 Â» 1 -o /oU) = |(f) where the parameters ^and v are related to the mean and the variance of the random variable g. E(g) = E(g^) = fv(v+l) (27) (28) where Â£ is a statistical mean operator. Therefore, the resulting probability density function of Y is the modified Bessel function, or generalized /^-distribution [52], given by K 3-v Yfz-^y^ f(y) = V 7iVr(v)|Z| (^rt-^-Â’T) (3-v)/2 (29) PAGE 32 23 Note that when g = so that the mean of the texture variable is unity the nonGaussian model, in the limit as v Â— > Â°o reduces to the Gaussian model. The assumption that HV is uncorrelated with HH and W is not always true (especially for man-made targets or for a polarimetric SAR with cross-talk between channels) but is valid for ground clutter [53], 2.4.2 Polarimetric Whitening Filter (PWF) The mathematical model established in the previous section is now used for processing the polarimetric measurements HH, HV, and VV to form an enhanced SAR intensity image. An optimal processor known as the PWF is derived, which combines the polarimetric measurements to produce an intensity image having a minimum amount of speckle. A quadratic processing of the polarimetric measurement to an intensity image is constructed as follows, y = Y\aY = gX^AX (30) where the weighting matrix A is assumed to be Hermitian symmetric and nonnegative definite, and g is a spatially varying texture variable. The objective is to find the optimal weighting matrix which leads the quadratic processing of the polarimetric measurement to produce an intensity image with a minimum amount of speckle. The ratio of the standard deviation of the image pixel intensities to the mean of the image pixel intensities is used as a measure of speckle and is given by 5 _ standard deviation of y m mean of y where VAR is the variance. Instead of minimizing the speckle amount s/m, we minimize the square of the speckle amount {s/mf. VAR{Y\AY) E^(Y\AY) (31) PAGE 33 24 5 2 VAR(YfAY) VARigXtAX) (-) = 2 L = i (32) E^iYtAY) E'^igXtAX) We use the following useful results: 3 Â£(XtAX) = trCLA) = ^ X. (33) i= 1 3 VAT^CXt'AX) = triTA)^ = X / = 1 where tr is the trace, and Xj , Ti^, and are the eigenvalues of the matrix ZA . With these results, the square of the s/m ratio can be written as VAR(g) . E\g) _ E(g^) E\g) l/A/?(Xt-AX) VAR{g) E'^iXtAX) E^{g) (35) Note that v is a constant in (35), and minimizing (s/m) is equivalent to minimizing 3 /= 1 Ih, 1 = 1 ^ (36) Note from Eq 35 that if the set { Xj, X 2 , ^ 3 } yields a minimum for (s/m)^, then so does the set a^,, aX,o for any scalar a. Therefore, we can minimize (35) by minimizing ^ ^ 3 its numerator ^ X^ subject to the arbitrary constraints ^ = 3 on its denominator. 1 = 1 / = 1 This modified optimization problem can be solved with the method of Lagrange multipliers. Using a Lagrange multiplier (3, we minimize the unconstrained functional 3 f(X^,X2,Xy^) = j^Xl /Â•= 1 f ] ( 37 ) PAGE 34 25 Taking partial derivatives with respect to Xj, and setting the results equal to zeros yields ^ = 2X.-2p^X. = 0 for i = 1,2,3 (38) i= 1 Thus we find that (39) B = Â— i= Â— = -2_ 3 3 3 I \ I ^ 1 = 1 / = 1 1 = 1 3 The above result (together with the condition ^ X.. = 3 ) implies that a minimizing solu/ = 1 tion is Xj = X-2 = X-3 = 1 (40) (40) leads to the following result; Y.A = / (41) A = Z(42) (40) and (41) imply that the optimal weighting matrix A is the one that makes all of the eigenvalues of ZA equal to one. The minimum speckle intensity image is therefore constructed as y = gXtZ-^X (43) This solution can be interpreted as a polarimetric whitening filter. That is, the polarimetric measurement Y is transformed to a new coordinate system by the filter ZÂ“* to obtain W = This transform whitens the polarimetric measurements, so that -1 _1 Y.^ = E{WW\) =gz"2Â£(XYr)z"2 (44) 1 _i 2vv 2 = gZ "ZZ = gl ( 45 ) PAGE 35 26 The minimum speckle image is then obtained by noncoherently averaging the power in the elements of the whitened vector W, as shown by >= If,!" /= 1 = wtw (46) In conclusion, the PWF changes the polarimetric base from a linear polarimetric base to a new base given by ^ -|p|^) (47) In this new basis, the polarimetric channels are uncorrelated and have equal expected power. Thus the optimal way to reduce speckle polarimetrically is to sum the powers noncoherently in these polarimetric channels. Another advantage of the PWF processed image is that it merges into a single image of the scatters that show up only in one of the polarization channels. Therefore, the PWF image is very useful for locating the features that might be ultimately used in a recognition task. Figure 8 summarizes the PWF processing procedure. 2.4.3 Preprocessing SAR Image The focus of attention of the ATD/R system (Figure 2) is deployed on the PWF SAR image whose primary characteristic is that the speckle in the SAR imagery is optimally reduced [54]. Therefore, the input to preprocessing is a sequence of fully polarimetric images where each polarimetric image in the sequence is a set of three complex-valued images denoted by HH, HV and VV when the images are expressed in a linear polarization basis. Each set of three complex-valued (HH, HV, W) pixels is optimally combined and transformed to the real-valued pixel intensity y by the PWF. PAGE 36 27 1 . Whitening the original image Whitening HH Filter HV ^ W = Je 2 ^1/2 {VV-p*JyHH) I 1 K) 1 2. Compute a minimum speckle image HH\^ + HV 2 -H {VV-p*JyHH) 7i 7v(i-ipi") Figure 8 Minimum-speckle image processing [55], Each set of three complex-valued {HH, HV, W) pixels is optimally combined and transformed to the realvalued pixel intensity y by the PWF. Each pixel intensity y is related to the vector Y = (HH, HV, W) of corresponding complex polarization values HH, HV, and VY by the quadratic relation y = Yt AY where t denotes the complex transposition and the matrix A is determined such that the speckle amount is minimum. Finding A requires that the polarization covariance matrix of the surrounding clutter be found. The mission 90 pass 5 SAR imagery used in this study is a strip mode, fully polarimetric image with a linear-polarization basis. The scrub region located in the vicinity of the powerline towers in frame 105 of mission 90 pass 5 SAR data set was used for computing the covariance matrix of a typical clutter background and for the PWF processing. The estimated polarimetric covariance matrix estimated is reported to be [55] PAGE 37 28 (48) Z = 0.098 Â• C I.OO+ 7 O.OO 0.01 -; 0.02 0.60+y0.05 0.01 +; 0.02 0.19+y0.00 0.00 + jO.OO 0.60 -J0.05 0.00 -JO.OO 1.08+;0.00 2.5 SAR Image Visualization It is often required that the image data being used as input need to be visualized on a display screen to observe the characteristics of the data such as trees, shadows, buildings. The visualization of the data is also required in the target embedding, the preparation of training and testing data for the detectors, and the selection of regions of interest. Khoros [43] was used for such purposes, which is an integrated software development environment for information processing and visualization. Khoros worksheet brings up interactive display glypses with pan and zoom capabilities in Figure 9. This worksheet was created for a display purpose. The worksheet loads frames and concatenates them in order. Using interactive display it allows a users to scroll through the entire plane of the loaded images and to zoom in and out on the regions of interest. Reading the current cursor position provides the corresponding coordinates (x,y) of the image. Figure 9 displays an example of visualizing two frames of the SAR images in the Khoros working environment. One frame in the Mission 90 pass 5 SAR data set has a size of 2048 and 512 pixels in azimuth and cross range. Each pixel is a 8 -bit integer value which ranges from 0 and 255. The pixel values were linearly converted from the PWF transformed data having mostly -50 dB to -f-30 dB. PAGE 38 29 i it ji i''^^ Jrjn Oil, ty.r> Routines Tinder fyj'r Jj iizvf^ui,aU 'fnjuz^iav uhju^ ^uyj/lriu'/J: iaoy^rWtifjw Putdata: tmpdir/ioAAAa04l32 rÂ«? * 771 Â»" Tsc > Cantata: Visual Programming Language for the KHOROS System Tor tha KH3R0S Systjeft Â» 1 i. )( 3^UHÂ«u(Â» )( CtcÂ«Â«t.-a ~}( ]naaÂ«Ffc< ) ffl :3 g: m PÂ‘ TIT. .. : Â’Â’ : : "i wm^ i -_ :, : I : Figure 9 Khoros worksheet for a display purpose. PAGE 39 30 2,6 Examples of SAR Images Radar images are composed of many pixel elements (pixels). Each pixel in the radar image represents the radar backscatter; brighter areas represent high backscatter. Bright features mean that a large fraction of a transmitted radar signal was reflected back to the radar receiver, while dark features imply very little reflection back from targets. Backscatter for a target area at a particular wave length depends on a variety of conditions: the size of scatters in the target area, moisture content of the target area, polarization of the pulses transmitted, the wavelength used in the radar transmitter, and observation angles. Backscatter also depends on the use of different polarizations. Since polarimetric SARs measure the phases of incoming pulses, the phase differences (in degrees) in the return of HH and W signals are frequently the result of structural characteristics of the scatters. Figure 10 displays some SAR imagery preprocessed by the PWF. In general, the higher or brighter the backscatter on the image, the rougher the surface being imaged. Flat surfaces that reflect little or no radar transmitted signal back towards the radar receiver will always appear dark in radar images. Figure 10a shows a highway and a bridge. The surfaces of the highway on the bridge appear dark since they are flat. Surfaces inclined towards the radar usually have a stronger backscatter than surfaces which slope away from the radar and tend to appear brighter in a radar image. Figure 10b displays some houses and parked cars. The roof surfaces inclined toward the SAR appear much brighter than the surfaces inclined away from the radar. Some areas not illuminated by a radar, such as the back slope of mountains, appear dark and are called the radar shadow. As an example of this, the houses (in Figure 10b) display radar shadows at the back sides. Received radar pulses that bounced off of several objects appear very bright (white) in radar images. Vegetation is typically rough on the scale of most radar wavelengths and appears light PAGE 40 31 grey in a radar image. Figure 10c displays scenes of powerline towers, trees and scrub. Two pairs of powerline towers are visible in the upper-left and the lower-right sides of the picture. A narrow scrub region running diagonally through the picture appears moderately coarse and the trees divided by the narrow scrub region look rough so that the trees are almost individually discernible. Buildings which do not line up so that the radar pulses are reflected straight back will appear light grey, like very rough surfaces. Backscatter is also sensitive to the targetÂ’s electrical properties, including water content. Wetter objects will appear bright, and drier targets will appear dark. The exception to this is a smooth body of water, which reflects incoming pulses away from a target; these bodies will appear dark. 2.7 Target Embedding Strategy 2.7.1 Development of a Target Embedding Method ATR systems employ detection/recognition algorithms over the sensor imagery. A target in the imagery can exhibit an infinite number of different shapes depending on depression and aspect angles by SAR. In order to reliably test the performance of an ATR system, it is very difficult and impractical to actually place targets with many different aspect angles at many different locations over a terrain and to image the terrain. ISAR operation provides rich shape information about a target with different view angles. By employing an appropriate method of target embedding, targets can be placed at many different locations with many different view-angles. Figure 1 1 depicts a target embedding methodology. The methodology developed for target embedding proceeds as follows: (1) appropriate locations for target embedding are selected such that targets are placed in the clear and in between large scattering centers in the regions of cultural clutter, (2) in order to handle the circular-polarization-basis ISAR target images for the image PAGE 41 32 Figure 10 Some examples of SAR images from the Mission 90 pass 5 SAR data set. The images were sensored at an altitude of 2 km with a depression angle of 22.5Â° with a slant range of 7 km. The HH, HV and VV returns of the images were combined to produce a minimum speckle image via PWF processing. The radar sensor is located at the top of each image, looking down so that the radar shadows go downward. PAGE 42 33 (Mission 90 Pass 5 data set) in linear-polarization-basis target images, a polarization basis transformation is applied to the ISAR target images, resulting in the corresponding linearpolarization-basis target images, (3) two different images having the same polarization basis with the same resolution generated at the same radar wavelength, clutter image and target image, are coherently added at the locations selected for target embedding, (4) the PWF transformation is finally applied to the new image coherently added with targets. 35 GHz Mission 90 Pass 5 35 GHz TABILS 24 SAR data with targets embedded Figure 1 1 Target Embedding Procedure. The coherent addition means that, for example, the in-phase component {HHj) and the quadrature component {HHq) of a target pixel are added into the in-phase component (HHj) and quadrature component {HHq) of the clutter pixel, respectively, at a location PAGE 43 34 selected for target embedding. This procedure is also applied to the other two components (HV and VV) of the pixel. The PWF transformation leads the three complex values of each pixel to a single image pixel value. Therefore, the PWF transformed image is useful for locating the features which might be ultimately useful for recognition. After target embedding, further processing can be applied to the image with targets embedded before the detection/recognition algorithms are employed. For example, PWF processing can be applied for speckle reduction in the image. 2.7.2 Embedding the TABILS 24 ISAR Target Data into the Mission 90 Pass 5 SAR Data Set In order to utilize and migrate plenty of targets from the TABILS 24 ISAR target data set into the strip mode SAR image, it is required that an appropriate transformation of polarization bases be applied to the ISAR target images which are circular-polarizationbasis image. Therefore, the circular-polarization-basis images (i.e., LL, LR and RR) of the TABLIS 24 ISAR targets are transformed to the corresponding linear-polarization-basis images (HH, HV, and W). This transformed target images are coherently added into the appropriate locations of the mission 90 pass 5 SAR imagery selected for target embedding which is later discussed in detail. The embedding method developed for testing performance with targets in clutter (Figure 11) combines the turntable data (TABILS 24) with the SAR data (mission 90 pass 5). No embedding method will be perfect, but it is our belief that the method we are using preserves the gross statistical characteristics of non-occluded targets in clutter upon which the pre-detection schemes in the focus of attention stage depend. Independent testing of the algorithms by the MIT Lincoln laboratory on real targets in clutter (SAR data in which targets were in the field of view during data collection) corroborated the results of the superiority of the CFAR/QGD combination [63]. So this seems to bear out the fact that the PAGE 44 35 embedding method used is sufficient for development and generation of preliminary ROCs. We chose ISAR target data that was taken at approximately the same depression angle as the mission 90 pass 5 (23 degrees). Our assumption is that ISAR data with the same resolution of mission 90 data, and depression angles that differ from the mission 90 pass 5 depression angle by less than 3 degrees are suitable for embedding. ISAR target images were collected from 22 target data sources of TABILS 24 data set. Since some of 22 target data sources have the same target types but were measured under different weather conditions, 22 target data sources consist of 10 different target types. For each target data source, target images were extracted at 7.5Â° azimuth increment over a complete of azimuth. This resulted in 1000 target images (345 training, 345 cross validation, and 345 testing). The ISAR target images within the training set, cross validation, and testing sets were separated by 7.5Â° azimuth increment, that is, the target images were picked up at each increment step of 7.5Â° and assigned to the training, cross validation, and testing sets in a sequence. The targets then need to be placed in the clear and in between large scattering centers in regions of cultural clutter. In order to embed the targets into the clutter data, three polarizations of complex ISAR data are extracted {HH, HV, W). These are added coherently to the different regions in the three polarizations of complex SAR data. The PWF transformation is then applied to the new data. The PWF transformed data is scaled logarithmically to the range [0, 255], Since the points at which the ISAR data is embedded are range pixels with low RCS (although surrounding regions may contain range cell with very large RCS (Radar Cross Section)), the large scattering centers on the ISAR targets are not changed significantly. These are the points upon which the prescreening algorithms depend. An example of the embedding process is shown in Figure 12. At the top of the figure is an example of an ISAR image (PWF). At the bottom left of the figure is a section of cultural clutter. At the bottom right is the same section with the ISAR target embedded. The PAGE 45 36 images are shown after PWF processing, but of course, the target is embedded prior to such processing. As can be seen the target signature is not changed dramatically after embedding. Figure 13 displays some embedded targets at a variety of aspect angles. Original ISAR image chip, no background Cultural clutter region, no embedded Cultural clutter region with target target embedded Figure 12 Target embedding. PAGE 46 37 ' ' ill /" .. -V'-r V f , Â•-V/ ?.<' iX--' ' . ''1^' ^ ; Â•: ,;: ;. vx;. . . . . : : ..; . . .. .. . Â•.> .. : .Â• . ... -r' ' . . .... ::.r: . .... <>S f-' Â• :: :Â• . . Â• . : V W. ill ' X'. . lily III 1 / / Figure 13 Some examples of the TABLIS 24 ISAR targets at variety of aspect angles after embedding. PAGE 47 CHAPTER 3 PRESCREENERS 3.1 Introduction As mentioned in Chapter 1, the front-end detection stage aims to locate potential targets in the sensor imagery and allows for significant reduction of data processing in subsequent stages. Due to the direct processing of the entire imagery, the front-end detection stage requires computationally simple and efficient algorithms. A two-parameter CFAR detector has been used as a prescreener for the front-end detection stage [57]. It computes a Mahalanobis distance between a pixel under test and its neighbor pixels defined in a predetermined size of window (CFAR stencil). The twoparameter CFAR detector meets the requirements of algorithm simplicity and efficiency. There is however room for improvement of the CFAR detector and we discuss this possibility of improvement. We introduce a novel detector which is called the yCFAR detector. The yCFAR detector extends the conventional CFAR structure by using a set of gamma kernels as an alternative to the fixed and predetermined size of the CFAR stencil. Before we discuss the two-parameter CFAR detector, we start with a one-parameter CFAR detector to pave the road for a two-parameter CFAR detector. 3.2 A One-Parameter Constant False Alarm Rate (CFAR) Detector A cell-averaging (CA) CFAR detector controls the detection threshold for a specific resolution cell based on the estimate of a sufficient statistic of the clutter. The detector estimates the clutter power in the cells surrounding a cell under test. When the clutter is statis38 PAGE 48 39 tically homogeneous over the resolution cells, this detector works well, otherwise, the performance degrades. In order to implement a CA operation, a stencil can be used locally for a particular location in the image (Figure 14). A test pixel is defined at the center area (/?,) of the stencil and the clutter intensity mean is estimated in a local region (R^) in a distance away from the pixel under test. We refer to the stencil as the CFAR stencil. The output of the CA-CFAR operation can be expressed as y = 2 N I X UJ) 2 N 1 cije X (ij) (49) where x represents the intensity pixels and A, and are the numbers of pixels in and in the stencil. This CA-CFAR detector can be called a one-parameter CFAR detector because only the mean information is utilized in the operation in (49). The hypothesis testing is then performed over the outputs after the CA-CFAR processing such that target y ^ ^CFAR non target (50) From a signal processing perspective, the one-parameter CFAR detector in (49) is a band pass filter running over the image in which the outputs of a low pass filter by the target mask of the template is subtracted from the outputs of another low pass filter by the clutter mask. Abrupt changes in image intensities produce large outputs by the one-parameter CFAR detector. Since man-made objects are usually strongly scattered back to millimeter SARs, the large intensity differences between pixels under test and their clutter background are easily detected. However, an output of the one-parameter CFAR processing in (49) is the result of convolving the image with two rectangular-shaped kernels which cause large losses or ripples outside their main lobes in the frequency domain. Contrast enhancement between tar- PAGE 49 40 gets and clutter can be achieved by incorporating smooth-shaped kernels in the stencil [26], The threshold of the one-parameter CFAR detector in (49) is sensitive to the scale of the image in which the threshold is linearly proportional to the scale factor of the image. In the next section, a two-parameter CFAR detector is introduced which incorporates a normalization factor. This factor is a standard deviation of the clutter in R^.. 3.3 A Two-Parameter CFAR Detector A two-parameter CFAR detector was first developed for use on 1-D range profile by Goldstein [29]. Later, Novak et al. extended it for use in SAR imagery [56] [57]. The two-parameter CFAR detector has been commonly used as a pre-screener in multi-stage SAR ATD/R systems. Its popularity is due to an excellent figure of merit in terms of performance/simplicity. The name indicates that a constant false alarm probability of detection is achieved, but in fact this is only true for Gaussian distributed targets and clutter [57]. Normally in SAR imagery this is not the case for targets in clutter. Nevertheless, experience has shown that the two parameter CFAR is a robust detector for manmade clutter and targets [57]. The reasons for this success can be found in the simplicity and the discriminating power of the test. Basically, the CFAR compares the intensity of a pixel under test with the normalized intensity of a surrounding area. Since man-made objects are normally bright in SAR imagery, this is a very effective test, which can be efficiently implemented in digital hardware. In terms of statistical detection theory, the CFAR is estimating the parameters of the local intensity probability density function (pdf), and making a decision when there is a brightness deviation of the pixel under test with respect to the normalized mean background intensity, i.e. PAGE 50 41 Figure 14 CFAR stencil. The amplitude of the test pixel is compared with the mean and the standard deviation of the local area. The guard area ensures that no target pixels are included in the measurement of the local statistics [57], PAGE 51 42 target > < clutter CFAR (51) where is a pixel under test, is a threshold for the two-parameter CFAR detector and and are the estimates of the local statistics of mean and standard deviation measured in a defined local area by the CFAR stencil (see Figure 14). The estimates, X^ and are computed by (52) where x(ij) is the pixel intensities at (ij) locations and ^C= ^ Z (Xiij)-xy (53) V c,-,ye where defines the local area where X^ and are computed, and N^. is the number of pixels in Q^. The shape of the stencil ensures that when the center pixel is on target, the neighborhood falls in the background such that its local statistics can be reasonably well estimated. The shape of the stencil (in particular the guardband) is governed by the target size [54]. In SAR imagery, the reflectivity of the object is only weakly coupled to its geometric shape, so a priori stencil dimensions based solely on target size cause suboptimal performance. In terms of statistical pattern recognition, we can interpret the CFAR in a slightly different way. One can think that the CFAR stencil is extracting intensity features in the neighborhood of the pixel under test. In fact, the CFAR equation can be rewritten as X^ '2-X^X^ + X^ ^CFAR^c ^CFAR^c > < 0 (54) where X^ and X^(= I ije n (i,j ) ) are the square of an estimated mean and an esti- PAGE 52 43 mated mean of intensity power respectively, both of which are measured in a neighboring area selected by the CFAR stencil. Notice that this expression is a linear combination (with fixed weighting) of the image intensity and its square at the pixel and the mean, power, and mean square of the intensity at the neighborhood. Hence we can interpret the CFAR as implementing a Â“restrictedÂ” linear discriminant function of quadratic terms of the image intensity. From this perspective, the two parameter CFAR can be improved: first, it uses only some of the quadratic terms of the intensity on the pixel and its surroundings; second, it implements a fixed parametric combination of these features; and thirdly, there is little flexibility in the feature extraction because the kernel is ad-hoc. These three aspects can be greatly improved if more mathematically oriented features are computed and if trainable classifiers are built. In Chapter 4, this is exactly what the quadratic gamma detector (QGD) and even more the NL-QGD (Nonlinear Extension to QGD) will provide. 3.4 Extension to the Two-Parameter CFAR Detector 3.4.1 Introduction In this section, the conventional CFAR detector discussed in the previous section is improved, incorporating a new stencil which is called the yCFAR (gamma CFAR) stencil. In Section 3.4.2, the gamma filter and gamma kernel [19] [64] are reviewed. They constitute the basis of the new stencil. The 2-D extension of 1 -D gamma kernels are introduced in Section 3.4.3. As extensions to the conventional CFAR detector, yCFAR detectors are introduced and their characteristics are discussed in Section 3.4.4. 3.4.2 Gamma Filter and Gamma Kernel The gamma kernel was originally developed for time series analysis as a short-term memory network structure for sequence processing neural network [19]. With the property PAGE 53 44 'y of completeness in Hilbert space (L space), any finite energy signals can be approximated arbitrary closely using a finite number of gamma kernels. The gamma kernels are defined as The implementation of the gamma kernels is accomplished by introducing a local feedback loop around the delay operator in Figure 15. The impulse response from the input to the /Ih tap generates the /rth order gamma kernel by (55). This structure is called the gamma filter. The Laplace transform of the gamma kernels is given as N * . Gi(^) = (j^) =G*(^) (56) where G{s) 5 + p (57) The /cth order gamma filter is characterized by a pole at s = -p with multiplicity k. The location of zeros are determined by the weights wj, W 2 ,..., and the parameter p. The discrete time gamma filter is depicted in Figure 15b. The discrete time gamma kernels and their corresponding z-transform are given by gk(n) n 1 Uu p*(i -p) n-k for n> k (58) G,(z) = r ^ _Z( 1 -fi). = G^(Z) (59) The gamma filters in both continuous and discrete time are locally recurrent, globally feedforward structures. PAGE 54 45 Figure 15 The gamma filter structures a) in continuous time and b) in discrete time. PAGE 55 46 a) Effect of changing kernel order (k) t Figure 16 Shape of gamma kernels effected by parameter and kernel order (p and k). PAGE 56 47 The discrete gamma filter generalizes a tapped delay line structure. For the case of p = 1, the gamma filter reduces to a transversal filter (or a finite impulse response (FIR) filter). The stability of the continuous time gamma filter is guaranteed because the system poles always exist in the left half plane of Laplace domain as long as p > 0 is guaranteed. For the discrete case, the poles are located inside the unit circle in Z-domain as long as 0 < p < 2 is guaranteed. So the stability problem in the discrete time is also trivial. The interesting point is that the functions constitute a non-orthogonal basis that is complete for signals of finite energy (in L ) [19]. Hence we can interpret the gamma filter output as a projection of the input signal in a linear space defined by the convolution of the input with the basis functions [20]. Figure 16 illustrates, in continuous time, an example of gamma kernel shapes as functions of the kernel order k and parameter p. Note that the shape of the kernels is very similar when one selects the order for a fixed p, or when p is selected for a given kernel order. The main characteristic of the gamma filter is that time in the filter taps is scaled by the feedback parameter p (linear time warping). In other words, the region of support of the impulse response is controlled by the single parameter p such that by changing p the impulse response can be stretched out or shrunk as a rubber band (Figure 16b). In a signal processing framework, the parameter p can be adapted with the output mean square error using a gradient descent approach, so that the best local features are captured by the filter [64]. The linear filters are often characterized in the frequency domain in terms of their magnitudes and phase responses. For the temporal processing of signals, linear filters such as the FIR, HR and gamma filter can be understood as memory filters which are characterized by the memory depth and memory resolution of the filters. The memory depth D is defined as the temporal mean value of the impulse response g(t) of a filter used for storing temporal information [19]. PAGE 57 48 D = for continuous time (60) 0 (61) The memory resolution R is defined as the number of degrees of freedom (i.e. memory state variables) per unit time. As a memory filter, the impulse response of a kth order FIR filter at each tap is g^(n) = 5 (n -/:). The transfer function in the Z-domain is depth of an FIR filter is equal to the filter order. Due to the limitation of memory depth by the filter, a low order FIR filter has a poor modeling capability of low-pass frequency bounded signals [64]. On the other hand, the structure of an HR filter contains feedback connections, and consequently the memory depth is not limited by the filter order. The HR filters, however, suffer from stability problems. The memory depth of a /ah order gamma filter for both discrete and continuous time domains is obtained by (61) Contrary to an FIR filter, a gamma filter has a memory depth uncoupled to the filter order. The gamma filter shares the property of HR filters in memory depth due to locally introduced feedback between taps. This also relaxes the stability problem as long as the local feed back parameter is limited in the range of 0 < p < 2 for discrete case. The memory resolution of a /ah order discrete time gamma filter is equivalent to the number of taps divided by the memory depth. given by G{z) = z ^ . The memory depth D of the filter is D = K. That is, the memory (63) (63) can be written as k = DxR (64) PAGE 58 49 For a given order k, the trade-off between memory depth and memory resolution is possible so that a very deep memory structure can be obtained at the expense of a low memory resolution. In an FIR case, the memory depth is the filter order and this is a special case of |i = 1 in the gamma filter. 3.4.3 2-D Extension of 1-D Gamma Kernels Theoretically the gamma filter can be extended to N dimensions without problems. This is done by simply substituting t in (55) with an N dimensional basis vector. In this extension, circularly symmetric gamma kernels are obtained but more general cases can be considered. Specifically for our applications to 2-D (image) data, the gamma kernels can be extended by where the constant C is a normalization factor. The resulting 2-D kernel has a circularly symmetric shape given by where Q is the region of support of the kernel, k the kernel order, and |i the parameter that controls the shape and scale of the kernel. As a result the resulting 2-D gamma kernels are circularly symmetric. Since the 2-D circularly symmetric gamma kernels are created from the corresponding 1-D gamma kernels in the spatial domain, they preserve the spatial characteristics of the 1-D gamma kernels. That is, the concept of a time warping parameter extrapolates to the spatial domain as a scale parameter that controls the region of support of the 2-D gamma kernels. ^(0 (65) ( 66 ) Q = { (A:, /) ;-A PAGE 59 50 be easily obtained. By letting k = rcosG, I = rsinG and considering dk = rdr, dl = dQ, ( 66 ) in the circular coordinated system is given by ( 67 ) The memory depth in 2-D spatial domain is expressed as Â£> = J d)drdd n (68) 0 0 ( 68 ) is equivalent to the 1-D case. This means that the spatial characteristics of 1-D gamma kernels is preserved in a 2-D domain by the circularly symmetric rotation. Figure 17 shows the characteristics of 2-D gamma kernels in the spatial domain. The 1 st order (n= 1) kernel has its peak at the pivot point ( 0 , 0 ) with an exponentially decaying amplitude. All the other kernels have a peak at the radius n/p, creating concentric smooth rings around the pivot point (Figure 17). For a fixed order (n = 15) the radial distance where the kernel peaks is still dependent upon the parameter p, as in the 1-D case. 3.4.4 Gamma CFAR (tCFAR) Detector By analogy to the CFAR stencil, any combination of the first kernel with one of the higher order kernels produces a similar stencil, although the shapes with the 2-D gamma kernels are smoother. We call this stencil the yCFAR. It is interesting to contrast the 7 CFAR detector with the CFAR detector. Target masking in the CFAR stencil is implemented by the first order 2-D gamma kernel so that target intensity mean is estimated closely around the center pixel under test, while a higher order kernel creates a clutter masking so that the statistics of clutter are measured in the roundshaped ring. With the yCFAR, we have a better handle on the shape of the kernel due to its analytic formalization. Figure 18 shows the yCFAR stencil for CFAR test. PAGE 60 51 n = 1, |i = 0.357 n= 15, )i = 0.776 Figure 17 2-D gamma kernels. PAGE 61 52 Since the parameter |i in 2-D has exactly the same function as the 1-D counterpart, i.e., it shrinks or stretches the region of support of the image response, we can adaptively select p. to better perform CFAR test. In fact, after fixing the order of the kernel, we have a single parameter that controls its spatial extent, and we can derive equations that will change the parameter p to minimize an output error. Figure 19 shows the block diagram of a one-parameter yCFAR detector. Two gamma kernels are linearly combined to form an output y of the one-parameter CFAR detector. We call this the one-parameter yCFAR detector a counterpart for the one -parameter CFAR detector in (49). The output can be written as y = ^ 69 ) where Â• represents a convolution operator, wj and W 2 are weights, the kernel order m = 1 and and pÂ„ are the parameters that control the extent of the kernel. It is expected that W 2 needs to be negative so that the output will be high over areas of largest contrast as is the case in the one-parameter CFAR detector. In addition to the degrees of freedom associated with the controllability of p for the kernel extent, the smooth shape of the gamma kernels yields smaller sidelobes than those associated with the CFAR stencil. From a signal processing perspective, as is the case in the one-parameter CFAR detector, the one-parameter yCFAR detector performs bandpass filtering but has an ability of correctly choosing the frequency bands of the bright pixel intensities in the image by adapting the region of support of the kernels, and the energy of pixel intensities can be better preserved in the selected frequency band with less loss due to the smoothness of gamma kernels. Interpreted from the projection point of view, the input signal (an image in the 2-D case) is still being projected onto a local basis obtained by convolving the input with the first kernel (m = 1) and a higher order kernel (n > 1 ). PAGE 62 53 Figure 18 The 7 CFAR detector: a) the center kernel has an order of 1 and the rounding kernel is of an order 15; b) the rounding kernel defines a local area where the local statistics of mean and standard deviation are measured. The peaky kernel averages a pixel under test and the very closely neighbored pixels around a pixel under test. PAGE 63 54 Due to the exponential amplitude decay from the pivot point, the spatial extent of the kernel can be truncated to a small value. This projection on gamma kernels of order m and n will not be complete, meaning that the input image can not be recovered from the projection. But as the CFAR experience shows, it will still preserve the important features to distinguish man-made objects from clutter. The decision rule in the yCFAR detector is defined as e Â•X-e Â»X target */?i, |i U a < hcFAR (70) clutter PAGE 64 55 where Â• represents the inner product operator, is a threshold for determining target or clutter, and ^ <7I) As shown in (70) and (71), the local mean and standard deviation are computed by a higher order kernel {n > 1 ) and the test pixel under test is averaged with its adjacent pixels less emphasized away from the pivot point. The 7 CFAR decision can also be recast as a discriminating function UÂ„*2r) + (Â«Â„-x)2 ~^yCFAR^SÂ„* ^ ^ 0 ^ ^ It is instructive to analyze experimentally the dependency of the accuracy of the CFAR test as a function of the size of the stencil, now that we have a parametric form for the stencil. The local image features computed by the convolution can be specified as projections of intensity and power onto gamma kernels. i,j e SI ~ (^>7) i,j e SI (73) (74) Note that these operations can be interpreted as FIR filtering with gamma stencil 8n, 4 (^ 7 )output values can be viewed as estimates of local 1st and 2nd moments respectively, which are sufficient to compute the local variance. The local standard deviation in (53) is needed to compute the two-parameter CFAR statistics, and has been shown to be an important measure in deciding whether a target is present or not [57]. The tCFAR detector separates targets from clutter based on the values of 4 features PAGE 65 56 8m * which are needed to compute local mean and standard deviation. These terms enable a one-to-one comparison of the 7 CFAR detector and the two-parameter CFAR detector, and are also used to study the benefits of using adaptive feature extraction and adaptive weights in Chapter 4. In conclusion, the potential advantage of the 7 CFAR detector implementation is that the extent of the localized mean and surrounding areas can be adaptively set. This could lead to fine tuning the area where local statistics are measured. The 7 CFAR detector can be a promising device based on 2-D gamma filters as an extension of the conventional CFAR detector. 3.4.5 Implementation of the yCFAR Detector Computing the feature set at each pixel of the image amounts to correlating each of the two kernels ^ and ^ with both the original image and the image squared. Each kernel assumes a role of an FIR filter with rectangular support (size {N 1) x {N + 1)). The three base features ( ^ Â• X 8Â„ and ^ Â• X ) at a point (/,_/) in the image are then obtained using a translated gamma kernels as + ('>)) p = ^ or 2 k / N N where n stands for the kernel order and p indicates either the first or the second moment. Correlations are computed in the frequency domain using FFTs to obtain better computational efficiency. For processing large images and to avoid memory problems we divide the image into overlapping radix 2 windows which are individually processed and combined using an overlap and save method [59]. PAGE 66 57 3.5 Receiver Operating Characteristic (ROC) A plot of detection probability as a function of false alarm probability is referred to as receiver operating characteristic (ROC) curve. The ROC is a very important assessment for detection systems and depends on the probability density function of a observed signal under each hypothesis Hj, that is, py| ^ (y| Hj),j = 0, 1 . Figure 20b depicts an ROC curve with the probability functions for two hypotheses. In Figure 20a, varying a threshold K controls the area representing P^j and Pf. The corresponding ROC curve is shown in Figure 20b. Figure 20 ROC: a) probability density functions, b) ROC curves. PAGE 67 CHAPTER 4 QUADRATIC GAMMA DETECTOR 4,1 Introduction The goal of a prescreener in the front-end detection stage was to locate potential targets and to eliminate a large amount of clutter in the sensor imagery. The output of the prescreener are regions of interest which need be further considered in the following stage, false alarm reduction stage. The regions of interest are indicated by the coordinates U,y) in the image and the further processing is applied to the locations reported by the prescreener. Section 4.2 briefly reviews the discriminant functions. Section 4.3 develops a novel detector (QGD) [63], which is extended from the two-parameter GEAR and 7 CFAR detectors, and discusses its discrimination power. In Section 4.5 a multilayer perceptron (MLP) structure is reviewed and its training and the generalization ability of the network is also discussed. Finally the QGD is extended into a nonlinear structure such MLPs, which is called the NL-QGD (Nonlinear extension to the QGD) in Section 4.6. 4.2 Discriminant Functions 4,2.1 Linear Discriminant Functions In general, a discriminant function is an operator that, when applied to a pattern, yields decisions concerning the class membership of the pattern. The action of a discriminant function is to produce a mapping from pattern space to attribute space. 58 PAGE 68 59 A general linear discriminant function in n-dimensional pattern space is of the form ^(X) =w^x^ + W2X2 + + = ly'x+w n+ 1 (76) where W is called the weight vector and the threshold weight. The decision rule of a two-category classifier is implemented by the following property g(X) = VT"'X+wÂ„^, >0 if X e (77) <0 if X e Cd2 The input pattern vector X is categorized into the class Â©| if g(X) > 0 and into if g(X) < 0. For a multicategory case, linear discriminant functions implement the decision rule: assign X to co. if g(X) > gj (X) for all j ^ i where g(X) = W^X -tIn case of ties, decision is left undefined. 4.2.2 Generalized Linear Discriminant Functions The linear discriminant function discussed in the previous section can be extended to the generalized linear discriminant function of the form g (X) = w^f^ (X) + w /2 (X) -I... -Iwj (X) + w ^ ^ ( 78 : = W^F where F= [/j (X) / 2 (X) ... /Â„ (jc) 1 ] ^ are a real valued vector with function elements of the pattern X, and VF is a n-dimensional weight vector. Any desired discriminant function can be approximated by a series expansion {f(X) } by selecting these functions judiciously and letting n be sufficiently large. The discriminant function in (78) is not linear in the original input pattern space, but linear in the transformed pattern (F) space. The pattern F is therefore separated in the transformed space by a hyperplane but the pattern X by the hypersurface in the original pattern space. Therefore, the mapping from X to F reduces the problem to one of finding a PAGE 69 60 linear discriminant function. 4.3 Extension to the tCFAR Detector Quadratic discriminant functions implement the optimal classifier for Gaussian distributed classes [8] [27]. A quadratic discriminant function in d dimensional space is d d \ d d g(X) = J^Wjjxj + X Z + j=l j=l k=j+l y=l where w is a set of adjustable parameters. Probably the most common way to construct a quadratic classifier is to create a quadratic processor that creates all the terms of ^(X), followed by a linear machine which simply weighs each one of these terms. Figure 21 depicts the implementation of the quadratic gamma detector (QGD) that follows this methodology. The QGD extends the yCFAR detector as a generalized form by exploiting all quadratic and linear terms of the two intensity features Â• X and g^ Â• X in (70). Notice that in addition to 4 terms of the quadratic form, we included 3 more terms g^ Â• X^, Â• X, and gÂ„* X for a direct analogy with the CFAR detector. In fact, our two input features are the intensities at the pixel under test Â• X and the intensity in the ring neighborhood gÂ„* X. From these quantities the traditional quadratic detector creates (Sm Â• (Sn Â• 7^) Sm * 8n* (Sm * (Sn * ^ direct Comparison with the CFAR we will add g^^ Â• X^ and g,, Â• X^ for a total of 8 features. The feature vector reads , |l ^8ni, (i 8n, I Â• X Â’m, u g Â»X ( 80 ) and the quadratic detector becomes PAGE 70 61 target 3" $ T Figure 21 Quadratic Gamma Detector (QGD). PAGE 71 62 y = W^F n = ^1 (8m, n/ ^) + (8n, HÂ„ Â• ^) + (Sm, nÂ„ * (7 (target) < T (clutter) Â•X^) Â• X) + Wjj ( 81 ) where T w = [ Wj Wj Wj W4 Wj W7 Wg] ( 82 ) With this formulation, we can understand a little better what was said previously regarding the restricted nature of the two parameter CFAR detector when seen from a pattern recognition stand point. From pattern recognition theory we know that any quadratic or higher order polynomial decision function can be implemented with a linear decision function if the feature vector is appropriately expanded [81], The added features therefore help increase the separability between target and clutter in pattern space. The weight vector W is obtained during a training procedure which will be discussed in the following section 4.3.1 . The discrimination between the two input classes (target and clutter) is done using a single threshold. The discrimination function is a quadratic function of the image intensity features extracted by the gamma kernels, which leads to naming the detector as the QGD (quadratic gamma detector). The parameter vector for the yCFAR (or equivalently for the two parameter CFAR) is W = [0 0 0 -T^ 1 -2 0] ^ (83) where some of the parameters are set to zero and others are fixed. Since the increase of the number of free parameters of a system is coupled with more flexibility, we can improve PAGE 72 63 the performance of the 7 CFAR detector by creating more parameters and adapting them with representative data. 4.3.1 Training the OGD In an adaptive pattern recognition framework, the free parameters of the classifiers which define the positioning of the discriminant function for maximum performance are learned from a set of representative data. This is called the training of the classifier. Given a set of training image chips {Xi,X 2 ,..., Xyv) centered around points of a known class, we compute the corresponding feature vector F. The corresponding desired values of the chips are IÂ’s for target class and OÂ’s for clutter which construct a desired vector d = {dj, d 2 ,..., dN}^. 4.3. 1.1 Closed Form Pseudo-Inverse Solutions If the mean square error between the system output and the desired response is the cost function, there is an analytical solution to the problem [30]. The method solves in the least square sense an overdetermined (assuming A > 8 ) system of linear equations for the unknown coefficient vector W by using the Moore-Penrose pseudo-inverse. ininWd FWW 2 (84) IV yielding W= (85) When there exist more weights than equations, the system is under-determined and an optimal weight can be calculated by W = F^(FF'^)~^d ( 86 ) In either case, when the pseudo-inverse has zero eigenvalues, singular value decomposition techniques can be used to select the optimal weight vector which has the smallest PAGE 73 64 Euclidean norm [32]. Since the feature vector F is a function of the parameter p. the problem we are facing is in fact a parametric least squares, which does not have a closed form solution due to the nonlinear dependence on the parameter p. There are two possibilities for solving this problem. Either we first determine the best values of pÂ„, and pÂ„ through an exhaustive search in the parameter space, or we have to use an iterative approach to find both the weight vector, p^ and pÂ„. One remark is necessary at this point. Training of an adaptive system needs to follow certain steps for good results [1]. In particular the training samples should be representative of the conditions found during the testing, and they should outnumber (at least) 10:1 the number of free parameters in the network. For the QGD the minimum number of training exemplars is easily met. The training and testing procedures are depicted in Figure 22. The methods of finding a weight vector are further discussed in the following two subsections. 4.3. 1.2 Iterative Solution Based on Gradient Descent An optimal set of p^ and pÂ„ can be determined through the parameter space search to provide a minimum false alarm for 100% detection in the training set. However, the search of the parameter space p is two-dimensional so that finding p becomes computationally expensive. Computing both the weight vector W and the values of p^ and pÂ„ can be accomplished by using a gradient descent method (LMS algorithm), borrowed from adaptive signal processing theory. It is known that the LMS method converges to the MoorePenrose solution of the overdetermined least squares problem [35]. Using the method of gradient descent, the weights and the parameters are adjusted in an iterative manner along the error surface with the aim of moving them progressively toward the optimum solution. Figure 23 depicts an adaptation scheme for the QGD parameters and weights. PAGE 74 65 a) Training phase of QGD b) Testing phase of QGD Figure 22 Training and testing of the Quadratic Gamma Detector (QGD). PAGE 75 66 Figure 23 Adaptation scheme for parameters and weights of the QGD. In the batch mode of adaptation, weight updating is performed after the presentation of all the training examples (this constitutes an epoch). For a given set with N training examples, a cost function can be defined as E{k) = Ud{k) -y{k)\P 1 n = 1 J_ Np N I \d{k) -y{k)\P ( 87 ) where E{k) is the p normed absolute value of an instantaneous error. The adjustment applied to the weights (VV) and the parameter (p) are derived PAGE 76 67 AW where and ^ -r| dW N T1 k = 1 N = ^ X ^isnidik) -y{k))\d{k) -y{k)f k= 1 N " 7^ S (1^) -y(k)) \d (k) y (k) f ^^{k) k= 1 ( 88 ) = -a a Af = ^ X (^/ (/:)->Â’ (A:) ) 1^ (A:) y (A:) 1^ Â‘ (A:) it= 1 a ^ --j^W^'Zsignidik) -y{k))\d{k) k= 1 (89) ^ = f aiy (90) ^/^n(\|/) = { 1 if V > 0 -1 if Vj/<0 ( 91 ) We can compute the derivative of the feature vector as PAGE 77 68 0 dg an m, u ') dF, 0 fdg m, |l (^n.u -^) 0 "Â•X 0 3F, ^ m ^ n 0 0 X 0 92) 2 ( PAGE 78 69 4.3.2 1-D Implementation The feature vector F is constructed from the convolution of gamma kernels and training subimages. The adaptation of p requires the extraction of a new feature vector F based on a different kernel shape; that is, we would have to go back to the image plane and compute a new feature vector F after every epoch. The 2-D convolution takes O(N^) multiplications, so for each training epoch, the adaptation of p would be extremely computationally expensive even for small training subimages. Fortunately, the 2-D gamma kernels are circularly symmetric so, without loss of information, the result of the convolution can be projected radially to 1-D (see Figure 24) as follows: The pixel intensities in concentric annulus around the pivot point are converted to a 1-D signal, and this signal is then convolved with 1-D gamma kernels. Figure 24 Image converted into 1-D radial energy. Therefore, we convert the training subimages to 1-D radial profiles, generate 1-D gamma kernels and utilize the recursive implementation of the gamma kernel to perform the convolution. This scheme reduces much of the computational burden of 2-D implementation. The 2-D gamma kernels are reduced to the forms of 1-D gamma kernels. PAGE 79 70 2nn (/p/s) (95) The adaptation of the weights W and of p can be performed using the LMS (least mean square) algorithm in 1 -D. 4.4 Comparison of the OGD with the CFAR and the t CFAR Detectors To draw a parallel between the QGD decision function (in (8 1 )) and the two parameter CFAR detector (in (51)), we first observe that and are estimates of the local mean and variance, respectively, around the test cell X, and that the standard deviation can be computed from the first and the second moments by (52) and (53). (54) can be interpreted as a linear decision function with a fixed set of weights for a given Tf-yAjiThe measurements that appear in this equation correspond closely to some of those used in the QGD (in (81)). Specifically, ^ Â• X has the same role as X^ , and gÂ„ ^ * X corresponds to the local mean, X^. Similarly, for the second moment we have correspondence between gÂ„ II * ^^and X^. With this formulation for the QGD we preserved the similarity to the two parameter CFAR detector (the types of features used) but have generalized it with respect to: (1) the shape of the kernels used for the estimation of the mean and variance and (2) the selection of the weights of the decision function which are not chosen a priori but are found through optimization. Note also that the QGD has two additional linear terms and a bias term. The two parameter CFAR detector is therefore a special case of the QGD. In this formulation however, we can no longer guarantee the constant false alarm rate property of the two parameter CFAR which is achieved for Gaussian clutter statistics. 4.5 Artificial Neural Networks (ANNs) 4.5.1 Introduction Human being performs many complex tasks such as speech and image recognition PAGE 80 71 with relative ease which are very difficult to solve using traditional algorithmic computing techniques. Much effort has been made on the development of machines which have human-like information processing capabilities. A neural network model, a simplified model of brain, is motivated from the neuroanatomy of living animals with cells corresponding to neurons, activations corresponding to neuronal firing rates, connections corresponding to synapses, and connection weights corresponding to synaptic strengths. The neural network model is also called a connectionist network and consists of a set of computational units and connection weights between the units, with the processing tasks being distributed across many units. Most of the current neural network models are far from realistic biological neural models but they serve as good models for essential information processing that organisms perform. In this chapter, one of the most popular neural models, a multi-layer perceptron (MLP) model is discussed along with its learning algorithms, validation and generalization problems. The idea of the MLP here is used for extending the QGD into a non-linear structure, which will be later discussed in Chapter 4.6. 4.5.2 Multi-laver Perceptrons (MLPs) A multi-layer perceptron (MLP) has played a central role in neural network modeling, which is probably the most widely used ANN. An MLP is a feedforward network in which the network input is propagated forward through several processing layers before the network output is obtained. Each layer is composed of a number of nodes, and each node forms a weighted sum of inputs from the nodes in the previous layer and nonlinearly transforms the sum through a bounded, continuously increasing nonlinearity. A multi-layer feed-forward network is shown in Figure 25. Learning is accomplished by minimizing the error between outputs and target values [68], or by maximizing an entropy measure on the outputs. PAGE 81 72 The network learning based on gradient descent requires knowledge of the derivative of the activation functions associated with neurons so that the activation functions need be differentiable with respect to the network weights. The sigmoidal and hyperbolic tangent functions are commonly used as differentiable activation functions in the MLPs. Thus the network output is a continuous (continuously differentiable) function of every weight in the network, enabling it to be trained using gradient descent rules. The availability of such learning algorithms popularized the MLR The model structure does not depend on the learning rule. It is known that in classification tasks a three-layer MLR network with threshold activation functions could represent an arbitrary decision boundary to arbitrary accuracy [38]. For functional approximation, a three-layer MLR with sufficient nodes in PAGE 82 73 the hidden layer could approximate, to arbitrary degree, any continuous nonlinear function [37], It is obvious that an MLP can be used as a tool for providing a general framework for representing non-linear functional mapping between the input space and the output space. 4.5.3 Training MLPs A neural network has the ability of the network to learn from its environment. A neural network adjusts its free parameters towards minimizing or maximizing a criterion function in an iterative manner during its learning period. There are two general learning paradigms, supervised and unsupervised learning. In supervised learning the output of a network is compared with a desired response and the error between the outputs and the desired responses are used to correct the network parameters. In unsupervised learning any desired responses are not provided to the network. The network discovers for itself interesting categories or features in the input data. In the mid-eighties, a gradient descent learning algorithm called Back Propagation (BP), which enables MLPs to learn arbitrary functional mapping, stimulated considerable interest in learning systems. BP is a supervised learning mechanism in feedforward networks using a cost function and gradient descent which is the most widely used training algorithm. This section discusses BP algorithms (on-line learning and batch learning) for a network having a feed-forward topology and differentiable non-linear activation functions for the case of a differentiable cost function. 4.5.3. 1 On-line Learning In the on-line learning process, a network uses only the information provided by a single training example {x(t), y(t)} when the network parameters are adapted. Consider an epoch of A training examples in the following order: {x(l), J(l)},..., {jc(l), ^/(A0). Fora training pattern, each node computes a weighted sum of inputs of the form Vjin) = Y^Wj.{n)y.{n) (96) PAGE 83 74 where y,(n) is the activation of a node or input, which is connected to node j and Wjj{n) is the weight associated with that connection. The sum in Eq 68 is transformed by a non-linear activation function /(.) to give activation yjin) of node j in the form Since the activation outputs are successively computed layer by layer this process is often called forward propagation. Now, the instantaneous error signal at the output of node k at iteration n is defined by On-line back propagation learning generally attempts to minimize the sum of squared instantaneous errors (SSE) at the nodes in the output layer. The SSE is defined by where K is the number of nodes in the output layer. For a given training set, E{n) represents the cost function as a measure of training set learning performance. The object of the learning process is to adjust the free parameters of the network so as to minimize the cost function. The adjustments to the weights are made in accordance with the respective errors computed for each training example to the network. Gradient descent learning algorithms adapt the weights in the direction of the negative gradient of the cost function. The correction Avv^(n) applied to wy,(n) is defined by the delta rule yjin) =f{vj{n)) ( 97 ) e(n) = dj.{n) -y,,{n) ( 98 ) E(n) = ^ ^ e(k)2 ( 99 ) k = 1 ( 100 ) where q is the learning rate of the BP algorithm. The learning rate controls the speed of network training and affects the network stability. For a node j in the output layer, the par- PAGE 84 75 tial derivative, dE (n) /dwj(n), can be factorized in the form by the chain rule as follows dE(n) dE(n) dvj(n) ~ dvj(n) dwj-(n) dv-(n) = -5.(n) dwj.{n) J' ' dwj-(n) ( 101 ) where 5^. (n) = -dE (n) /dvj (n) is a local gradient at a node in the output layer, which is a sensitive factor of the cost function with respect to the output of a node i before the activation function. Differentiating Vj{n) with respect to wy,(n) yields 9v.(n) = y,(Â«) dw.-{n) The local gradient b. (n) at a node j in the output layer is obtained by ^j(n) = ej(n)ip' (Vj{n)) Accordingly, the correction Aw^(n) with (102) and (103) can be written by Awj-{n) r\6.{n)y-{n) at a node; in the output layer ( 102 ) (103) (104) For a node in a hidden layer, the local gradient can be computed recursively in terms of the local gradients of all nodes in the next layer in the forward direction. o . , dE{n) oAn) = y J dVj{n) Y dv,.{n) dvj(n) = ys^(Â«) V ov-(n) at a node j in a hidden layer (105) where the sum runs over all units k to which the node j sends connections. The net activation level at node k is v^(n) = '^Wi^j{n)yj{n) (106) PAGE 85 76 The partial derivative in (105) can be rewritten _ ^yk(n) dvi.{n) dvj{n) dvi.{n) dvj{n) = a' {vi^in))w^j(n) ( 107 ) Thus, using (105) and (106), we obtain the following back-propagation formula dj(n) = o ' at a node j in a hidden layer (108) k The local gradient at a node in the jth hidden layer can be computed by propagating the local gradients backwards from the nodes in the j + 1 layer. 4.5. 3.2 Learning Rate and Momentum The network training based on the gradient descent method requires to choose a suitable value for the learning rate r\ . The effectiveness and convergence of the BP learning algorithm depend significantly on the value of the learning rate q . When the curvature of the error surface varies significantly with the direction of interest in the weight space a large value of q will result in oscillation in the error surface and, for a fairly flat error surface, a small value of q will result in a very slow convergence in the error minimization process because the weight adaptation applied to a weight is proportional to the derivative of E with respect to the weight. Only small learning rates guarantee a true gradient descent, increasing the total number of learning steps that needed to reach a satisfactory solution. One of the simplest methods in accelerating convergence speech is addition of a momentum term in the weight adaptation (in (104)). Awjj (n) = q5^. (n)y. (n) -iaAwj (n 1) (109) where a is a momentum rate and Awjj (n 1 ) is a weight correction when the n-l th training example is present at the network input. PAGE 86 77 Summary of Back Propagation Algorithm 1. Determine a network topology, initialize randomly all weights and set a learning rate and momentum rate. 2. Repeat until a network performance is satisfied. 2. 1 Forward Propagation step: Feed-forward the training examples into the network and compute the activations of all nodes 2.2 Calculate errors in the output layer and compute the local gradients at all nodes dj{n) = ej{n)o' {Vj{n)) at node j in the output layer (110) 5j(n) = o' at node in a hidden layer (111) 2.3 Update the weights 4. 5. 3. 3 Batch Learning In on-line BP learning, the weight adaptation was performed after the presentation of each training example. The local gradient estimates use only a single piece of training information when the weight adaptation is performed. In batch BP learning, the weights are adjusted after presenting the entire set of training examples. The batch learning provides a more accurate estimate of the gradient vector. The instantaneous squared errors are summed up over the entire training examples and the average of the total squared errors can be defined as a cost function Wj (n -I1 ) = Wj. (n) -Iri5^. (n) y (n) + aAw^.. (n 1 ) ( 112 ) ( 113 ) n = U' = where N is the number of training examples. A correction AWj. applied to a weight proportional to the negative gradient of the cost function PAGE 87 78 I* N 'Lej(n) n = 1 dejin) (114) 4.5.4 Validation of A Neural Model In any learning system, the validation of system learning is one of the most important parts because the objective of network training is not to learn an exact representation of the training data itself, but rather to learn the underlying statistics of the process which govern the data. Therefore, during the learning phase, the network should assess how well it has learned the training data and is able to generalize unforeseen data. In order to develop quantitative techniques to evaluate a networkÂ’s performance with real-world data, rigorous mathematical foundations must be developed to determine the characteristics of the training set and the networkÂ’s ability to generalize from the training set. However, although the question of whether a network possesses the ability to generalize correctly (or sufficiently accurately) is still unsolved, most learning algorithms can successfully learn a set of training examples given a sufficiently flexible model structure or an appropriate learning algorithm. A popular methods in assessing the validity of trained networks is to split the available data into two sets, a training set and a test set. The training set is further split into two subsets; one set used for training the network and the other for evaluating the performance of the network during the training phase [80]. 4.5.4. 1 BiasA^ariance Dilemma A network model that is too inflexible will have a large bias, while one that is too flexible will have a large variance. This can be explained by the bias/variance dilemma [28]. The MSE performance measure for a given training set can be decomposed into the sum of PAGE 88 79 two components which reflect the squared bias of the network error, and the variance of the estimates of the trained network. Let D denote a training set, y{x,D) ne the output of a network trained using the data eontained in D and Â£/)(.) denote the expectation operator taken over all possible training sets, then the output error is given by =y{x)-y{x,D) (115) The effectiveness of the network as a predictor of y (x) by calculating the MSE for all possible training sets D is given by Ed (^y W ) =Ed(9 y D ) ) = (5* (jf) Ed (y (x,D)))^ + Ed( (y (jt) y {x, D) ) The first term in the right hand side is called the square of the bias, and the second term estimates the variance of the network approximations. The bias measures the average modelling error and the variance measures how sensitive the network modelling is to a partieular choice of data set. When the network is too flexible, a large variance occurs so that the performance of the network is very sensitive to a particular training set. This results in a poor MSE performance. The network also produces a poor MSE performance when it possesses too little flexibility which causes a large bias. Bias and variance are complementary quantities. In general, a network should be flexible enough to ensure that the modelling error (bias) is small but should not be over-parameterized because its performance is highly sensitive to a particular training set (high variance). 4.5A.2 Network Complexity and Early Stopping The problem of choosing a network complexity (in an MLP) is to choose the size of the free parameter set used to model the data set (in terms of number of nodes and hidden layers). A good estimate of the true performance of a network is required to determine PAGE 89 80 whether the complexity of a network model is effective for a particular data set. If the model complexity is increased and this resulted in a lower modelling error, it would indicate that the network is not flexible. Similarly, if a more flexible model produced a higher modelling error, this would indicate that the network overfitted the training data set and a simpler network would be preferred for a better generalization. An alternative to obtaining the effective complexity of a network is the procedure of early stopping. Typically, iterative training of learning systems reduces a cost function error as more iterations are made during the training phase. However, the generalization error measured with respect to a validation set often decreases as the number of iterations increases and starts increasing after a certain iteration point where the training causes the network to over-fit the training set. For a good generalization, training can be stopped at the point the generalization error starts increasing. This is referred to as the cross-validation and this procedure provides an appealing guide to better generalization performance of the network. 4.6 Nonlinear Extension To QGD (NL-QGD) 4.6.1 Introduction One of the possible extensions to the QGD is to augment the output adder with a set of nonlinear processing elements which nonlinearly combine the feature elements, i.e. to implement a neural classifier such as an MLR The structure is called the NL-QGD (NonLinear extension to QGD). Since the MLR is capable of creating arbitrary discriminant functions this extension has the potential to improve performance. Note that the QGD creates a quadratic discriminant function of the image intensity, which is only optimal for unimodal probability density functions [8] [27]. Moreover, a nonlinear system normally generalizes better, so the performance in the test set can also improve. In order to fully use the neural network approach we have to develop an iterative algo- PAGE 90 81 rithm to adapt both the weights and p (using the backpropagation procedure). Availability of on-line methods of adapting p effectively means that the detector output error can be used for refining p, or equivalently, to search on-line for the optimal guard band. The relationship between the minimum number of false alarms and the minimum mean square error becomes crucial now, since the system will be adapting p for the smallest MSE. The cost function used should correspond to the minimum false alarm rate, since this is the basis of scoring. Alternate error norms may have to be used to adapt the weights to guarantee that the minimum cost corresponds to the minimum false alarm rate. Figure 26 displays the block diagram of the NL-QGD. As indicated in this figure the feature expansion is still the same as that of the QGD. In Figure 2 1, we can think of the QGD as the linear part of one of the hidden layer processing elements. The two basic issues that need to be addressed when using a neural network are the training and the net topology. 4.6.2 Training the NL-QGD In order to fully develop the neural network approach, an iterative learning scheme is required to adapt the weights and the parameter p. Availability of on-line methods of adapting p effectively means that the detector output error can be used to search on-line for the optimal local area and guard band. The NL-QGD is trained with a desired signal ( 1 s for the target class and Os for the non-target class) using a back-propagation algorithm [68]. d{n) = { 1 feature vector belongs to the target class 0 feature vector belongs to the non target class (117) The sum of squared errors is utilized in this work, i.e. N nun W ^(Â«) = ^ X -z(n) n = 1 (118) PAGE 91 82 Figure 26 Adaptation scheme of the NL-QGD PAGE 92 83 Moreover, in order to adapt the parameter p which controls the scale of the 7CFAR stencil, the error generated at the detector output is back-propagated up to the input layer. The decision boundary of the NL-QGD is therefore formed in conjunction with the parameter p. The correction Ap (n) at each iteration is proportional to the instantaneous gradient dE{n) /d\i(n) . According to the chain rule, this gradient is expressed as follow. dE (n) apCn) where the index i runs over the nodes in the first hidden layer. v/(n) is the output before the nonlinear transformation is applied at the ith node in the first hidden layer and is expressed by p v,(n) = Yj^ip^n)fp{n) (120) p = 1 where the feature element of F is given by (80) at each iteration n and F{n) = [/i (n),/2(n), ...,/p(n)] ^ withP = 8. The local gradients dE (n) /dv(n) can be computed by (105). Therefore, (1 19) is written with the partial derivative of v,(n) with respect to p(n) as ^ dE(n) dv-(n) ^dv-(n) 8p (n) (119) dE(n) dil(n) ' p = 1 dn(n) dp (n) p = I i = p = 1 = Aj(n) dp (rt) dF{n) d\i{n) ( 121 ) where b. (n) and 5^ (n) are the local gradients at the ith node in the first hidden layer and pth node in the input layer respectively. The local gradient vector in the input layer is defined as Ap (n) = [5j(n) 82 (n) ... 5p (n) ] ^. vv,p(/i) is a weight between the pth PAGE 93 84 node in the input layer and the /th node in the first hidden layer. The gradient dF (n) /9|i (n) is given by (92). The adaptation of the parameter |i is given by u,.(n+l) Â— Â— dE{n) j pi = p(Â«) +PA%(/ i) 3 4 Â— F, ^ in) where P is the step size, and j = n, m (one equation for each kernel). (122) PAGE 94 CHAPTER 5 TRAINING STRATEGIES FOR NL-QGD 5.1 Introduction Neural network models are usually defined by three major parts: architecture, cost function and training algorithm. Here the cost function typically measures enors between network estimates and the actual outputs from the training data and plays an important role as a guidance for network optimization. The sum of squares error (SSE) function (L 2 norm) has been widely used as an optimality index for network training. There are many other possible choices of cost functions which can be also considered, depending on particular applications. Minimizing the SSE is equivalent to the maximum likelihood principle for Gaussian distribution errors in a regression problems where the goal is to model the conditional distribution of the output variables given the input samples. When used as a classification device, a feed-forward neural network is normally trained with binary desired responses (0 or 1) given training data during the training period. In order to guarantee that the outputs represent probabilities, the sum of the output values should be equal to one and each output value should lie in the range (0, 1 ). For classification problems, the error distribution is far away from the Gaussian distribution because the target variables are binary. For two-class classification problems, the error has the form of a binomial distribution and multinomial distribution for multi-class classification problems in which the network has the same number of output nodes with the number of class membership. 85 PAGE 95 86 Cross-entropy can also be used as a cost function for network optimization [2] [36] [79]. The cross-entropy cost function maximizes the likelihood of observing a training data set during the training. In this sense, the cross-entropy is theoretically the most appropriate cost function for network optimization in the case of binary classification [69]. However, when a neural network is used as a detector with a single output node in the output layer for two classes and supposed to detect samples from one particular class and reject samples from other classes as much as possible, the network needs to be trained on an optimality condition which is different from the classification problems. Presently this optimal condition has not been formulated. So the problem has to be solved experimentally. Through the search of cost functions for network optimization, the performance of the two detectors (QGD and NL-QGD) are discussed based on cost functions used for training. 5.2 Optimality Index 5.2.1 Lo Norm One of the most popular norms in training networks is the L 2 criterion. For a network with linear outputs, training the network leads to an optimal solution, the least squares solution. The L 2 norm is an appropriate choice for normally distributed inputs in the sense of both minimum cost and minimum probability of prediction error (maximum likelihood). Consider a set of training pairs [x(n), d{n)}, n=l, 2,..., N. Assume that the input vectors x, are drawn randomly and independently from a normal distribution. The objective of network training is not to memorize training data, but rather to model the underlying generator of the data. Therefore the best possible prediction for desired responses d can be made when new data is presented to the trained network. A network training scheme is shown in Figure 27. The most general and complete description of the generator of the data is in terms of the probability density p(x,d) in the joint input-target space [5]. PAGE 96 87 x(n) Teacher ^ Uo PAGE 97 88 Since the second term in (125) is independent of the network parameter, minimizing E is equivalent to minimizing the first term so the error function can be written as E = -'^lnp{d{n)\x{n)) (126) n We assume that the input data has a Gaussian distribution, the target value has a form of deterministic function with added Gaussian noise ej^, so that d^ = h,^{x-,w) (127) We now want to model the deterministic function h^f^x) by a neural network with output yi((x,w) where w is the set of weight parameters determining the network mapping. The distribution of e(n) is given by P{e^) 7^ exp no V 2a^J (128) Using (127) and (128), the probability density of the target variables is given by exp KO f (yk(^'^w) -df.) 2^2 (129) The mapping function between input and target, hi^{x), was replaced by the network model yic(x;w). Substituting (129) into (126), we obtain the error function [5], j N c ^ X {yk(^(fT-)r'^) -d,,{n)}^ + NClnc+^ln(2n) (130) ik=i ^ where C is the number of classes and N the number of exemplars. In ( 1 30), the second and third term are independent of the network parameters w and can be omitted. The premultiplication factor in the first term can also be omitted. The error function then has a form of the SSE function 1 "" ^ n = U= 1 ( 131 ) PAGE 98 89 Here, the SSE cost function was derived from the principle of maximum likelihood on the assumption of Gaussian distributed target data. 5.2.2 Training with Excluding Outliers from Non-Target Class with Lo Norm In the training data for the QGD and NL-QGD, it is observed that some of the target and non-target class data overlap because when a pixel intensity significantly exceeds the mean intensity in its surrounding region the two-parameter GEAR or yCFAR detector declares the pixel as detected because targets and man-made objects (non-targets) are usually brighter than background in their surrounding areas. The objective of training the NL-QGD (or any detector for that matter) is not to obtain minimum classification error but to achieve a minimum false alarm rate while maintaining a high probability of target detection. The training of neural networks are mostly intended to produce minimum classification error in given training data and to well generalize the data which has never been seen by the network. This leads to partitioning of the input space into subregions according to the number of classes and usually places a decision boundary among highly populated regions of classes. In two-class detection problems, the input space is partitioned into two subspaces, one for the target class and the other for the non-target class. The NL-QGD produces very low values of outputs for target data which are far away from the decision boundary and reside in a deep region of the non-target subregion. These low output values really degrade high probability of target detection because a threshold for the NL-QGD is set based on minimum target output value for a 100% target detection. It is therefore desired that a decision boundary be placed in order to encompass all target samples into the target subregion and exclude non-target samples by as many as possible from the target subregion. This, of course, may not yield minimum classification errors but leads to obtaining high probability detection in favor of one class. Figure 28 illustrates the effect of outliers in placing a decision boundary in the input PAGE 99 90 Space. In Figure 28a, the data set contains 15 samples (marked by Â“*Â”) for target class and 15 samples (marked by Â“oÂ”) in the input space. The desired responses are IÂ’s and -IÂ’s respectively for the target class and non-target class. In Figure 28a, the decision boundary #1 was configured based on the LS solution by using all samples while the decision boundary #2 was formed by the LS solution which was computed after removing four non-target outliers ((-3,1), (-2,2), (-1,2) and (-1 1)) from the data set. By removing the four non-target outliers when computing a LS solution, the decision boundary moves in the direction where the target outliers are located so that more target-class samples are included above the decision boundary #2. Figure 28b plots the outputs of the data set based on the two LS solutions. From the decision boundary #1 , 10 false alarms occurs in order to detect all target outputs. However the decision boundary #2 yields 8 false alarms for all target detection. There is a lack of theoretical foundation to design cost functions for network training in such a case. A simple way of implementing the idea is to train the NL-QGD, excluding outliers from non-target samples during the training period. Since outliers from the training data cause low output values for target class and high output values for non-target class, this yields large error values which are used to correct network parameters during training. Error values serve as forces that position the decision boundaries in the direction of the corresponding samples. Therefore removing non-target outliers helps the decision boundary to shift towards the direction of target outliers. This can yield larger output values of the NL-QGD from target outliers and reduces false alarms correspondingly on the operating point of high probability target detection. PAGE 100 91 Figure 28 An example of outlier effect to decision boundary formation and false alarm rate. PAGE 101 92 Training procedure of NL-QGD with removal of non-target outliers 1. Train NL-QGD with a complete set of training data and cross validation data for a certain number (Nj) of iterations 2. Compute false alarm for 100% target detection from the cross validation set. 2.1 if (minFA > FA(iter)) { Â• minFA = FA(iter) Â• stopiter = stopiter -i1 Â• remove the non-target exemplar which caused the largest output values from the nontarget exemplars of the training data. } else i/ (minFA = FA(iter)) { //(stopiter < N2) stopiter = stopiter -11 else stop training } 2.2 train NL-QGD for a certain number of iterations (N3). 2.3 go to step 2 5.2.3 L^ Norm The L2 norm (SSE function) was derived from the Gaussian distributed target variables. The L2 norm equally weighs all errors. When there are long tails on the distributions then the solution can be significantly affected by a very small number of points called outliers which cause particularly large errors. We can obtain more general error functions by extending the Gaussian distribution function to a more general form [5] [7] P(e) 2cr(Wp) Â‘^Â’Â’[ >\P \ Pi<^JP (132) where F(.) denotes the Gamma function and is the dispersion parameter in the Lp sense. For the case of /? = 2 this reduces to a Gaussian. Substituting (132) into (125) and omitting constant terms, we obtain a generalized error function of the SSE N C n=\k=\ (133) PAGE 102 93 This shows the link between error distribution and Lp norm criterion function. Figure 29 shows error weighting effect of Lp norms according to a norm power, p. When p increases, large errors become more weighed than small errors. On the other hand, small error get more weighed than large errors as p decreases below p = 1 . This implicates that the decision boundary forms differently in the input space, depending upon norms. Figure 29 \dy|^ versus e for different p. The derivatives of the Lp norm error function with respect to the network parameters are [5] given by '^ji n k sign{df.{n) -y^{x{n) ;w) ) ^y^, (jf (n) ;w) ^ 134 ) If we view the Lp norm error function in an asymptotic sense the error function is PAGE 103 94 rt = lit = 1 C = j\di,-y,^(x;w)\Pp(x\(x)i^)dx k= 1 c = ^ P(cop (135) k = 1 where fix) = \d,^-y^ix;w)\P (x| (o^ (136) For p = 2, the cost function (L 2 ) places emphasis on the errors from samples in densely populated regions, that is, in the regions where p(x) is large, rather than in the regions near the decision boundary. When p is large the optimization process of the network is dominated by the samples which cause large errors. If p is taken to be small, more emphases is placed in the region of the decision boundary. A BP algorithm for MLPs can be used with only minor modification at the local gradient in the output layer. The local gradients in the output layer is modified as where / and L are a node index and the number of nodes in the output layer respectively. Ein) is a instantaneous error defined as Large norms usually slows down the training speed. This effect can be seen by rewritS/(Â«) L = 9 ' (y/ (Â«) ) X {di in) Jiin) ) \di in) in) \p Â‘ (137) /= 1 (138) ing (137) in the following equation PAGE 104 95 L 5,(Â«) = |e(n)P"^(p'(y,(n)) -y^(rt))|e(rt)| (139) / = 1 The weight adaptation is Aw I (n) = T|5^ (n) (n) + aAwy (n 1) L = ^p(p' iyi(n))yiin) Y^sign{di{n) (n) ) |e (n) | (140) 1 = 1 + aAw; (n 1) where r\^ is the effective learning rate for a norm ip) and ri^ = r\\e (n)\P~^. When \e{n)\ < 1 , Tj^ become very small for large norms (p > 2) so that the convergence speed becomes very slow. Note that r| is the learning rate for p = 2. 5.2.4 Mixed Lj, Norm The errors from the training data are equally penalized with the L 2 norm. For p <2, more emphasis is placed on smaller errors than on large errors while the Lp norms (p>2) penalize larger errors more than smaller errors. So, by using mixed norms, the different emphasis on the target cass samples and the non-target class samples can be made in training the NL-QGD. It is desirable to use a larger norm (p>2) for the target class in order to prevent the NL-QGD from producing very low output values from target outliers and a smaller norm (p <2) for the non-target class. Since the smaller norm weighs the smaller errors more, large errors from non-target outliers could not gain force as much as the L 2 case to pull the decision boundary into their directions. A cost function can thus be defined with a mixed form of two different norms. E = 1 N. X \d-y(X,w) pi ntx e CO 1/p/ -I^ X \d-y{X,w)\p^ _^nXea, [/ph (141) where is the number of non-target samples and is the number of target samples, pi is a small norm (pi < 2) and ph is a large norm. In order to prevent the sum of p/-power errors from dominating the sum of p/z-power errors (ph > 2) when errors are less than 1 (the case PAGE 105 96 for NL-QGD with a sigmoidal activation function at the output) those sums are normalized by the numbers of their class samples and the power of the inverses of pi and ph respectively. The weight adaptation is performed after presenting all the samples each time (batch training). The local gradient at the output can be written as N. \d-y{X,w)\P^ ntX e CO pi 1 X \d-y(X,w)\PÂ‘ V(V;t) ntX e CO nXe 2 ph ^ 1 X |c/-y(X,w)|/Â’^-*(p'(v,) nXe 2 (142) 5.2.5 Cross Entropy The binomial distribution, one of the most useful discrete distributions, is based on the idea of a Bernoulli trial [69]. A Bernoulli trial is an experiment with only two possible outcomes. Observation of this nature arise, for instance, in medical trials where, at the end of the trial period, a patient has either recovered {d=\)or has not {d = 0). When a network uses a single output which can meet probability conditions for two-class classification problems, we want the value of the output, y, to represent the posteriori probability p (CO j I or) for class cOj , that is, y = p{(a^^x) and for class cOj, p (CO2I x) =l-y. Inour problem, y = P (target class/input image chip) for the target class {d= 1) and 1 y = P(non-target/input image chips) for the non-target class (d = 0). So we combine this scheme into a single scheme. The probability of observing either target or non-target is p{t\x) = yÂ‘^{l-y)^-Â‘^ (143) This is the Bernoulli distribution which is a special case of the binomial distribution. Assuming that the target or non-target data are drawn independently from this distribution. PAGE 106 97 the likelihood function {p{d\ x)) is expressed by ( 144 ) n The goal is to maximize this likelihood function given input data. For convenience, we again want to minimize the negative logarithm of the likelihood function which can be thought as a cost function (E) for the network optimization [5] [69]. E = { d{n) lny{n) + { \ d{n)) ln{ \ y{n ) ) } (145) n This error function is called the cross-entropy error function between desired responses (d) and the posterior probabilities (y). This shows that the cross entropy is a natural cost function for the two class classification. Now, the derivative of this function is _ y{n) -d{n) dy{n) y(n) (1 -y(n) ) (146) Note that when y (n) = d (n) for all Â« Â£ = 0. With the interpretation of output activations having probabilities, we want them to range between 0 and 1 . Therefore a sigmoidal function is natural and the posterior probability can be written in the form of logistic function. P = Â— ^ (147) 1+e ^ We see that the derivative of E with respect to e is dE ^ = y(n) -d(n) (148) The error form has the same form of sum of squared error with linear output units. But the outputs here represents the posterior probabilities, q can be a linear discriminant function or a nonlinear discriminant function. For desired responses having 1 for target class and 0 for non-target class, the cross entropy cost function can be written in the following form [5] PAGE 107 98 E = = ^ ln{l e (n ) ) ^ ln{ \ + e (n) ) (149) n G target n e non target class class where e(n) = d{n) y{n). For small e(n), the cross entropy function becomes E^Y^\e{n)\ (150) n This has the same form of Lj norm. Small errors are more weighed by the cross entropy function than Lp norms (p > 2). The cross entropy cost function versus error is plotted in Figure 30. The small errors and large errors are penalized more by the cross entropy cost function than Lp norms. e = d-y Figure 30 Cross entropy cost function versus error. A BP algorithm can be used for training MLPs with the cross entropy cost function. The local gradients are modified in the output layer of MLPs and given by PAGE 108 99 5 , = !;-Â£(Â„) l=\ ' ^ /= 1 ^ if ne target class if n Â€. non target class where E(n) is an instantaneous error defined as L ^ /n ( 1 e (rt) ) if nG target class E{n) = /= 1 L In { \ + e (n ) ) if n e non target class 1 = 1 ( 151 ) ( 152 ) PAGE 109 CHAPTER 6 EXPERIMENTS AND RESULTS 6.1 Introduction This chapter experiments the focus of attention proposed for ATR and evaluates the system performance based on a millimeter wave SAR imagery (Mission 90 Pass 5 SAR data set). In Section 6.2, the two prescreeners, the two-parameter GEAR detector and the TCFAR detector, are evaluated based on their detection performances over the entire imagery of the Mission 90 Pass 5 SAR data set. Section 6.3 assesses the detection performance of the QGD and the NL-QGDs applied to the regions of interest (ROIs) passed by the twoparameter GEAR detector. The results of discriminating powers of the QGD and NLQGDs are compared based on ROCs over the false alarms and targets detected by the twoparameter GEAR detector. The detection performance of the NL-QGDs trained based on different optimality indices are presented for different sizes of networks. The detection performances of the QGD and NL-QGDs are presented in conjunction with the yCFAR detector in Section 6.3.4. 6.2 Prescreening SAR Imagery 6.2.1 Two-Parameter GEAR Processing The two-parameter GEAR detector was run over the 127 frames (about 7 km^) of the mission 90 pass 5 data. 345 targets from the TABILS 24 ISAR data base were embedded based on the method mentioned in Section 2.7.2. The size of the GEAR stencil was 85 by 100 PAGE 110 101 85 pixels in order to compare with the CFAR stencil (84 by 84 pixels for 1 ft resolution PWF SAR data) by Novak et al. [57]. The local mean and standard deviation are computed in the outmost 4-pixel wide boundary of the stencil. The intensity of the test pixel is computed by averaging 3 by 3 pixels in the center of the stencil. After the CFAR processing, multiple detection points occur in targets and other regions because targets and man-made clutter (and tree tops) normally consist of many high reflectivity pixels that trigger the prescreener repeatedly (raw detections). It is therefore required that a clustering process over the multiple detections be performed as a more representative count of detections and false alarms. The false alarm reduction stage and the classification stage in Figure 2 only operate on the clustered locations. The size of the clustering region was determined by the size of targets (in this case, clustering is 22-pixel long). The clustering used is as follows: each frame (512 by 2048 pixels) of the SAR data is processed by the front-end detection stage and the outputs which exceed a threshold in (51) are stored by magnitude; the clustering starts at the location with the maximum output and groups all the raw detections within a representative range of target sizes; next, all those grouped locations are merged at a single representative location based upon the weighted sum of the outputs in the grouping. The number of false alarms is computed from the clustered detections. The two-parameter CFAR detector yielded 4,455 false alarms over the 127 frames of the mission 90 pass 5 SAR data set when the detection threshold was set at 100% target detection (all 345 targets). Figure 31 displays some of detection and clustering results by the two-parameter CFAR detector. The detection points (false positives) in the centers of the subimages (image chips having 85 by 85 sizes) mostly exhibit high contrasts relative to their surroundings because the two-parameter CFAR detector depends on relative intensity information. PAGE 111 102 .: . -aP3 ' . : .V. Â• . ?iT-; "ri? m ' ' ' '. ...' -/ vv* "> " "i ' " PAGE 112 103 6.2.2 3 CFAR Processing 6.2.2. 1 Optimal Parameter Search From the false alarms triggered by the two-parameter CFAR detector for 100% target detection, 550 false alarms as non-target image chips were randomly selected and integrated with the set of 345 target image chips to create the data base for finding optimal parameters (p-j and P15) of the 7CFAR detector. After the clustering process, each of the targets and non-target objects from detections was centered in the image chips by using its highest intensity pixel. Some of non-target image chips are shown in Figure 31 . The parameter space of the pj and P15 was incrementally scanned between 0.0304 and 4.6052. This range was converted from the range (0 to 1) in which gamma kernels in discrete time are stable: first, the range (0 to 1) was equally scanned in 33 steps for stable gamma kernels in discrete time; next, it was converted to the range (0.03 to 4.6052) by the following relationship p,^ = -In ( 1 p^) where p^ are values of the scanned range (0 to 1) in 33 steps and p^ the converted values for continuous time. Kernels with 33 different p values were computed which correspond to memory depth of about 0.5 to 33 pixels for the gj kernel, and 8 to 495 pixels for gi5 kernel. The false alarms were computed in the follow way: First, a target output of the yCFAR detector (set to detect all the targets) is found and set to be a threshold for each set of p-i and p-15 and then, with this threshold, we computed the corresponding false alarm rate in the parameter space. The false alarm surface in the pj and pj5 space is shown in Figure 31a. After searching the 2-D false alarm surface, the minimum false alarm rate was found with p,j = 1.0788 (index 22, memory depth of 1 pixel) and P15 = 0.5978 (index 15, memory depth of 25 pixels). This combination constitutes a yCFAR stencil which can be thought of as a local feature extractor for the best discrimination between the target and non-target classes in the training image chips. The minimum number of false alarms produced by the 7CFAR detector was 105 in the training image chips. With the optimal values PAGE 113 104 of ( 4 | and (X| 5 , the guard band size was approximately 15-pixel wide, and the local area was approximated within a 10-pixel wide ring band (Figure 32b). Note that the yCFAR detector was able to improve upon the CFAR performance which produced 550 false alarms, changing the memory depths of the 2-D gamma kernels (^i and ^ 15 ) in the stencil. 6.2.2. 2 Impact of Stencil Size in False Alarm The false alarm surface in Figure 32a illustrates the importance of the stencil for detection performance. What is important to note is the dramatic dependence of false alarm rate on the shapes of gj and ^15 kernels. A small difference in the shapes of the kernels makes a big difference in the false alarm performance. The gamma kernels are continuous functions of |xÂ’s so that the gamma kernels can be differentiated with respect to the parameter |iÂ’s. Hence a productive way of setting the scale parameters, | 4 j and 1 x 15 , is to use adaptation algorithms from adaptive filter theory. Deciding on the shape of the stencil by the geometric characteristic of targets, as done in the two-parameter CFAR stencil, will probably give suboptimal performance. Since the 2-D gamma kernels are circularly symmetric, a single parameter of the kernels controls the kernel shapes with the kernel orders fixed. More versatile shapes may perform better, but we have to be prepared for the explosion in the degree of freedom and the inherent difficulty of setting more parameters. It is also interesting to see the effect of changing the guard area size and the size of a target masking kernel in the CFAR stencil. In order to fairly compare the CFAR stencil with the yCFAR stencil, circular kernels with abrupt changes in magnitude were used (Figure 33a). The size of the clutter masking kernel was set to be 10-pixel wide, to approximate the width (about 10-pixel wide) of the optimal g |5 kernel (Figure 32b). The numbers of false alarms were computed by changing r 2 and rj. The minimum number of false alarms was 360 which was obtained at rj = 2 and r 2 = 27 (Figure 33b). This number is about 3.5 times the minimum number (105 false alarms) of false alarms by the yCFAR PAGE 114 105 detector. This implies that the abrupt shape of the CFAR stencil may cause undesirable detection performance. 10 20 30 40 50 60 70 80 b) tCFAR stencil corresponding at the optimal Pi and P 15 . Figure 32 The false alarm surface of 7 CFAR detector and the corresponding stencil at the optimal parameters. PAGE 115 106 Figure 33 Impact of CFAR stencil size to false alarm: a) round-shape CFAR stencil with abrupt change in magnitude, b) false alarms versus the radius rj of a target masking kernel and the radius r 2 of a clutter masking kernel from the pivot point in the CFAR stencil. PAGE 116 107 6.2.2.3 Batch-running the tCFAR Detector With the optimal set of p-j and p. 15 , the 7 CFAR detector was run over the first 127 frames of the Mission 90 Pass 5 SAR data set with the same embedded targets as for the CFAR detector. The size of the 7 CFAR stencil was 85 by 85 pixels as for the CFAR stencil. In order to compute the 7 CFAR output at each pixel in the image, three features (1st moment at the center pixel and 1 st and 2 nd moments in the local region) are required to be computed by convolving the image with two gamma kernels (gj and ^). The convolution was computed in the frequency domain using FFTs to obtain better computational efficiency. The image sequence was divided into overlapping radix 2 windows (2048 by 128 pixels) and processed by an overlap and save method. After processing the entire imagery of the Mission 90 Pass 5 SAR data set by the 7 CFAR detector, the minimum output value was selected to be a threshold for 100% target detection. All detection points above the threshold were clustered as for the CFAR case. 6.2.3 Performance Comparison of the Two-Parameter CFAR Detector and the "\ C!FAR Detector. The performances of the two-parameter CFAR detector and the yCFAR detector aie compared by ROC curves (Figure 34). For 100% detection, the 7 CFAR detector yielded 760 false alarms while the CFAR detector produced 4,455 false alarms (a 1 :6 ratio). Less false alarms created by the prescreener means that the computational bandwidth of the subsequent processing modules can be decreased. With a discount of 2% target outliers, the yCFAR and the CFAR detectors yielded 239 and 510 false alarms respectively for 98% target detection. Overall, the ROC of the 7 CFAR in Figure 34 shows more robust performance than the standard two parameter CFAR test. Figure 35 shows the number of raw detections (false positives) and clustered detections per frame. The yCFAR detector outperformed the PAGE 117 108 detection performance of the two-parameter CFAR detector in all frames, i.e. both for natural and cultural clutter. We can observe that the number of false positive change from frame to frame. It is also interesting to note that clustering merges the raw detections into smaller numbers of ROIs in each frame. The overall computation requirements by the twoparameter CFAR detector is larger for clustering due to an excessive false positives. Figure 37 and Figure 38 display the performance of both detectors in areas having different statistical characteristics, i.e. natural clutter area from frames 7 and 8 and a cultural clutter area from frames 121 and 122 (Figure 36). In the cultural clutter region, the yCFAR detector produced 59 false alarms the CFAR detector yielded 1 13 false alarms. In the natural clutter region, 4 false alarms occurred by the yCFAR detector and 28 false alarms by the CFAR detector. Note that the objects in the boxes in Figure 36b indicate targets embedded. 350 300 PAGE 118 109 In Figure 38, multiple detections occurs on tops of the targets and brighter objects of man-made clutter because these are typically high contrast relative to their background clutter. The two-parameter CFAR detector produced considerable amounts of detections along the roof edges of the houses toward the SAR (that is, upwards the figure). Many detections also occur on tops of the trees in natural clutter because the relative contrasts between trees and their surrounding radar shadows are large which causes the CFAR detections. 6.2.4 Conclusion Unlike the two-parameter CFAR detector which uses an a priori defined stencil, the TCFAR detector utilizes the family of 2-D gamma kernels where the free parameter p changes the shape and scale of the 7 CFAR stencil. This work shows that the performance of the standard two parameter CFAR detector can be improved if the size of the stencil is optimized. It is true that the shape of the 7 CFAR is graded and also different from that of the CFAR stencil, but one of the most important factors for the performance improvement is the stencil size. The optimal value of p was found through exhaustive search just to quantify what is the best possible performance. In an adaptive signal processing framework, we can find the best value of p through training, either by minimizing false alarms or by minimizing an output error measure. This has been done already for the 1-D case [64], and the methodology can be extended to the 2-D case. The big challenge is to find an on-line performance criterion that will be able to decide what is the best scale. Further research should also look into the use of nonsymmetric kernels. The "yCFAR can be interpreted as an estimator of the local intensity statistics. In this perspective the use of the full kernel, instead of using only gj and gi 5 motivated by the CFAR stencil, should improve the performance of the detector. PAGE 119 110 b) Figure 35 False alarms vs. frame, a) The number of raw detections (false detections) was counted before clustering, and b) the number of false alarms was counted to detect 345 targets (testing target set) after clustering through 127 frames of the Mission 90 Pass 5 SAR data set. PAGE 120 Ill b) cultural clutter (man-made objects) bushes and grass farm house farm supplies embedded target cars trees a) natural clutter (trees and bushes) Figure 36 A natural clutter region (a) from frames 121 and 122 and cultural clutter region (b) from frames 7 and 8 in the mission 90 pass 5 data set. In the cultural clutter, 13 targets were embedded and marked by rectangular boxes. PAGE 121 112 Figure 37 The performance comparison of CFAR and 7 CFAR detector. The CFAR detector yielded 28 false alarms while the 7 CFAR detector triggered only 4 false alarms. a) CFAR detector b) tCFAR detector PAGE 122 113 Â®,Â«<* >'# a) CFAK (Iclcctor Â•t) (X) Â• ? Hi. tt, ty ,Ci> (K). (x)^' itr^ 'W SssSffi#*! vt ^'' Â•=','''' J Â‘Â•' / Â•''V '-Â»Vv, PAGE 123 114 6.3 False Alarm Reduction 6.3.1 CFAR/OGD 6 . 3. 1.1 Trainine OGD 6.3. 1.1.1 Training Data Preparation In order to train the QGD, 275 embedded target image chips of a training set were used for the target class and 550 clutter image chips were randomly selected from the 4455 false alarms caused by the two-parameter CFAR detector. Before the target and clutter image chips were collected, the clustering was performed and multiple detections within a target size (23 pixel long) were clustered into a centroid. 6 . 3. 1.1. 2 Optimal Weights by the Closed Form Solution The least squares method is utilized to optimally compute the weights of the QGD in the training set. The weights are found such that the power of differences between the system outputs and the corresponding desired responses are minimized (L 2 norm). Since the size of the training set (550 clutter and 275 target image chips) is greater than the number of weights (8 weights), the method solves an overdetermined system of linear equations by (85), and is a parametric least squares problem because the least squares solution is also dependent of the parameters, pj and P 15 . The parameters and weight vector of the QGD can be obtained through an exhaustive search in the parameter space. Given each combination of Pi and pj 5 , the corresponding least squares solution is computed. Optimal parameters and weight vectors are determined in which the number of false alarms is minimum in the training set. Figure 39a shows the false alarm surface in the parameter space p. The optimal indices of the parameters, P] and P 15 , are 8 and 16 which correspond to 0.274 and 0.654 respectively. Note that the optimal values of P] are much smaller than those of pj in the TCFAR detector, which means that the shape of the kernel gj of the QGD is much broader. PAGE 124 115 Since, in the training set of the QGD, the clustering points are the centers of mass of detections, the brightest pixels are not always placed at the centers of image chips. The gj kernel of the QGD stretches out to capture the bright pixel intensities which usually help discriminate targets from clutter. So gj ends up being a broader shape of kernel compared to the that in the yCFAR stencil. Another distinguishing feature of the QGD false alarm surface is that it shows the small numbers of false alarms at several different location in the parameter surface (local minima in the false alarms). This is because, in addition to the parameters (pÂ’s), the QGD has 8 weights which allow for more discriminating power than the yCFAR detector. That is, the optimal weights counterbalance improper choices of the parameters. The SSEs of the QGD were computed in the parameter space as for the false alarms and the SSE surface is shown in Figure 39b. Compared to the false alarm surface, the SSE surface is much smoother and the optimal indices of P] and pj5 yielding the least SSE are 4 and 16 which correspond 0. 128 and 0.654. The optimal P15 in the false alarm surface is the same as the optimal P15 in the SSE surface but the least SSE is found at a smaller value of the pj, compared to the minimum false alarm. However, the two optimal points in both the SSE and the false alarm surfaces are close to each other in the parameter space. This implies that, as for the case of the LS method with an exhaustive search of the parameter space (vvi,..., wg), the QGD may be able to obtain a comparable false alarm performance from an gradient descent method by adaptively computing the free parameters (pj, P15, wg) when the SSE approaches nearby the global minimum point in the 10-dimensional parameter space (pj, P15, wj,..., Wg) during the training. Table 1 Detection performance of QGD in the training set. Quadratic Gamma Detector (QGD) Target detection rates 100% 99% 98% 95% 92% SSE # of false alarms 14 14 8 4 3 0.0273 PAGE 125 116 Figure 39 Performance surfaces of the QGD in parameter space, a) False alarm surface, b) SSE surface with optimal weights. PAGE 126 117 ROC curve: QGD QGD outputs for the clutter/target image chips 275 target image chips image chips 100 200 300 400 500 600 700 800 900 image chip index b) Figure 40 The discriminating performance of the QGD in the training set. a) ROC curves, b) QGD outputs for the inputs of clutter and target image chips after training. PAGE 127 118 In order to detect all targets (275 targets) in the training set, the QGD yielded 14 false alarms at 100% and 99% detection rates, and for 98%, 95% and 92% target detections, 8 , 4, and 3 false alarms occurred respectively in Table 1. As shown in Figure 40b, clutter and target outputs were well separated. 6 . 3 .1.1 .3 Testing the QGD Given a set of parameters, the QGD training seeks optimal weights based on minimizing the sum of squared errors between the outputs and their corresponding desired responses. The optimal set of parameters (P| and Pi 5 ) are determined which leads to the minimum number of false alarms in the training set. With the optimal set of parameters and optimal weights, the QGD was tested on different sets of clutter and target image chip embeddings. Table 2 shows the performance of the QGD in the testing phase based on SSE and false alarms given detection rates. The QGD exhibits a discrimination power of reducing 3905 false alarms to 385, 1 18, 97, 53 and 42 false alarms for 100 %, 99%, 98%, 95%, and 92% detection rates respectively. For all clutter image chips (4455 false alarms caused by the two-parameter CFAR detector), the false alarms were reduced to 422, 132, 109, 57, and 44 at 100%, 99%, 98%, 95%, and 92% detection rates respectively by the QGD. A discrimination power of about l.TO ratio (422/4455) was obtained by the QGD at 100% target detections in the false alarm reduction stage in the ATD/R system (Figure 2). 6 . 3. 1.2 Training the QGD in an Iterative Manner An optimal weight set of the QGD was computed based on the LS given pj and P 15 in the previous section. The optimal parameter set (pj and P 15 ) was selected based on the minimum false alarm for 100% detection after an exhaustive search of the 2-D parameter space. Here the weights and the parameters (pi and P 15 ) were adaptively computed in an PAGE 128 119 a) ROC curve b) QGD outputs for the clutter/target image chips 500 1000 1500 2000 2500 3000 3500 4000 Figure 41 Testing results of the QGD for testing set #1 . 1 . a) ROC curve, b) outputs of the QGD. PAGE 129 120 a) ROC curve b) QGD outputs for the clutter/target image chips _1.5l 1 1 1 1 1 1 1 1 1 Â— 500 1000 1500 2000 2500 3000 3500 4000 4500 Figure 42 Testing results of the QGD for testing set #2. a) ROC curves, b) outputs of the QGD. PAGE 130 121 Table 2 Detection performance of QGD in testing. Quadratic Gamma Detector (QGD) Target detection rates 100% 99% 98% 95% 92% SSE No. of false alarms in test385/ 118/ 97/ 53/ 42/ 0.0202/ ing set #1.1 /No. of false alarms in testing set #2 422 132 109 57 44 0.0192 Note that the testing set #1 . 1 contains 3905 clutter image chips (the image chips except 550 training clutter image chips out of 4455 false alarms by the two-parameter CFAR detector) and 345 target image chips embedded over 127 frames (about 7 kni^) of the mission 90 pass 5 data set for testing purpose. The testing set #2 includes all the clutter image chips (4455) and 345 target image chips embedded for testing. iterative manner by a gradient descent method during the training. At each iteration, the learning and momentum rates were changed to accelerate the convergence speed of the weight and parameter variables, that is, when the current error is smaller that the error at the previous iteration the learning and momentum rates were increased by small amounts and otherwise decreased to 70% of the values of the learning and momentum rates at the previous iteration. In the QGD feature set, the intensity mean values is much smaller than the intensity squared mean. This causes a wide spread in optimal weight values. So the weight and )i adaptation were performed based on a whitened feature set at each iteration. The training was stopped when the change in the number of false alarms for 100% detection was less than two false alarms for 3500 iterations. Figure 43a and Figure 43b show the learning curve and false alarm curve of the QGD. The number of false alarms for 100% detection decreases as the error decreases during the training. PAGE 131 122 after training the QGD, d) Outputs of the QGD from the training set after training the QGD, e) ROC curve from the testing set, and f) Adaptation of the parameters (|ij and Hi5)- PAGE 132 123 iterations iterations iterations iterations Figure 44 Adaptation of weights PAGE 133 124 In the LS solution, the optimal set of parameters were chosen based on the number of minimum false alarms. In this iterative method, the parameters were adaptively computed in the sense of minimizing the SSE. In Figure 43f and Figure 44, the parameter ( 1 x 15 ) and the weights (W 2 and Wg) didnÂ’t seem to converge to certain values. Since the training was stopped based on the change in the false alarms for 100 % target detection, the outputs of the QGD after the training was stopped were well separated (Figure 43d). Table 3 demonstrates the detection performance of the QGD trained in the iterative manner. Table 3 Detection performance of the adaptively trained QGD in the training set. Quadratic Gamma Detector (QGD) Target detection rates 100 % 99% 98% 95% 92% SSE # of false alarms 35 14 8 6 4 0.0196 The detection performance of the iteratively trained QGD is inferior to the QGD trained on the LS at probabilities of detection (100% ~ 92%) in the training set. Table 4 Detection performance of the adaptively trained QGD in the testing set. Quadratic Gamma Detector (QGD) Target detection rates 100 % 99% 98% 95% 92% SSE # of false alarms 598/ 142/ 106/ 48/ 36/ 0.0281/ 695 162 122 55 42 0.0273 In the testing set, as in the case of the training set, the iteratively trained QGD yielded more false alarms at high probabilities (100% ~ 98%) of detection (Table 4). Below about 95%, the detection rate is higher than that of the QGD trained on the LS. PAGE 134 125 6.3. 1.3 Independent Testine Results for the OGD A more complete study of the QGD was independently conducted at MIT Lincoln Laboratory using a large SAR database of real military targets and ground clutter. The objective of this study was to evaluate the performance of the QGD in aiding the detection stage of the SAR ATD/R system. The data for this first test case consisted of high resolution (1^ X Ift) fully polarimetric SAR imagery preprocessed using the PWF. The QGD was trained on two target types using spotlight target data and also manmade discretes from stripmap clutter data. A total of 135 target image chips were chosen for training; these were 5 degrees apart in aspect angle (i.e., 5, 10, 15 degrees, etc.). The clutter data used for training consisted of 100 typical man-made discretes. Evaluation of this test case was performed using spotlight target and stripmap clutter data. As in the training stage, spotlight data of two targets that were 5 degrees apart in aspect angle (i.e., 3, 8, 13, 18 degrees, etc.) were used for testing. The test clutter data consisted of 4727 stripmap clutter image chips extracted from a total of 56 knP' in area. Thus, the test data set for this experiment was composed of 139 target image chips and 4727 clutter image chips. The QGD was evaluated by running the data through the two-parameter GEAR detector first (i.e. only over the image chips that triggered by the two-parameter GEAR detector). Then, the ROC curves were obtained by computing the cumulative number of false alarms out of each detector. At a probability of detection of 1 .0 {P^ = 1 .0), the GEAR algorithm detected 139 targets and had 2499 false alarms, whereas the QGD, while also detecting 139 targets, reduced the above-mentioned false alarm number to 7 1 5 (Figure 45). The second test case study used single channel (HH) stripmap imagery with a resolution of Im X Im. The training set for the QGD consisted of 52 target image chips and 150 clutter image chips that represented two types of targets and man made clutter. The evaluation of the QGD was performed using 75 target image chips and 44599 clutter image chips, which triggered the two parameter GEAR detector when analyzing a 23 1 krr?' of area. PAGE 135 126 'y False alarms per km Figure 45 Discriminant performance of the QGD versus CFAR. X) a X) o c .2 o o Q o.oL 0.1 HH 1M Doto Processed vio Neurol Network (U of Florido) 44599 clutter chips 75 target chips At Pd of 1.0 there ore; 75 torgets detected 20910 Folse Alorms HH 1M Data CFAR Anolysis 44595 clutter chips 75 torget chips Al KO PAGE 136 127 Figure 46 shows the results. At = 100%, the two parameter CFAR detector had 39,709 false alarms, while the QGD also detected all 75 targets and had only 19,037 false alarms. These two experiments constitute a comprehensive test of the performance of the QGD. As we expected the generalization capability of the QGD is very good, which should not be surprising, since the QGD has only 8 free parameters. Consequently, in practice the QGD can be trained for robust performance. The better discriminant characteristics of the QGD can be due to the better estimates for the mean and standard deviation for targets and clutter, or due to the larger feature space. 6. 3. 1.4 Conclusion The appeal of the QGD is that the scale of the regions where the statistics are estimated can be adapted during training with the error at the output of the detector. The implementation that we chose uses only a subset of the gamma kernels and ^15). It was derived by analogy with the two parameter CFAR detector to enable a straight forward comparison with this widely used algorithm. In the QGD, the estimates of the local statistics are obtained by convolution with the g\ (cell under test) and ^15 kernels (local neighborhood). The QGD is superior to the conventional two parameter CFAR in terms of detection performances. The Lincoln Laboratory testing also shows that the QGD can improve the false alarm rate of the two parameter CFAR detector without affecting the probability of detection. 6.3.2 CFAR/NL-OGDs 6.3.2. 1 Lo Based Training and Testing of the NL-QGDs In order to compare the performance of the QGD and NL-QGD based on the same feature values, the same optimal parameters for the QGD, pi and P15, were used for the NL-QGD. Since the QGD features are highly correlated one another it is desirable to whiten the training data set for the adaptation of weights in the NL-QGD training to PAGE 137 128 achieve fast convergence speed. The following matrix indicates the correlation coefficient matrix of the QGD training set. As seen in the matrix, the first column vector shows high correlation among the values of the 1st, 3rd 5th and 7th rows ((gj Â• X) , (gj Â• X^), (gj Â• X) (gj Â• X) Â• X)), the second column exhibits high correlation among the values of the 2nd, 4th, 6th and 7th rows ((gjj Â• X) , Â• X^) , Â• X) \ ( PAGE 138 129 between about the 200th and 1000th epoch and show saturation after about the 1000th epoch. This implies that the outliers of the target class produce very low output values after the training and the L 2 norm may not be a good criterion for training a detector. So the target outliers should be prevented from producing very low output values. Figure 48 shows the ROC plots and outputs of the QGD and NL-QGD after the training. Figure 49 and Figure 50 plot the ROC curves and the outputs of the NL-QGD751 in the testing sets. The performance of the QGD and NL-QGD are compared for different network sizes and the SSE and the number of false alarms are presented for both systems in Table 5 and Table 6. In the training set, the ROCÂ’s of the QGD and the NL-QGD are comparable. In the testing sets, the NL-QGD outperforms that of the QGD with respect to detection probability at most detection rates. The NL-QGD is able to provide a smaller final SSE than the QGD. However, the performance of a detector is not measured in terms of SSE but number of false alarms for a given detection accuracy. Hence, the ROC curve of the two systems must be compared. The comparisons were restricted to one hidden layer networks, and the hidden layer size was increased from 3 to 7 nodes. The NL-QGD did not perform as well as the QGD at the 100% detection rate. The detector output of the NL-QGD tended to provide large misclassifications which affected the selection of the threshold. However, at 99%, 98% and 95% detection probability the NL-QGD outperformed the QGD. Note that changing the number of hidden nodes from 3 to 7 did not seem to affect the performance of the NL-QGD much. PAGE 139 130 Learning and generalization curves Figure 47 Learning and False alarm curves of the NL-QGD751 trained based on L 2 norm. PAGE 140 131 ROC curves Figure 48 Training results of the L 2 normed NL-QGD75 1 . a) ROC curves, b) outputs of the NL-QGD751 for the training set. PAGE 141 132 ROC; NL-QGD751 Figure 49 Testing results of the L 2 normed NL-QGD75 1 for testing set #1 . 1 . a) ROC curves, b) outputs of the NL-QGD751. PAGE 142 133 ROC curves Figure 50 Testing results of the L 2 normed NL-QGD751 for testing set #2. a) ROC curves, b) outputs of the NL-QGD751. PAGE 143 134 Table 5 Detection performance of NL-QGD in training set. Network Topologies Detection Rates SSE Stop iterations 100% 99% 98% 95% 92% NL-QGD731 80 6 3 3 2 0.005039 4156 NL-QGD751 80 6 3 3 2 0.005038 4139 NL-QGD771 80 6 3 3 2 0.005048 4117 The NL-QGD731 indicates that the network has 7 input nodes, 3 nodes in the hidden layer, and 1 output node. For NL-QGD training, the learning rate (t)) and momentum rate (a) were 0.1 and 0.02 respectively for all the networks. Table 6 Detection performance of NL-QGD in testing. Network Topologies Detection Rates (No. of false alarms in testing set #1.2/ No. of false alarms in testing set #2) SSE 100% 99% 98% 95% 92% QGDls 340/ 105/ 85/ 45/ 35/ 0.0204/ 422 132 109 57 44 0.0192 QGDadapt 514/ 115/ 86/ 41/ 32/ 0.0292/ 695 162 122 55 42 0.0273 NL-QGD731 1456/ 96/ 67/ 35/ 30/ 0.006263/ 1892 120 83 46 37 0.005848 NL-QGD751 1453/ 96/ 67/ 35/ 30/ 0.006261/ 1889 120 83 46 37 0.005846 NL-QGD771 1445/ 97/ 67/ 35/ 30/ 0.006262/ 1880 120 83 46 37 0.005847 Note that the testing set #1.2 contains 3355 clutter image chips (the image chips except 550 training and 550 cross validation clutter image chips out of 4455 false alarms by the two-parameter CFAR detector) and 345 target image chips embedded over 127 frames (about 7 kni^) of the Mission 90 Pass 5 SAR data set for testing purpose. PAGE 144 135 The NL-QGD performed better than the QGD in most of detection probabilities. However, at 100% detection its performance is inferior to the QGD. We can observe that the NL-QGD did a much better job in terms of SSE by analyzing the detection outputs (Figure 48b). When it gave the wrong output, the error was much larger than the QGDÂ’s. We can expect this from the nonlinear nature of the detector. In terms of number of false alarms, this behavior affects the performance because the threshold for 100% detection has to be set based on the smallest value obtained from the target image chips. 6. 3. 2. 2 Training and Testine of the NL-OGDs Without Non-Target Outliers In the L 2 normed training, it was observed that the NL-QGDs produced very low output values for the target outliers and high values for the non-target outliers. By removing the non-target outliers during the training, the decision boundaries of the NL-QGDs move toward the target outliers so that the output values of the NL-QGDs for the non-target outliers increase. This leads to a threshold for a high probability of target detection to increase. The training procedure for the NL-QGDs are as follows: first, the NL-QGDs are trained based on L 2 norm and the number of false alarms is computed at each iteration during the training, second, when the number of false alarms in the cross validation set starts increasing, the non-target sample which produced the maximum output is removed from the training set; third, this training procedure proceeds until the number of false alarms does not change for 250 iterations. Figure 5 1 shows the ROC curve and the outputs of the NL-QGDs after the trained was stopped. As expected, the output values from the target outliers were significantly increased with high output values from the non-target outliers. This is because the decision boundary moved toward the target outliers so that the distances between the non-target outliers and the decision boundary became larger. PAGE 145 136 ROC curves Figure 51 Training results of the NL-QGD751 trained without non-target outliers, a) ROC, b) outputs of the NL-QGD751 for the training set. PAGE 146 137 ROC curves Figure 52 Testing results of the NL-QGD751 trained without non-target outliers for the testing set #1 .2. a) ROC curves, b) outputs of the NL-QGD751 . PAGE 147 138 ROC curves Figure 53 Testing results of the NL-QGD751 trained without non-target outliers for the testing set #2. a) ROC curves, b) Outputs of the NL-QGD751. PAGE 148 139 Table 7 summarizes the training results of the NL-QGDs trained on L 2 norm with removal of non-target outliers during training for different network sizes (3, 5 and 7 nodes in the hidden layer). During training, the numbers of non-target outliers from the training set were 8, 9, and 6 respectively. Note that the stop iterations for all network sizes are extremely small compared to the case without removal of non-target outliers. Figure 52 and Figure 53 show the ROC curves and the outputs of the QGD751 from the two testing sets. The minimum target output value is much higher than that from the NL-QGD751 trained on L 2 without removal of non-target outliers. Table 7 Detection performance of the NL-QGDs trained without non-target outliers . (training set) Network Topologies Detection Rates SSE #of target outliers removed Stop iterations 100 % 99% 98% 95% 92% NL-QGD731 13 13 8 4 3 0.00648 8 567 NL-QGD751 14 13 9 6 4 0.00618 9 342 NL-QGD771 14 12 9 5 5 0.00630 6 426 For NL-QGD training, the learning rate (ri) and momentum rate (a) were 0. 196 and 0.02 respectively for all the networks. In the testing sets, some of the non-target outputs are very large due to the boundary shift towards the target outliers. The detection performance for the NL-QGDs in the testing sets are summarized in Table 8. The number of false alarms at 100% detection rate has significantly reduced compared to the L 2 case without removal of non-target outliers for all network sizes (3, 5, 7 nodes). The detection performance at 99% detection rate is slightly better than the L 2 case without removal of non-target outliers. Below 98% detection rate, the performance became worse because the training with removal of non-target outliers led the NL-QGDs to lose their generalization capability. PAGE 149 140 Table 8 Detection performance of the NL-QGD trained without non-target outliers. (testing sets) Network Topologies Detection Rates (No. of false alarms in testing set #1.2/No. of false alarms in testing set #2) SSE 100% 99% 98% 95% 92% NL-QGD731 293/ 103/ 74/ 45/ 35/ 0.01047/ 365 130 96 58 45 0.01030 NL-QGD751 250/ 97/ 77/ 49/ 37/ 0.01768/ 315 125 101 63 49 0.01741 NL-QGD771 246/ 98/ 74/ 48/ 37/ 0.01624/ 311 126 97 61 48 0.01596 63 . 2.3 Lp Based Training and Testing of the NL-QGD As observed in the NL-QGD training based on L 2 , some of target outputs of the NLQGDs are very low. Since the smallest value of the NL-QGD output is set to be thresholded for 100% detection, low target outputs indeed cause considerable amounts of false detections at high detection probability. A way of alleviating large deviations of the NL-QGD outputs from their desired values was to propose to using a larger norm (p > 2) than the L 2 norm. By choosing p = 8 as a large norm, a BP algorithm with momentum was used to train the NL-QGDs. The Lg norm is an appropriately large norm in penalizing large errors. The same set for training and cross validation was used to training the NL-QGDs. The training was stopped when the cost in the cross validation set started increasing. Figure 54 shows the learning and generalization curves, and false alarm curves of the NL-QGD751 in the training and the cross validation sets. As opposed to the case in the L 2 norm based training, the false alarm curves decreased as the training epochs increased PAGE 150 141 with a period of false alarm increment between about the 200 th and 1000 th epoch. The false alarm decrement is somewhat in agreement with a minimizing process of the costing during training. Table 7 illustrates the detection performance of the NL-QGD with 3,5, and 7 nodes at 100%, 99%, 98%, 95%, and 92% detection rates. The Lg normed NL-QGDs yielded higher SSEs than the QGD and the L 2 normed NL-QGDs after training. However, false alarms less occurred for all NL-QGDs with Lg norm than for those of the QGD trained based the least squares. This implies that the large errors were weighted more than in the L 2 case and false alarms were therefore greatly improved. In addition, the SSE criterion is not the best choice from a detection stand point. A threshold for 100% detection was increased, thus avoiding an excessive amounts of false alarms due to a low threshold. The NL-QGDs with a large norm also show more robust performance than the NL-QGD with L 2 and QGD in the ROC curves in Figure 55a, Figure 56a, and Figure 57a. During the training, the effective learning rate at the output layer and hidden layer [74] is modified to rj (n) = je (n) where ff is a learning rate at p = 2 and e{n) is the error at an iteration n at the output layer. When the errors becomes smaller as the iteration number increases, the convergence speed of the NL-QGDs base on a large norm is slow due to the p-2 powered pre-multiplication factor. Training iterations of the NL-QGDs based on a Lg norm are about 10 times larger compared to the QGD for 3, 5, and 7 nodes (Table 7). Table 8 shows the Lg normed NL-QGD testing over the same testing sets as for the QGD and L 2 normed NL-QGD cases. The ROC curves and the outputs of the Lg normed NL-QGDs are plotted in Figure 56 and Figure 57. The large errors for the two classes were reduced due to the larger norm (Lg). The NL-QGDs with 3, 5, and 7 nodes yielded about 200 more false alarms than the QGD at 100% detection but less false alarms below the 100% detection than the QGD. PAGE 151 142 Learning and generalization curves Figure 54 Learning curve and False alarm curve for 100% traget detection of the Lg normed NL-QGD751. PAGE 152 143 ROC curves a) 1.5r NL-QGD751 (dark line) QGD (light line) target outputs (275) -0.5 non-target outputs (550) 100 200 300 400 500 600 700 800 b) Figure 55 Training results of the L 3 normed NL-QGD751. a) ROC, b) outputs of the Lg NL-QGD751 for the training set. PAGE 153 144 ROC curves Figure 56 Testing results of the Lg normed NL-QGD751 with 3, 5, 7 nodes for the testing set #1.2. a) ROC curves, b) outputs of the Lg normed NL-QGD751 . PAGE 154 145 ROC curves ;et outputs (345) -1 non-target outputs (4455) -1.5L _l l_ _I L. 500 1000 1500 2000 2500 3000 3500 4000 4500 Figure 57 Testing results of the NL-QGD751 with 3, 5, 7 nodes for the testing set #2. a) ROC curves, b) Outputs of the NL-QGD751 with 3, 5, 7 nodes. PAGE 155 146 Table 9 Detection performance of Lg normed NL-QGD in training set. Network Topologies Detection Rates SSE Costs** Stop iterations 100 % 99% 98% 95% 92% NL-QGD731 10 10 3 2 2 0.02168 2.550e-5 43144 NL-QGD751 12 10 5 3 1 0.02171 2.559e-5 48634 NL-QGD771 11 10 3 3 2 0.02221 2.590e-5 41819 For NL-QGD training, the learning rate (ri) and momentum rate (a) were 0.9 and 0.7 respectively for all the networks. Table 10 Detection performance of Lg normed NL-QGD in testing. Network Topologies Detection Rates (No. of false alarms in testing set #1 .2/No. of false alarms in testing set #2) SSE Costs 100% 99% 98% 95% 92% NL-QGD731 522/ 95/ 57/ 31/ 25/ 0.019355/ 6.877e-5/ 652 119 73 42 31 0.018434 6.144e-5 NL-QGD751 506/ 91/ 54/ 32/ 22/ 0.018682/ 7.041e-5 632 115 70 43 29 0.017697 6.114e-5 NL-QGD771 512/ 91/ 57/ 31/ 24/ 0.019235/ 6.720e-5/ 640 114 73 42 30 0.018255 5.862e-5 The Lg normed NL-QGDs outperformed the L 2 normed NL-QGDs for all 3, 5, 7 nodes in ROC. The training performances of the Lg normed NL-QGDs are shown to be more robust than the QGD and the L 2 normed NL-QGD cases. The NL-QGD with 3 nodes shows better detection performance than the cases with 5, 7 nodes between about 200 and 300 target detections in Figure 56a and Figure 57a but in the other operating range, the detection performances are comparable for the 3, 5, 7 node cases. PAGE 156 147 6.3.2.4 Training and testing the NL-OGD with mixed Lp norms In order to reduce the effect of non-target outliers ton the input space decision boundary formation, a small norm (p = 1.1) was imposed on the non-target class and a large norm (p = 8) on the target class. Table 1 1 shows the detection performance of the Lj j/Lg normed NL-QGDs with 3, 5, 7 nodes. A BP algorithm with momentum was used to train the Lj |/Lg normed NL-QGDs. The algorithm was only modified at the output layer for its local gradient according to (142). The weight adaptation was performed after presenting all training exemplars and the training was stopped when the cost in the cross validation set started increasing. In Figure 58b, the false alarm curves for both training and cross validation sets decrease as the training iterations increase. The smaller norm (Lj j) imposed on the nontarget class emphasizes small errors and deemphasizes large errors. In Table 1 1, the detection performance of the Lj jLg normed NL-QGD outperformed the QGD at high probabilities of detection rates (100%, 99%, 98%, 95%, 92%) after the training. Table 11 Detection performance of mixed norm NL-QGD in training set. Network Topologies Detection Rates SSE ** Costs Stop iterations 100 % 99% 98% 95% 92% NL-QGD731 12 12 10 5 4 0.00682 0.4393 76775 NL-QGD751 12 12 10 5 4 0.00662 0.4345 82400 NL-QGD771 12 12 10 5 4 0.00664 0.4358 79747 For NL-QGD training, the learning rate (rj) and momentum rate (oc) were 0.0001 and 0.00001 respectively for all the networks. PAGE 157 148 Learning and generalization curves epochs Figure 58 Learning curve and False alarm curve for 100% target detection of the Lj j/Lg normed NL-QGD751. PAGE 158 149 ROC curves Figure 59 Training results of the Lj j/Lg normed NL-QGD751. a) ROC curve, b) outputs of the Lj i/Lg normed NL-QGD751 for the training set. PAGE 159 150 ROC curves Figure 60 Testing results of the Lj j/Lg normed NL-QGD75 1 for the testing set #1.2. a) ROC curves, b) outputs of the L] j/Lg normed NL-QGD75 1 . PAGE 160 151 ROC curves 2 1.5 1 0.5 0 0.5 -1 1.5 NL-QGD7'51 (dark line) ^ ^ ' QGD (light line) (4455) J L 500 1000 1500 2000 2500 3000 3500 4000 4500 Figure 61 Testing results of the Lj j/Lg normed NL-QGD75 1 for the testing set #2. a) ROC curves, b) outputs of the Lj j/Lg normed NL-QGD751. PAGE 161 152 Table 12 Detection performance of mixed norm NL-QGD in testing. Network Topologies Detection Rates (No. of false alarms in testing set #1.2/No. of false alarms in testing set #2) SSE 100% 99% 98% 95% 92% NL-QGD731 232/ 97/ 76/ 40/ 36/ 0.00830/ 292 124 101 63 45 0.00795 NL-QGD751 252/ 96/ 76/ 48/ 36/ 0.00833/ 316 123 101 63 45 0.00800 NL-QGD771 251/ 96/ 76/ 48/ 36/ 0.00833 315 123 101 63 45 0.00799 The performance of the Lj j/Lg normed NL-QGDs in the testing sets is presented for different network sizes in Table 12. At the 98% ~ 100% detection range in the ROC shown in Figure 60a and Figure 61a, the Lj j/Lg normed NL-QGDs improved the false alarm rates compared to that of the QGD and yielded the least false alarm rate at 100% detection among the QGD and the L 2 normed and Lg normed NL-QGDs for the three different network sizes. Consequently, imposing a large norm on the target class moved the decision boundary towards the outliers of the target class so that the Lj j/Lg normed NL-QGDs did not produce large errors for the target outliers. The detection thresholds, therefore, were set to be larger at high probabilities of detection so that more non-target inputs could be discriminated. The small errors from the target inputs were more deemphasized so the detection performance became worse below a high probability (about 92%) of detection rate (Figure 60a and Figure 61a). PAGE 162 153 6.3.2.5 Training and Testing the NL-OGD with Cross Entropy The cross entropy cost function was adapted as a network optimality index to maximize the likelihood of observing the training data set. The NL-QGDs trained based on the cross entropy put more emphasis on smaller errors and large errors than when trained by the Lp (p > 2) norms. The training for the NL-QGDs was stopped at the epoch where the cross entropy function started increasing. In Figure 62b, the false alarm curves for the both training set and cross validation set decrease as the training epochs increase. As for the Lg case, more emphasis was put on large errors than small errors. However the cross entropy function also puts more emphasis on small errors which are deemphasized in a large norm case. This leads to improving the detection performance in low probability detection ranges. The NL-QGDs outperformed the QGD and the NL-QGDs trained Lp norms (Table 13 and Table 14). The ROCs of the NL-QGDs with cross entropy exhibited much more robust detection capability. This is compared in Figure 64a and Figure 65a. Table 13 Detection performance of NL-QGD trained on cross entropy in training set. Network Topologies Detection Rates SSE Costs** Stop iterations 100 % 99% 98% 95% 92% NL-QGD731 11 9 3 1 1 0.00409 25.12 11272 NL-QGD751 10 10 3 1 1 0.00404 24.95 11231 NL-QGD771 10 10 3 1 1 0.00405 24.88 11689 For NL-QGD training, the learning rate (ri) and momentum rate (a) were 0.9 and 0.7 respectively for all the networks. PAGE 163 154 Learning and generalization curves epochs Figure 62 Learning curve and False alarm curve for 100% target detection of the NLQGD751S trained based on the cross entropy function. PAGE 164 155 ROC curves Figure 63 Training results of the NL-QGDs with a eross entropy function, a) ROC curves, b) outputs of NL-QGD751 for the training set. PAGE 165 156 ROC curves Figure 64 Testing results of the NL-QGDs with the cross entropy function for the testing set#1.2. a) ROC curves, b) outputs of the NL-QGD751 trained on the cross entropy. PAGE 166 157 ROC curves NL-QGD751 (dark line) QGD (light line) -1 rton-target outputs (4455) : -1.5L taiget outputs (345) 500 1000 1500 2000 2500 3000 3500 4000 4500 Figure 65 Testing results of the NL-QGDs with the cross entropy function for the testing set#2. a) ROC curves, b) outputs of the NL-QGD751 trained on the cross entropy. PAGE 167 158 Table 14 Detection performance of NL-QGD trained on cross entropy in testing set. Network Topologies Detection Rates (No. of false alarms in testing set #1 .2/No. of false alarms in testing set #2) SSE 100% 99% 98% 95% 92% NL-QGD731 701/ 103/ 77/ 29/ 17/ 0.005356/ 910 130 97 39 20 0.004950 NL-QGD751 820/ 102/ 70/ 27/ 17/ 0.005312/ 1064 129 90 36 20 0.004959 NL-QGD771 885/ 99/ 71/ 27/ 17/ 0.005309/ 1152 125 91 36 20 0.004941 6.3.3 Summary of Detection Performance in CFAR/OGD and CFAR/NL-OGDs The QGD reduced the number (4455) of false alarms caused by the two-parameter CFAR detector to 422 false alarms, thus achieving a discrimination power of about 1:10 (422/4455). The independent test results from Lincoln Laboratory also showed very promising discrimination using the QGD over a large data base of real-life data. The NL-QGDs were trained and tested based on different optimality indices (Lp norm, mixed Lp norm, and cross entropy). The optimality indices influenced the detection performance of the NL-QGDs. The Lp (p = 2) normed NL-QGDs produced smaller SSE than the QGD but yielded large errors for outliers. This lowered a threshold and therefore caused excessive false alarms for high detection probabilities. The mixed norm approach was performed as an effort to reduce false alarm rates at high probabilities of detection by imposing a larger norm (p > 2) on the target class and a smaller norm (p < 2) on the non-target class. Penalizing more errors from the target class than the non-target class improved false alarms at 100% detection rates. In the training, the PAGE 168 159 non-target outliers produced large errors but these large errors were less penalized by a smaller norm. Training the NL-QGDs based on L 2 with removal of non-target outliers also effectively improved the false alarms at 100% detection. Hence these two training methods effectively handled the non-target outliers and their detection performances were superior to the QGD and NL-QGDs trained on the Lg, L 2 , cross entropy criterions. However below 100% detection rate these two training methods (mixed norm (Lj j/Lg) and L 2 norm with removal of non-target outliers during the training) deteriorated the detection performances of the NL-QGDs. A large norm (p > 2) effectively reduced large errors and improved the false alarms compared to the L 2 normed NL-QGDs. At 100% detection, the Lg norm still led the NLQGDs to producing more false alarms than the QGD but improved three times the false alarms by the NL-QGDs trained on the L 2 norm. The Lg normed NL-QGDs showed more robust detection performance below 100% detection rate. Contrary to the Lp norms, the cross entropy function was designed to maximize the likelihood of observing the training data set regardless of the pdf assumption of the training data set. For an infinitely large data set, this indeed leads the network output to producing the a posteriori probability given a class input [5]. Since the outliers for each class have low probabilities (where the NL-QGDs trained on the cross entropy yielded large errors), large false alarms were produced at the range between 100% and 98% detection. However, the detection performance is most robust below 98% detection rate among the optimality indices used. As another figure of merit for detection performance assessment, detection robustness is defined as the area under an ROC curve. The QGDs and the NL-QGDs trained on the L 2 norm with removal of non-target outliers and the mixed norm (Lj j/Lg) showed that their detection performances dropped down rapidly below about 90% detection rate. Table 15 summarizes the detection performance of the QGDs and the NL-QGDs based on the number of false alarms at high probabilities of detection (100%, 99%, 98%, 95%, and 92%). PAGE 169 Table 15: Rank of Detection Performance 160 * * * * * * * * * * * * Â•Jf * PLI u PQ 00 00 00 04 04 04 u u U J J J J J -J m m fOi Â»n CO PAGE 170 161 * * * 00 00 00 -J -J _i J o o o fÂ— ^ r~ in cn m rÂ»n cs r*rr~ rro\ Q Q Q Q Q Q CO a O O O O a Q ^ a . a O O O o 3: j if;}' nJ ^ j if2 j if;}' j j a O' s z s S w z s z s z s z s * * * 00 00 00 d d J J J i o o o J -J J sS) ^iH CO m in cn m o> Q Q Q Q Q Q CO o O O O O O Q , , o O O O o O O j oo' j j j PAGE 171 162 Table 16 compares the detection robustness among the detectors. Note that the detectors marked by the same number of asterisk (Â‘*Â’) in each column have the same detection performances (the same numbers of false alarms). Table 16 Robustness Rank of detection performances of the detectors (QGD and NL-QGDs). Robustness Rank Detectors 1 NL-QGDCE 2 NL-QGDl 8 3 NL-QGDl 2 4 QGDjjjjgpj 5 QGDls 6 NL-QGDol 7 NL-QGDli i/l 8 6.3.4 v GFAR/OGD and tCFAR/NL-OGDs In Section 6.2, the two prescreeners, the two-parameter CFAR and yCFAR detectors, were run over the Mission 90 Pass 5 SAR data set and their detection performances were compared. In Section 6.3.1, the QGD and the NL-QGDs were trained and tested as false alarm reducers in the false alarm reduction stage after the two-parameter CFAR detector. In this section, the QGD and NL-QGDs trained in Section 6.3.1 and Section 6.3.2 are applied to the ROIs selected by the yCFAR detector. Retraining the QGD and the NLQGD is not essential in conjunction with the yCFAR detector because many detection points (after clustering) caused by the yCFAR detector overlap with those by the twoparameter CFAR detector (Figure 38). The detection performances of the QGD and NL-QGDs at high probabilities of detection (100% ~ 92%) are tabulated in Table 17. In general, the NL-QGDs trained on the L 3 PAGE 172 163 and Lj j/Lg norms showed the best detection performances for different sizes of the network (3, 5, 7 nodes) at 100% detection rate. This implies that imposing a large norm on the target class increased a 100% detection threshold so that an excessive amount of false alarms was prevented from being produced due to a low threshold. The NL-QGDs trained with removal of non-target outliers showed the second best detection performance and were followed by the QGD trained with the LS and a gradient descent method in which 334 and 399 false alarms were produced respectively at 100% detection rate. The cross entropy criterion and the L 2 norm led the NL-QGDs to yielding relatively large false alarms at 100% detection rate. The NL-QGDs trained with removal of non-target outliers still showed a good discrimination ability at 99% detection rate while their detection performances were degraded in conjunction with the two-parameter CFAR detector. The QGD^s showed the second best detection performance at 99% detection rate. The Lj j/Lg norm, Lg norm and cross entropy function led the NL-QGDs to producing comparable detection performance with the QGD^dapt at 99% detection rate. The NL-QGDs trained on the L 2 norm produced the most false alarms at 99% detection rate. At 98% detection rate, the NL-QGDsLg norm produced the best detection performance followed by the NL-QGDs^g, NL-QGDsl2Â’ QGDsql, QGDslSÂ’ NL-QGDsli i/Lg, and QGDs adaptBelow 95% detection rate, the NL-QGDS(^Â£ and NL-QGDsÂ£g produced low false alarms compared to the other norms. This is in agree with the case of the QGD and NL-QGDs in conjunction with the twoparameter CFAR detector. 6.3.5 Fast Implementation of yCFAR/QGD and vCFAR/NL-QGDs Currently, the implementation of the yCFAR and QGD requires two different optimal sets of \iffi and pÂ„. The yCFAR and QGD find their optimal and pÂ„ values at different locations in the parameter space for their best target discriminations against clutter. This is because the yCFAR finds its optimal and pÂ„ with a restricted discriminant function PAGE 173 164 while the QGD seeks its optimal and with 8 degrees of freedom in the weights for the discriminant function. Another reason for the dissimilarity in the two optimal sets of and pÂ„ is from the fact that for the 7 CFAR detector the highest pixel intensities of the training image chips were placed at the centers of the chips, thus resulting in a peaky Table 17 Detection performance of the QGD and NL-QGDs in conjunction with the yCFAR detector (yCFAR/QGD and tCFAR/NL-QGDs). Network Topologies Detection Rates SSE 100% 99% 98% 95% 92% QGDls 334 100 94 74 52 0.06070 QGDadapt 399 154 108 87 73 0.04580 NL-QGD731l2 677 239 90 47 38 0.02912 NL-QGD751l2 678 239 90 47 38 0.02911 NL-QGD771l2 677 239 90 47 38 0.02910 NL-QGD731ql 237 98 93 76 57 0.05171 NL-QGD751ql 207 102 92 79 68 0.07480 NL-QGD771ol 204 98 93 78 63 0.07212 NL-QGD731l8 177 147 72 45 24 0.04623 NL-QGD751l8 160 148 71 41 23 0.04586 NL-QGD771l8 167 148 73 43 23 0.04639 NL-QGD731li.i/l8 176 146 96 75 63 0.03937 NL-QGD751lu/l8 176 146 95 75 62 0.03987 NL-QGD771li.i/l8 173 145 95 75 62 0.03977 NL-QGD731ce 451 145 88 28 23 0.02384 NL-QGD751ce 431 152 86 27 22 0.02401 NL-QGD771ce 434 182 86 27 22 0.02404 kernel (m = 1) while for the QGD the center points of the training image chips are centroids of detection points in the image so that the highest pixel intensities are no longer PAGE 174 165 placed at the chip centers. This led kernel (m = 1) of the QGD to stretching out to capture high pixel values under the kernel, thus resulting in to be broader compared to that of the tCFAR detector. Focus of attention for ATD/R requires fast implementation of processing the input data. If the QGD uses the same and pÂ„ values with the ^CFAR detector the computational bandwidth of the QGD can be reduced because 5 features of Â• X) , {g^* X), (8n Â• (Sn Â• (Snt * ^8n * required to be computed. This expedites the operation speed of the focus of attention stage in the multi-stage ATD/R system. However, the usage of the ^CFARÂ’s and pÂ„ will degrade the detection performance of the QGD. This is shown in Figure 39a. With the tCFARÂ’s optimal and pÂ„ (pj index = 22 and pj5 index = 15), the QGD produced 280 false alarms (5 times the number of false alarms with the QGDÂ’s optimal pj and pj5 values). A way of obtaining a reasonable detection performance of the QGD at the yCFARÂ’s optimal P] and pj5 is to train the QGD with training image chips in which the highest intensities are placed at the chip centers. This is because the QGD may capture its discrimination ability with a peakier kernel in the parameter space. Figure 66 shows the QGD false alarm surface with the same training image chips used in Figure 39a but the highest pixel intensities at the chip centers. The minimum number of false alarms was 38 and occurred at pi index 17 and P15 index 14 in the parameter space. In this false alarm surface, the number of false alarms was 76 at the tCFARÂ’s optimal pj and pj5. From this training set, the QGD produced 3.5 times less false alarms than the QGD with the training set used in Figure 39a. In conclusion, the same usage of pj and P15 for both the tCFAR detector and the QGD expedites the computation speed in the focus of attention stage and does not require an exhaustive search of the parameter space so that an optimal weight set of the QGD can be obtained by the LS solution given a predetermined set of pj and P15 values. This is indeed a big advantage in the QGD training time and in a real-time operation of the multistage ATD/R system. PAGE 175 166 Figure 66 False alarm surface versus iij and }ij 5 . Note that in this training set the image chips have the highest pixel intensities at the chip centers. PAGE 176 CHAPTER 7 CONCLUSIONS AND FURTURE RESEARCH 7.1 Summary In this study, a novel approach to the design of a focus of attention stage was proposed based on gamma kernels for a multi-stage ATD/R system. The focus of attention stage is composed of two substages; a front-end detection stage and a false alarm reduction stage. The front-end detection stage employs a prescreener to nominate potential target locations in the image. Here we discussed a common prescreener, the two-parameter CFAR detector and extended it to the ^yCFAR detector. The two-parameter CFAR detector measures a target mean by a target masking kernel and clutter statistics of mean and variance by clutter masking kernel in the CFAR stencil. The yCFAR detector incorporates a set of 2-D gamma kernels into the yCFAR stencil which relaxes a constraint of the fixed size of the CFAR stencil. A first order gamma kernel and a higher order gamma kernel (n > 1 ) constitutes the yCFAR stencil in which a target mean is computed by the first order gamma kernel and clutter statistics of mean and variance are estimated by the higher order gamma kernel. The 2-D gamma kernels were extended by a circularly symmetric rotation from their 1-D counterparts. The 2-D gamma kernels preserve the 1-D characteristic of memory depth in the 2-D domain. The memory depth is controlled by the kernel order n and the parameter p which determine the shape and the scale of the yCFAR stencil. Thus the yCFAR stencil can be adaptively set to better optimize a figure of merit (false alarm rates, detector output errors, etc.). The two-parameter CFAR detector and the yCFAR detector were interpreted in a sig167 PAGE 177 168 nal processing perspective. In the CFAR stencil, a local region of image is decomposed by two local projection operators (a target masking kernel and clutter masking kernel), which may be suboptimal due to a fixed size of window. However, the yCFAR stencil locally projects a region of image under analysis onto a gamma basis by two gamma kernels (one having n = 1 and the order having an order (n> 1)). The parameter p provides a degree of freedom to rotate the gamma basis so that a figure of merit could be better optimized. From the detection performance comparison in the mission 90 pass 5 SAR data set, the yCFAR detector outperformed the two-parameter CFAR detector in each frame and yielded 760 false alarms while the two-parameter CFAR detector produced 4455 false alarms (1:6 ratio). Overall, the ROC of the yCFAR detector exhibited more robust detection performance than the two-parameter CFAR detector. After prescreening the entire image in the front-end detection stage, a more sophisticate algorithm is applied to the locations nominated by the prescreener to further discriminate objects of interest against clutter (false positive), in the false alarm reduction. The two-parameter CFAR detector was interpreted in terms of pattern recognition. The two-parameter CFAR detector implements a restricted linear discriminant function of quadratic terms of image intensities. From this perspective, the two-parameter CFAR detector could be improved: ( 1 ) it uses only some of quadratic terms of image intensity on a pixel and its surrounds; ( 2 ) it implements a fixed parametric combination of these features; (3) there is little flexibility in the feature extraction because the target masking and the clutter masking in the stencil are ad-hoc. The QGD, an extended form of the two-parameter CFAR detector, improved these three aspects by exploiting all quadratic and linear terms of two image intensities (target mean intensity and clutter mean intensity) extracted in the 7 CFAR stencil and then by constructing a linear discriminant function based on the features. The optimal weights are computed in a closed form (LS solution) through an exhaustive search of the parameter space but can be adaptively found in the parameter and weight space in a iterative manner PAGE 178 169 by a gradient descent method. Being compared to the two-parameter CFAR detector, the QGD generalizes it with respect to: ( 1 ) the shape of the kernels used for estimating the mean and variance; ( 2 ) the selection of the weights of the decision function which is not a priori but are found through optimization; (3) utilizing all quadratic and linear terms of the two intensity features (target mean and clutter mean). So our conclusion is that not only the "yCFAR detector but also the QGD can be viable alternatives to the two-parameter CFAR detector: The yCFAR detector can replace the two-parameter CFAR detector as a prescreener. Notice that the raw features that the QGD requires are a superset of the ones used for the yCFAR detector. So with one preprocessor (QGD), we can implement both a detector and a false alarm reducer (or a discriminator). The NL-QGD extended the linear structure of the QGD into a nonlinear structure as a MLP. The NL-QGD has a potential to improve the detection performance since the MLP has capable of creating arbitrary discriminant functions. Both the QGD and the NL-QGD were tested and compared based on detection performance in ROC. The QGD reduced 4455 false alarms triggered by the two-parameter CFAR detector to 422 false alarms at 100% detection rate, achieving a discrimination power (1:10 ratio). The NL-QGDs trained on L 2 norm outperformed the QGD in most range of ROC for different network sizes (3, 5, 7 nodes) but produced excessive false alarms at 100% detection rate due to the nature of a nonlinear system. Sinee the NL-QGD is designated to operate on high probabilities of detection, the effort has been made on improving the false alarms at high probability rates. By incorporating a bigber norm (p > 2) into the NL-QGD cost function, the NL-QGD trained on L 3 produced 665, 632, and 640 false alarms at 100% detection rate. These false alarms still outnumber the false alarms caused by the QGD but the L 3 normed NL-QGDs greatly improved the performance below 100% detection rate. The detection performance of the NL-QGD with mixed norms was improved at 100% detection rate (292, 316, 315 false PAGE 179 170 alarms for 3, 5, 7 nodes) and became inferior but comparable to the QGD below 98% detection rate. The cross entropy function was finally used as an optimal index for the NLQGD. The NL-QGDs trained on the cross entropy function showed more robust performance compared to the QGD, the L 2 normed NL-QGDs, the Lg normed NL-QGDs, and the NL-QGD with the mixed norms (Lj 1 for non-target class and Lg for target class). With about 30% loss of targets, they were able to completely reject false alarms. 7.2 Future works The yCFAR stencil relaxed the constraint of a fixed size of the CFAR stencil. Even the 7 CFAR stencil can be adaptively set to optimize a figure of merit the kernel orders were a priori determined to cover a full scattering range of known targets. However, multi-scale objection detection may face the range uncertainties of different targets and the differences between the size shapes at target types projected over all aspects and elevations. In order to solve the expected problems, it may be necessary to utilize a full set of gamma kernels to cover the ranges at possible situations. Potential targets may be declared by the highest scores measured over the entire set of gamma kernels. The use of the full gamma kernel size for image analysis is left for further research. Also, the determination of the optimal order to estimate the background mean and variance has not been addressed yet, since there is a quasi-symmetry between a change in order and a change in |i. The projection on the circularly symmetric 2-D gamma kernels are complete so that a input image can be recovered from the projection. Extending 1-D gamma kernels to 2-D gamma kernels on which the projection could be complete will allow for image reconstruction from a projection. The image analysis can be accomplished by the 2-D gamma kernels which can be pdfs. PAGE 180 APPENDIX POLARIZATION BASIS TRANSFORMATION A. 1 Representation of a Plane Wave Polarization A plane wave E, propagating in the +z-direction can be express as E = {E^u^ + EyUy)e-'^^ (153) In (153), both Ex and Ey are complex and can be written for a timevarying plane wave as Â£, = Â£Â„e''Â“, (154) where e is the ralative phase difference between Ex and Ey, both of which are travelling in the -i-z-direction. The time-varying plane wave IE is expressed as IE = Re{E} = Eg^cos (wt-kz)Ux + E^yCos (wtkz-e)Uy (155) = W,u^ + Wyiiy where W-x = Eoj^cos{wt kz) (156) ^y = E^yCOs{wt-kz-e) . (157) Expanding the expression for Ey into W-y/Egy cos {wt kz) cos {z) s\n{wt kz) sin (e) and combining it with yields IE IE -^ 7 ^ cos (e) = -sin (wt A:z) sin (e) (158) 171 PAGE 181 172 It follows from ( 1 56) that so (158) leads to sin (wt-kz) = y^oxj 1/2 ^-^cos(e) \ oy ox 2 Â— 1 J sin (e) By rearranging the above expression, we have fW. fW. (W. \ ^ _2 _y_ K^oy) y^ox) \^oy) fW. \ K^oxJ cos (e) = (sin(e)) (159) This is the equation of an ellipse making an anglea with the {u^, Â«^)-coordinate system such that tan (2a) ^^ox^oy^^^ (e) (160) and the wave (153) is then said to be elliptically polarized. The ellipse of (159) is graphically described in Figure 67. Figure 67 Graphical description of an elliptically polarized wave. PAGE 182 173 A particular wave can be stated in terms of its specific state of polarization. The polarization state includes linear polarization, and rightor left-circular polarization. So the condition of elliptic polarization corresponds to a polarization state. The polarization state is defined by the following parameters Â• Linear polarization e = 2nm where m = 0, +1,Â±2, ... E = E^^cos{wt-kz)Ux + E^yCOs(wt-kz)Uy Right-cireular polarization n E = -+ 2nm where m = 0, +1, Â±2, ... (161) ^ox ~ ^oy ~ E = Eg{cos {wt kz) u^+ sin (wt kz) Uy) (162) Â• Left-cireular polarization 71 e = ~ + 2nm where m = 0, Â±1,Â±2, ... E = E = E OX oy o E E^ (cos (wt kz) u^sin (wt kz) Uy) (163) A linearly polarized wave can be synthesized from two oppositely polarized circular waves of equal amplitude. In particular, if we add the right-circular wave of (162) to the left-circular wave of (163), we get E = 2 E ^cos {wt Â— kz) Ujy (164) which has a constant amplitude vector of 2EJi^ and is therefore linearly polarized. A. 2 Circular-to-Linear and Linear-to-Circular Polarization Basis Transformations If we use the xand y-axis as a linear polarization basis for the horizontally (H) and vertically (V) time-varying components of a plane wave respectively and write the equation of a electric field {E) propagating in the -f-z-direction at the transimt antenna, we get PAGE 183 174 the following expression by [51] Uh hXr U y ( 165 ) where is the intrinsic imepdence of free space, I the current into the transmit antenna terminals, X the wave length of the transmitted electric field, and r is a distance from the transmit antenna. If it is assumed that the fully polarimetric scattering matrix (5 ) of a target for a linear polarized transmit/receive antenna configuration is known, then we can write the reflected wave at the receive antenna as F'' 1 ^HV eÂ‘ FÂ’' J^r SvH Syv FÂ‘ \^v\ (166) where = ^HH ^HV SvH ^VV (167) The reflective wave at the receive antenna can also be written with the fully polarimet(C) ric circular scattering matrix (S ) of the target in left and right circular component form which which leads to the following expression F'' 1 ^RR ^RL E' F'' J^r ^LR Sll eÂ‘ (168) where ^(C) ^ ^RR ^RL ^LR ^LL (169) The electric field {E) can be expressed in the circular polarization basis in right and left circular component form as PAGE 184 175 E = Ef^u fj + Eyii = (170) where Ej^ and indicate the complex right and left circular field components respectively. According to(161),(l 62) and ( 1 62), the circular basis ( d)^ and ) are related to the linear basis by 6 ^ = Uff+jUy or Â’ 1 -j 1 j . (171) (172) By substituting (171) into (170), The rightand left-circular components of the wave are expressed in terms of the horizontal and vertical components Er 2^^ H Â’^JEy) 1 (EÂ„-jE0 (173) We substitute (173) into (168) for both the transmitted and received wave but due to the direction change in the propagation and the right-handed coordinate system after reflection against the target, we have the coordinate system (-Â« 2 > '''ilh signs changed accordingly. So we replace Ey by -Ey for the reflective wave in (168). E^h-JE'v 1 ^RR ^RL EÂ’h+JE'v E'H+jE'y 1 ^LR Sllj E'^-jE'y Rearranging (174) yields the following equation (174) Â— F'^ 1 \^V] J4nr 2 ^ ^^RR ^RL ~ ^LR ~ ^Ll) 2 ^~^RR ^RL ^ LR ~ ^Ll) (175) PAGE 185 176 By equating (175) and (166), we obtain the following relationship between the linear and circular scattering matrix elements of the target. ^HH 2 ^RL ^rO ^HV ~ 2^ ~ ^RL ^LR ~ ^ Ll) (176) ~ 2^ ^^RR ^RL ~ ^LR ~ Syv 2 ^~^RR ^RL ^LR ~ ^lD ^RR ~ 2 ~j^HV ~j^VH ~ ^RL ~ 2 J (177) ^LR ~ 2 ^ ^LL = \ ~ Â“^vv) PAGE 186 REFERENCES [1] Baum, E. BÂ„ and Haussler, D., Â“What size net gives valid generalization?,Â” Neural Computation 1 (1), pp. 151-160, 1989. [2] Baum, E. B., and Wilczek, F., Â“Supervised learning of probability distributions by neural networks. In Anderson, D. Z. (Ed.), Neural Information Processing Systems, pp. 52-61, 1988. [3] Bemardon, A. M., and Garrick, J. E., Â“A neural system for automatic target learning and recognition applied to bare and camouflaged SAR targets,Â” Neural Networks, vol. 8, no. 7/8, pp. 1103-1108, 1995. [4] Bhanu, B., Â“Automatic target recognition: state of the art survey,Â” IEEE Transactions on Aerospace and Electronic Systems, AES-22, no. 4, pp. 364-379, 1986. [5] Bishop, C. M., Neural Networks for Pattern Recognition. Oxford University Press Inc., New York, 1995. [6] Bom, M., and Wolf, E., Principles of Optics, Pergamon Press, New York, 1959. [7] Burrascano, P., Â“A norm selection criterion for the generalized delta rule,Â” IEEE Transactions on Neural Networks, vol. 2, no. 1, 1991. [8] Duda, R. O., and Hart, P. E., Pattern Classification and Scene Analysis, John Wiley & Sons, 1973. [9] Burl, M. C., Owirka, G. J., and Novak, L. M., Â“Texture discrimination in synthetic aperture radar imagery,Â” 23rd Asilomar Conf on Signals, Systems, and Computers, Pacific Cove, CA, 30 Oct.-l Nov., pp. 399-404, 1989. [10] Burl, M. C., and Novak, L. M., Â“Polarimetric segmentation of SAR imagery,Â” Proceedings of the SPIE, Automatic Object Recognition, vol. 1417, pp. 92-1 15, 1991. [1 1] Casasent, D. P., and Neiberg, L. M., Â“Classifier and shift-invariant automatic target recognition neural networks,Â” Neural Networks, vol. 8, no. 7/8, pp. 1117-1 129, 1995. [12] Cathcart, J. M., Doll, T. J., and Schmieder, D. E., Â“Target detection in urban clutter,Â” IEEE Transactions on Systems, Man, and Cybernetics, vol. 19, no. 5, pp. 1242-1250, 1989. 177 PAGE 187 178 [13] Celebi, S. and Principe, J., Â“Parametric signal representation using gamma bases,Â” IEEE Trans, on Signal Processing, vol. 43, no. 3, pp. 781-784, 1995. [14] Chaney, R. D., Burl, M. C., and Novak, L. M., Â“On the performance of polarimetric target detection algorithms,Â” Record of the 1990 IEEE International Radar Conference, pp. 520-525, Arlington, VA, 1990. [15] Chang, C., Siu, S., and Wei, C., Â“On the convergence of the Lp norm algorithm for polynomial perceptron having different error signal distributions,Â” Journal of the Chinese Institute of Engineers, vol. 18, no. 2, pp. 293-302, 1995. [16] Curlander, J. C., and McDonough R. N., Synthetic Aperture Radar Systems and Signal Processing. John Wiley & Sons Inc., New York, 1991. [17] Currie, A., Â“Synthetic aperture radar,Â” Electronics & Communication Engineering Journal, Aug., pp. 159-170, 1991. [18] Daniell, C. D., Kemsley, D. H., Lincoln, W. P., Tackett, W. A., and Baraghimian, G. A., Â“Artificial neural networks for automatic target recognition,Â” Optimal Engineering, vol. 31, no., 12, pp. 2521-2531, 1992. [19] de Vries B. and Principe, J., Â“The gamma model-A new neural model for temporal processing,Â” Neural Networks, vol. 5, pp. 565-576, 1992. [20] de Vries B., Â“Temporal processing with neural networks: The development of the gamma model,Â” Ph.D., dissertation. University of Florida, 1991. [21] Delanoy, R. L., Verly J. G., and Dudgeon D. E., Â“Automatic building and supervised discrimination learning of appearance models of 3-D objects,Â” Proceedings of the SPIE, vol. 1708, pp. 549-560, 1992. [22] Duda, Richard O., Pattern Classification and Scene Analysis, John Wiley & sons. New York, 1973. [23] Dudgeon, D. E., and Lacoss, R. T., Â“An overview of automatic target recognition,Â” Line. Lab. J., vol. 6, no. 1, pp. 3-10, 1993. [24] Fitch, J. R, Synthetic Aperture Radar. SpringerVerlag, New York, 1988. [25] Fogler, R. J., Hostetler, L. D., and Hush, D. R., Â“SAR clutter suppression using probability density skewness,Â” IEEE Transactions on Aerospace and Electronic Systems, vol. 30, no. 2, pp. 622-626, 1994. [26] Forman, A. V, Jr., Â“Systolic Implementation of Viewpoint Invariant Automatic Target Recognition,Â” Ph.D. dissertation. University of Florida, 1992. [27] Fukunaga, K. Introduction to Statistical Pattern Recognition Boston. Academic Press, New York, 1990. PAGE 188 179 [28] Geman, S. Bienenstock, E., and Doursat, R., Â“Neural Networks and the bias/variance dilemma,Â” Neural Computation, 4 (1), pp. 1-58, 1992. [29] Goldstein, G. B., Â“False-alarm regulation in log-normal and Weibull clutter,Â” IEEE Trans. Aerospace Electron. Syst., vol. AES-9, no. 1, pp. 84-92, Jan. 1973. [30] Golub, G. H. and Van Loan, C. F., Matrix Computations. Johns Hopkins, Baltimore, 1983. [31] Hanson, S. J., and Burr, D. J., Â“Minkowski-r back-propagation: learning in connectionist models with non-Euclidean error signals,Â” In D. Anderson (Ed.) Neural Information Processing Systems, pp. 348-357, 1988. [32] Haykin, S., Adaptive Filter Theory, 2nd ed. Prentice-Hall, Englewood Cliffs, NJ, 1991. [33] Haykin, S., Neural Networks, Macmillan College Publishing Co., New York, 1994. [34] Hecht, E., Optics, Addison-Wesley Publishing Co., Reading, MA, 1987. [35] Hecht-Nielsen, R., Neurocomputing, Addison-Wesley Publishing Co., 1990. [36] Hinton, G. E., Â“Connectionist learning procedures,Â” Artificial Intellegence, vol. 40, pp. 185-234, 1989. [37] Honik, K., Stinchcombe, M., and White, H., Â“Multilayer feedforward networks are universal approximators,Â” Neural Networks, vol. 2, no. 5, pp. 359-366, 1989. [38] Honik, K., Â“Approximation capabilities of multilayer feedforward networks,Â” Neural Networks, vol. 4, no. 2, pp. 251-257, 1991. [39] Hopfield, J. J., Â“Learning algorithms and probability distributions in feed-forward and feed-back networks,Â” Proceedings of the National Academy of Sciences, vol. 84, pp. 8429-8433, 1987. [40] Kim, M, Fisher, J, and Principe, J. C., Â“A new CFAR stencil for target detections in Synthetic Aperture Radar (SAR) imagery,Â” Proceedings of the SPIE The International Society for Optical Engineering, {Algorithms for Synthetic Aperture Radar Imagery III), vol. 2757, pp. 432-442, April, 1996. [41] Kim, M. and Principe, J., Â“Artificial neural networks with gamma kernels for automatic target detection,Â” Proceedings of the International Conference on Neural Networks, Washington, DC, vol. 3, pp. 1594-1599, 1996. [42] Koch, M. W., Moya, M. M., Hostetler, L. D., and Fogler, R. J., Â“Cueing, Feature Discovery, and One-class learning for synthetic aperture radar automatic target recognition,Â” Neural Networks, vol. 8, no. 7/8, pp. 1081-1 102, 1995. PAGE 189 180 [43] Konstantinides, K. and Rasure, J. R., Â“The Khoros software development environment for image and signal processing Â” IEEE Transactions on Image Processing, vol. 3, no. 3, pp. 243-252, 1994. [44] Kraus, J. D., Antennas. McGraw-Hill Book Co., New York, 1950. [45] Kreithen, D. E., Halversen, S. D., and Owirka, G. J., Â“Discriminating targets from clutter,Â” Line. Lab. J., vol. 6, no. 1, pp. 25-52, 1993. [46] Levanon, N., Radar Principles. John Wiley & Sons Inc., New York, 1988. [47] Liu, J., and Chang, K., Â“Feature-based target recognition with a Bayesian network,Â” Optical Engineering, vol. 30, no. 3, pp. 701-707, 1996. [48] Lim, J. S., Two-dimensional Signal and Image Processing. Prentice-Hall, Englewood Cliffs, NJ, 1990. [49] Miller, K. S., Complex Stochastic Processes, Addison-Wesley Publishing Co., Reading, MA, 1974. [50] Money, A. H., Affleck-Graves, J. F., Hart, M. L., and Barr, G. D. 1., Â“The Linear Regression Model: Lp norm estimation and the choice of p,Â” Communications in Statistics: Simulation and Computation, vol. 1 1, no. 1, pp. 89-109, 1982. [51] Mott, H., Polarization in Antennas and Radar. John Wiley & Sons Inc., New York, 1986. [52] Novak, L. M., Sechtin, M. B., and Cardullo, M. J., "Studies of target detection algorithms that use polarimetric radar data,Â” IEEE Trans. Aerospace Electron. Syst., vol. AES-25, no. 2, pp. 150-165, Mar. 1989. [53] Novak, L. M., and Burl, M. C., Â“Optimal speckle reduction in polarimetric SAR imagery,Â” IEEE Trans. Aerospace Electron. Syst., vol. AES-26, no. 2, pp. 293-305, Mar. 1990. [54] Novak, L. M., Burl, M. C., Chaney R., and Owirka, G. J., Â“Optimal processing of polarimetric synthetic aperture radar imagery,Â” Line. Lab. J., vol. 3, no. 2, pp. 273290, 1990. [55] Novak, L. M., and Netishen, C. M., Â“Polarimetric synthetic aperture radar imaging,Â” International Journal of Imaging Systems and Technology, vol. 4, pp. 306-318, 1992. [56] Novak, L. M., Burl, M. C. and Irving, W. W., Â“Optimal polarimetric processing for enhanced target detection,Â” IEEE Trans. Aerospace Electron. Syst., vol. AES-29, no. l,pp. 234-244, Jan. 1993. [57] Novak, L. M., Owirka, G. J., and Netishen, C. M., Â“Performance of a high-resolution polarimetric SAR automatic target recognition system,Â” Line. Lab. J., vol. 6, no. 1, pp. PAGE 190 181 11-24, 1993. [58] Novak, L. M., Halversen, S. D., Owirka, G. J., and Hiett, M, Â“Effects of polarization and resolution on the performance of a SAR automatic target recognition,Â” Line. Lab. J., vol. 8, no. 1, pp. 49-65, 1995. [59] Oppenheim, A., and Schafer, R., Digital Signal Processing. Prentice-Hall, Englewood Cliffs, 1975. [60] Owirka, G. J., and Noval, L. M., Â“A new SAR ATR algorithm suit,Â” Proceedings of the SPIE The International Society for Optical Engineering, (Algorithms for Synthetic Aperture Radar), vol. 2230, pp. 336-343, 1994. [61] Pao, Yoh-Han, Adaptive Pattern Recognition and Neural Networks. Addison-Wesley Publishing Co., Reading, MA, 1989. [62] Pei, S., and Tseng, C., Â“Least Mean P-Power Error Criterion for Adaptive FIR Filter,Â” IEEE Transactions on Selected Areas in Communications, vol. 12, no. 9, pp. 15401547, 1994. [63] Principe, J. C., A. Radisavljevic, M. Kim, J. Fisher 111, M. Hiett, and L. M. Novak, Â“Target Prescreening Based on 2D Gamma Kernels,Â” Proceedings of the SPIE The International Society for Optical Engineering, (Algorithms for Synthetic Aperture Radar Imagery II), vol. 2487, pp. 251-258, FL, USA, April, 1995. [64] Principe, J. C., deVries, B., and de Oliveira, P. G., Â“The gamma filter-A new class of adaptive HR filters with restrictive feedback,Â” IEEE Transactions on Signal Processing, vol. 41, no. 2, pp. 649-656, 1993. [65] Principe J.C., Kuo J.M., and Celebi S., Â“An analysis of the gamma memory in dynamic neural networks,Â” IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 331-337, 1994. [66] Rosenfeld, A., and Thurston, M., Â“Edge and curve detection for visual scene analysis,Â” IEEE Transactions on Computers, vol. C-20, pp. 562-569, 1971. [67] Roth, M. W., Â“Survey of neural network technology for automatic target recognition,Â” IEEE Transactions Neural Networks, vol. 1, no. 1, pp. 28-43, Mar 1990. [68] Rumelhart, D. E., Hinton, G. E., and Williams, Â“Learning internal representations by error propagation,Â” In Parallel Distributed Processing, vol. 1, Chapter8, MIT press, 1986. [69] Rumelhart, D. E., Durbin, R., Golden, R., and Chauvin, Y, Â“Backpropagation: the basic theory,Â” In Chauvin, Y, and Rumelhart, D. E. (Eds.), Backpropagation: Theory, Architecture, and Applications, pp. 1-34, Lawrence Erlbaum, Hillsdale, NJ, 1995. [70] Ruzinsky, S. A., and Olsen, E., Â“Lj and minimization via a variant of KarmarkarÂ’s PAGE 191 182 algorithm,Â” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 2, pp. 245-253, 1989. [71] Schachter, Â“A survey and evaluation of FLIR target detection/segmentation algorithms,Â” Proceedings of Image Understanding Workshop, pp. 49-57, 1982. [72] Schmieder, D. E., and Weathersby, M. R., Â“Detection performance in clutter with variable resolution,Â” IEEE Transactions on Aerospace and Electronic Systems, vol. AES-19, no. 4, pp. 622-630, 1983. [73] Schmieder, D. E., and Weathersby, M. R., Â“Developing texture-based image clutter measures for object detection,Â” Optimal Engineering, vol. 31, no. 12, pp. 2628-2639, 1992. [74] Schroeder, J., Yarlagadda, R., and Hershey, J., Â“Lp normed minimization with applications to linear predictive modeling for sinusoidal frequency estimation,Â” Signal Processing, vol. 24, pp. 193-216, 1991. [75] Schroeder, J., and Endsley, J., Â“Lp normed spectral estimation residual analysis,Â” 1988 22nd Asilomar Conference on Signal, Systems and Computers, pp. 848-852, 1988. [76] Shirvaikar, M. V., and Trivedi, M. M., Â“A neural network filter to detect small targets in high clutter backgrounds,Â” IEEE Transactions on Neural Networks, vol. 6, no. 1, pp. 252-257, 1995. [77] Siu, S., Chang, C. H., and Wei, C. H., norm back propagation algorithm for adaptive equalization,Â” IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, vol. 42, no. 9, pp. 604-607, 1995. [78] Siu, S., and Cowan, C. F. N., Â“Performance analysis of the /p norm back propagation algorithm for adaptive equalisation,Â” lEE Processing-F, vol. 140, no. 1, pp.43-47, 1993. [79] Solla, S. A., Levin, E., and Fleisher, M., Â“Accelerated learning in layered neural networks,Â” Complex Systems, vol. 2, pp. 625-640, 1988. [80] Stone, M., Â“Cross-validatory choice and assessment of statistical predictions,Â” Journal of the Royal Statistical Society, vol. B 36, no. 1, PP1 1 1-147, 1974. [81] Tou, J. T., and Gonzalez, R. C., Pattern Recognition Principles. Addison-Wesley Publishing Co., Reading, MA, 1974. [82] Ukrainec A. M., and Haykin, S, Â“A modular neural network for enhancement of cross-polar radar targets,Â” Acwra/ vol. 9, no. l,pp143-168, 1996. [83] Verbout, S. M., Irving, W. W., and Hanes, A. S., Â“Improving a template-based classifier in a SAR automatic target recognition system by using 3-D target information,Â” PAGE 192 183 Line. Lab. J., vol. 6, no. 1, pp. 53-76, 1993. [84] Verly, J. G., Delanoy, R. L., and Dudgeon, D. E., Â“Model-based system for automatic target recognition from forward-looking laser-radar imagery,Â” Optical Engineering, vol. 31, no. 12, pp. 2540-2552, 1992. [85] Verly, J. G., Delanoy, R. L., Lazott, C. H., and Dudgeon, D. E., Â“A Model-based Automatic target recognition system for Synthetic Aperture Radar Imagery,Â” 2nd Automatic Target Recognizer Systems and Technology Conference, pp. 51-80, 1992. [86] Verly, J. G., and Dudgeon, D. E., Â“Model-based automatic target recognition system for the UGV/RSTA radar,Â” Image Understanding Workshop, pp. 559-583, Nov., 1994. [87] Waxman, A. M., Seibert, M. C., Gove, A., Fay, D. A., A. M., Lazott, C., Steele, W. R., and Cunningham, R. K., Â“Neural Processing of Targets in Visible, Multispectral IR and SAR Imagery,Â” Neural Networks, vol. 8, no. 7/8, pp. 1029-1051, 1995. [88] Waxman, A. M., Lazott, C., Fay, D. A., Gove, A., and Steele, W. R., Â“Neural Processing of SAR Imagery for enhanced target detection,Â” Proceedings of the SPIE, Algorithms for Synthetic Aperture Radar Imageiy Jl, vol. 2487, pp. 201-210, 1995. [89] Widrow, B, and Stearns, S. D., Adaptive Signal Processing. Prentice-Hall Inc., Englewood Cliffs, NJ, 1985. [90] Yarlagadda, R., Bednar, J. B., and Watt, T. L., Â“Fast algorithms for Ip deconvolution,Â” IEEE Transactions on Acoustic, Speech, and Signal Processing, vol. ASSP-33, no. 1, pp. 174-182, 1985. PAGE 193 BIOGRAPHICAL SKETCH Munchurl Kim was born in Korea on January 25th, 1966. He received his Bachelor of Engineering degree in electronics from the KyungPook National University, Korea, in 1989. After graduation he joined the graduate program in electrical and computer engineering at the University of Florida in which he received a Master of Engineering degree in 1992 and a Ph.D. in 1996. He plans to continue conducting research in the areas of automatic target detection/recognition, diginal signal and image processing, pattern recognition, and artificial neural networks. 184 PAGE 194 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in score and quality, as a dissertation for the degree of Doctor of Philosophy. incipe. Chairman f^sor ofElectrical and Computer Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in score and quality, as a dissertation for the degree of Doctor of Philosophy. Assistant Professor of Electrical and Computer Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in score and quality, as a dissertation for the degree of Doctor of Philosophy. Assistant Professor of Electrical and Computer Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in score and quality, as a dissertation for the degree of Doctor of Philosophy. Jian Li ^ Assistant PjjOTessor of Electrical and Computer Engineering PAGE 195 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in score and quality, as a dissertation for the degree of Doctor of Philosophy. Joseph N. Wilson Assistant Professor of Computer and Information Science and Engineering This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. August 1996 Winfred M. Phillips Dean, College of Engineering Karen A. Holbrook Dean, Graduate School |