Citation |

- Permanent Link:
- http://ufdc.ufl.edu/AA00024501/00001
## Material Information- Title:
- Energy, entropy and information potential for neural computation
- Creator:
- Xu, Dongxin, 1963-
- Publication Date:
- 1999
- Language:
- English
- Physical Description:
- ix, 197 leaves : ill. ; 29 cm.
## Subjects- Subjects / Keywords:
- Datasets ( jstor )
Eigenvalues ( jstor ) Energy ( jstor ) Entropy ( jstor ) Estimation methods ( jstor ) Learning modalities ( jstor ) Neural networks ( jstor ) Scalars ( jstor ) Shannon entropy ( jstor ) Signals ( jstor ) - Genre:
- bibliography ( marcgt )
non-fiction ( marcgt )
## Notes- Thesis:
- Thesis (Ph. D.)--University of Florida, 1999.
- Bibliography:
- Includes bibliographical references (leaves 188-196).
- General Note:
- Typescript.
- General Note:
- Vita.
- Statement of Responsibility:
- by Dongxin Xu.
## Record Information- Source Institution:
- University of Florida
- Rights Management:
- The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. Â§107) for non-profit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
- Resource Identifier:
- 41901125 ( OCLC )
ocm41901125
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

ENERGY, ENTROPY AND INFORMATION POTENTIAL FOR NEURAL COMPUTATION By DONGXIN XU A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1999 To My Parents ACKNOWLEDGEMENTS This Chinese poem exactly expresses my feeling and experience in four years' Ph.D study. During this period, there have been difficulties encountered both in the course of my research and in my daily life. Just as the poem says, there are always hopes in spite of difficulties. Retrospecting the past, I would like to express my gratitude to individuals who brought me hope and light which guided me go through the darkness. First, I would like to thank my advisor, Dr. Jos6 Principe, for providing me with the wonderful opportunity to be a Ph.D student in CNEL. Its excellent environment helped me a lot when I just came here. I was impressed by Dr. Principe's active thought and appreci- ated very much his style of supervision which give a lot of space to students to explore on their own. I am grateful for his introducing me to the area of the information-theoretic learning and the guidance throughout the development of this dissertation. I would also like to thank my committee members Dr. John Harris, Dr. Donald Childers, Dr. Jacob Hammer, Dr. Mark Yang and Dr. Tan Wong for their guidance and dis- cussion they provided. Their comments are critical and constructive. Special thank goes to John Fisher for introducing his work to me, which actually inspired this work. Special thank also goes to Chuan Wang for introducing me to CNEL and the friendship he provided. The discussions with Hsiao-Chun Wu were fruitful. The special thank is also due to him. I would also like to thank the other CNEL fellows. The iii list includes, but not limited to, Likang Yen, Craig Fancourt, Frank Candocia, Qun Zhao for their help and friendship. I would like to thank my brother, sister and my friend Yuan Yao for their constant love, support and encouragement. Finally, I would like to thank my wife, Shu, for her love, support, patience and sacri- fice, which made this dissertation possible. iv TABLE OF CONTENTS Page ACKNOWLEDGEMENTS .................................................................................... iii A B STR A C T ............................................................................................................ viii CHAPTERS 1 INTRODUCTION ................................................................................... 1 1.1 Information and Energy: A Brief Review .................................... .... 1 1.2 M otivation .......................................... .............. .............................. 6 1.3 O utline ............................................................................................... 15 2 ENERGY, ENTROPY AND INFORMATION POTENTIAL ...................... 17 2.1 Energy, Entropy and Information of Signals ......................................... 17 2.1.1 Energy of Signals ......................................................................... 17 2.1.2 Information Entropy .............................................................. 20 2.1.3 Geometrical Interpretation of Entropy .......................................... 24 2.1.4 M utual Inform ation ...................................................................... 27 2.1.5 Quadratic Mutual Information .................................... ........... 31 2.1.6 Geometrical Interpretation of Mutual Information .................... 38 2.1.7 Energy and Entropy for Gaussian Signal ...................................... 39 2.1.8 Cross-Correlation and Mutual Information for Gaussian Signal .... 42 2.2 Empirical Energy, Entropy and MI: Problem and Literature Review ..... 44 2.2.1 Empirical Energy .................................................................... 44 2.2.2 Empirical Entropy and Mutual Information: The Problem ............ 44 2.2.3 Nonparametric Density Estimation ............................................... 46 2.2.4 Empirical Entropy and Mutual Information: The Literature Review 51 2.3 Quadratic Entropy and Information Potential ........................................ 57 2.3.1 The Development of Information Potential .................................. 57 2.3.2 Information Force (IF) ............................................... ............ 59 2.3.3 The Calculation of Information Potential and Force ................. 60 2.4 Quadratic Mutual Information and Cross Information Potential ......... 62 2.4.1 QMI and Cross Information Potential (CIP) ................................. 62 2.4.2 Cross Information Forces (CIF) .................................................... 65 2.4.3 An Explanation to QMI ............................................. ........... 66 V Page 3 LEARNING FROM EXAMPLES ................................................................... 68 3.1 Learning System ...................................................... ........................ 68 3.1.1 Static M odels ................................................... ....................... 69 3.1.2 Dynamic Models ..................................................................... 74 3.2 Learning M echanism s ........................................................................... 78 3.2.1 Learning Criteria .......................................................................... 79 3.2.2 Optimization Techniques ........................................... .......... 83 3.3 General Point of View ..................................................................... 90 3.3.1 InfoMax Principle ................................................................... 90 3.3.2 Other Similar Information-Theoretic Schemes ............................. 91 3.3.3 A General Scheme .................................................................. 95 3.3.4 Learning as Information Transmission Layer-by-Layer ............. 96 3.3.5 Information Filtering: Filtering beyond Spectrum ........................ 97 3.4 Learning by Information Force ............................................ ........... 97 3.5 Discussion of Generalization by Learning .......................................... 99 4 LEARNING WITH ON-LINE LOCAL RULE: A CASE STUDY ON GENERALIZED EIGENDECOMPOSITION .......................................... 101 4.1 Energy, Correlation and Decorrelation for Linear Model ..................... 101 4.1.1 Signal Power, Quadratic Form, Correlation, Hebbian and Anti-Hebbian Learning ........................................ 102 4.1.2 Lateral Inhibition Connections, Anti-Hebbian Learning and D ecorrelation .............................................................................. 103 4.2 Eigendecomposition and Generalized Eigendecomposition ................... 105 4.2.1 The Information-Theoretic Formulation for Eigendecomposition and Generalized Eigendecomposition ....................................... 106 4.2.2 The Formulation of Eigendecomposition and Generalized Eigendecomposition Based on the Energy Measures .............. 109 4.3 The On-line Local Rule for Eigendecomposition .................................. 111 4.3.1 Oja's Rule and the First Projection ............................................... 111 4.3.2 Geometrical Explanation to Oja's Rule ........................................ 112 4.3.3 Sanger's Rule and the Other Projections ...................................... 113 4.3.4 APEX Model: The Local Implementation of Sanger's Rule ......... 114 4.4 An Iterative Method for Generalized Eigendecomposition ................... 118 4.5 An On-line Local Rule for Generalized Eigendecomposition .............. 120 4.5.1 The Proposed Learning Rule for the First Projection ................... 121 4.5.2 The Proposed Learning Rules for the Other Connections .............. 127 4.6 Sim ulations .............................................................................................. 133 4.7 Conclusion and Discussion .................................................................... 134 5 A PPLICA TION S .............................................................................................. 138 5.1 Aspect Angle Estimation for SAR Imagery .......................................... 138 5.1.1 Problem Description ..................................................................... 138 5.1.2 Problem Formulation .................................................................... 139 5.1.3 Experiments of Aspect Angle Estimation ..................................... 142 vi Page 5.1.4 Occlusion Test on Aspect Angle Estimation ................................ 149 5.2 Automatic Target Recognition (ATR) ................................................... 152 5.2.1 Problem Description and Formulation .......................................... 152 5.2.2 Experiment and Result .................................................................. 155 5.3 Training MLP Layer-by-Layer with CIP ................................................. 160 5.4 Blind Source Separation and Independent Component Analysis ............ 164 5.4.1 Problem Description and Formulation .......................................... 164 5.4.2 Blind Source Separation with CS-QMI (CS-CIP) ........................ 165 5.4.3 Blind Source Separation by Maximizing Quadratic Entropy ........ 167 5.4.4 Blind Source Separation with ED-QMI (ED-CIP) and MiniMax Method ................................................................ 171 6 CONCLUSIONS AND FUTURE WORK ..................................................... 179 APENDICES A THE INTEGRATION OF THE PRODUCT OF GAUSSIAN KERNELS ...... 182 B SHANNON ENTROPY OF MULTI-DIMENSIONAL GAUSSIAN VARIABLE ............................................................................ 185 C RENYI ENTROPY OF MULTI-DIMENSIONAL GAUSSIAN VARIABLE ............................................................................ 186 D H-C ENTROPY OF MULTI-DIMENSIONAL GAUSSIAN VARIABLE ..... 187 REFER EN CES ............................................................................................................. 188 BIOGRAPHICAL SKETCH .............. ........................................................................... 197 vii Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ENERGY, ENTROPY AND INFORMATION POTENTIAL FOR NEURAL COMPUTATION By Dongxin Xu May 1999 Chairman: Dr. Jose C. Principe Major Department: Electrical and Computer Engineering The major goal of this research is to develop general nonparametric methods for the estimation of entropy and mutual information, giving a unifying point of view for their use in signal processing and neural computation. In many real world problems, the informa- tion is carried solely by data samples without any other a priori knowledge. The central issue of "learning from examples" is to estimate energy, entropy or mutual information of a variable only from its samples and adapt the system parameters by optimizing a criterion based on the estimation. By using alternative entropy measures such as Renyi's quadratic entropy, coupled with the Parzen window estimation of the probability density function for data samples, we developed an "information potential" method for entropy estimation. In this method, data samples are treated as physical particles and the entropy turns out to be related to the potential energy of these "information particles." The entropy maximization or minimiza- viii tion is then equivalent to the minimization or the maximization of the "information poten- tial." Based on the Cauchy-Schwartz inequality and the Euclidean distance metric, we further proposed the quadratic mutual information as an alternative to Shannon's mutual information. There is also a "cross information potential" implementation for the qua- dratic mutual information that measures the correlation between the "marginal informa- tion potentials" at several levels. "Learning from examples" at the output of a mapper by the "information potential" or the "cross information potential" is implemented by propa- gating the "information force" or the "cross information force" back to the system param- eters. Since the criteria are decoupled from the structure of learning machines, they are general learning schemes. The "information potential" and the "cross information poten- tial" provide a microscopic expression for the macroscopic measure of the entropy and mutual information at the data sample level. The algorithms examine the relative position of each data pair and thus have a computational complexity of O(N2). An on-line local algorithm for learning is also discussed, where the energy field is related to the famous biological Hebbian and anti-Hebbian learning rules. Based on this understanding, an on-line local algorithm for the generalized eigendecomposition is pro- posed. The information potential methods have been successfully applied to various problems such as aspect angle estimation in synthetic aperture radar (SAR) imagery, target recogni- tion in SAR imagery, layer-by-layer training of multilayer neural networks and blind source separation. The good performance of the methods on various problems confirms the validity and efficiency of the information potential methods. ix CHAPTER 1 INTRODUCTION 1.1 Information and Energy: A Brief Review Information plays an important role both in the life of a person and of a society, espe- cially in today's information age. The basic purpose of all kinds of scientific research is to obtain information in a particular area. One of the most important tasks of space programs is to get information about cosmic space and celestial bodies, such as evidence whether there is life on Mars. A central problem of the Internet is how to transmit, process and store information in computer networks. "Like it or not, we are information dependent. It is a commodity as vital as the air we breathe, as any of our metabolic energy requirements. For better or worse, we're all inescapably embedded in a universe of flows, not only of matter and energy but also of whatever it is we call information" [You87: page 1]. The notion of information is so fundamental and universal that only the notion of energy can be compared with it. The parallel and analogy of these two fundamental notions are well known. Most of the greatest inventions and discoveries in scientific and human history can be related to either the conversion, transfer, and storage of energy or the transmission and storage of information. For instance, the use of fire and water, the invention of simple machines such as the lever and the wheel, and the invention of the steam-engine, the discoveries of electricity and atomic energy are all connected to energy while the appearance of speech in the prehistoric times and the invention of writing at the 1 2 dawn of human history, followed by the invention of paper, printing, telegraph, photogra- phy, telephone, radio, television and finally the computer and the computer network are examples of information. Many inventions and discoveries can be used for both purposes. Fire, as an example, can be used for cooking, heating and transmitting signals. Electricity, as another example, can be used for transmitting both energy and information [Ren60]. There are a variety of energies and information. If we disregard the actual form of energy (mechanical, thermal, chemical, electrical and atomic, etc.) and the real content of information, what will be left is the pure quantity [Ren60]. The principle of energy conser- vation was formulated and developed in the middle of the last century, while the essence of information was studied later in the 1940s. With the quantity of energy, we can come up to the conclusion that a small amount of U235 contains a large amount of atomic energy and our world came into the atomic age. With the pure quantity of information, we can tell that the optical cable can transmit much more information than the ordinary elec- trical telephone line, and in general, the capacity of a communication channel can be spec- ified in terms of the rate of information quantity. Although the quantitative measure of information was originated from the study of communication, it is such a fundamental concept and method that it has been widely applied to many areas such as statistics, phys- ics, chemistry, biology, lifescience, psychology, psychobiology, cognitive science, neuro- science, cybernetics, computer sciences, economics, operation research, linguistics, philosophy [You87, Kub75, Kap92, Jum86]. The study of quantitative measure of information in communication systems started in 1920s. In 1924 Nyquist showed that the speed W of transmission of intelligence over a telegraph circuit with a fixed line speed is proportional to the logarithm of a number m of 3 current values used to encode the message: W = klogm, where k is a constant [Nyq24, Chr81]. In 1928, Hartley generalized this to all forms of communication, letting m repre- sent the number of symbols available at each selection of a symbol to be transmitted. Hart- ley explicitly addressed the issue of the quantitative measure for information and pointed out that it should be independent of psychological factors (or objective) [Har28, Chr81]. Later in 1948, Shannon published his celebrated paper "A Mathematical Theory of Com- munication," which explored the statistical structure of a message and extended Nyquist and Hartley's logarithmic measure for information to a probabilistic logarithm: N N I = pklogpk for the probability structure pk 0 (k = 1,...,N), Pk = 1. k= k= 1 When Pk = 1/m in the equiprobable case, Shannon's measure degenerates to Hartley's measure [Sha48, Sha62]. Shannon's measure can also be regarded as a measure for uncer- tainty. It laid the foundation for information theory. There is a striking formal similarity between Shannon's measure and the entropy in statistical mechanics. This was one of the reasons that led von Neumann to suggest to Shannon to call his uncertainty measure the entropy [Tri71]. "Entropie" was a German word coined in 1865 by Clausius to represent the capacity for change of matter [Chr81]. The second law of thermodynamics, formulated by Clausius, is also known as the entropy law. Its best-known statement has been in the form, "Heat cannot by itself pass from a colder to a hotter system." Or more formally, the entropy of a closed system will never decrease, but can only increase until it reaches its maximum [You87]. The entropy maxi- mum principle of a closed system has a corollary that is an energy minimum principle [Cha87]; i.e., the energy of the closed system will reach its minimum when the entropy of the system reaches its maximum. 4 Clausius' entropy was initially an abstract and macroscopic idea. It was Boltzmann who first gave the entropy a microscopic and probabilistic interpretation. Boltzmann's work showed that entropy could be understood as a statistical law measuring the probable states of the particles in a closed system. In statistical mechanics, each particle in a system occupies a point in a "phase space," and so the entropy of a system came to constitute a measure for the probability of the microscopic state (distribution of particles) of any such system. According to this interpretation, a closed system will approach a state of thermo- dynamic equilibrium because equilibrium is overwhelmingly the most probable state of the system. The probabilistic interpretation of entropy resulted in an interpretation of entropy that is one of the cornerstones of the moder relationship between measures of entropy and the amount of information in a message. That is, both the information entropy and the statistical mechanical entropy are the measure of uncertainty or disorder of a sys- tem [You87]. One interesting problem about entropy which puzzled physicists for almost 80 years is Maxwell's Demon, a hypothetical identity which could theoretically sort the molecules of a gas into either of two compartments, say, the faster molecules going into A, the slower to B, resulting in the lowering of the temperature in B while raising it in A without expendi- ture of work. But according to the second law of thermodynamics, i.e. the entropy law, the temperature of a closed system will eventually be even and thus the entropy be maxi- mized. In 1929, Szilard pointed out that the sorting of the molecules depends on the infor- mation about the speed of molecules which is obtained by the measurement or observation on molecules, and any such measurement or observation will invariably involve dissipa- tion of energy and increase entropy. While Szilard did not produce a working model, he 5 showed mathematically that entropy and information were fundamentally interconnected, and his formula was analogous to the measures of information developed by Nyquist and Hartley and eventually by Shannon [You87]. Contrary to closed systems, the open systems with energy flux in and out tend to self- organize and develop and maintain a structural identity, resisting the entropy drift of closed systems and their irreversible thermodynamic fate [You87, Hak88]. In this area, Prigogine and his colleagues' work on nonlinear, nonequilibrium processes made a pecu- liar contribution, which provides a powerful explanation of how order in the form of stable structures can be built up and maintained in a universe whose ingredients seem otherwise subject to a law of increasing entropy [You87]. Boltzmann and others' work gave the relationship between entropy maximization and state probabilities; that is, the most probable microscopic state of an ensemble is a state of uniformity described by maximizing its entropy subject to constraints specifying its observed macroscopic condition [Chr81]. The maximization of Shannon's entropy, as a comparison, can be used as the basis for equiprobability assumptions (an equiprobability should be used upon the total ignorance of the probability distribution). Information-theo- retic entropy maximization subject to known constraints was explored by Jaynes in 1957 as a basis for statistical mechanics, which in turn makes it a basis for thermostatics and thermodynamics [Chr81]. Jaynes also pointed out: "in making inferences on the bases of partial information we must use that probability distribution which has maximum entropy subject to whatever is known. This is the only unbiased assignment we can make; to use any other would amount to arbitrary assumption of information which by hypothesis we do not have" [Jay57: I, page 623]. More general than Jaynes' maximum entropy principle 6 is Kullback's minimum cross-entropy principle, which introduces the concept of cross- entropy or "directed divergence" of a probability distribution P from another probability distribution Q. The maximum entropy principle can be viewed as a special case of the minimum cross-entropy principle when Q is a uniform distribution [Kap92]. In addition, Shannon's mutual information is nothing but the directed divergence between the joint probability distribution and the factorized marginal distributions. 1.2 Motivation The above gives a brief review of various aspects on energy, entropy and information, from which we can see how fundamental and general the concepts of energy and entropy are, and how these two fundamental concepts are related to each other. In this dissertation, the major interests and the issues addressed are about the energy and entropy of signals, especially the empirical energy and entropy measures of signals, which are crucial in sig- nal processing practice. First, let's take a look at the empirical energy measures for sig- nals. There are many kinds of signals in the world. No matter what kind, a signal can be abstracted as X(n) e Rm, where n is the time index (only discrete time signals are consid- ered in this dissertation), Rm represents an m-dimensional real space (only real signals are considered in this dissertation, complex signals can be thought of as a two dimensional real signal). The empirical energy and power of a finite signal x(n) e R, n = 1, ..., N, is N 1 N E(x)= x(n) P(x) = x(n)2 (1.1) n=l n= 7 The difference between two signals x (n) and x2(n), n = 1, ..., N can be measured by the empirical energy or power of the difference signal: d(n) = xl(n) -x2(n) N N Ed(xl, 2) = E d(n)2 Pd(xlX2) = Z d(n)2 (1.2) n= n= The difference between x1 and x2 can also be measured by the cross-correlation (inner-product) N C(x,,x2) = xl(n)x2(n) (1.3) n=1 or its normalized version N N N C(x, X2) = (n)2(n) / xl 2 x2(2 (1.4) n=1 n=1 n= The geometrical illustration of these quantities is shown in Figure 1-1. x1 d O cos(6) = C(l, x2) Figure 1-1. Geometrical Illustration of Energy Quantities Since E(x) = C(x, x), cross-correlation can be regarded as an energy related quan- tity. We know that for a random signal x(n) with the pdf (probability density function) fx(x), the Shannon information entropy is 8 H(x) = -ffx(x)logfx(x)dx (1.5) Based on the information entropy concept, the difference or similarity between two random signals x1 and x2 with joint pdf fXIX2(X1, x2) and marginal pdfs fx (xl), fX2(X2) can be measured by the mutual information between two signals: I(x ,x2) = fxx2(X 1 2)1g fx, (X)2 x 1dx2 (1.6) Since H(x) = I(x, x), mutual information is an entropy type quantity. Comparatively, energy is a simple, straightforward idea and easy to implement, while information entropy uses all the statistics of the signal and is much more profound and dif- ficult to measure or implement. A very fundamental and important question arises natu- rally: If a discrete data set {x(n) e Rmln= 1, ..., N} is given, what is the information entropy related to this data set, or how can we estimate the entropy for this data set. This empirical entropy problem was addressed before in the literatures [Chr80, Chr81, Bat94, Vio95, Fis97], etc. Parametric methods can be used for pdf estimation and then entropy estimation, which is straightforward but less general. Nonparametric methods for pdf esti- mation can be used as the basis for the general entropy estimation (no assumption about data distribution is required). One example is the historgram method [Bat94] which is easy to implement in one dimensional space but difficult to apply to high dimensional space, and also difficult to analyze mathematically. Another popular nonparametric pdf estima- tion method is the Parzen window method, the so-called kernel or potential function method [Par62, Dud73, Chr81]. Once the Parzen window method is used, the perplexing problem left is the calculation of the integral in the entropy or mutual information formula. Numerical methods are extremely complex in this case and thus only suitable for one 9 dimensional variable [Pha96]. Approximation can also be made by using sample mean [Vio95] which requires a large amount of data and may not be a good approximation for a small data set. The indirect method of Fisher [Fis97] can not be used for entropy estima- tion but only for entropy maximization purposes. For the blind source separation (BSS) or independent component analysis (ICA) problem [Com94, Cao96, Car98b, Bel95, Dec96, Car97, Yan97], one popular contrast function is the empirical mutual information between the outputs of a demixing system, which can be implemented by the difference between the sum of the marginal entropies and the joint entropy, where joint entropy is usually related to the input entropy and the determinant of the linear demixing matrix, and the marginal entropies are estimated based on the moment expansions for pdf such as the Edgeworth expansion and the Gram-Charlier expansion [Yan97, Dec96]. The moment expansions have to be truncated in practice and are only appropriate for a one-dimension (1-D) signal because, in multi-dimensional space, the expansions will become extremely complicated. So, from the above brief review, we can see that there lacks an effective and general entropy estimation method. One major point of this dissertation is to formulate and develop such an effective and general method for the empirical entropy problem and give a unifying point of view about signal energy and entropy, especially the empirical signal energy and entropy. Surprisingly, if we regard each data sample mentioned above as a physical particle, then the whole discrete data set is just like a set of particles in a statistical mechanical sys- tem. It might be interesting to think what is the information entropy of this data set and how this can be related to physics. 10 According to the modem science, the universe is a mass-energy system. In such mass- energy spirit, we would ask whether the information entropy, especially the empirical information entropy, would somehow have mass-energy properties. In this dissertation, the empirical information entropy is related to "potential energy" of "data particles" (data samples). Thus, a data sample is called "information particle" (IPT). In fact, data samples are basic units conveying information; they indeed are "particles" which transmit informa- tion. Accordingly, the empirical entropy can be related to the potential energy called "information potential" (IP) of "information particles" (IPTs). With the information potential, we can further study how it can be used in a learning system or an adaptive system of signal processing, and how a learning system can self- organize with the information flux in and out (often in the form of the flux of data sam- ples), just like an open physical system which will appear some orders with the energy flux in and out. The information theory originated from communication study and has been widely used for the design and practice in this area and many other areas. However, its applica- tion to learning systems or adaptive systems such as perceptual systems, either artificial or natural, is just in its infancy. Some early researchers tried to use information theory for the explanation of a perceptual process, e.g. Attneave who pointed out in 1954 that "a major function of the perceptual machinery is to strip away some of the redundancy of stimula- tion, to describe or encode information in a form more economical than that in which it impinges on the receptors" [Hay94: page 444]. However, only in the late 1980s did Lin- sker propose the principle of maximum information preservation (InfoMax) [Lin88, Lin89] as the basic principle for the self-organization of neural networks, which requires the maximization of the mutual information between the output and the input of the net- work so that the information about the input is best preserved in the output. Linsker further applied the principle to linear networks with Gaussian assumption on input data distribu- tion and noise distribution, and derived the way to maximize the mutual information in this particular case [Lin88, Lin89]. In 1988, Plumbley and Fallside proposed the similar minimum information loss principle [Plu88]. In the same period, there are other research- ers who use the information-theoretic principles but still with the limitation of linear model or Gaussian assumption, for instance, Becker and Hinton's spatially coherent fea- tures [Bec89, Bec92], Ukrainec and Haykin's spatially incoherent features [Ukr92], etc. In recent years, the information-theoretic approaches for BSS and ICA have drawn a lot of attention. Although they certainly broke the limitation of the model linearity and the Gaus- sian assumption, the methods are still not general enough. There are two typical informa- tion-theoretic methods in this area: maximum entropy (ME) and minimum mutual information (MMI) [Bel95, Yan97, Yan98, Pha96]. Both methods use the entropy relation of a full rank linear mapping: H(Y) = H(X) + log det( W)I, where Y = WX and W is a full rank square matrix. Thus the estimation of information quantities is coupled with the network structure. Moreover, ME requires that the nonlinearity in the outputs matches with the cdf (cumulative density function) of the source signals [Bel95], and MMI uses the above mentioned expansion methods or numerical method to estimate the marginal entro- pies [Yan97, Yan98, Pha96]. On the whole, there lacks a general method and a unifying point of view about the estimation of information quantities. Human beings and animals in general are examples of systems that can learn from the interactions with their environments. Such interactions are usually in the form of "exam- 12 pies" (or called "data samples"). For instance, children learn to speak by listening, learn to recognize objects by being presented with exemplars, learn to walk by trying, etc. In gen- eral, children learn by the stimulation from their environment. Adaptive systems for signal processing [Wid85, Hay94, Hay96] are also learning systems that evolve with the interac- tion with input, output and desired (or teacher) signals. To study the general principle of a learning system, we first need to set an abstract model for the system and its environment. As illustrated in Figure 1-2, an abstract learning system is a mapping Rm Rk: Y = q(X, W), where X e Rm is the input signal, Y E Rk is the output signal, W is a set of parameters of the mapping. The environment is modeled k by the doublet (X, D), where D Rk is a desired signal (teacher signal). The learning mechanism is a set of rules or procedures that will adjust the parameters W so that the mapping achieves a desired goal. Desired Signal D Learnin System Learning Y = (X, W) Mechanism Input Signal Output Signal X Y Figure 1-2. Illustration of a Learning System There are a variety of learning systems, linear or nonlinear, feedforward or recurrent, full rank or dimension reduced, perceptron and multilayer perceptron (MLP) with global basis or radial-basis function with local basis, etc. Different system structures may have different property and usage [Hay98]. 13 The environment doublet (X, D) also has a variety of forms. A learning process can have a desired signal or not (very often the input signal is the implicit desired signal). Some statistical property of X or Y or D can be given or assumed. Most often, only a dis- crete data set i = {(Xi, Di) i= 1, ..., N} is provided. Such a scheme is called "learning from examples" and is a general case [Hay94, Hay98]. This dissertation is more interested in "learning form examples" than any scheme with some assumptions about the data. Of course, if a priori knowledge about data is known, a learning method should incorporate this knowledge. There are also a lot of learning mechanisms. Some of them make assumptions about data, and others do not. Some are coupled with the structure and topology of the learning system, while the others are independent of the system. A general learning mechanism should not depend on data and should be de-coupled from the learning system. There is no doubt that the area is rich in diversity but lacks unification. There are no more known abstract and fundamental concepts such as energy and information. To look for the essence of learning, one should start from these two basic ideas. Obviously, learn- ing is about obtaining knowledge and information. Based on the above learning system model, we can say that learning is nothing but to transfer onto the machine parameters the information contained in the environment or in a given data set to be more specific. This dissertation will try to give a unifying point of view for learning systems and to implement it by using the proposed information potential. The basic purpose of learning is to generalize. The ability of animals to learn some- thing general from their past experiences is the key to their survival in the future. Regard- ing the generalization ability of a learning machine, one very fundamental question is 14 what is the best we can do to generalize for a given learning system and a given set of environmental data? One thing is very clear that the information contained in the given data set is a quantity that can not be changed by any learning method, and no learning method can go beyond that. Thus, it is the best that one learning system can possibly obtain. Generalization, from this point of view, is not to create something new but to uti- lize fully the information contained in the observed data, neither less nor more. By "less," we mean that the information obtained by a learning system is less than the information contained in the given data. By "more," we mean that implicitly or explicitly, a learning method assumes something that is not given. This is also the spirit of Jaynes [Jay57] men- tioned above and similar point of view can be found in Christensen [Chr80, Chr81]. The environmental data for a learning system are usually not collected all at one time but are accumulated during a learning process. Whenever one datum appears or after a small set of data is obtained, learning should take place and the parameters of the learning system should be updated. This is the problem of the on-line learning method, which is also the issue that this dissertation is going to deal with. Another problem that this dissertation is interested in is the "local" learning algo- rithms. In a biological nervous system, what can be changed is the strength of synaptic connections. The change of a synaptic connection can only depend on its local informa- tion, i.e. its input and output. For an engineering system, it will be much easier to imple- ment by either hardware or software if the learning rule is "local;" i.e., the update of a connection in a learning network system only relies on its input and output. The Hebb's rule is a famous neuropsychological postulation of how a synaptic connection will evolve 15 with its input and output [Heb49, Hay98]. It will be shown in this dissertation how Heb- bian type algorithms can be related to the energy and entropy of signals. 1.3 Outline In Chapter 2, the basic ideas of energy, information entropy and their relationship will be reviewed. Since the information entropy directly relies on the pdf of the variable, the Parzen window nonparametric method will be reviewed for the development of the idea of information potential and cross information potential. Finally, the derivation will be given, the idea of the information force in a information potential field will be introduced for its use in learning systems, and the calculation procedure for information potential and cross information potential and all the forces in corresponding information potential fields will be described. In Chapter 3, a variety of learning systems and learning mechanisms will be reviewed. A unifying point of view about learning by information theory will be given. The informa- tion potential implementation for the unifying idea will be described. And generalization of learning will be discussed. In Chapter 4, the on-line local algorithms for a linear system with energy criteria will be reviewed. The relationship between Hebbian, anti-Hebbian rules and the energy criteria will be discussed. An on-line local algorithm for generalized eigen-decomposition will be proposed, with the discussion of convergence properties such as the convergence speed and stability. Chapter 5 will give several application examples. First, the information potential method will be applied to aspect angle estimation for SAR images. Second, the same method will be applied to the SAR automatic target recognition. Third, the example of the 16 training of layered neural network by the information potential method will be described. Fourth, the method will be applied to independent component analysis and blind source separation. Chapter 6 will conclude the dissertation and provide a survey on the future work in this area. CHAPTER 2 ENERGY, ENTROPY AND INFORMATION POTENTIAL 2.1 Energy. Entropy and Information of Signals 2.1.1 Energy of Signals From the statistical point of view, the energy of a 1-D stationary signal is related to its 2 variance. For a 1-D stationary signal x(n) with variance a2 and mean m, its energy (pre- cisely short time energy or power) is Ex = E[x2] = 2+m2 (2.1) where E[ ] is the expectation operator. If m = 0, then the energy is equal to the variance Ex = 2 So, basically, energy is a quantity related to second order statistics. For two 1-D signals xl(n) and x2(n) with mean mI and m2 respectively, the co-vari- ance r = E[(x-m )(x2 m2)] = E[xx2] -m m2, and we have the cross-correlation between two signals: 12 = CXIX2 = E[xlX2] = r+M1m2 (2.2) If at least one signal is zero-mean, c12 = r. For a 2-D signal X = (x x2) all the second statistics are given in a covariance matrix 1, and we have E[XX] = + ml mlm2 E = T r (2.3) 2 2 mIm2 M2 r mC2 17 18 Usually, the first order statistics has nothing to do with the information; we will just consider zero-mean case; thus we have E[XX ] = . 2 2 For a 2-D signal, there are three energy quantities in the covariance matrix: Cl, 02 and r. One may ask what is the overall energy quantity for a 2-D signal. From linear alge- bra [Nob88], there are 3 choices: the first is the determinant of I which is a volume mea- sure in the 2-D signal space and is equal to the product of all the eigenvalues of Y; second is the trace of Z which is equal to the sum of all the eigenvalues of Y; the third is the product of all the diagonal elements. Thus, we have Jl = logl~l J2 = tr() = + 02 (2.4) 2 2 J3 = log(G12) where tr( ) is the trace operator, the use of log function in Jl and J3 is to reduce the dynamic range of the original quantities and this is also related to the information of the signal which will be clear later in this chapter. The component signals xl and x2 will be called marginal signals in this dissertation. If the two marginal signals x1 and x2 are uncorrelated, then J1 = J3. In general, we have J3 > J1 (2.5) where the equality holds if and only if the two marginal signals are uncorrelated. This is the so-called Hadamard's inequality [Nob88, Dec96]. In general, for a positive semi-defi- nite matrix Z, we have the same inequality where Jl is the determinant of the matrix (or its logarithm, note that logarithm is a monotonic increasing function); J3 is the multiplica- tion of the diagonal components (or its logarithm) 19 When the two marginal signals are uncorrelated and their variances are equal, then J, and J2 are equivalent in the sense that JI = 21ogJ2-21og2 = 21ogo2 (2.6) For a n-D signal X = (xl, ..., xn) with zero-mean, we have covariance matrix Ty ... r1n E = E[XX ] = ... ...... (2.7) 2 rnl *** Fn] 2 where oa (i = 1, ...,n) are the variance of the marginal signals xi, rij (i = 1, ..., n, j = 1, ..., n, i #j) are the cross-correlations between the marginal sig- nals xi and xj. The three possible overall energy measure are SJ = log IZ n J2 = tr() = C i= (2.8) J3 = log cn ki= 1 Hadamard's inequality is J3 > J the equality holds if and only if I is diagonal; i.e., the marginal signals are uncorrelated with each other. J2 is equal to the sum of all the eigenvalues of I and is invariant under any orthonor- mal transformation (rotation transform). When the marginal signals are uncorrelated with each other and their variances are equal, J2 and J, are equivalent in the sense that they are related by a monotonic increasing function: J" = nlogJ2-nlogn = nlogo2 (2.9) 20 2.1.2 Information Entropy Compared with energy, the information entropy of a signal involves all the statistics of a signal, and thus is more profound and difficult to implement. As mentioned in Chapter 1, the study of abstract quantitative measures for information started in 1920s when Nyquist and Hartley proposed a logarithmic measure [Nyq24, Har28]. Later in 1948, Shannon pointed out that the measure is valid only if all events are equiprobable [Sha48]. Further he coined the term "information entropy" which is the mathematical expectation of Nyquist and Hartley's measures. In 1960, Renyi generalized Shannon's idea by using an exponential function rather than a linear function to calculate the mean [Ren60, Ren61]. Later on, other forms of information entropy appeared (e.g. Havrda and Charvat's measure, Kapur's measure) [Kap94]. Although Shannon's entropy is the only one which possesses all the postulated properties (which will be given later) for an information measure, the other forms such as Renyi's and Havrda-Charvat's are equiv- alent with regards to entropy maximization [Kap94]. In a real problem, which form to use depends upon other requirements such as ease of implementation. For an event with probability p, according to Hartley's idea, the information given when this event happens is I(p) = log- = -logp [Har28]. Shannon further developed p Hartley's idea, resulting in Shannon's information entropy for a variable with the proba- bility distribution {pkl k 1, ..., n} : n n Hs= ZPk(Pk) Pk = 1 Pk O (2.10) k= 1 k= In the general theory of means, a mean of the real numbers x, ..., xn with weights P ..., n has the form 21 -1 pk(xk) (2.11) k= 1 where q((x) is Kolmogorov-Nagumo function, which is an arbitrary continuous and strictly monotonic function defined on the real numbers. So, in general, the entropy mea- sure should be [Ren60, Ren61] 9p1 (I(pPk)) (2.12) As an information measure, (p( ) can not be arbitrary since information is "additive." To meet the additivity condition, (p( ) can be either (p(x) = x or (x) = 2( -)x. If the former is used, (2.12) will become Shannon's entropy (2.10). If the latter is used, Renyi's entropy with order a is obtained [Ren60, Ren61]: n a HR -- log : a > 0, a 1 (2.13) k= 1 In 1967, Havrda and Charvat proposed another entropy measure which is similar to Renyi's measure but has different scaling [Hav67, Kap94] (it will be called Havrda-Char- vat's entropy or H-C entropy for short): Hha = Pk- a > 0, a 1 (2.14) k= 1 There are also some other entropy measures, for instance, H. = -log(max (Pk)) k [Kap94]. Different entropy measures may have different properties. There are more than a dozen properties for Shannon's entropy. We will discuss five basic properties since all the 22 other properties can be derived from these properties [Sha48, Sha62, Kap92, Kap94, Acz75]. (1) The entropy measure H(pl, ...,pn) is a continuous function of all the probabilities Pk, which means that a small change in probability distribution will only result in a small change in the entropy. (2) H(p ..., p,) is permutationally symmetric; i.e., the position change of any two or more Pk in H(pl, ...,p,) will not change the entropy value. Actually, the permutation of any Pk in the distribution will not change the uncertainty or disorder of the distribution and thus should not affect the entropy. (3) H(1/n, ..., 1/n) is a monotonic increasing function of n. For an equiprobable distribution, when the number of choices n increases, the uncertainty or disorder increases, and so does the entropy measure. (4) Recursivity: If an entropy measure satisfies (2.15) or (2.16), then it has the recur- sivity property. It means that the entropy of n outcomes can be expressed in terms of the entropy of n 1 outcomes plus the weighted entropy of the combined 2 outcomes. H,(p,p2, ...,p,) = H,_- +p2p,,P3 ...,pn)+(P I )H2 (2.15) Hn(P1,P2, "",Pn) = Hn- 1(P +P2'P3' .. )+Pn)(1 +P2)aH2 ", P2'2 l ) (2.16) where a is the parameter in Renyi's entropy or H-C entropy. (5) Additivity: If p = (pl, ...pn) and q = (q, ..., q,) are two independent proba- bility distribution, and the joint probability distribution is denoted by p q, then the prop- erty H(p q) = H(p) + H(q) is called additivity. 23 The following table gives the comparison of the three types of entropy about the above five properties: Table 2-1. The Comparison of Properties of Three Entropies Properties (1) (2) (3) (4) (5) Shannon's yes yes yes yes yes Renyi's yes yes yes no yes H-C's yes yes yes yes no From the table, we can see that the three types of entropy differ in recursivity and addi- tivity. However, Kapur pointed out: "The maximum entropy probability distributions given by Havrda-Charvat and Renyi's measures are identical. This shows that neither additivity nor recursivity is essential for a measure to be used in maximum entropy princi- ple" [Kap94: page 42]. So, the three entropies are equivalent for entropy maximization and any of them can be used. As we can see from the above, Shannon's entropy has no parameter, but both Renyi's entropy and Havrda-Charvat's entropy have a parameter a. So, both Renyi's entropy and Havrda-Charvat's measures constitute a family of entropy measures. There is a relation between Shannon's entropy and Renyi's entropy [Ren60, Kap94]: { HRa >Hs>HRp, if > a > 0 and (3> 1 (2.17) li HR = H(2.17) a-> 1 i.e., the Renyi's entropy is a monotonic decreasing function of the parameter a and it approaches Shannon's entropy when a approaches 1. Thus, Shannon's entropy can be regarded as one member of the Renyi's entropy family. Similar results hold for Havrda-Charvat's entropy measure [Kap94]: 24 { Hha_2Hs_2Hhp, if I>a>O and >1 (2.18) lim Hhr = Hs) a-+ I Thus, Shannon's entropy can also be regarded as one member of Havrda-Charvat's entropy family. So, both Renyi and Havrda-Charvat generalize Shannon's idea of informa- tion entropy. n When a = 2, Hh2 = 1 P is called quadratic entropy [Jum90]. In this disserta- n k=i tion, HR2 = -log p is also called quadratic entropy for convenience and because of k= 1 the dependence of the entropy quantity on the quadratic form of probability distribution. The quadratic form will give us more convenience as we will see later. For the continuous random variable Y with pdf fr(y), similarly to the Boltzmann- +oo Shannon differential entropy Hs(Y) = fy(y)logfy(y)dy, we can obtain the differen- -00-O tial version for these two types of entropy: HRa(Y) =- log fyy)ad HR2(Y) -log fy(y)2dy (2.19) Hha(Y) =1 f aY- 1 Hh2(Y)= 1- 2dy The relationship among Shannon's, Renyi's and Havrda-Charvat's entropies in (2.17) and (2.18) will hold for their corresponding differential entropies. 2.1.3 Geometrical Interpretation of Entropy From the above, we see that both Renyi's entropy and Havrda-Charvat's entropy con- n tain the term y pa for a discrete variable, and both of them approach Shannon's entropy k= 1 when a approaches 1. This suggests that all these entropies are related to some kind dis- 25 tance between the point of the probability distribution p = (p ...,p,) and the origin in the space of Rn. As illustrated in Figure 2-1, the probability distribution point n p = (p, .,Pn) is restricted to a segment of the hyperplane defined by Y Pk = 1 and k= 1 pk 2 0 (in the left graph below, the region is the line connecting two points (1,0) and (0,1); in the right graph below, the region is the triangular area confined by the three connecting lines between each pair of three points (1,0,0), (0,1,0) and (0,0,1)). The entropy of the n probability distribution p = (pi, ...,p) is a function V= Pk, which is the a- k= 1 norm of the point p raised power to a [Nov88, Gol93] and will be called "entropy a - norm." Renyi's entropy rescale the "entropy a -norm" Va by a logarithm: HRa = log Va; while Havrda-Charvat's entropy linearly rescales the "entropy a - norm" Va: Hha = a(Va-). P2 n P2 p = 11 (entropy a-norm) P2 k= 1 1 -(a-norm ofp raised power to a) 1 P (P1,P2,P3) Sp = (P1,P2) N 0 1 P3 | p1 Pl 1 0 1 Figure 2-1. Geometrical Interpretation of Entropy So, both Renyi's entropy with order a (HRa) and Havrda-Charvat's entropy with order a (Hha) are related to the a -norm of the probability distribution p. For the above- mentioned infinity entropy H., there is a relation lim HRa= HO and a oo H. = -log(max (Pk)) [Kap94]. Therefore, H, is related to the infinity-norm of the k 26 probability distribution p. For Shannon's entropy, we have lim HRa = Hs and a-+ I lim Hh = H It might be interesting to consider Shannon's entropy as the result of 1- a-+ 1 norm of the probability distribution p. Actually, the 1-norm of any probability distribution n is always 1 ( 1pk = 1). If we plug V1 = 1 and a = 1 in HRa = --log V and k= 1 Hha = 1 -(Va- 1), we will get 0/0. Its limit, however, is Shannon's entropy. So, in the limit sense, Shannon's entropy can be regarded as the function value of the 1-norm of the probability distribution. Thus, we can generally say that the entropy with order a (either Renyi's or H-C's) is a monotonic function of the a -norm of the probability distri- bution p, and the entropy (all entropies, at least all the above-mentioned entropies) is essentially a monotonic function of the distance from the probability distribution point p to the origin. From linear algebra, all norms are equivalent in comparing distances [Gol93, Nob88]; thus, they are equivalent for distance maximization or distance minimization, in both unconstrained and constrained cases. Therefore, all entropies (at least the above men- tioned entropies) are equivalent for the purpose of entropy maximization or entropy mini- mization. When a > 1, both Renyi's entropy HRa and Havrda-Charvat's entropy Hha are monotonic decreasing functions of the "entropy a -norm" Va. So, in this case, the entropy maximization is equivalent to the minimization of the "entropy a -norm" Va, and the entropy minimization is equivalent to the maximization of the "entropy a -norm" Va. When a < 1, both Renyi's entropy HRa and Havrda-Charvat's entropy Hha are monotonic increasing functions of the "entropy a-norm" Va. So, in this case, the entropy maximization is equivalent to the maximization of the "entropy a -norm" V,, and the entropy minimization is equivalent to the minimization of the "entropy a -norm" Va. 27 Of particular interest in this dissertation are the quadratic entropies HR2 and Hh2, which are both monotonic decreasing functions of the "entropy 2-norm" V2 of the proba- bility distribution p and are related to the Euclidean distance from the point p to the ori- gin. The entropy maximization is equivalent to the minimization of V2; and the entropy minimization is equivalent to the maximization of V2. Moreover, since both HR2 and Hh2 are lower bounds of Shannon's entropy, they might be more efficient than Shannon's entropy for entropy maximization. For a continuous variable Y, the probability density function fy(y) is a point in a func- tional space. All the pdf fy(y) will constitutes a similar region in a "hyperplane" defined +00 by fy(y)dy = 1 and fy(y) > 0. The similar geometrical interpretation can also be -00 given to the differential entropies. In particular, we have the "entropy a -norm" as +00 +oo Va = f(Y)adY V2 = fy(y)2dy (2.20) -00 --00 2.1.4 Mutual Information Mutual information (MI) measures the relationship between two variables and thus is more desirable in many cases. Following Shannon [Sha48, Sha62], the mutual information between two random variables X1 and X2 is defined as I,(X,)= f 2) f X2) dx dx (2.21) IS IX2) fffxX2(X IIX2)lof0 ( )fx( 12) 2 where fx (x,, x 2) is the joint pdf ofjoint variable (xl, x2) fX (x1) and fxy(X2) are the marginal pdf for X1 and X2 respectively. Obviously, mutual information is symmetric; i.e., I,(XI, X2) = Is(X2, X1). It is not difficult to show the relation between mutual infor- mation and Shannon's entropy in (2.22) [Dec96, Hay98]: 28 Is(X,X2) = H,(X,)-H,(XI X2) = Hs(X2)-Hs(X2X1,) (2.22) = H(XI) + Hs(X2)- H(X1,X2) where Hs(XI) and Hs(X2) are the marginal entropies; Hs(X1, X2) is the joint entropy; Hs(XI X2) = Hs(XI,X2)-Hs(X2) is the conditional entropy of X1 given X2 which is the measure of uncertainty of X1 when X2 is given, or the uncertainty left in (X1, X2) when the uncertain of X2 is removed; similarly, Hs(X2 X1) is the conditional entropy of X2 given X1 (all entropies involved are Shannon's entropy). From (2.22), it can be seen that the mutual information is the measure of the uncertainty removed from X1 when X2 is given, or in another word, the mutual information is the measure of the information that X2 convey about X, (or vice versa since the mutual information is symmetric). It pro- vides a measure of the statistical relationship between X1 and X2, which contains all the statistics of the related distributions and thus is a more general measure than a simple cross-correlation between X1 and X2 which only involve the second order statistics of the variables. It can be shown that the mutual information is non-negative, or equivalently the Shan- non's entropy reduces on conditioning, or the total marginal entropies is the upper bound of the joint entropy; i.e., Is(X1,X2) 0 Hs(XI) 2 Hs(X, X2), Hs(X2) > Hs(X2 X1) (2.23) Hs(XI, X2) < H,(X1) + Hs(X2) The mutual information can also be regarded as the Kullback-Leibler divergence (K-L divergence or called cross-entropy) [Kul68, Dec96, Hay98] between the joint pdf 29 fxx2 (x1, x2) and the factorized marginal pdf f (x1i)fx (x2). The Kullback-Leibler diver- gence between two pdfs f(x) and g(x) is defined as Dk(f,g)= f(x)log()dx (2.24) Jensen's inequality [Dec96, Ace92] says for a random variable X and a convex func- tion h(x), the expectation of this convex function of X is no less than the convex function of the expectation of X; i.e., E[h(X)] h(E[X]) or fh(x)fx(x)dx > h(fxfx(x)dx) (2.25) where E[ ] is the operator of mathematical expectation, fx(x) is the pdf of X. From Jensen's inequality [Dec96, Kul68], or by using the derivation in Acero [Ace92], it can be shown that the Kullback-Leibler divergence is non-negative and is zero if and only if two distributions are the same; i.e., Dk( g) = f(x)log ) dx > 0 (2.26) where the equality holds if and only if f(x) = g(x). So, the Kullback-Leibler divergence can be regarded as a "distance" measure between pdfs f(x) and g(x). However, it is not symmetric; i.e., Dk(f, g) # Dk(g,f) in general, and thus is called "directed divergence." Obviously, the mutual information mentioned above is the Kullback-Leibler "distance" from the joint pdf fx, (x1, x2) to the factorized marginal pdf fx,(1x)fx2(x2) Dk(fX,X2 (Xl, 2),fx,(Xl)f(X2)) Based on Renyi's entropy, we can define Renyi's divergence measure with order a for two pdff(x) and g(x) [Ren60, Ren6, Kap94]: 30 DRa(f, g) = logj d (2.27) (a 1) g(x)c11 The relation between Renyi's divergence and Kullback-Leibler divergence is [Kap92, Kap94] lim DRa(f g) = Dk(fg) (2.28) a-+ 1 Based on Havrda-Charvat's entropy, there is also Havrda-Charvat's divergence mea- sure with order a for two pdfs f(x) and g(x) [Hav67, Kap92, Kap94]: D (1 f(x dx-11 (2.29) Dha(f'g) (a- )[lg(x)a-1 There is also a similar relation between this divergence measure and Kullback-Leibler divergence [Kap92, Kap94]: lim Dh(f, g) = Dk(f,g) (2.30) a-+ I Unfortunately, as Renyi pointed out DRa(fXX(Xl, X2),fX,(xl)fx(X2)) is not appro- priate as a measure of mutual information of the variables X1 and X2 [Ren60]. Further- more, all these divergence measures (Kullback-Leibler, Renyi and Havrda-Charvat) are complicated due to the calculation of the integrals involved in their formula. Therefore, they are difficult to implement in the "learning from examples" and general adaptive sig- nal processing applications where the maximization or minimization of the measures is desired. In practice, simplicity becomes a paramount consideration. Therefore, there is a need for alternative measures which may have the same maximum or minimum pdf solu- tions as Kullback-Leibler divergence but at the same time is easy to implement, just like the case of the quadratic entropy which meet these two requirements. 31 For discrete variables X1 and X2 with probability distribution P i = 1,..., n and = 1, ., m respectively, and the joint probability distribution {PJi = 1, ..., n ;j = 1, ..., m }, the Shannon's mutual information is defined as n m P Is(XI,X2) = Pflog (2.31) i' = 1 x, 2.1.5 Quadratic Mutual Information As pointed out by Kapur [Kap92], there is no reason to restrict ourselves to Shannon's measure for entropy and to confine ourselves to Kullback-Leibler's measure for cross- entropy (density discrepancy or density distance). Entropy or cross-entropy is too deep and too complex a concept to be measured by a single measure under all conditions. The alternative measures for entropy discussed in 2.1.2 break such restriction on entropy, espe- cially, there are entropies with simple quadratic form ofpdfs. In this section, the possibil- ity of "mutual information" measures with only simple quadratic form of pdfs will be discussed (the reason to use quadratic form of pdfs will be clear later in this chapter). These measures will be called quadratic mutual information although they may lack some properties of Shannon's mutual information. Independence is a fundamental statistical relationship between two random variables (the extension of the idea of independence to multiple variables is not difficult, for the simplicity of exposition, only the case of two variables will be discussed at this stage). It is defined when the joint pdf is equal to the factorized marginal pdfs. For instance, two vari- ables X1 and X2 are independent with each other when fx, x(x, X2) = fx,(X)f,(X2) (2.32) 32 where fx2 x(X, X2) is the joint pdf and fx,(x1) and fx2(x2) are marginal pdfs. As men- tioned in the previous section, the mutual information can be regarded as a distance between the joint pdf and the factorized marginal pdf in the pdf functional space. When the distance is zero, the two variables are independent. When the distance is maximized, two variables will be far away from the independent state and roughly speaking the depen- dence between them will be maximized. The Euclidean distance is a simple and straightforward distance measure for two pdfs. The squared distance between the joint pdf and the factorized marginal pdf will be called Euclidean distance quadratic mutual information (ED-QMI). It is defined as DED(fg) = (f(x)-g(x))2dx (2.33) IED(XI,X2) = DED( fX,2(X1,X2),X fx,(xl)fx,(X2) ) Obviously, the ED-QMI between X1 and X2: IED(XI, X2) is non-negative and is zero if and only if fx, (X2 X2) = f, (Xl)fX,(X2); i.e., X, and X2 are independent with each other. So, it is appropriate to measure the independence between X1 and X2. Although there is no strict theoretical justification yet that the ED-QMI is an appropriate measure for the dependence between two variables, the experimental results described later in this dissertation and the comparison between ED-QMI and Shannon's Mutual Information in some special cases described later in this chapter will all support that ED-QMI is appropri- ate to measure the degree of dependence between two variables, especially the maximiza- tion of this quantity will give reasonable results. For multiple variables, the extension of ED-QMI is straightforward: 33 k IED(XI, ...,Xk) = DED fX(x, ...,xk) f(xi) (2.34) i= where fx(xl, ..., xk) is the joint pdf, fx (xi) (i=l, ..., k) are marginal pdfs. Another possible pdf distance measure is based on Cauchy-Schwartz inequality [Har34]: (f(x)2dx)(g(xdx) (fx)g(x)dx)2 where equality holds if and only if f(x) = C g(x) for a constant scalar Q. Iff(x) and g(x) are pdfs; i.e., Jf(x)dx = 1 and Ig(x)dx = 1, then f(x) = C g(x) implies I = 1. So, for two pdfs f(x) and g(x), we have equality holding if and only if f(x) = g(x) Thus, we may define Cauchy- Schwartz distance for two pdfs as (Jf(x)2dx)(Ig(x)2dx) Dcs(f g) = log (f() 2 (2.35) (ff(x)g(x)dx) Obviously, Dcs(f, g) 2 0, with equality if and only if f(x) = g(x) almost everywhere and the integrals involved are all quadratic form of pdfs. Based on Dcs(f, g), we have Cauchy-Schwartz quadratic mutual information (CS-QMI) between two variables X, and X2 as Ics(X,,X2) = Dcs( fxX,(X, X2), fX(xlI)fx(x2) ) (2.36) where the notations are the same as above. Directly from the above, we have Ics(X1, X2) 2 0 with the equality if and only if X1 and X2 are independent with each other. So, Ics is an appropriate measure for independence. However, the experimental results shows that it might be not appropriate as a dependence measure. For multiple vari- ables, the extension is also straightforward: 34 k Ics(X, ...,Xk) = Dcs fx(XI, ..., k) f, x(x) (2.37) i= 1 For the discrete variables X1 and X2 with probability distribution Px i = 1, .., n and Px = 1, ..., m respectively, and the joint probability distribution {P' i = 1, ..., n ;j = 1, ..., m the ED-QMI and CS-QMI are n m 2 ED(XI, X2) = (P-PP'x2) i=lj=l E (P E (PPE, )2 (2.38) i' = lj= 1 i= j= 1 n m i= lj= 1 X2 p 2-2 *412 1 22 I I ,X x 1 2 p1 2 pX I PxI Figure 2-2. A Simple Example 35 .1 7 x Figure 2-2, Xi will be either 1 or 2 and its probability distribution is Pg = (P, P.) ; i.e., P(X,= 1) = P:I and P(Xi= 2) = P Similarly X2 can also be either 1 or 2 with 36 the probability distribution P = (P, ) (P(2= 1) = andP( 2) = ). The joint probability distribution is Px = (Px, Px1, P' p2 ; i.e., P((XI,X2)= (1, 1)) = Px P((X1,X2)= (1,2)) = P 2, P((X, X2)= (2, 1)) = P2 and P((XI,X2)= (2,2)) = 22. Obviously, = + = + , P^ = P +pX2 and P = 2 +P 22 PPx +Px First, let's look at the case with the distribution of X1 fixed Px = (0.6, 0.4). Then the free parameters left are P1 from 0 to 0.6 and PX from 0 to 0.4. When P and P1 change in the ranges, the values of I,, IED and Ics can be calculated. Figure 2-3 shows how these values change with P and PX where the left graphs are surfaces for I,, IED and Ics versus Px and Pl ; the right graphs are the contours of the corresponding left surfaces, (contour means that each line has the same value). These graphs show that although the surfaces or contours of the three measures are different, they reach the mini- 11 21 mum value 0 in the same line Px = 1.5Px where the joint probabilities equal the corre- sponding factorized marginal probabilities. And the maximum values, although different, 11 21 are also reached at the same points (Pt, P ) = (0.6 0) and (0 0.4) where the joint proba- bilities are 12 p22 r12 [ 22 P Px .. 0.4 and x 0.6 0 I P P21 0.6 0 p p21 0.4 respectively. These are just cases where X1 and X2 have a 1-to-i relation; i.e., X1 can determine X2 without any uncertainty, and vice versa. If the marginal probability of X2 is further fixed, e.g. Px2 = (0.3, 0.7), then the free parameter can be P from 0 to 0.3. In this case, both marginal probabilities of X1 and X2 are fixed and the factorized marginal probability distribution is thus fixed and only the 37 joint probability distribution will change. This case can also be regarded as the previous 11 21 case with a further constraint specified by Pj + Pj = 0.3. Figure 2-4 shows how the three measures change with Py in this case, from which we can see that the minima are reached at the same point P1 = 0.18, and the maxima are also reached at the same point Px = 0; i.e., 12 p22 P1 0 0.3 1- Figure 2-4. ,I, IED and Ics vs. PQ From this simple example, we can see that although the three measures are different, they have the same minimum points and also have the same maximum points in this par- ticular case. It is known that both Shannon's mutual information Is and ED-QMI IED are convex functions of pdfs [Kap92]. From the above graphs, we can confirm this fact and also come up to the conclusion that CS-QMI Ics is not a convex function of pdfs. On the 38 whole, we can say that the similarity between Shannon's mutual information I, and ED- QMI IED is confirmed by their convexity with the guaranteed same minimum points. fxx,(x, x2) IED(Euclidean Distance) SI,(K-L Divergence) V, 0 fx,(x1)fX,(X2) VM Ics -log((cos9)2) Vc = cos9frM Figure 2-5. Illustration of Geometrical Interpretation to Mutual Information 2.1.6 Geometrical Interpretation of Mutual Information From the previous section, we can see that both ED-QMI and CS-QMI have the fol- lowing three terms in their formulas: VJ = ffxx2(Xl X2)2d, dX2 VM = (fX(xl)f~2(X2)) 2x 2 (2.39) Vc = fx,2(x, X2fx(x)fx2(x2)dx1dx 2 where Vj is obviously the "entropy 2-norm" (the squared 2-norm) of the joint pdf, VM is the "entropy 2-norm" of the factorized marginal pdf and Vc is the cross-correlation or inner product between the joint pdf and the factorized marginal pdf. With these three terms, QMI can be expressed as 39 I IED = V- 2 V + V Ics = log V,- 2log Vc + log VM Figure 2-5 shows the illustration of the geometrical interpretation to all these quanti- ties. I,, as previously mentioned, is the K-L divergence between the joint pdf and the fac- torized marginal pdf, IED is the squared Euclidean distance between these two pdfs and Ics is related to the angle between these two pdfs. Note that VM can be factorized as two marginal information potentials V1 and V2: VM = (f (x )fx2,(X2))2dxldx2 = V V2 VI = Jfx,(xi)2dxl (2.41) V2 = fX2(x2)2dx2 2.1.7 Energy and Entropy for Gaussian Signal T k It is well known that for a Gaussian random variable X= (x1, ..., xk) Rk with pdf function fx(x) = k 1 /2 exp --(x p) T (x- P)), where i is the mean and Y (2n)k12 1|- H is covariance matrix, the Shannon's information entropy is H,(X) = logl| + log2n + (2.42) (see Appendix B for the derivation) Similarly, we can get the Renyi's information entropy for X: HRa(X) = logl|I + log2x + ) (2.43) (The derivation is given in Appendix C) For Havrda-Charvat's entropy, we have 40 S(1-a) (l-a) Hha(X) =-- 2 k k 1 (2.44) (The derivation is given in Appendix D). Obviously, lim Ha(X) = Hs(X) and lim Hha(X) = Hs(X) in this case a-1 a-l which are consistent with and (2.18) respectively. Since k and a in (2.42), (2.43) and (2.44) have nothing to do with the data, the data dependent quantity is loglI or |11. From the information-theoretic point of view, a mea- sure of information using energy quantities (the elements in covariance matrix 1) is J, = logIlI in (2.4) and (2.8), or just II. 2 If the diagonal elements of I are ao (i = 1, ..., k); i.e., the variance of the marginal 2 signal xi is of, then the Shannon's and Renyi's marginal entropies are Hs(xi) = log + log2 + H(x) = log + log2 + thus we have 2 logai log2+ 2' g 2 2. 1)' k k Hs(xi) = log ( + tlog2x + i=1 i= (2.45) k I k k HRa(i) = log oC' + log27 + 2 - i=l1 i= k So, J3 = log oa in (2.8) is related to the sum of the marginal Shannon's or i= 1 Renyi's entropies. or Shannon's entropy, we generally have (2.23) and its generalization (2.46) [Dec96, Hay98]. k Hs(xi) > Hs(X) (2.46) i= 1 41 Applying (2.42) and (2.45) to (2.46), we get Hadamard's inequality (2.5). So, Had- amard's inequality can be regarded as a special case of (2.46) when the variable X is Gaussian distributed. The most popular energy quantity used in practice is J2 in (2.8): N k J2 = tr(l) = N (x(n)-i)2 (2.47) n=li= where pt = (P I, ., Pk)T and pi is the mean of the marginal signal xi. The geometrical meaning of J2 is the average of the squared Euclidean distance from the data points to the "mean point." If the signal is an error signal, this is so called MSE (mean squared error) criterion, and it is wildly applied in learning or adaptive system, etc.. This criterion is not directly related to the information measure of the signal. Only when the signal is white Gaussian with zero-mean, J2 and J, becomes equivalent as (2.9) shows. So, from the information-theoretic point of view, when a MSE criterion is used, it implicitly assumes that the error signal is white Gaussian with zero-mean. As mentioned in 2.1.1, J1 is basically the determinant of 1, which is the product of all the eigenvalues of Y and can be regarded as a geometrical average of all the eigenvalues, while J2 is the trace of 1, which is the sum of all the eigenvalues and can be regarded as an arithmetic average of all the eigenvalues. Note that I|I = 0 can not guarantee the zero energy of all the marginal signals but the maximization of I1I can make the joint entropy of X maximum; while the maximization of tr[X] can not guarantee the maximum of the joint entropy of X but the minimization of tr[I] can make all the marginal signals zero. This is possibility the reason why the minimization of MSE is so popular in practice. 42 2.1.8 Cross-Correlation and Mutual Information for Gaussian Signal Suppose X = (x x2)T is a zero-mean (without lose of generality because both cross- correlation and mutual information have not ing to do with the mean) Gaussian random 2 variable with covariance matrix I = The joint pdf will be r1 2 f(x,,x2) = 1/2e (2.48) (27n)111 the two marginal pdfs are 2 2 Xl X2 x2 x2 1 20 1 202 fl(xi) = e f2(x2)= e2 (2.49) The Shannon's mutual information is 1 1 Is(x, X2) = Hs(l) + Hs(x2)-Hs(x, X2) = log 2 1 -p (2.50) 2 22 p = r /(212) where p is the correlation coefficient between xl and x2. By using (A.1) in Appendix A and letting p = 0 02 then we have V, = JfAfxx)2d = 12 4itrJf1-p2 VM = ff(Xl)2f2(x2)2dxdx2 = 1 I(2.51) 4n13 Vc = Jf(xl x2)fl(,l)f2(x2)dxldX2 = 2 47cJ4I tn wl The ED-QMI and CS-QMI then will be 43 IED(XI, X2) \i/l- p J4-0 p 2 (2.52) 4-p Ics(Xi, X2) = log 41T7 Is Ics IED(P= 0.5) Figure 2-6. Mutual Informations vs. correlation coefficient for Gaussian distribution Similar to I,, Ics is the function of only one parameter p, and both are the monotonic increasing function of p with the same minimum value 0, the same minimum point p = 0 and the same maximum point p = 1 in spite of the difference of the maximum values. IED is the function of two parameters p and 3. However, 0 only serves as a sca- lar of the function and can not change the shape of the function. Once P is fixed, IED will be the monotonic increasing function of p with the same minimum value 0, the same min- imum point p = 0 and the same maximum point p = 1 as I, and Ics, in spite of the dif- ference of the maximum values. Figure 2-6 shows these curves, which tells us the two 44 proposed ED-QMI and CS-QMI are consistent with Shannon's MI in the Gaussian case regarding the minimum and maximum points. 2.2 Empirical Energy, Entropy and MI: Problem and Literature Review In the previous section 2.1, the concept of various energy, entropy and mutual infor- mation quantities have been introduced. In practice, we are facing the problem of estimat- ing these quantities from given sample data. In this section, empirical energy, entropy and MI problems will be discussed, and the related literature review will be given. 2.2.1 Empirical Energy The problem of empirical energy is relatively simple and straightforward. For a given data set {a(i)= (a (i), ..., an(i)) i= 1, ...,N} ofa n-D signal X = (x ...,xn) it is not difficult to estimate the means, the variances of the marginal signals and the covari- ance between the marginal signals. We have sample mean and sample variance matrix as follows [Dud73, Dud98]: SN mi = a(j) i = 1, ..., n j=1 (2.53) N E = (a(j)-mi) (a(j)--mi)T j= I These are the results of maximum likelihood estimation [Dud73, Dud98]. 2.2.2 Empirical Entropy and Mutual Information: The Problem As shown in the previous section 2.1, the entropy and mutual information all rely on the probability density function (pdf) of the variables, thus they use all the statistics of the 45 variables, but are more complicated and difficult to implement than the energy. To esti- mate the entropy or mutual information, the first thing we need to do is to estimate the pdf of the variables, then the entropy and mutual information can be calculated according to the formula described in the previous section 2.1. For continuous variables, there are inev- itable integrals in all the entropy and mutual information definitions described in 2.1, which is the major difficulty after pdf estimation. Thus, the pdf estimation and the mea- sures for entropy and mutual information should be appropriately chosen so that the corre- sponding integrals can be simplified. In the rest of this chapter, we will see the importance of the choice in practice. Different empirical entropies or mutual informations are actually the results of different choices. If a priori knowledge about the data distribution is known or a model is assumed, then parametric methods can be used to estimate the pdf model parameters, and then the entro- pies and mutual informations can be estimated based on the model and the estimated parameters. However, in many real world problems the only available information about the domain is contained in the data collected and there is no a priori knowledge about the data. It is therefore practically significant to estimate the entropy of a variable or the mutual information between variables based merely on the given data samples, without further assumption or any a priori model assumed. Thus, we are actually seeking nonpara- metric ways for the estimation of entropies and mutual informations. Formally, the problems can be described as follows: *The Nonparametric Entropy Estimation: given a data set {a(i) i= 1, ..., N} for a signal X (X can be a scalar or n-D signal), how to estimate the entropy of X without any other informations or assumptions. 46 * The Nonparametric Mutual Information Estimation: given a data set {a(i)= (al(i),a2(i))T i= 1, ...,N} for a signal X = (xl, x2) (x1 and x2 can be scalar or n-D signals, and their dimensions can be different), how to estimate the mutual information between x1 and x2 without any assumption. This scheme can be easily extended to the mutual information of multiple signals. For nonparametric methods, there are still two major difficulties: the non-parametric pdf estimation and the calculation of the integrals involved in the entropy and mutual information measures. In the following, the literature review on these two aspects will be given. 2.2.3 Nonparametric Density Estimation The literature of nonparametric density estimation is fairly extensive. A complete dis- cussion on this topic in such small section is virtually impossible. Here, only a brief review on the relevant methods such as histogram, Parzen window method, orthogonal series estimates, mixture model, etc. will be given. * Histogram [Sil86, Weg72]: Histogram is the oldest and most widely used density estimator. For a 1-D variable x, given an origin xo and a bin width h, the bins for the histogram can be defined as the intervals [ xo + mh, xo + (m + 1)h ). The histogram is then defined by 1 f(x) = --(number of samples in the same bin as x) (2.54) The histogram can be generalized by allowing the bin widths to vary. Formally, sup- pose the real line has been dissected into bins, then the histogram can be 47 f(x) =1 (number of samples in the same bin as x) (2.55) N (width of the bin containing x) For a multi-dimensional variable, histogram presets several difficulties. First, contour diagrams to represent data can not be easily drawn. Second, the problem of choosing the origin and the bins (or cells) are exacerbated. Third, if rectangular type of bins are used for n-D variable and the number of bin for each marginal variable is m, then the number of bins is in the order of O(mn). Forth, since the histogram discretizes each marginal variable, it is difficult to make further mathematical analysis. Orthogonal Series Estimates [Hay98, Com94, Yan97, Weg72, Sil86, Wil62, Kol94]: This category includes Fourier Expansion, Edgeworth Expansion and Gram-Charlier Expansion etc.. We will just discuss Edgeworth and Gram-Charlier Expansions for 1- D variable. Without the loss of generality, we assume that the random variable x is zero-mean. 1 x2/2 The pdf of x can be expressed in terms of Gaussian function G(x) = --e as f(x) = G(x) 1 + ckHk(x) (2.56) where ck are coefficients which depend on the cumulants of x. e.g. cl = 0, c2 = 0, c3 = k3/6, c4 = k4/24, c5 = k5/120, c6 = (k6+ 10k)/720, c7 = (k7 + 35k4k3)/5040, cg = (k8 + 56ksk3 + 35k2)/40320, etc., (ki are ith order cumulants); Hk(x) are the Hermite polynomials which can be defined in terms of the kth derivative of the Gaussian function G(x) as G(k)x) = (-)kG(x)Hk(x), or exlicitly H = Hx = x H = etc. and there is a recursive explicitly, Ho(x) = 1, Hl(x) = x, H2(x) = x 1, etc., and there is a recursive 48 relation Hk + 1(x) = xHk(x) kHk_ (x). Furthermore, biorthogonal property exists between the Hermite polynomials and the derivatives of the Gaussian function: f H(x)Gm)(x)dx = (-1)mm!6km (k, m) = 0, 1,... (2.57) -00 where 8km is the Kronecker delta which is equal to 1 if k = m and 0 otherwise. (2.56) is the so called Gram-Charlier expansion. It is important to note that the natural order of the terms is not the best for the Gram-Charlier series. Rather, the grouping k = (0), (3), (4, 6), (5, 7, 9), ... is more appropriate. In practice, the expansion has to be truncated. For BSS or ICA application, the truncation of the series at k = (4, 6) is considered to be adequate. Thus, we have A(x) G(x) 1 + H3(x) + H4 ) + (2.58) 2 2 3 where cumulants k3 = m3, k4 = m4-3m2, k = m- 10m3- 15m2m4 + 30m (moments mi = E[x'] ). The Edgeworth expansion, on the other hand, can be defined as 2 ( k3 k4 10k3 k5 f(x) = G(x) 1 + H ) + H(x) + -_--H6(x) + -H5(x) (2.59) 35k3k4 280k3 k6 + 7! H(x) + 9! H9(x) + H6(x) + ... There is no essential difference between the Edgeworth expansion and the Gram- Charlier expansion. The key feature of the Edgeworth expansion is that its coefficients decrease uniformly, while the terms in the Gram-Charlier expansion do not tend uni- formly to zero from the viewpoint of numerical errors. This is why the terms in Gram- Charlier expansion should be grouped as mentioned above. 49 Both Edgeworth and Gram-Charlier expansions will be truncated in the real applica- tion, which make them a kind of approximation to pdfs. Furthermore, they usually can only be used for 1-D variable. For multi-dimensional variable, they become very com- plicated. Parzen Window Method [Par62, Dud73, Dud98, Chr81, Vap95, Dev85]: The Parzen Window Method is also called a kernel estimation method, or potential function method. Several nonparametric methods for density estimation appeared in the 60's. Among these methods the Parzen window method is the most popular. According to the method, one first has to determine the so-called kernel function. For simplicity and the later use in this dissertation, we consider a simple symmetric Gaus- sian kernel function: ( T x 2 1 xx G(x, a2) = tk/ kexp (2.60) (2ir) a 20a where a will control the kernel size and x can be a n-D variables. For a data set described in 2.2.2, the density function will be N f(x) = G(x-a(i), C2) (2.61) i=1 which means that each data point will be occupied by a kernel function and the whole density is the average of all kernel functions. The asymptotic theory for Parzen type nonparametric density estimation was developed in the 70s [Dev85]. It concludes that (i) Parzen's estimator is consistent (in the various metrics) for estimating a density from a very wide classes of densities; (ii) The asymptotic rate of convergence for 50 Parzen's estimator is optimal for "smooth" densities. We will see later in this Chapter how this density estimation method can be combined with quadratic entropy and qua- dratic mutual information to develop the ideas of the information potential and the cross information potential. However, selecting the Parzen window method is not just only for simplicity but also for its good asymptotic properties. In addition, this kernel function is actually consistent with the mass-energy spirit mentioned in Chapter 1. In fact, one data point should not only represent itself but also represent its neighbor- hood. The kernel function is nothing but more like a mass-density function in this sense. And from this point of view, it naturally introduce the idea of field and potential energy. We will see this in a clearer way later in this chapter. Mixture Model [McL88, McL96, Dem77, Rab93, Hua90]: The mixture model is a kind of "semi-parametric" method (or we may call it semi- nonparametric). The mixture model, especially the Gaussian mixture model has been extensively applied in various engineering areas such as the hidden markov model in speech recognition and many other areas. Although Gaussian mixture model assumes that the data samples come from several Gaussian sources, it can approximate quite diverse densities. Generally, the density for a n-D variable x is assumed as K f(x) = ckG(x pk, k) (2.62) k= 1 where K is the number of mixture sources, ck are mixture coefficients which are non- K negative and their summation equals 1 V ck = 1, p, and Ei are means and covari- k=l ance matrices for each Gaussian source where Gaussian function is notated by 51 1 (x- T)E-, (X l) G(x-I,) = n/21/e with the mean ut and covariance n/2 1/2 (22t) |1| matrix Y as the parameters. All the parameters ck, Pk and 1k can be estimated from data samples by the EM algorithm in the maximum likelihood sense. One may notice the similarity between the Gaussian mixture model and the Gaussian kernel estimation method. Actually, the Gaussian kernel estimation method is the extreme case of the Gaussian mixture model where all the means are data points themselves and all the mixture coefficients and all the covariance matrices are equal. In other words, each data point in the Gaussian kernel estimation method is treated as a Gaussian source with equal mixture coefficient and equal covariance. There are also other nonparametric method such as the k-nearest neighbor method [Dud73, Dud98, Si186], the naive estimator [Sil86], etc.. These estimated density func- tions are not the "natural density functions;" i.e., the integrations of these functions are not equal to 1. And their unsmoothness in data points also make them difficult to be applied to the entropy or mutual information estimation. 2.2.4 Empirical Entropy and Mutual Information: The Literature Review With the probability density function, we can then calculate the entropy or the mutual information, where the difficulty lies in the integrals involved. Both Shannon's entropy and Shannon's mutual information are the dominating measures used in the literature, where the logarithm usually brings big difficulties in their estimations. Some researchers tried to avoid the use of Shannon's measures in order to get some tractability. The sum- mary on various existing methods will be given and organized in the following manner, which will start with the simple method of histogram. 52 Histogram Based Method If the pdfofa variable is estimated by the histogram method, the variable has to be dis- cretized by histogram bins. Thus the integration in Shannon's entropy or mutual infor- mation becomes a summation and there is no difficulty at all for its calculation. However, this is true only for a low dimension variable. As pointed out in the previous section, for a high dimension variable, the computational complexity becomes too large for the method to be implementable. Furthermore, in spite of the simplicity it made in the calculation, the discretization makes it impossible to make further mathe- matical analysis and to apply this method to the problem of optimization of entropy or mutual information where differential continuous functions are needed for analysis. Nevertheless, such simple method is still very useful in the cases such as the feature selection [Bat94] where only the static comparison of the entropy or mutual informa- tion is needed. SThe Case of Full Rank Linear Transform From probability theory, we know that for a full rank linear transform Y = WX where X = (X1, ...,Xn) and Y = (yl, ...,y)T are all vectors in an n-dimensional real space, W is n-by-n full rank matrix, there is a relation between the density function of fx(x) X and the density function of Y: fy(y) = det [Pap91] where fy and f( are den- sity of Y and X respectively, and det( ) is the determinant operator. Accordingly, we have the relation between the entropy of Y and the entropy of X: H(Y) = E[-logfy(y)] = E[-logfx(x)+ logldet(W)I] = H(X) + logldet(W)l. So, the output entropy H(Y) can be expressed in terms of the input entropy H(X). Although H(X) may not be known, it may be fixed and the relation can be used for 53 the purpose of the manipulation of the output entropy H( Y). This is the basis for a series of methods in BSS and ICA areas. For instance, the mutual information among n n the output marginal variables I(yl,...,yn) = H(yi)-H(Y) = H(yi) - i=1 i= 1 logidet( W)I H(X) so that the minimization of the mutual information can be imple- mented by the manipulation on the marginal entropies and the determinant of the lin- ear transform. In spite of the simplicity, this method, however, is obviously coupled with the structure of the transform (full rank is required, etc.), and thus is less general. InfoMax Method T Let's look at a transformation Z = (z, .., z. ) = f(Yi), (yl, ..,yn)T = Y = WX, where f( ) is a monotonic increasing (or decreasing for the cases other than BSS and ICA) function, and the linear transform is the same as the previous. Again, from fy(y) probability theory [Pap91], we have fz(z) = ( where f and fy are density of Z IJ(z)\ and Y respectively, and J(z) is the Jacobian of the nonlinear transforms expressed as the function of z. Thus, there is the relation: H(Z) = H(Y)+E[loglJ(z)|] = H(X) + logldet(W)I + E[log|J(z)|], where E[loglJ(z)|] is approximated by the sam- ple mean method [Bel95]. The maximization of the output entropy can then be manip- ulated by the two terms logldet(W)I and E[loglJ(z)|]. In addition to the sample mean approximation, this method requires the match between the nonlinear function and the cdf of the sources signals when applied to BSS and ICA problems. SNonlinear Function By the Mixture Model The above method can be generalized by using the mixture method to model the pdf of sources [XuL97] and then the corresponding cdf; i.e., the nonlinear functions. 54 Although this method avoid the arbitrary assumption on the cdf of the sources, it still suffers from the problem such as the coupling with the structure of a learning machine. Numerical Method The integration involved in the calculation of the entropy or mutual information is usually complicated. A numerical method can be used to calculate the integration. However, this method can only be used for low dimensional variables. [Pha96] used the Parzen window method to estimate the marginal density and applied this method for the calculation of the marginal entropies needed in the calculation of the mutual information of the outputs of a linear transform described above. As pointed out by [Vio95], the integration in Shannon's entropy or mutual information will become extremely complicated when Parzen window is used for the density estimation. Apply- ing the numerical method makes the calculation possible but restricts itself to simple cases, and the method is also coupled with the structure of the learning machine. SEdgeworth and Gram-Charlier Expansion Based Method As described above, both expansions can be expressed in the form f(x) = G(x)( 1 + A(x)), where A(x) is a polynomial. By using the Taylor expansion, 2 we have log(l+A(x)) = A(x)- A = B(x) for relative small A(x). Then H(x) = -Jf(x)logf(x)dx = G(x)(l +A(x))(logG(x) + B(x))dx. Notice that G(x) is the Gaussian function and A(x) and B(x) are all polynomials, this integra- tion will have an analytical result. Thus a relation between the entropy and the coeffi- cients of the polynomials A(x) and B(x) (i.e. the sample cumulants of the variable) can be established. Unfortunately, this method can only be used for 1-D variable, and 55 thus it is usually used in the calculation of the mutual information described above for BSS and ICA problems [Yan97, Yan98, Hay98]. Parzen Window and Sample Mean Similar to [Pha96], [Vio95] also uses the Parzen Window Method for the pdf estima- tion. To avoid the complicated integration, [Vio95] used the sample mean to approxi- mate the integration rather than numerical method in Pham [Pha96]. This is clear when we express the entropy as H(x) = E[-logf(x)]. This method can used not only for 1- D variables but also for n-D variables. Although this method is flexible, its sample mean approximation restrict its precision. SAn Indirect Method Based on Parzen Window Estimation Fisher [Fis97] uses an indirect way for entropy optimization. If Y is the output of an mapping and is bounded in a rectangular type region D = {y (ai mum entropy. So, for the purpose of entropy maximization, one can set up a MSE cri- terion as k T1 f/ / ^ f ^2, (b t -at);y eD J= \(u(y)-fy(y))2dy u(y) i= D (2.63) 0 ;otherwise where u(y) is the uniform pdf in the region D, fr(y) is the estimated pdf of the output y by Parzen Window method described in the previous section. The gradient method can be used for the minimization of J. As an example, the partial derivative of J with respect to wj are 56 k N 8J = i i OJ 8 ,n" Sy y(n) ) (2.64) aiU p=ln=l P P where y(n) are samples of the output. The partial derivative of the mean squared dif- ference with respect to output samples, can be broken down as 1J 1 1 yn) u(y(n))- N-i KG(y(i)-y(n)) Ku(z) = u(z) Gg(z) = fu(y)Gg(z-y)dy (2.65) KG(z) = G(z,2) Gg(z) = fG(y, (2)Gg(z-y)dy Gg(z) = G(z, 2) where Gg(y) is the gradient of the Gaussian Kernel, Ku(z) is the convolution between the uniform pdf u(z) and the gradient of the Gaussian Kernel Gg(z), KG(z) is the convolution between the Gaussian Kernel G(z, 02) and its gradient Gg(z). As shown in Fisher [Fis97], the convolution KG(z) turns out to be ( = 1 )G(z, 2) 1/2Z (2.66) KG(Z) = (3k/4) + 1 7k/4 (k/2) + 2(2.66) If domain D is symmetric; i.e., bi = -ai = a/2, i = 1, ..., k, then the convolution Ku(z) is er F -er z+ 2 a i1 J Ku(z) = ... (2.67) a a an er + -z rlI -er Gk zk 2 -Gk Zk 2) i k 2 F y,F2Cr ( 57 T 1 2 x2 where z = (zi, ..., k) G(z, a2) is the same as (2.60), erf(x) = exp - -x .2-n is the error function. This method is indirect and still depends on the topology of the network. But it also shows the flexibility by using Parzen Window method. It has been used in practice with good results for the MACE [Fis97]. Summarizing the above, we see that there is no direct efficient nonparametric method to estimate the entropy or mutual information for a given discrete data set, which is decou- pled from the structure of the learning machine and can be applied to n-D variables. In the next sections, we will show how the quadratic entropy and the quadratic mutual informa- tion rather than Shannon's entropy and mutual information can be combined with the Gaussian kernel estimation of pdfs to develop the ideas of "information potential" and "cross information potential," resulting in a effective and general method for the calcula- tion of the empirical entropy and mutual information. 2.3 Quadratic Entropy and Information Potential 2.3.1 The Development of Information Potential As mentioned in the previous section, the integration of Shannon's entropy with the Gaussian kernel estimation for pdf will become "inordinately difficult" [Vio95]. How- ever, if we choose the quadratic entropy and notice the fact that the integration of the prod- uct of two Gaussian function can still be evaluated by another Gaussian function as (A.1) shows, then we can come up to a simple method. For a data set described in 2.2.2, we can use Gaussian kernel method in (2.61) to estimate pdf of X and then to calculate the "entropy 2-norm" as 58 V= fx(x)2dx +00 + V G(x- a(i), 2) N Z G(x a(j), U2 dx (2.68) = -N N G(x-a(i), 2)G(x-a(j), o2)dx iV i = 1--o N N G(a(i) -a(j), 2a2) N i= lj= 1 So, Renyi's quadratic entropy and Havrda-Charvat's quadratic entropy lead to a much simpler entropy estimator for a set of discrete data points {a(i) I i= 1, ...,N} HR2(XI{a}) = -logV Hh2(X {a}) = 1-V (2.69) N N V= G(a(i)-a(j), 22) i= Ij= 1 The combination of the quadratic entropies with the Parzen window method leads to entropy estimator that computes the interactions among pairs of samples. Notice that there is no approximation in these evaluations except pdf estimation. We wrote (2.69) in this way because there is a very interesting physical interpretation for this estimator of entropy. Let us assume that we place physical particles in the loca- tions prescribed by a(i) and a(j). Actually, the Parzen window method is just in the spirit of mass-energy. The integration of the product of two Gaussian kernels representing some kind of mass density can be regarded as the interaction between particles a(i) and a(j), which results in the potential energy G(a(i) a(j), 202). Notice that it is always positive 59 and is inversely proportional to the distance square between the particles. We can consider that a potential field exists for each particle in the space of with a field strength defined by the Gaussian kernel; i.e., an exponential decay with the distance square. In the real world, physical particles interact with the potential energy inverse to the distance between them. but here the potential energy abides by a different law which in fact is determined by the kernel in pdf estimation. V in (2.69) is the overall potential energy including each pair of data particles. As pointed out previously, these potential energies are related to "informa- tion" and thus are called "information potentials" (IP). Accordingly, data samples will be called "information particles" (IPT). Now, the entropy is expressed in terms of the poten- tial energy and the entropy maximization now becomes equivalent to the minimization of the information potential. This is again a surprising similarity to the statistical mechanics where the entropy maximization principle has a corollary of the energy minimization prin- ciple. It is a pleasant surprise to verify that the nonparametric estimation of entropy here ends up with a principle that resembles the one of the physical particle world which was one of the origin of the concept of entropy. We can also see from (2.68) and (2.69) that the Parzen window method implemented with the Gaussian kernel and coupled with Renyi's entropy or Havrda-Charvat's entropy of higher order (a>2) will compute each interaction among a-tuples of samples, providing even more information about the detailed structure and distribution of the data set. 2.3.2 Information Force (IF) Just like in mechanics, the derivative of the potential energy is a force, in this case an information driven force that moves the data samples in the space of the interactions to change the distribution of the data and thus the entropy of the data. Therefore, 60 (G(a(i)-a(j) 2a2) = G(a(i)-a(j), 2 2)(a(j)-a(i))/(2T2) (2.70) aa(i) can be regarded as the force that a particle in the position of sample a(j) impinges upon a(i) and will be called an information force. If all the data samples are free to move in a certain region of the space, then the information forces between each pair of samples will drive all the samples to a state with minimum information potential. If we add all the con- tributions of the information forces from the ensemble of samples on a(i) we have the overall effect of the information potential on sample a(i); i.e., N V = 12 G(a(i)-a(j), 2a2)(a(i)-a(j)) (2.71) aa(i) Nc j = I j=1 The Information force is the realization of the interaction among "information parti- cles." The entropy will change towards the direction (for each information particle) of the information force. Accordingly, Entropy maximization or minimization could be imple- mented in a simple and effective way. 2.3.3 The Calculation of Information Potential and Force The above has given the concept of the information potential and the information force. Here, the procedure for the calculation of the information potential and the informa- tion force will be given according to the formula above. The procedure itself and the plot here may even help to further understand the idea of the information potential and the information force. To calculate the information potential and the information force, two matrices can be defined as (2.72) and their structures are illustrated in Figure 2-7. 61 (2.72) D = { d(ij)}, d(ij) = a(i)- a(j) v = {v(ij)}, v(ij) = G(d(ij), 202) a(l) a(2) ... a(j) .. a(N) a(1) a(2)----- a(i)-a(j) a(i) a(N - Figure 2-7. The structure of Matrix D and V Notice that each element of D is a vector in Rn space while each element of v is a scalar. It is easy to show from the above that SN N V = (ij) i= lj = 1 N (2.73) f(i) =-7 2 v(ij)d(ij) i= 1, ...,N Yi = IZ where V is the overall information potential, f(i) is the force that a(i) receives. We can also define the information potential for each particle a(i) as 1 v1 i v(i) = v(ij). Obviously, V = 1 v(i) i j = I i == I From this procedure, we can clearly see that the information potential relies on the dif- ference between each pair of data points, and therefore makes full use of the information of their relative position; i.e., the data distribution. 62 2.4 Quadratic Mutual Information and Cross Information Potential 2.4.1 OMI and Cross Information Potential (CIP) For the given data set {a(i)= (al(i),a2(i))Tli= 1,...,N} of a variable X = (xI, x2) described in 2.2.2, the joint and marginal pdfs can be estimated by the Gaussian kernel method as f/IX2 (XI2) = [ G(xl -al(i), a )G(x2-a2) 2) SN f,(x1) = ZG(x -al(i), 02) (2.74) i=1 N2 fX2(x2) = T G(x2 -a2(i), a 2) i= 1 Following the same procedure as the development of the information potential, we can obtain the three terms in ED-QMI and CS-QMI based only on the given data set: 1 NN N V = GZ G(a(i)- a(j), 22) 1i= j= = N G(a(i)-al(j), 2 2)G(a2(i)-a2(), 22) i= j= 1 VM = VI V2 (2.75) IN N 2 Vk = 1 G(ak(i)-ak(j),2 2), k = 1,2 N i= lj= 1 Vc = G(al(i)-al(j), 2a2) G(a2(i)-a(j), 22) i=l =1 == If we define similar matrices to (2.72), then we have 63 D = {d(ij)}, d(ij) = a(i)-a(j) Dk = {dk(ij)}, dk(ij) = ak(i)-ak(j), k = 1,2 v= {v(ij), v(ij) = G(d(ij), 22) vk = k(j)}, vk(ij) = G(dk(ij), 2c2), k = 1,2 (2.76) 1N 1N v(i) = .v(ij), k(i) = vk(ij), k = 1,2 J=1 J=1 where v(ij) is the information potential in the joint space, thus is called the joint potential; vk(ij) is the information potential in the marginal space, thus is called the marginal poten- tial; v(i) is the joint information potential energy for IPT a(i); vk(i) is the marginal information potential energy for the marginal IPT ak(i) in the marginal space indexed by k. Based on these quantities, the above three terms can be expressed as 1 N N 1 N N V = v(') = -iZ ZVI (i()v ) i= j= Ij ilj= 1 VM= V V2 1 N N (2.77) Vk =- Z vk(ij), k = 1,2 N i= lj= I Vc -= l (i)v2(i) i= 1 So, ED-QMI and CS-QMI can be expressed as 64 1 N N 2N IED(XI X2) = VED = vl(i )v(ij) 2 vl(i)v2(i)+ VI V2 Ni= lj=1 1 N N Z Z V(i j)v2(ij) (VI V2) (2.78) 1 N ICS(XiX2) = VCS = log N 2 Vl (i)v2(i) i =1 From the above, we can see that both QMIs can be expressed as the cross-correlations between the marginal information potentials at different levels: v (ij)v2(ij), v1 i)v2(i) and V1 V2. Thus, the above measure VED is called the Euclidean distance cross informa- tion potential (ED-CIP), and the measure Vcs is the called Cauchy-Schwartz cross infor- mation potential (CS-CIP). The quadratic mutual information and the corresponding cross information potential T can be easily extended to the case with multiple variables, e.g. X = (xl, ...xK) In this case, we have similar matrices D and v and all similar IPs and marginal IPs. Then we have the ED-QMI and CS-QMI and their corresponding ED-CIP and CS-CIP as follows. 1 N N K 2N K K IED(X,... XK) = VED = v -k() + I k ij k= I = Ik=l k=l IN N K K -n1 vk(j) Vk (2.79) i= Ij= Ik= I k=1 ICS(Xi, ...,XK) = Vcs = log 2 IN K 1 H F k(i)12 i =lk=l ^Eni' 65 2.4.2 Cross Information Forces (CIF) The cross information potential is more complex than the information potential. Three different terms (or potentials) contribute to the cross information potential. So, the force that one data point a(i) receives comes from these three sources. A force in the joint space can decomposed into marginal components. The marginal force in each marginal space should be considered separately to simplify the analysis. The case of ED-CIP and CS-CIP are different. They should also be considered separately. Only the cross informa- tion potential between two variables will be dealt with here. The case for multiple vari- ables can be readily obtained in a similar way. OVED First, let's look at the CIF of ED-CIP (- (k= 1, 2). By the similar derivation pro- cedure to that of the Information Force in IP field, we can obtain the following Ck = {Ck(ij)} Ck(ij)= vk(i)-vk(i)-k(') + Vk k = 1,2 8VED -1 N fk( a(i) N 2 c k(ij)dk(ij) (2.80) S= 1,...,N, k= 1,2 l#k where all dk(ij), k(ij), vk(i) Vk are defined as the previous ones, Ck are cross matrices which serve as force modifiers. For the CIF of CS-CIP, similarly, we have SVcs 1 avJ 2 Vc 1 Vk fk() Oaak(i) Vj8ak(i) VOak( Vkak() N N N Svl(i)v2(ij)(j) dv(ij)() (v,(i) + vl('))vk(ij)dk(ij) (2.81) -j=1 j=1 j=1 2 SN N N N N SZ v(i(j)v(i) vk(iJ) N l(i)v2(i) i= lj= 1 i= lj= 1 i= 1 66 X2 (al(i),a2(i))T (al(i), a2())T a2() a2(i) x1 al(i) -- real IPT -- virtual WI marginal IPT Figure 2-8. Illustration of "real IPT" and "virtual IPT" 2.4.3 An Explanation to OMI Another way to look at the CIP comes from the expression of the factorized marginal pdfs. From the above, we have 1 N N x(x1)f 2) = G(x-a (i), 2)G(x2-a 2(), 2) (2.82) Ni= 1j= 1 This suggests that in the joint space, there are N2 "virtual IPTs" T {(al(i), a2(j)) i,j= 1, ...,N} whose pdf estimated by the Parzen Window method will be exactly the factorized marginal pdfs of the "real IPTs." The relation between all types of IPTs is illustrated in Figure 2-8. From the above description, we can see that the ED-CIP is the square of the Euclidean distance between real IP field (formed by real IPTs) and the virtual IP field (formed by vir- tual IPTs), and the CS-CIP is related to the angle between the real IP field and the virtual IP field as Figure 2-5 shows. When real IPTs are organized such that each virtual IPT has at least one real IPT in the same position, the CIP is zero and two marginal variables x1 67 and X2 are statistically independent; when real IPTs are distributed along a diagonal line, the difference between the distribution of real IPTs and virtual IPTs is maximized. Two extreme cases are illustrated in Figure 2-9 and Figure 2-10. It should be noticed that both x1 and x2 are not necessarily scalars. Actually, they can be multidimensional variables, and their dimensions can be even different. CIPs are general measures for the statistical relation between two variables (based merely on given data). x2 (a1(i), a2(i)) /(a(i), a2(M)T a2(i X1 al(i) () real IPT virtual IPT marginal IPT Figure 2-9. Illustration of Independent IPTs X2 (a (i),a2(i))T /(a1(i), a2(W))T a2(') - a2(i) - x1 al(i) ) -- real IT virtual IPT 2- marginal IPT Figure 2-10. Illustration of Highly Correlated Variables CHAPTER 3 LEARNING FROM EXAMPLES A learning machine is usually a network. Neural networks are of particular interest in this dissertation. Actually, almost all adaptive systems can be regarded as network models, no matter if they are linear or nonlinear, feedforward or recurrent. In this sense, the learn- ing machines studied here are neural networks. So, learning, in this circumstance, is a pro- cess by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded [Men70]. The environ- mental stimulation, as pointed out in Chapter 1, is usually in the form of "examples," and thus learning is about how to obtain information from "examples." "Learning from exam- ples" is the topic of this chapter, which will include the review and discussion on learning systems, learning mechanisms, the information-theoretic viewpoint about learning, "learn- ing from examples" by the information potential, and finally a discussion on generaliza- tion. 3.1 Learning System According to the abstract model described in Chapter 1, a learning system is a map- ping network. The flexibility of the mapping highly depends on the structure of the sys- tem. The structure of several typical network systems will be reviewed in this section. Network models can basically be divided into two categories: static models and dynamic models. The static model can also be called a memory-less model. In a network, 68 69 memory about the signal past is obtained by using delayed connections (the connections through delay units) (In continuous time case delay connections become feedback connec- tions. In this dissertation, only discrete time signals and systems are studied). Generally speaking, if there are delay units in a network, then the network will have memory. For instance, the transversal filter [Hay96, Wid85, Hon84], the general IIR filter [Hay96, Wid85, Hon84], the time delay neural network (TDNN) [Lan88, Wai89], the gamma neu- ral network [deV92, Pri93], the general recurrent neural networks [Hay98, Hay94], etc. are all dynamic network systems with memory or delay connections. If a network has delay connections, it has to be described by difference equations (in the continuous time case, differential equations), while a static network can be expressed by algebraic equa- tions (linear or nonlinear). There is also another taxonomy for the structure of learning or adaptive systems. For instance, linear models and nonlinear models belongs to another category. The following will start with the static linear model. 3.1.1 Static Models E. Linear Model Possibly, the simplest mapping network structure is the linear model. Mathematically, it is a linear transformation. As shown in Figure 3-1, the input and output relation of the network is defined by (3.1). T T k y=w x, y = (yl, ..,yk) T Rk x Rm, w= (wl,...,wk)ER mxk, w E Rm 70 where x is the input signal and y is the output signal, w is the linear transformation matrix where each column wi (i = 1, ...k) is a vector. Each output or group of outputs is a subspace of the input signal space. Eigenanalysis (principal component analysis) [Oja82, Dia96, Kun94, Dud73, Dud98] and generalized eigenanalysis [XuD98, Cha97, Dud73, Dud98] are seeking signal sub- space with maximum signal-to-noise ratio (SNR) or signal-to-signal ratio. For pattern classifica- tion, subspace methods such as Fisher Discriminant Analysis are also very useful tools [Oja82, Dud73, Dud98]. Linear models can also be used for inverse problems such as BSS and ICA [Com94, Cao96, Car98b, Bel95, Dec96, Car97, Yan97]. The linear model is simple, and it is very effective for a wide range of problems. The understanding of the learning behavior of a linear model may also help the understanding of nonlinear systems. y1 Y2 Yk y* y 0 x Figure 3-1. Linear Model F. Multilayer Perceptron (MLP) The multilayer perceptron is the extension of the perceptron model [Ros58, Ros62, Min69]. The perceptron is similar to the linear model in Figure 3-1 but with nonlinear functions in each output node, e.g. a hard limit function f(x) = -, x> The per- -1 x < 0 71 ceptron initiated the mathematical analysis of learning and it is the first machine which learns directly from examples [Vap95]. Although the perceptron demonstrated an amazing learning ability, its performance is still limited by its single layer structure [Min69]. The MLP extends the perceptron by putting more layer in the network structure as shown in Figure 3-2. For the ease of mathematical analysis, the nonlinear function in each node is usually a continuous differentiable function, e.g. the sigmoid function f(x) = 1/(1 + eX). (3.2) gives a typical input-output relation of the network in Figure 3-2: zi =f(wx + bi) i = 1, ..., I (3.2) T T 2 yj = f(vjz + aj) z = (z, ...,zl) j = 1,...,k where bi and a. are the biases for the node zi and yj respectively, v e R and wi e Rm are the linear projections for node yj and zi respectively. The layer of nodes z is called hidden layer which is neither input nor output. MLPs may have more than one hidden lay- ers. The nonlinear function f( ) may be different for different nodes. Each node in an MLP is a simple processing element which is abstracted functionally from a real neuron cell, called the McCullock-Pitts model [Hay98, Ru86a]. Collective behavior emerges when these simple elements are connected with each other to form a network whose over- all function can be very complex [Ru86a]. One of the most appealing properties of the MLP is its universal approximation ability. It has been shown that as long as there are enough hidden nodes, an MLP can approximate any functional mapping [Hec87, Gal88, Hay94, Hay98]. Since a learning system is noth- ing but a mapping from an abstract point of view, the universal approximation property of 72 the MLP is a very desirable feature for a learning system. This is one reason why the MLP is so popular. The MLP is a kind of "global" model whose basic building block is a hyper- plane which is the projection represented by the sum of the products at each node. The nonlinear function at each node distorts its hyperplane to a ridge function which also serves as a selector. So, the overall functional surface of a MLP is the combination of these ridge function. The number of hidden nodes provides the number of ridge functions. Therefore, as long as the number of nodes is large enough, the overall functional surface can approximate any mapping. This is an intuitive understanding of the universal approxi- mation property of the MLP. Y1 Y2 Yk Vl/ / * Z1 22 " x Figure 3-2. Multilayer Perceptron G Radial-Basis Function (RBF) As shown in Figure 3-3, the RBF network has two layers, the hidden layer is the non- linear layer, whose input-output relation is a radial-basis function, e.g. the Gaussian func- 73 tnz --x-pill tion: z, = e 2 where i.t is the mean (center) of the Gaussian function and determines the location of the Gaussian function in the input space, a2 is the variance of the Gaussian function and determines the shape or sharpness of the Gaussian function. The output layer is a linear layer. So the overall input-output relation of the network can be expressed as yj x- illl f 1 zi = e i= ..., (3.3) j = wjz z = (zi, ...,zl) = k where wj are linear projections, ao and pi are the same as above. y\ Y2 Yk zl z2 * A z x Figure 3-3. Radial-Basis Function Network (RBF Network) The RBF network is also a universal approximator if the number of hidden nodes is large enough [Pog90, Par91, Hay98]. However, unlike the MLP, the basic building block is not a "global" function but a "local" one such as the Gaussian function. The overall 74 mapping surface is approximated by the linear combination of such "local" surfaces. Intu- itively, we can also imagine that any shape of the mapping surface can be approximated by the linear combination of small piece of local surfaces if there is enough such basic building blocks. The RBF network is also an optimal regularization function [Pog90, Hay98]. It has been applied as extensively as the MLP in various areas. 3.1.2 Dynamic Models H. Transversal Filter The transversal filter, also referred to as a tapped-delay line filter or FIR filter, consists of two parts (as depicted in Figure 3-4): (1) the tapped-delay line, (2) the linear projection. The input-output relation can be expressed as q T q)T, T y(n) = wx(n i) = w x, w = (w0, ..., x = (x(n), ...,x(n -q)) (3.4) i=0 where wi are the parameters of the filter. Because of its versatility and ease of implemen- tation, the transversal filter has become an essential signal processing structure in a wide variety of applications [Hay96, Hon84]. Fi / W2 \ q-1 q Z1 Z-1 .. 1 Figure 3-4. Transversal Filter 75 W1 3 Wq + z1 1 Tap Gamma Memory Figure 3-5. Gamma Filter I. Gamma Model As shown in Figure 3-5, the gamma filter is similar to transversal filter except that the tapped delay line is replaced by the gamma memory line [deV92, Pri93]. The gamma memory is a delay tap with feedback. The transfer function of one tap gamma memory is -1 G(z) (z p (3.5) 1 -(I )z- z-(1 P) The corresponding impulse response is the gamma function with one parameter p = 1: g(n) = p(l-p)-1, n> 1 (3.6) For the pth tap of the gamma memory line, the transfer function and its impulse response (the gamma function) are G,(z) = -l gp(n) = (p (1- l)n-P, nP2p (3.7) Compared with the tapped delay line, the gamma memory line is a recursive structure and has infinite length of impulse response. Therefore, the "memory depth" can be adjusted by the parameter pt instead of fixed by the number of taps in the tapped delay line. Compared with the general IIR filter, the analysis of the stability of the gamma mem- ory is simple. When 0 < p < 2, the gamma memory line is stable (everywhere in the line). 76 And also when pt = 1, the gamma memory line becomes the tapped delay line. So, the gamma memory line is the generalization of the tapped delay line. The gamma filter is a good compromise between the FIR filter and the IIR filter. It has been widely applied to a variety of signal processing and pattern recognition problems. J. The All Pole IIR Filter + .. -- x y Figure 3-6. The All Pole IIR Filter As shown in Figure 3-6, the all pole IIR filter is composed of only the delayed feed- back and there is no feedforward connections in the network structure. The transfer func- tion of the filter is H(z) = (3.8) n 1- wi- i=1 n Obviously, this is the inverse system of the FIR filter H(z) = 1 wiz which has i= 1 been used in deconvolution problems [Hay94a]. There are also its counterpart for two inputs and two outputs system, which has been used in the blind source and blind source separation problems [Ngu95, Wan96]. In general, this type of filters may be very useful in inverse, or system identification problem. 77 K. TDNN and Gamma Neural Network In an MLP, each connection is instantaneous and there is no temporal structure in it. If the instantaneous connections are replaced by a filter, then each node will have the ability to process time signals. The time delay neural network (TDNN) is formed by replacing the connections in the MLP with transversal filters [Lan88, Wai89]. The gamma neural net- work is the result of replacing the connections in the MLP with gamma filters [deV92, Pri93]. These types of neural networks extend the ability of the MLP. Y1 Y2 Yk V1 2 V zl ( 2 ( )zl d SW2W1 Delayed Connection x Figure 3-7. Multilayer Perceptron with Delayed Connections L. General Recurrent Neural Network A general nonlinear dynamic system is the multilayer perceptron with some delayed connections. As Figure 3-7 shows, for instance, the output of node zi relies on the previ- ous output of node Yk: z,(n) = f(w(x(n) + bi + dyk(n 1)) (3.9) 78 There may be some other nodes which have the similar delayed connections. This type of neural network is powerful but complicated. It is difficult to analyze adaptation although its flexibility and potential are high. 3.2 Learning Mechanisms The central part of a learning mechanism is the criterion. The range of application of a learning system may be very broad. For instance, a learning system or adaptive signal pro- cessing system can be used for data compression, encoding or decoding signals, noise or echo cancellation, source separation, signal enhancement, pattern classification, system identification and control, etc.. However, the criterion to achieve such diverse purposes can be basically divided into only two types: one is based on the energy measures; the other is based on information measures. As pointed out in Chapter 2, the energy measures can be regarded as special cases of information measures. In the following, various energy measures and information measures will be discussed. Once the criterion of a system is determined, the task left is to adjust the parameters of the system so as to optimize the criterion. There are a variety of optimization techniques. The gradient method is perhaps the simplest but it is a general method [Gil81, Hes80, Wid85] which is based on the first order approximation of the performance surface. Its on- line version--the stochastic gradient method [Wid63] is widely used in adaptive and learn- ing systems. Newton's method [Gil81, Hes80, Wid85] is a more sophisticated method which is based on the second order approximation of the performance surface. Its varied version--the conjugate gradient method [Hes80] will avoid the calculation of the inverse of the Hessian matrix and thus is computationally more efficient [Hes80]. There are also other techniques which are efficient for specific applications. For instance, the Expecta- 79 tion and Maximization algorithm for the maximum likelihood estimation or a class of non- negative function maximization [Dem77, Mcl96, XuD95, XuD96]. The natural gradient method by means of information geometry is used in the case where the parameter space is constrained [Ama98]. In the following, various techniques will also be briefly reviewed. 3.2.1 Learning Criteria SMSE Criterion The mean squared error (MSE) criterion is one of the most widely used criteria. For the learning system described in Chapter 1, if the given environmental data is {(x(n), d(n))|n= 1, ...,N} where x(n) is the input signal and d(n) is the desired signal, then the output signal is y(n) = q(x(n), W) and the error signal is e(n) = d(n) -y(n). The MSE criterion can be defined as N N J = e(n) = (d(n)-y(n)) (3.10) n=1 n= It is basically the squared Euclidean distance between desired signal d(n) and the out- put signal y(n) from the geometrical point of view, and the energy of the error signal from the point of view of the energy and entropy measures. Minimization of the MSE criterion will result in a closest output signal to the desired signal in the Euclidean dis- tance sense. As mentioned in Chapter 2, if we assume the error signal is white Gauss- ian with zero-mean, then the minimization of the MSE is equivalent to the minimization of the entropy of the error signal. 80 For a multiple output system; i.e., the output signal and the desired signal are multi- dimensional, the error signal is then multi-dimensional and the definition of the MSE criterion is the same as described in Chapter 2. Signal-to-Noise Ratio (SNR) The signal-to-noise ratio is also a frequently used criterion in the signal processing area. The purpose of many signal processing systems is to enhance the SNR. A well known example is the principal component analysis (PCA), where a linear projection is desired such that the SNR in the output is maximized (when the noise is assumed to T m be white Gaussian). For the linear model described above y = w x, y R, x Rm and w e Rm, if the input x is zero-mean and its covariance matrix is Rx = E[xxT], 2 T T T then the output power (short time energy) is E[y ] = w E[xx ]w = w Rxw. If the input is xnoise --a zero-mean white Gaussian noise with covariance matrix being iden- T tity matrix I, then the output power of the noise is w w. The SNR in the output of the linear projection will be T w Rxw J =-- (3.11) ww From the information-theoretic point of view, the entropy of the output will be S 1 T 1 1 H(w Xnoise) = log(w w) + -log27n + H(wTx) = 3log(w Rxw) + 3log2t + where the input signal x is assumed zero-mean Gaussian signal. Then the entropy dif- ference is 81 T T T 1 w Rxw J = H(w x)-H(w noise) = log T (3.13) ww which is equivalent to the SNR criterion. The solution to this problem is the eigenvec- tor that corresponds to the largest eigenvalue of Rx. The PCA problem can also be formulated as the minimum reconstruction MSE prob- lem [Kun94]: J = E[ (lww x-x)11] (3.14) (3.14) can also be regarded as an auto-association problem in a two-layer network with the constraints that the two layer weights should be dual with each other (i.e. one is the transpose of the other). The minimization solution to (3.14) is equivalent to the maxi- mization solution to (3.12) or (3.13). Signal-to-Signal Ratio For the same linear network, if the input signal is switched between two zero-mean signals x, and x2, then the signal-to-signal ratio in the output of the linear projection will be T w Rx w J = (3.15) w R 2W where Rx, is the covariance matrix of xl, and RX2 is the covariance matrix of x2. The Maximization of this criterion is to enhance the signal x1 in the output and to attenuate the signal x2 at the same time. From the information-theoretic point of view, if both signals are Gaussian signals, then the entropy difference in the output will be 82 T T T 1 wTRW J = H(w xl)-H(w x2) = -log- T (3.16) WX2 which is equivalent to a signal-to-signal ratio. The maximization solution to (3.15) or (3.16) is the generalized eigenvector with the largest generalized eigenvalue: Rx ioptimal = kmaxRx2woptimal (3.17) [Cha97] also shows that when this criterion is applied to classification problems, it can be formulated as a heteroassociation problem with a MSE criterion and a constraint. The Maximum Likelihood The maximum likelihood estimation has been widely used in the parametric model estimation [Dud98, Dud73]. It has also been extensively applied to "learning from examples." For instance, the hidden markov model has been successfully applied in the speech recognition problem [Rab93, Hua90]. Training of most hidden markov models is based on maximum likelihood estimation. In general, suppose there is a sta- tistical model p(z, w) where z is a random variable and w are a set of parameters, and the true probability distribution is q(z) but unknown. The problem is to find w so that p(z, w) is the closest to q(z). We can simply apply the information cross-entropy cri- terion, i.e. the Kullback-Leibler criterion to the problem: J(w) = Jq(z)log q(z dz = -E[logp(z, w)] + Hs(z) (3.18) p(Z, w) where Hs(z) is the Shannon entropy of z which does not depend on the parameters w, and L(w) = E[logp(z, w)] is exactly the log likelihood function of p(z, w). So, the minimization of (3.18) is equivalent to the maximization of the log likelihood function L(w). In other words, the maximum likelihood estimation is exactly the same as the 83 minimum Kullback-Leibler cross-entropy between the true probability distribution and the model probability distribution [Ama98]. * The Information-Theoretic Measures for BBS and ICA As introduced in Chapter 2, the maximization of the output entropy and the minimiza- tion of the mutual information between the outputs can be used in BBS and ICA prob- lems. We will deal with this case in more details later. 3.2.2 Optimization Techniques * The Back-Propagation Algorithm In general, for a function Rm -+ R: J = f(w), the gradient is the steepest ascent dw aJ direction for J, and is the steepest descent direction for J, and the whole first dw order approximation of the function at w = wn is J =f(w) + Aw-A (3.19) w =w, So, for the maximization of the function, the updating of w can be accomplished along the steepest ascent direction; i.e., wn + = n + where n is the st ize. For the minimization of the function the updating rule can be along the steepest aJ aJ descent direction; i.e., wn + = wn- [Wid85]. If the gradient a- can be W = W, expressed as the summation over data samples such as the case of the MSE as the cri- N 1 2 terion J = J(n), J(n) = -(d(n)-y(n)) then each datum can be used to n=l update the parameter w whenever it appears; i.e., wn + = w + p-C (n). This is called the stochastic gradient method [Wid63]. 84 N For a MLP network described above, the MSE criterion is still J = V J(n). Let's n=l T T look at a simple case with only one output node y = f(v z + a), v = (vl, ..., vt) T T z = (z, ..., z) zi = f(wx + bi), i = 1,..., 1. Then by the chain rule, we have NJ a N a a = n ~(n) -(n)y(n) (3.20) FV ay(n) 8v We can see from this equation that the key point here is how to calculate the sensitivity a a of the network output -y(n). The term -J(n) in the MSE case is the error signal FVY ay(n) --L-J(n) = e(n) = y(n) -d(n). The sensitivity can then be regarded as a mecha- 9y(n) nism which will propagate the error e(n) back to the parameters v or w,. To be more specific, we have (3.21) if we consider the relation = y( -y) for a sigmoid func- dx tion y = f(x) = 1/(1 + e x) and apply the chain to the problem 4(n) = 1 {y(n)(1-y(n))} -(n) = (n)z av -y(n) = (n)v zz (3.21) (n) y(n). {z(n)(l -z(n))} az(n) -y(n) = Wi(n)x(n) awi where is the operator for component-wise multiplication. The process of (3.21) is a linear process which back-propagate 1 through the "dual network" system back to each parameter and thus is called "back-propagation." If we need to back-propagate an error e(n), then the 1 in ((n) of (3.21) will be replaced by e(n), and (3.21) will be called the "error back-propagation." Actually, the "error back-propagation" is nothing but the gradient method implementation with the calculation of the gradient by the 85 chain rule applied to the network structure. The effectiveness of the "back-propaga- tion" is its locality in calculation by utilizing the topology of the network. It is signifi- cant for engineering implementations. For a detailed description, one can refer to Rumelhart etal. [Ru86b, Ru86c]. Yk(l) yk(2) y (N) yi(l) yl(2) d y4(N) zi (1) dz(2) zli(N) zi z(1)) (2) Z( x(l) x(2) x(N) T= 1 T=2 T=N Figure 3-8. The Time Extension of the Recurrent Neural Network in Figure 3-7. For a dynamic system with delay connections, the whole network can be extended along time with the delay connections linking the nodes between time slices. The recurrent neural network in Figure 3-7 is shown in Figure 3-8, in which, the structure in each time slice will only contain the instantaneous connections, and the delay con- nections will connect the corresponding nodes between time slices. Once a dynamic network is extended in time, the whole structure can be regarded as a large static net- 86 work and the back-propagation algorithm can be applied as usual. This is so called the "back-propagation through time" (BPTT) [Wer90, Wil90, Hay98]. There is another algorithm for the training of dynamic networks, which is called "real time recurrent learning" (RTRL) [Wil89, Hay98]. Both the BPTT and the RTRL are the gradient based method and both of them use the chain rule to calculate the gradient. The differ- ence is that the BPTT starts the chain rule from the end of a time block to the begin- ning of it, while the RTRL starts the chain rule from the beginning of a time block to the end of it, resulting in differences of the memory complexity and computational complexity [Hay98]. Newton's Method The gradient method is based on the first order approximation of the performance sur- face and is simple. But its convergence speed may be slow. Newton's method is based on the second order approximation of the performance surface and the closed form optimization solution to a quadratic function. First, let's look at the optimization solu- 1 T T mxm tion to a quadratic function F(x) = x Ax h x + c where A R is symmetric matrix, it is either positive definite or negative definite, h e Rm and x e Rm are vec- tors, c is a scalar constant. There is an maximum solution xo if A is negative definite, or there is an minimum solution xo if A is positive definite, where in both case, xo should satisfy the linear equation a-F(x) = 0; i.e., Ax = h, or xo = A -h. For a ax general cost function J(w), its second order approximation at w = wn will be J(w) = J(w) + (wn)(w-w + (- )()(w- ) (3.22) 87 where H(wn) is the Hessian matrix of J(w) at w = wn. So, the optimization point for (3.22) is w-wn = -H(wn) -L J(Wn). Thus we have Newton's method as fol- lows [Hes80, Hay98, Wid85]: wn + 1 = wn-H(wn) -l(w ) (3.23) 1= (3.23) As pointed in Haykin [Hay98], there are several problems for Newton's method to be applied to the MLP training. For instance, Newton's method involves the calculation of the inverse of the Hessian matrix. It is computationally complex and there is no guarantee that the Hessian matrix is nonsingular and always positive or negative defi- nite. For a nonquadratic performance surface, there is no guarantee for the conver- gence of Newton's method. To overcome these problems, there appear the Quasi- Newton method [Hay98] and the conjugate gradient method [Hes80, Hay98], etc. Quasi-Newton Method This method uses an estimate of the inverse Hessian matrix without the calculation of the real inverse. This estimate is guaranteed to be positive definite for a minimization problem or negative definite for a maximization problem. However, the computational complexity is still in the order of O(W2) where W is the number of parameters [Hay98]. SThe Conjugate Gradient Method The conjugate gradient method is based on the fact that the optimal point of a qua- dratic function can be obtained by a sequential searches along the so called conjugate directions rather than the direct calculation of the inverse of the Hessian matrix. There is a guarantee that the optimal solution can be obtained within W steps for a quadratic 88 function (W is the number of parameters). One method to obtain the conjugate direc- tions is based on the gradient directions; i.e., the modification of the gradient direc- tions may result in the one set of conjugate directions, thus the name "conjugate gradient method" [Hes80, Hay98]. The conjugate gradient method can avoid the cal- culation of the inverse and even the evaluation of the Hessian matrix itself, and thus is computational efficient. The conjugate gradient method is perhaps the only second- order optimization method which can be applied to large-scale problems [Hay98]. The Natural Gradient Method When a parameter space has a certain underlying structure, the ordinary gradient of a function does not represent its steepest direction, but the natural gradient does. The basic point of the natural gradient method is as follows [Ama98]: For a cost function J(w), if the small incremental vector dw is fixed with its length; i.e., Idwl = 2 where e is a small constant, then the steepest descent direction of J(w) is --J(w) and the steepest ascent direction is ---J(w). However, if the length aw dw T 2 of dw is constrained in such a way that the quadratic form (dw) G(dw) = e where G is so called Riemannian metric tensor which is always positive definite, then the steepest descent direction will be -G 'L-J(w), and the steepest ascent direction will aw be Gl -. (w). aw SThe Expectation and Maximization (EM) Algorithm The EM algorithm can be generalized and summarized as the following inequality called the generalized EM inequality [XuD95], which can be described as follows: 89 I For a non-negative function f(D, 0) = i(D, 0), fi(D, 0) 2 0, V(D, 0), i= 1 D = { di e R } is the data set, 0 is the parameter set, we have f(D, On+1) >f(D, On), If 0n1 = argmax fi(D, On)logfi(D, 0) (3.24) i= 1 This inequality suggests an iterative method for the maximization of the function f(D, 0) with respect to the parameters 0, that is the generalized EM algorithm (all functions f1(D, 0) and f(D, 0) are not required to be a pdf function, as long as they are non-negative functions). First, use the known parameters On to calculate fi(D, On) 1 and thus .fi(D, 0,)logf;(D, 0), this is so called expectation step I i=1 ( fi(D, On)logf;(D, 0) can be regarded as a generalized expectation); Second, find i=1 1 the maximum point On 1 for the expectation function -fi(D, on)logfi(D, 0), this i= 1 is so called maximization step. The process can go on iteratively. With this inequality, it is not difficult to prove the Baum-Eagon inequality which is the basis for the training of the well known hidden markov model. The Baum-Eagon ine- quality can be stated as P(y) 2 P(x) where P(x) = P({xi,}) is a polynomial with nonnegative coefficients homogeneous of degree d in its variables xij; x = {xij } is a qi point in the domain PD: xii 0 xij = 1 i = 1, ...,p j = 1,..., q and qi j= 1 xi,.-P(x) 0 for all i; y = {yj} is another point in the PD satisfying S=1 ij ( 9i 9 y = x PP/I (x) P(x) 0 If we regard x as a parameter set, then = (iX j = 1 if this inequality also suggests an iterative way to maximize the polynomial P(x). That is the above y is a better estimation of parameters (better means makes the polynomial larger) and the process can go on iteratively. The polynomial can also be non-homoge- neous but with nonnegative coefficients. This is a general result which has been 90 applied to train such general model as the multi-channel hidden markov model [XuD96], where the calculation of the gradient -P(x) is still needed and which is axi accomplished by the back-propagation through time. So, the forward and backward algorithm in the training of the hidden markov model can be regarded as the forward process and back-propagation through time for the hidden markov network [XuD96]. The details about the EM algorithm can be found in Dempster and McLachlan [Dep77, Mcl96]. 3.3 General Point of View It can be seen from the above that there are variety of learning criteria. Some of them are based on energy quantities, some of them are based on information-theoretic mea- sures. In this chapter, a unifying point of view will be given 3.3.1 InfoMax Principle In the late 1980s, Linsker gave a rather general point of view about learning or statisti- cal signal processing [Lin88, Lin89]. He pointed out that the transformation of a random vector X observed at the input layer of a neural network to a random vector Y produced at the output layer of the network should be so chosen that the activities of the neurons in the output layer jointly maximize information about the activities in the input layer. To achieve this, the mutual information I(Y, X) between the input vector X and the output vector Y should be used as the cost function or criteria for the learning process of the neu- ral network. This is called the InfoMax principle. The InfoMax principle provides a math- ematical framework for self-organization of the learning network that is independent of the rule used for its implementation. This principle can also be viewed as the neural net- 91 work counterpart of the concept of channel capacity, which defines the Shannon limit on the rate of information transmission through a communication channel. The InfoMax prin- ciple is depicted in the following figure: Input X Output Y SNeural etwork Maximization of I(Y, X) Figure 3-9. InfoMax Scheme When the neural network or mapping system is deterministic, the mutual information is determined by the output entropy as it can be shown by I(Y, X) = H(Y) H(YIX) where H(Y) is the output entropy, and H(YIX) = 0 is the conditional output entropy when the input is given (since the input-output relation is deterministic, the conditional entropy is zero). So, in this case, the maximization of mutual information is equivalent to the maximization of the output entropy. 3.3.2 Other Similar Information-Theoretic Schemes Haykin summarized other information-theoretic learning schemes in [Hay98], which all use the mutual information as the learning criteria but the schemes are formulated in different ways. There are three other different scenarios which are described in the follow- ing. Although the formulations are different, the spirit is the same as the InfoMax princi- ple [Hay98]. |

Full Text |

PAGE 1 (1(5*< (17523< $1' ,1)250$7,21 327(17,$/ )25 1(85$/ &20387$7,21 %\ '21*;,1 ;8 $ ',66(57$7,21 35(6(17(' 72 7+( *5$'8$7( 6&+22/ 2) 7+( 81,9(56,7< 2) )/25,'$ ,1 3$57,$/ )8/),//0(17 2) 7+( 5(48,5(0(176 )25 7+( '(*5(( 2) '2&725 2) 3+,/2623+< 81,9(56,7< 2) )/25,'$ PAGE 2 7R 0\ 3DUHQWV PAGE 3 $&.12:/('*(0(176 7KLV &KLQHVH SRHP H[DFWO\ H[SUHVVHV P\ IHHOLQJ DQG H[SHULHQFH LQ IRXU \HDUVf 3K' VWXG\ 'XULQJ WKLV SHULRG WKHUH KDYH EHHQ GLIILFXOWLHV HQFRXQWHUHG ERWK LQ WKH FRXUVH RI P\ UHVHDUFK DQG LQ P\ GDLO\ OLIH -XVW DV WKH SRHP VD\V WKHUH DUH DOZD\V KRSHV LQ VSLWH RI GLIILFXOWLHV 5HWURVSHFWLQJ WKH SDVW ZRXOG OLNH WR H[SUHVV P\ JUDWLWXGH WR LQGLYLGXDOV ZKR EURXJKW PH KRSH DQG OLJKW ZKLFK JXLGHG PH JR WKURXJK WKH GDUNQHVV )LUVW ZRXOG OLNH WR WKDQN P\ DGYLVRU 'U -RV 3ULQFLSH IRU SURYLGLQJ PH ZLWK WKH ZRQGHUIXO RSSRUWXQLW\ WR EH D 3K' VWXGHQW LQ &1(/ ,WV H[FHOOHQW HQYLURQPHQW KHOSHG PH D ORW ZKHQ MXVW FDPH KHUH ZDV LPSUHVVHG E\ 'U 3ULQFLSHfV DFWLYH WKRXJKW DQG DSSUHFLn DWHG YHU\ PXFK KLV VW\OH RI VXSHUYLVLRQ ZKLFK JLYH D ORW RI VSDFH WR VWXGHQWV WR H[SORUH RQ WKHLU RZQ DP JUDWHIXO IRU KLV LQWURGXFLQJ PH WR WKH DUHD RI WKH LQIRUPDWLRQWKHRUHWLF OHDUQLQJ DQG WKH JXLGDQFH WKURXJKRXW WKH GHYHORSPHQW RI WKLV GLVVHUWDWLRQ ZRXOG DOVR OLNH WR WKDQN P\ FRPPLWWHH PHPEHUV 'U -RKQ +DUULV 'U 'RQDOG &KLOGHUV 'U -DFRE +DPPHU 'U 0DUN PAGE 4 OLVW LQFOXGHV EXW QRW OLPLWHG WR /LNDQJ PAGE 5 7$%/( 2) &217(176 3DJH $&.12:/('*(0(176 LLL $%675$&7 YLLL &+$37(56 ,1752'8&7,21 ,QIRUPDWLRQ DQG (QHUJ\ $ %ULHI 5HYLHZ 0RWLYDWLRQ 2XWOLQH (1(5*< (17523< $1' ,1)250$7,21 327(17,$/ (QHUJ\ (QWURS\ DQG ,QIRUPDWLRQ RI 6LJQDOV (QHUJ\ RI 6LJQDOV ,QIRUPDWLRQ (QWURS\ *HRPHWULFDO ,QWHUSUHWDWLRQ RI (QWURS\ 0XWXDO ,QIRUPDWLRQ 4XDGUDWLF 0XWXDO ,QIRUPDWLRQ *HRPHWULFDO ,QWHUSUHWDWLRQ RI 0XWXDO ,QIRUPDWLRQ (QHUJ\ DQG (QWURS\ IRU *DXVVLDQ 6LJQDO &URVV&RUUHODWLRQ DQG 0XWXDO ,QIRUPDWLRQ IRU *DXVVLDQ 6LJQDO (PSLULFDO (QHUJ\ (QWURS\ DQG 0, 3UREOHP DQG /LWHUDWXUH 5HYLHZ (PSLULFDO (QHUJ\ (PSLULFDO (QWURS\ DQG 0XWXDO ,QIRUPDWLRQ 7KH 3UREOHP 1RQSDUDPHWULF 'HQVLW\ (VWLPDWLRQ (PSLULFDO (QWURS\ DQG 0XWXDO ,QIRUPDWLRQ 7KH /LWHUDWXUH 5HYLHZ 4XDGUDWLF (QWURS\ DQG ,QIRUPDWLRQ 3RWHQWLDO 7KH 'HYHORSPHQW RI ,QIRUPDWLRQ 3RWHQWLDO ,QIRUPDWLRQ )RUFH ,)f 7KH &DOFXODWLRQ RI ,QIRUPDWLRQ 3RWHQWLDO DQG )RUFH 4XDGUDWLF 0XWXDO ,QIRUPDWLRQ DQG &URVV ,QIRUPDWLRQ 3RWHQWLDO 40, DQG &URVV ,QIRUPDWLRQ 3RWHQWLDO &,3f &URVV ,QIRUPDWLRQ )RUFHV &,)f $Q ([SODQDWLRQ WR 40, PAGE 6 3DJH /($51,1* )520 (;$03/(6 /HDUQLQJ 6\VWHP 6WDWLF 0RGHOV '\QDPLF 0RGHOV /HDUQLQJ 0HFKDQLVPV /HDUQLQJ &ULWHULD 2SWLPL]DWLRQ 7HFKQLTXHV *HQHUDO 3RLQW RI 9LHZ ,QIR0D[ 3ULQFLSOH 2WKHU 6LPLODU ,QIRUPDWLRQ7KHRUHWLF 6FKHPHV $ *HQHUDO 6FKHPH /HDUQLQJ DV ,QIRUPDWLRQ 7UDQVPLVVLRQ /D\HUE\/D\HU ,QIRUPDWLRQ )LOWHULQJ )LOWHULQJ EH\RQG 6SHFWUXP /HDUQLQJ E\ ,QIRUPDWLRQ )RUFH 'LVFXVVLRQ RI *HQHUDOL]DWLRQ E\ /HDUQLQJ /($51,1* :,7+ 21/,1( /2&$/ 58/( $ &$6( 678'< 21 *(1(5$/,=(' (,*(1'(&20326,7,21 (QHUJ\ &RUUHODWLRQ DQG 'HFRUUHODWLRQ IRU /LQHDU 0RGHO 6LJQDO 3RZHU 4XDGUDWLF )RUP &RUUHODWLRQ +HEELDQ DQG $QWL+HEELDQ /HDUQLQJ /DWHUDO ,QKLELWLRQ &RQQHFWLRQV $QWL+HEELDQ /HDUQLQJ DQG 'HFRUUHODWLRQ (LJHQGHFRPSRVLWLRQ DQG *HQHUDOL]HG (LJHQGHFRPSRVLWLRQ 7KH ,QIRUPDWLRQ7KHRUHWLF )RUPXODWLRQ IRU (LJHQGHFRPSRVLWLRQ DQG *HQHUDOL]HG (LJHQGHFRPSRVLWLRQ 7KH )RUPXODWLRQ RI (LJHQGHFRPSRVLWLRQ DQG *HQHUDOL]HG (LJHQGHFRPSRVLWLRQ %DVHG RQ WKH (QHUJ\ 0HDVXUHV 7KH 2QOLQH /RFDO 5XOH IRU (LJHQGHFRPSRVLWLRQ 2MDfV 5XOH DQG WKH )LUVW 3URMHFWLRQ *HRPHWULFDO ([SODQDWLRQ WR 2MDfV 5XOH 6DQJHUfV 5XOH DQG WKH 2WKHU 3URMHFWLRQV $3(; 0RGHO 7KH /RFDO ,PSOHPHQWDWLRQ RI 6DQJHUfV 5XOH $Q ,WHUDWLYH 0HWKRG IRU *HQHUDOL]HG (LJHQGHFRPSRVLWLRQ $Q 2QOLQH /RFDO 5XOH IRU *HQHUDOL]HG (LJHQGHFRPSRVLWLRQ 7KH 3URSRVHG /HDUQLQJ 5XOH IRU WKH )LUVW 3URMHFWLRQ 7KH 3URSRVHG /HDUQLQJ 5XOHV IRU WKH 2WKHU &RQQHFWLRQV 6LPXODWLRQV &RQFOXVLRQ DQG 'LVFXVVLRQ $33/,&$7,216 $VSHFW $QJOH (VWLPDWLRQ IRU 6$5 ,PDJHU\ 3UREOHP 'HVFULSWLRQ 3UREOHP )RUPXODWLRQ ([SHULPHQWV RI $VSHFW $QJOH (VWLPDWLRQ YL PAGE 7 3DJH 2FFOXVLRQ 7HVW RQ $VSHFW $QJOH (VWLPDWLRQ $XWRPDWLF 7DUJHW 5HFRJQLWLRQ $75f 3UREOHP 'HVFULSWLRQ DQG )RUPXODWLRQ ([SHULPHQW DQG 5HVXOW 7UDLQLQJ 0/3 /D\HUE\/D\HU ZLWK &,3 %OLQG 6RXUFH 6HSDUDWLRQ DQG ,QGHSHQGHQW &RPSRQHQW $QDO\VLV 3UREOHP 'HVFULSWLRQ DQG )RUPXODWLRQ %OLQG 6RXUFH 6HSDUDWLRQ ZLWK &640, &6&,3f %OLQG 6RXUFH 6HSDUDWLRQ E\ 0D[LPL]LQJ 4XDGUDWLF (QWURS\ %OLQG 6RXUFH 6HSDUDWLRQ ZLWK ('40, ('&,3f DQG 0LQL0D[ 0HWKRG &21&/86,216 $1' )8785( :25. $3(1',&(6 $ 7+( ,17(*5$7,21 2) 7+( 352'8&7 2) *$866,$1 .(51(/6 % 6+$1121 (17523< 2) 08/7,',0(16,21$/ *$866,$1 9$5,$%/( & 5(1<, (17523< 2) 08/7,',0(16,21$/ *$866,$1 9$5,$%/( +& (17523< 2) 08/7,',0(16,21$/ *$866,$1 9$5,$%/( 5()(5(1&(6 %,2*5$3+,&$/ 6.(7&+ YLL PAGE 8 $EVWUDFW RI 'LVVHUWDWLRQ 3UHVHQWHG WR WKH *UDGXDWH 6FKRRO RI WKH 8QLYHUVLW\ RI )ORULGD LQ 3DUWLDO )XOILOOPHQW RI WKH 5HTXLUHPHQWV IRU WKH 'HJUHH RI 'RFWRU RI 3KLORVRSK\ (1(5*< (17523< $1' ,1)250$7,21 327(17,$/ )25 1(85$/ &20387$7,21 %\ 'RQJ[LQ ;X 0D\ &KDLUPDQ 'U -RV & 3ULQFLSH 0DMRU 'HSDUWPHQW (OHFWULFDO DQG &RPSXWHU (QJLQHHULQJ 7KH PDMRU JRDO RI WKLV UHVHDUFK LV WR GHYHORS JHQHUDO QRQSDUDPHWULF PHWKRGV IRU WKH HVWLPDWLRQ RI HQWURS\ DQG PXWXDO LQIRUPDWLRQ JLYLQJ D XQLI\LQJ SRLQW RI YLHZ IRU WKHLU XVH LQ VLJQDO SURFHVVLQJ DQG QHXUDO FRPSXWDWLRQ ,Q PDQ\ UHDO ZRUOG SUREOHPV WKH LQIRUPDn WLRQ LV FDUULHG VROHO\ E\ GDWD VDPSOHV ZLWKRXW DQ\ RWKHU D SULRUL NQRZOHGJH 7KH FHQWUDO LVVXH RI fOHDUQLQJ IURP H[DPSOHVf LV WR HVWLPDWH HQHUJ\ HQWURS\ RU PXWXDO LQIRUPDWLRQ RI D YDULDEOH RQO\ IURP LWV VDPSOHV DQG DGDSW WKH V\VWHP SDUDPHWHUV E\ RSWLPL]LQJ D FULWHULRQ EDVHG RQ WKH HVWLPDWLRQ %\ XVLQJ DOWHUQDWLYH HQWURS\ PHDVXUHV VXFK DV 5HQ\LfV TXDGUDWLF HQWURS\ FRXSOHG ZLWK WKH 3DU]HQ ZLQGRZ HVWLPDWLRQ RI WKH SUREDELOLW\ GHQVLW\ IXQFWLRQ IRU GDWD VDPSOHV ZH GHYHORSHG DQ fLQIRUPDWLRQ SRWHQWLDOf PHWKRG IRU HQWURS\ HVWLPDWLRQ ,Q WKLV PHWKRG GDWD VDPSOHV DUH WUHDWHG DV SK\VLFDO SDUWLFOHV DQG WKH HQWURS\ WXUQV RXW WR EH UHODWHG WR WKH SRWHQWLDO HQHUJ\ RI WKHVH fLQIRUPDWLRQ SDUWLFOHVf 7KH HQWURS\ PD[LPL]DWLRQ RU PLQLPL]D YQL PAGE 9 WLRQ LV WKHQ HTXLYDOHQW WR WKH PLQLPL]DWLRQ RU WKH PD[LPL]DWLRQ RI WKH fLQIRUPDWLRQ SRWHQn WLDOf %DVHG RQ WKH &DXFK\6FKZDUW] LQHTXDOLW\ DQG WKH (XFOLGHDQ GLVWDQFH PHWULF ZH IXUWKHU SURSRVHG WKH TXDGUDWLF PXWXDO LQIRUPDWLRQ DV DQ DOWHUQDWLYH WR 6KDQQRQfV PXWXDO LQIRUPDWLRQ 7KHUH LV DOVR D fFURVV LQIRUPDWLRQ SRWHQWLDOf LPSOHPHQWDWLRQ IRU WKH TXDn GUDWLF PXWXDO LQIRUPDWLRQ WKDW PHDVXUHV WKH FRUUHODWLRQ EHWZHHQ WKH fPDUJLQDO LQIRUPDn WLRQ SRWHQWLDOVf DW VHYHUDO OHYHOV f/HDUQLQJ IURP H[DPSOHVf DW WKH RXWSXW RI D PDSSHU E\ WKH fLQIRUPDWLRQ SRWHQWLDOf RU WKH fFURVV LQIRUPDWLRQ SRWHQWLDOf LV LPSOHPHQWHG E\ SURSDn JDWLQJ WKH fLQIRUPDWLRQ IRUFHf RU WKH fFURVV LQIRUPDWLRQ IRUFHf EDFN WR WKH V\VWHP SDUDPn HWHUV 6LQFH WKH FULWHULD DUH GHFRXSOHG IURP WKH VWUXFWXUH RI OHDUQLQJ PDFKLQHV WKH\ DUH JHQHUDO OHDUQLQJ VFKHPHV 7KH fLQIRUPDWLRQ SRWHQWLDOf DQG WKH fFURVV LQIRUPDWLRQ SRWHQn WLDOf SURYLGH D PLFURVFRSLF H[SUHVVLRQ IRU WKH PDFURVFRSLF PHDVXUH RI WKH HQWURS\ DQG PXWXDO LQIRUPDWLRQ DW WKH GDWD VDPSOH OHYHO 7KH DOJRULWKPV H[DPLQH WKH UHODWLYH SRVLWLRQ RI HDFK GDWD SDLU DQG WKXV KDYH D FRPSXWDWLRQDO FRPSOH[LW\ RI 2$Af $Q RQOLQH ORFDO DOJRULWKP IRU OHDUQLQJ LV DOVR GLVFXVVHG ZKHUH WKH HQHUJ\ ILHOG LV UHODWHG WR WKH IDPRXV ELRORJLFDO +HEELDQ DQG DQWL+HEELDQ OHDUQLQJ UXOHV %DVHG RQ WKLV XQGHUVWDQGLQJ DQ RQOLQH ORFDO DOJRULWKP IRU WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ LV SURn SRVHG 7KH LQIRUPDWLRQ SRWHQWLDO PHWKRGV KDYH EHHQ VXFFHVVIXOO\ DSSOLHG WR YDULRXV SUREOHPV VXFK DV DVSHFW DQJOH HVWLPDWLRQ LQ V\QWKHWLF DSHUWXUH UDGDU 6$5f LPDJHU\ WDUJHW UHFRJQLn WLRQ LQ 6$5 LPDJHU\ OD\HUE\OD\HU WUDLQLQJ RI PXOWLOD\HU QHXUDO QHWZRUNV DQG EOLQG VRXUFH VHSDUDWLRQ 7KH JRRG SHUIRUPDQFH RI WKH PHWKRGV RQ YDULRXV SUREOHPV FRQILUPV WKH YDOLGLW\ DQG HIILFLHQF\ RI WKH LQIRUPDWLRQ SRWHQWLDO PHWKRGV ,; PAGE 10 &+$37(5 ,1752'8&7,21 ,QIRUPDWLRQ DQG (QHUJ\ $ %ULHI 5HYLHZ ,QIRUPDWLRQ SOD\V DQ LPSRUWDQW UROH ERWK LQ WKH OLIH RI D SHUVRQ DQG RI D VRFLHW\ HVSHn FLDOO\ LQ WRGD\fV LQIRUPDWLRQ DJH 7KH EDVLF SXUSRVH RI DOO NLQGV RI VFLHQWLILF UHVHDUFK LV WR REWDLQ LQIRUPDWLRQ LQ D SDUWLFXODU DUHD 2QH RI WKH PRVW LPSRUWDQW WDVNV RI VSDFH SURJUDPV LV WR JHW LQIRUPDWLRQ DERXW FRVPLF VSDFH DQG FHOHVWLDO ERGLHV VXFK DV HYLGHQFH ZKHWKHU WKHUH LV OLIH RQ 0DUV $ FHQWUDO SUREOHP RI WKH ,QWHUQHW LV KRZ WR WUDQVPLW SURFHVV DQG VWRUH LQIRUPDWLRQ LQ FRPSXWHU QHWZRUNV f/LNH LW RU QRW ZH DUH LQIRUPDWLRQ GHSHQGHQW ,W LV D FRPPRGLW\ DV YLWDO DV WKH DLU ZH EUHDWKH DV DQ\ RI RXU PHWDEROLF HQHUJ\ UHTXLUHPHQWV )RU EHWWHU RU ZRUVH ZHfUH DOO LQHVFDSDEO\ HPEHGGHG LQ D XQLYHUVH RI IORZV QRW RQO\ RI PDWWHU DQG HQHUJ\ EXW DOVR RI ZKDWHYHU LW LV ZH FDOO LQIRUPDWLRQf > PAGE 11 GDZQ RI KXPDQ KLVWRU\ IROORZHG E\ WKH LQYHQWLRQ RI SDSHU SULQWLQJ WHOHJUDSK SKRWRJUDn SK\ WHOHSKRQH UDGLR WHOHYLVLRQ DQG ILQDOO\ WKH FRPSXWHU DQG WKH FRPSXWHU QHWZRUN DUH H[DPSOHV RI LQIRUPDWLRQ 0DQ\ LQYHQWLRQV DQG GLVFRYHULHV FDQ EH XVHG IRU ERWK SXUSRVHV )LUH DV DQ H[DPSOH FDQ EH XVHG IRU FRRNLQJ KHDWLQJ DQG WUDQVPLWWLQJ VLJQDOV (OHFWULFLW\ DV DQRWKHU H[DPSOH FDQ EH XVHG IRU WUDQVPLWWLQJ ERWK HQHUJ\ DQG LQIRUPDWLRQ >5HQ@ 7KHUH DUH D YDULHW\ RI HQHUJLHV DQG LQIRUPDWLRQ ,I ZH GLVUHJDUG WKH DFWXDO IRUP RI HQHUJ\ PHFKDQLFDO WKHUPDO FKHPLFDO HOHFWULFDO DQG DWRPLF HWFf DQG WKH UHDO FRQWHQW RI LQIRUPDWLRQ ZKDW ZLOO EH OHIW LV WKH SXUH TXDQWLW\ >5HQ@ 7KH SULQFLSOH RI HQHUJ\ FRQVHUn YDWLRQ ZDV IRUPXODWHG DQG GHYHORSHG LQ WKH PLGGOH RI WKH ODVW FHQWXU\ ZKLOH WKH HVVHQFH RI LQIRUPDWLRQ ZDV VWXGLHG ODWHU LQ WKH V :LWK WKH TXDQWLW\ RI HQHUJ\ ZH FDQ FRPH XS WR WKH FRQFOXVLRQ WKDW D VPDOO DPRXQW RI 8 FRQWDLQV D ODUJH DPRXQW RI DWRPLF HQHUJ\ DQG RXU ZRUOG FDPH LQWR WKH DWRPLF DJH :LWK WKH SXUH TXDQWLW\ RI LQIRUPDWLRQ ZH FDQ WHOO WKDW WKH RSWLFDO FDEOH FDQ WUDQVPLW PXFK PRUH LQIRUPDWLRQ WKDQ WKH RUGLQDU\ HOHFn WULFDO WHOHSKRQH OLQH DQG LQ JHQHUDO WKH FDSDFLW\ RI D FRPPXQLFDWLRQ FKDQQHO FDQ EH VSHFn LILHG LQ WHUPV RI WKH UDWH RI LQIRUPDWLRQ TXDQWLW\ $OWKRXJK WKH TXDQWLWDWLYH PHDVXUH RI LQIRUPDWLRQ ZDV RULJLQDWHG IURP WKH VWXG\ RI FRPPXQLFDWLRQ LW LV VXFK D IXQGDPHQWDO FRQFHSW DQG PHWKRG WKDW LW KDV EHHQ ZLGHO\ DSSOLHG WR PDQ\ DUHDV VXFK DV VWDWLVWLFV SK\Vn LFV FKHPLVWU\ ELRORJ\ OLIHVFLHQFH SV\FKRORJ\ SV\FKRELRORJ\ FRJQLWLYH VFLHQFH QHXURn VFLHQFH F\EHUQHWLFV FRPSXWHU VFLHQFHV HFRQRPLFV RSHUDWLRQ UHVHDUFK OLQJXLVWLFV SKLORVRSK\ > PAGE 12 FXUUHQW YDOXHV XVHG WR HQFRGH WKH PHVVDJH : N?RJP ZKHUH $ LV D FRQVWDQW >1\T &KU@ ,Q +DUWOH\ JHQHUDOL]HG WKLV WR DOO IRUPV RI FRPPXQLFDWLRQ OHWWLQJ P UHSUHn VHQW WKH QXPEHU RI V\PEROV DYDLODEOH DW HDFK VHOHFWLRQ RI D V\PERO WR EH WUDQVPLWWHG +DUWn OH\ H[SOLFLWO\ DGGUHVVHG WKH LVVXH RI WKH TXDQWLWDWLYH PHDVXUH IRU LQIRUPDWLRQ DQG SRLQWHG RXW WKDW LW VKRXOG EH LQGHSHQGHQW RI SV\FKRORJLFDO IDFWRUV RU REMHFWLYHf >+DU &KU@ /DWHU LQ 6KDQQRQ SXEOLVKHG KLV FHOHEUDWHG SDSHU f$ 0DWKHPDWLFDO 7KHRU\ RI &RPn PXQLFDWLRQf ZKLFK H[SORUHG WKH VWDWLVWLFDO VWUXFWXUH RI D PHVVDJH DQG H[WHQGHG 1\TXLVW DQG +DUWOH\fV ORJDULWKPLF PHDVXUH IRU LQIRUPDWLRQ WR D SUREDELOLVWLF ORJDULWKP 1 1 SN?RJSN IRU WKH SUREDELOLW\ VWUXFWXUH SN! N ASN N N :KHQ SN :P LQ WKH HTXLSUREDEOH FDVH 6KDQQRQfV PHDVXUH GHJHQHUDWHV WR +DUWOH\fV PHDVXUH >6KD 6KD@ 6KDQQRQfV PHDVXUH FDQ DOVR EH UHJDUGHG DV D PHDVXUH IRU XQFHUn WDLQW\ ,W ODLG WKH IRXQGDWLRQ IRU LQIRUPDWLRQ WKHRU\ 7KHUH LV D VWULNLQJ IRUPDO VLPLODULW\ EHWZHHQ 6KDQQRQfV PHDVXUH DQG WKH HQWURS\ LQ VWDWLVWLFDO PHFKDQLFV 7KLV ZDV RQH RI WKH UHDVRQV WKDW OHG YRQ 1HXPDQQ WR VXJJHVW WR 6KDQQRQ WR FDOO KLV XQFHUWDLQW\ PHDVXUH WKH HQWURS\ >7UL@ f(QWURSLHf ZDV D *HUPDQ ZRUG FRLQHG LQ E\ &ODXVLXV WR UHSUHVHQW WKH FDSDFLW\ IRU FKDQJH RI PDWWHU >&KU@ 7KH VHFRQG ODZ RI WKHUPRG\QDPLFV IRUPXODWHG E\ &ODXVLXV LV DOVR NQRZQ DV WKH HQWURS\ ODZ ,WV EHVWNQRZQ VWDWHPHQW KDV EHHQ LQ WKH IRUP f+HDW FDQQRW E\ LWVHOI SDVV IURP D FROGHU WR D KRWWHU V\VWHPf 2U PRUH IRUPDOO\ WKH HQWURS\ RI D FORVHG V\VWHP ZLOO QHYHU GHFUHDVH EXW FDQ RQO\ LQFUHDVH XQWLO LW UHDFKHV LWV PD[LPXP > PAGE 13 &ODXVLXVf HQWURS\ ZDV LQLWLDOO\ DQ DEVWUDFW DQG PDFURVFRSLF LGHD ,W ZDV %ROW]PDQQ ZKR ILUVW JDYH WKH HQWURS\ D PLFURVFRSLF DQG SUREDELOLVWLF LQWHUSUHWDWLRQ %ROW]PDQQfV ZRUN VKRZHG WKDW HQWURS\ FRXOG EH XQGHUVWRRG DV D VWDWLVWLFDO ODZ PHDVXULQJ WKH SUREDEOH VWDWHV RI WKH SDUWLFOHV LQ D FORVHG V\VWHP ,Q VWDWLVWLFDO PHFKDQLFV HDFK SDUWLFOH LQ D V\VWHP RFFXSLHV D SRLQW LQ D fSKDVH VSDFHf DQG VR WKH HQWURS\ RI D V\VWHP FDPH WR FRQVWLWXWH D PHDVXUH IRU WKH SUREDELOLW\ RI WKH PLFURVFRSLF VWDWH GLVWULEXWLRQ RI SDUWLFOHVf RI DQ\ VXFK V\VWHP $FFRUGLQJ WR WKLV LQWHUSUHWDWLRQ D FORVHG V\VWHP ZLOO DSSURDFK D VWDWH RI WKHUPRn G\QDPLF HTXLOLEULXP EHFDXVH HTXLOLEULXP LV RYHUZKHOPLQJO\ WKH PRVW SUREDEOH VWDWH RI WKH V\VWHP 7KH SUREDELOLVWLF LQWHUSUHWDWLRQ RI HQWURS\ UHVXOWHG LQ DQ LQWHUSUHWDWLRQ RI HQWURS\ WKDW LV RQH RI WKH FRUQHUVWRQHV RI WKH PRGHP UHODWLRQVKLS EHWZHHQ PHDVXUHV RI HQWURS\ DQG WKH DPRXQW RI LQIRUPDWLRQ LQ D PHVVDJH 7KDW LV ERWK WKH LQIRUPDWLRQ HQWURS\ DQG WKH VWDWLVWLFDO PHFKDQLFDO HQWURS\ DUH WKH PHDVXUH RI XQFHUWDLQW\ RU GLVRUGHU RI D V\Vn WHP > PAGE 14 VKRZHG PDWKHPDWLFDOO\ WKDW HQWURS\ DQG LQIRUPDWLRQ ZHUH IXQGDPHQWDOO\ LQWHUFRQQHFWHG DQG KLV IRUPXOD ZDV DQDORJRXV WR WKH PHDVXUHV RI LQIRUPDWLRQ GHYHORSHG E\ 1\TXLVW DQG +DUWOH\ DQG HYHQWXDOO\ E\ 6KDQQRQ > PAGE 15 LV .XOOEDFNfV PLQLPXP FURVVHQWURS\ SULQFLSOH ZKLFK LQWURGXFHV WKH FRQFHSW RI FURVVn HQWURS\ RU fGLUHFWHG GLYHUJHQFHf RI D SUREDELOLW\ GLVWULEXWLRQ 3 IURP DQRWKHU SUREDELOLW\ GLVWULEXWLRQ 4 7KH PD[LPXP HQWURS\ SULQFLSOH FDQ EH YLHZHG DV D VSHFLDO FDVH RI WKH PLQLPXP FURVVHQWURS\ SULQFLSOH ZKHQ 4 LV D XQLIRUP GLVWULEXWLRQ >.DS@ ,Q DGGLWLRQ 6KDQQRQfV PXWXDO LQIRUPDWLRQ LV QRWKLQJ EXW WKH GLUHFWHG GLYHUJHQFH EHWZHHQ WKH MRLQW SUREDELOLW\ GLVWULEXWLRQ DQG WKH IDFWRUL]HG PDUJLQDO GLVWULEXWLRQV 0RWLYDWLRQ 7KH DERYH JLYHV D EULHI UHYLHZ RI YDULRXV DVSHFWV RQ HQHUJ\ HQWURS\ DQG LQIRUPDWLRQ IURP ZKLFK ZH FDQ VHH KRZ IXQGDPHQWDO DQG JHQHUDO WKH FRQFHSWV RI HQHUJ\ DQG HQWURS\ DUH DQG KRZ WKHVH WZR IXQGDPHQWDO FRQFHSWV DUH UHODWHG WR HDFK RWKHU ,Q WKLV GLVVHUWDWLRQ WKH PDMRU LQWHUHVWV DQG WKH LVVXHV DGGUHVVHG DUH DERXW WKH HQHUJ\ DQG HQWURS\ RI VLJQDOV HVSHFLDOO\ WKH HPSLULFDO HQHUJ\ DQG HQWURS\ PHDVXUHV RI VLJQDOV ZKLFK DUH FUXFLDO LQ VLJn QDO SURFHVVLQJ SUDFWLFH )LUVW OHWfV WDNH D ORRN DW WKH HPSLULFDO HQHUJ\ PHDVXUHV IRU VLJn QDOV 7KHUH DUH PDQ\ NLQGV RI VLJQDOV LQ WKH ZRUOG 1R PDWWHU ZKDW NLQG D VLJQDO FDQ EH DEVWUDFWHG DV ;Qf H 5P ZKHUH Q LV WKH WLPH LQGH[ RQO\ GLVFUHWH WLPH VLJQDOV DUH FRQVLGn HUHG LQ WKLV GLVVHUWDWLRQf 5P UHSUHVHQWV DQ PGLPHQVLRQDO UHDO VSDFH RQO\ UHDO VLJQDOV DUH FRQVLGHUHG LQ WKLV GLVVHUWDWLRQ FRPSOH[ VLJQDOV FDQ EH WKRXJKW RI DV D WZR GLPHQVLRQDO UHDO VLJQDOf 7KH HPSLULFDO HQHUJ\ DQG SRZHU RI D ILQLWH VLJQDO [^Qf H 5 Q 1 LV 1 ? 1 ([f e [Qf 3[f e [Qf Q Q f PAGE 16 7KH GLIIHUHQFH EHWZHHQ WZR VLJQDOV [ Qf DQG [Qf Q 1 FDQ EH PHDVXUHG E\ WKH HPSLULFDO HQHUJ\ RU SRZHU RI WKH GLIIHUHQFH VLJQDO GQf [ QffÂ§[Qf 1 1 (G[O[f ; G^Qf 3G[[[f ; GQf f Q Q 7KH GLIIHUHQFH EHWZHHQ [ DQG [ FDQ DOVR EH PHDVXUHG E\ WKH FURVVFRUUHODWLRQ LQQHUSURGXFWf 1 &;@[f A[Qf[Qf f Q RU LWV QRUPDOL]HG YHUVLRQ ? &[O [f 1 ; [?Qf[Qf 1 [Xf 1 ; ;Qf Q A L Q 9 \ Q Y \ 7KH JHRPHWULFDO LOOXVWUDWLRQ RI WKHVH TXDQWLWLHV LV VKRZQ LQ )LJXUH )LJXUH *HRPHWULFDO ,OOXVWUDWLRQ RI (QHUJ\ 4XDQWLWLHV 6LQFH ([f &[ [f FURVVFRUUHODWLRQ FDQ EH UHJDUGHG DV DQ HQHUJ\ UHODWHG TXDQn WLW\ :H NQRZ WKDW IRU D UDQGRP VLJQDO [Qf ZLWK WKH SGI SUREDELOLW\ GHQVLW\ IXQFWLRQf I[[f WKH 6KDQQRQ LQIRUPDWLRQ HQWURS\ LV PAGE 17 +[f ?I[[f?RJI[[fG[ f %DVHG RQ WKH LQIRUPDWLRQ HQWURS\ FRQFHSW WKH GLIIHUHQFH RU VLPLODULW\ EHWZHHQ WZR UDQGRP VLJQDOV MF DQG [ ZLWK MRLQW SGI I;>;O^[?[f DQG PDUJLQDO SGIV I[ [f I;L[f FDQ EH PHDVXUHG E\ WKH PXWXDO LQIRUPDWLRQ EHWZHHQ WZR VLJQDOV f 6LQFH +[f ,[ [f PXWXDO LQIRUPDWLRQ LV DQ HQWURS\ W\SH TXDQWLW\ &RPSDUDWLYHO\ HQHUJ\ LV D VLPSOH VWUDLJKWIRUZDUG LGHD DQG HDV\ WR LPSOHPHQW ZKLOH LQIRUPDWLRQ HQWURS\ XVHV DOO WKH VWDWLVWLFV RI WKH VLJQDO DQG LV PXFK PRUH SURIRXQG DQG GLIn ILFXOW WR PHDVXUH RU LPSOHPHQW $ YHU\ IXQGDPHQWDO DQG LPSRUWDQW TXHVWLRQ DULVHV QDWXn UDOO\ ,I D GLVFUHWH GDWD VHW ^[Qf H 5P?Q 1` LV JLYHQ ZKDW LV WKH LQIRUPDWLRQ HQWURS\ UHODWHG WR WKLV GDWD VHW RU KRZ FDQ ZH HVWLPDWH WKH HQWURS\ IRU WKLV GDWD VHW 7KLV HPSLULFDO HQWURS\ SUREOHP ZDV DGGUHVVHG EHIRUH LQ WKH OLWHUDWXUHV >&KU &KU %DW 9LR )LV@ HWF 3DUDPHWULF PHWKRGV FDQ EH XVHG IRU SGI HVWLPDWLRQ DQG WKHQ HQWURS\ HVWLPDWLRQ ZKLFK LV VWUDLJKWIRUZDUG EXW OHVV JHQHUDO 1RQSDUDPHWULF PHWKRGV IRU SGI HVWLn PDWLRQ FDQ EH XVHG DV WKH EDVLV IRU WKH JHQHUDO HQWURS\ HVWLPDWLRQ QR DVVXPSWLRQ DERXW GDWD GLVWULEXWLRQ LV UHTXLUHGf 2QH H[DPSOH LV WKH KLVWRUJUDP PHWKRG >%DW@ ZKLFK LV HDV\ WR LPSOHPHQW LQ RQH GLPHQVLRQDO VSDFH EXW GLIILFXOW WR DSSO\ WR KLJK GLPHQVLRQDO VSDFH DQG DOVR GLIILFXOW WR DQDO\]H PDWKHPDWLFDOO\ $QRWKHU SRSXODU QRQSDUDPHWULF SGI HVWLPDn WLRQ PHWKRG LV WKH 3DU]HQ ZLQGRZ PHWKRG WKH VRFDOOHG NHUQHO RU SRWHQWLDO IXQFWLRQ PHWKRG >3DU 'XG &KU@ 2QFH WKH 3DU]HQ ZLQGRZ PHWKRG LV XVHG WKH SHUSOH[LQJ SUREOHP OHIW LV WKH FDOFXODWLRQ RI WKH LQWHJUDO LQ WKH HQWURS\ RU PXWXDO LQIRUPDWLRQ IRUPXOD 1XPHULFDO PHWKRGV DUH H[WUHPHO\ FRPSOH[ LQ WKLV FDVH DQG WKXV RQO\ VXLWDEOH IRU RQH PAGE 18 GLPHQVLRQDO YDULDEOH >3KD@ $SSUR[LPDWLRQ FDQ DOVR EH PDGH E\ XVLQJ VDPSOH PHDQ >9LR@ ZKLFK UHTXLUHV D ODUJH DPRXQW RI GDWD DQG PD\ QRW EH D JRRG DSSUR[LPDWLRQ IRU D VPDOO GDWD VHW 7KH LQGLUHFW PHWKRG RI )LVKHU >)LV@ FDQ QRW EH XVHG IRU HQWURS\ HVWLPDn WLRQ EXW RQO\ IRU HQWURS\ PD[LPL]DWLRQ SXUSRVHV )RU WKH EOLQG VRXUFH VHSDUDWLRQ %66f RU LQGHSHQGHQW FRPSRQHQW DQDO\VLV ,&$f SUREOHP >&RP &DR &DUE %HO 'HF &DU PAGE 19 $FFRUGLQJ WR WKH PRGHP VFLHQFH WKH XQLYHUVH LV D PDVVHQHUJ\ V\VWHP ,Q VXFK PDVV HQHUJ\ VSLULW ZH ZRXOG DVN ZKHWKHU WKH LQIRUPDWLRQ HQWURS\ HVSHFLDOO\ WKH HPSLULFDO LQIRUPDWLRQ HQWURS\ ZRXOG VRPHKRZ KDYH PDVVHQHUJ\ SURSHUWLHV ,Q WKLV GLVVHUWDWLRQ WKH HPSLULFDO LQIRUPDWLRQ HQWURS\ LV UHODWHG WR fSRWHQWLDO HQHUJ\f RI fGDWD SDUWLFOHVf GDWD VDPSOHVf 7KXV D GDWD VDPSOH LV FDOOHG fLQIRUPDWLRQ SDUWLFOHf ,37f ,Q IDFW GDWD VDPSOHV DUH EDVLF XQLWV FRQYH\LQJ LQIRUPDWLRQ WKH\ LQGHHG DUH fSDUWLFOHVf ZKLFK WUDQVPLW LQIRUPDn WLRQ $FFRUGLQJO\ WKH HPSLULFDO HQWURS\ FDQ EH UHODWHG WR WKH SRWHQWLDO HQHUJ\ FDOOHG fLQIRUPDWLRQ SRWHQWLDOf ,3f RI fLQIRUPDWLRQ SDUWLFOHVf ,37Vf :LWK WKH LQIRUPDWLRQ SRWHQWLDO ZH FDQ IXUWKHU VWXG\ KRZ LW FDQ EH XVHG LQ D OHDUQLQJ V\VWHP RU DQ DGDSWLYH V\VWHP RI VLJQDO SURFHVVLQJ DQG KRZ D OHDUQLQJ V\VWHP FDQ VHOI RUJDQL]H ZLWK WKH LQIRUPDWLRQ IOX[ LQ DQG RXW RIWHQ LQ WKH IRUP RI WKH IOX[ RI GDWD VDPn SOHVf MXVW OLNH DQ RSHQ SK\VLFDO V\VWHP ZKLFK ZLOO DSSHDU VRPH RUGHUV ZLWK WKH HQHUJ\ IOX[ LQ DQG RXW 7KH LQIRUPDWLRQ WKHRU\ RULJLQDWHG IURP FRPPXQLFDWLRQ VWXG\ DQG KDV EHHQ ZLGHO\ XVHG IRU WKH GHVLJQ DQG SUDFWLFH LQ WKLV DUHD DQG PDQ\ RWKHU DUHDV +RZHYHU LWV DSSOLFDn WLRQ WR OHDUQLQJ V\VWHPV RU DGDSWLYH V\VWHPV VXFK DV SHUFHSWXDO V\VWHPV HLWKHU DUWLILFLDO RU QDWXUDO LV MXVW LQ LWV LQIDQF\ 6RPH HDUO\ UHVHDUFKHUV WULHG WR XVH LQIRUPDWLRQ WKHRU\ IRU WKH H[SODQDWLRQ RI D SHUFHSWXDO SURFHVV HJ $WWQHDYH ZKR SRLQWHG RXW LQ WKDW fD PDMRU IXQFWLRQ RI WKH SHUFHSWXDO PDFKLQHU\ LV WR VWULS DZD\ VRPH RI WKH UHGXQGDQF\ RI VWLPXODn WLRQ WR GHVFULEH RU HQFRGH LQIRUPDWLRQ LQ D IRUP PRUH HFRQRPLFDO WKDQ WKDW LQ ZKLFK LW LPSLQJHV RQ WKH UHFHSWRUVf >+D\ SDJH @ +RZHYHU RQO\ LQ WKH ODWH V GLG /LQ VNHU SURSRVH WKH SULQFLSOH RI PD[LPXP LQIRUPDWLRQ SUHVHUYDWLRQ ,QIR0D[f >/LQ /LQ@ DV WKH EDVLF SULQFLSOH IRU WKH VHOIRUJDQL]DWLRQ RI QHXUDO QHWZRUNV ZKLFK UHTXLUHV PAGE 20 WKH PD[LPL]DWLRQ RI WKH PXWXDO LQIRUPDWLRQ EHWZHHQ WKH RXWSXW DQG WKH LQSXW RI WKH QHWn ZRUN VR WKDW WKH LQIRUPDWLRQ DERXW WKH LQSXW LV EHVW SUHVHUYHG LQ WKH RXWSXW /LQVNHU IXUWKHU DSSOLHG WKH SULQFLSOH WR OLQHDU QHWZRUNV ZLWK *DXVVLDQ DVVXPSWLRQ RQ LQSXW GDWD GLVWULEXn WLRQ DQG QRLVH GLVWULEXWLRQ DQG GHULYHG WKH ZD\ WR PD[LPL]H WKH PXWXDO LQIRUPDWLRQ LQ WKLV SDUWLFXODU FDVH >/LQ /LQ@ ,Q 3OXPEOH\ DQG )DOOVLGH SURSRVHG WKH VLPLODU PLQLPXP LQIRUPDWLRQ ORVV SULQFLSOH >3OX@ ,Q WKH VDPH SHULRG WKHUH DUH RWKHU UHVHDUFKn HUV ZKR XVH WKH LQIRUPDWLRQWKHRUHWLF SULQFLSOHV EXW VWLOO ZLWK WKH OLPLWDWLRQ RI OLQHDU PRGHO RU *DXVVLDQ DVVXPSWLRQ IRU LQVWDQFH %HFNHU DQG +LQWRQfV VSDWLDOO\ FRKHUHQW IHDn WXUHV >%HF %HF@ 8NUDLQHF DQG +D\NLQfV VSDWLDOO\ LQFRKHUHQW IHDWXUHV >8NU@ HWF ,Q UHFHQW \HDUV WKH LQIRUPDWLRQWKHRUHWLF DSSURDFKHV IRU %66 DQG ,&$ KDYH GUDZQ D ORW RI DWWHQWLRQ $OWKRXJK WKH\ FHUWDLQO\ EURNH WKH OLPLWDWLRQ RI WKH PRGHO OLQHDULW\ DQG WKH *DXVn VLDQ DVVXPSWLRQ WKH PHWKRGV DUH VWLOO QRW JHQHUDO HQRXJK 7KHUH DUH WZR W\SLFDO LQIRUPDn WLRQWKHRUHWLF PHWKRGV LQ WKLV DUHD PD[LPXP HQWURS\ 0(f DQG PLQLPXP PXWXDO LQIRUPDWLRQ 00,f >%HO PAGE 21 SLHVf RU FDOOHG fGDWD VDPSOHVff )RU LQVWDQFH FKLOGUHQ OHDUQ WR VSHDN E\ OLVWHQLQJ OHDUQ WR UHFRJQL]H REMHFWV E\ EHLQJ SUHVHQWHG ZLWK H[HPSODUV OHDUQ WR ZDON E\ WU\LQJ HWF ,Q JHQn HUDO FKLOGUHQ OHDUQ E\ WKH VWLPXODWLRQ IURP WKHLU HQYLURQPHQW $GDSWLYH V\VWHPV IRU VLJQDO SURFHVVLQJ >:LG +D\ +D\@ DUH DOVR OHDUQLQJ V\VWHPV WKDW HYROYH ZLWK WKH LQWHUDFn WLRQ ZLWK LQSXW RXWSXW DQG GHVLUHG RU WHDFKHUf VLJQDOV 7R VWXG\ WKH JHQHUDO SULQFLSOH RI D OHDUQLQJ V\VWHP ZH ILUVW QHHG WR VHW DQ DEVWUDFW PRGHO IRU WKH V\VWHP DQG LWV HQYLURQPHQW $V LOOXVWUDWHG LQ )LJXUH DQ DEVWUDFW OHDUQLQJ WQ N P N V\VWHP LV D PDSSLQJ 5 } 5 < T; :f ZKHUH OHL" LV WKH LQSXW VLJQDO < H 5 LV WKH RXWSXW VLJQDO : LV D VHW RI SDUDPHWHUV RI WKH PDSSLQJ 7KH HQYLURQPHQW LV PRGHOHG N f f f E\ WKH GRXEOHW ; 'f ZKHUH H 5 LV D GHVLUHG VLJQDO WHDFKHU VLJQDOf 7KH OHDUQLQJ PHFKDQLVP LV D VHW RI UXOHV RU SURFHGXUHV WKDW ZLOO DGMXVW WKH SDUDPHWHUV : VR WKDW WKH PDSSLQJ DFKLHYHV D GHVLUHG JRDO )LJXUH ,OOXVWUDWLRQ RI D /HDUQLQJ 6\VWHP 7KHUH DUH D YDULHW\ RI OHDUQLQJ V\VWHPV OLQHDU RU QRQOLQHDU IHHGIRUZDUG RU UHFXUUHQW IXOO UDQN RU GLPHQVLRQ UHGXFHG SHUFHSWURQ DQG PXOWLOD\HU SHUFHSWURQ 0/3f ZLWK JOREDO EDVLV RU UDGLDOEDVLV IXQFWLRQ ZLWK ORFDO EDVLV HWF 'LIIHUHQW V\VWHP VWUXFWXUHV PD\ KDYH GLIIHUHQW SURSHUW\ DQG XVDJH >+D\@ PAGE 22 7KH HQYLURQPHQW GRXEOHW ;f 'f DOVR KDV D YDULHW\ RI IRUPV $ OHDUQLQJ SURFHVV FDQ KDYH D GHVLUHG VLJQDO RU QRW YHU\ RIWHQ WKH LQSXW VLJQDO LV WKH LPSOLFLW GHVLUHG VLJQDOf 6RPH VWDWLVWLFDO SURSHUW\ RI ; RU < RU FDQ EH JLYHQ RU DVVXPHG 0RVW RIWHQ RQO\ D GLVn FUHWH GDWD VHW 4 ^ ;c 'cf _] 1` LV SURYLGHG 6XFK D VFKHPH LV FDOOHG fOHDUQLQJ IURP H[DPSOHVf DQG LV D JHQHUDO FDVH >+D\ +D\@ 7KLV GLVVHUWDWLRQ LV PRUH LQWHUHVWHG LQ fOHDUQLQJ IRUP H[DPSOHVf WKDQ DQ\ VFKHPH ZLWK VRPH DVVXPSWLRQV DERXW WKH GDWD 2I FRXUVH LI D SULRUL NQRZOHGJH DERXW GDWD LV NQRZQ D OHDUQLQJ PHWKRG VKRXOG LQFRUSRUDWH WKLV NQRZOHGJH 7KHUH DUH DOVR D ORW RI OHDUQLQJ PHFKDQLVPV 6RPH RI WKHP PDNH DVVXPSWLRQV DERXW GDWD DQG RWKHUV GR QRW 6RPH DUH FRXSOHG ZLWK WKH VWUXFWXUH DQG WRSRORJ\ RI WKH OHDUQLQJ V\VWHP ZKLOH WKH RWKHUV DUH LQGHSHQGHQW RI WKH V\VWHP $ JHQHUDO OHDUQLQJ PHFKDQLVP VKRXOG QRW GHSHQG RQ GDWD DQG VKRXOG EH GHFRXSOHG IURP WKH OHDUQLQJ V\VWHP 7KHUH LV QR GRXEW WKDW WKH DUHD LV ULFK LQ GLYHUVLW\ EXW ODFNV XQLILFDWLRQ 7KHUH DUH QR PRUH NQRZQ DEVWUDFW DQG IXQGDPHQWDO FRQFHSWV VXFK DV HQHUJ\ DQG LQIRUPDWLRQ 7R ORRN IRU WKH HVVHQFH RI OHDUQLQJ RQH VKRXOG VWDUW IURP WKHVH WZR EDVLF LGHDV 2EYLRXVO\ OHDUQn LQJ LV DERXW REWDLQLQJ NQRZOHGJH DQG LQIRUPDWLRQ %DVHG RQ WKH DERYH OHDUQLQJ V\VWHP PRGHO ZH FDQ VD\ WKDW OHDUQLQJ LV QRWKLQJ EXW WR WUDQVIHU RQWR WKH PDFKLQH SDUDPHWHUV WKH LQIRUPDWLRQ FRQWDLQHG LQ WKH HQYLURQPHQW RU LQ D JLYHQ GDWD VHW WR EH PRUH VSHFLILF 7KLV GLVVHUWDWLRQ ZLOO WU\ WR JLYH D XQLI\LQJ SRLQW RI YLHZ IRU OHDUQLQJ V\VWHPV DQG WR LPSOHPHQW LW E\ XVLQJ WKH SURSRVHG LQIRUPDWLRQ SRWHQWLDO 7KH EDVLF SXUSRVH RI OHDUQLQJ LV WR JHQHUDOL]H 7KH DELOLW\ RI DQLPDOV WR OHDUQ VRPHn WKLQJ JHQHUDO IURP WKHLU SDVW H[SHULHQFHV LV WKH NH\ WR WKHLU VXUYLYDO LQ WKH IXWXUH 5HJDUGn LQJ WKH JHQHUDOL]DWLRQ DELOLW\ RI D OHDUQLQJ PDFKLQH RQH YHU\ IXQGDPHQWDO TXHVWLRQ LV PAGE 23 ZKDW LV WKH EHVW ZH FDQ GR WR JHQHUDOL]H IRU D JLYHQ OHDUQLQJ V\VWHP DQG D JLYHQ VHW RI HQYLURQPHQWDO GDWD" 2QH WKLQJ LV YHU\ FOHDU WKDW WKH LQIRUPDWLRQ FRQWDLQHG LQ WKH JLYHQ GDWD VHW LV D TXDQWLW\ WKDW FDQ QRW EH FKDQJHG E\ DQ\ OHDUQLQJ PHWKRG DQG QR OHDUQLQJ PHWKRG FDQ JR EH\RQG WKDW 7KXV LW LV WKH EHVW WKDW RQH OHDUQLQJ V\VWHP FDQ SRVVLEO\ REWDLQ *HQHUDOL]DWLRQ IURP WKLV SRLQW RI YLHZ LV QRW WR FUHDWH VRPHWKLQJ QHZ EXW WR XWLn OL]H IXOO\ WKH LQIRUPDWLRQ FRQWDLQHG LQ WKH REVHUYHG GDWD QHLWKHU OHVV QRU PRUH %\ fOHVVf ZH PHDQ WKDW WKH LQIRUPDWLRQ REWDLQHG E\ D OHDUQLQJ V\VWHP LV OHVV WKDQ WKH LQIRUPDWLRQ FRQWDLQHG LQ WKH JLYHQ GDWD %\ fPRUHf ZH PHDQ WKDW LPSOLFLWO\ RU H[SOLFLWO\ D OHDUQLQJ PHWKRG DVVXPHV VRPHWKLQJ WKDW LV QRW JLYHQ 7KLV LV DOVR WKH VSLULW RI -D\QHV >-D\@ PHQn WLRQHG DERYH DQG VLPLODU SRLQW RI YLHZ FDQ EH IRXQG LQ &KULVWHQVHQ >&KU &KU@ 7KH HQYLURQPHQWDO GDWD IRU D OHDUQLQJ V\VWHP DUH XVXDOO\ QRW FROOHFWHG DOO DW RQH WLPH EXW DUH DFFXPXODWHG GXULQJ D OHDUQLQJ SURFHVV :KHQHYHU RQH GDWXP DSSHDUV RU DIWHU D VPDOO VHW RI GDWD LV REWDLQHG OHDUQLQJ VKRXOG WDNH SODFH DQG WKH SDUDPHWHUV RI WKH OHDUQLQJ V\VWHP VKRXOG EH XSGDWHG 7KLV LV WKH SUREOHP RI WKH RQOLQH OHDUQLQJ PHWKRG ZKLFK LV DOVR WKH LVVXH WKDW WKLV GLVVHUWDWLRQ LV JRLQJ WR GHDO ZLWK $QRWKHU SUREOHP WKDW WKLV GLVVHUWDWLRQ LV LQWHUHVWHG LQ LV WKH fORFDOf OHDUQLQJ DOJRn ULWKPV ,Q D ELRORJLFDO QHUYRXV V\VWHP ZKDW FDQ EH FKDQJHG LV WKH VWUHQJWK RI V\QDSWLF FRQQHFWLRQV 7KH FKDQJH RI D V\QDSWLF FRQQHFWLRQ FDQ RQO\ GHSHQG RQ LWV ORFDO LQIRUPDn WLRQ LH LWV LQSXW DQG RXWSXW )RU DQ HQJLQHHULQJ V\VWHP LW ZLOO EH PXFK HDVLHU WR LPSOHn PHQW E\ HLWKHU KDUGZDUH RU VRIWZDUH LI WKH OHDUQLQJ UXOH LV fORFDOf LH WKH XSGDWH RI D FRQQHFWLRQ LQ D OHDUQLQJ QHWZRUN V\VWHP RQO\ UHOLHV RQ LWV LQSXW DQG RXWSXW 7KH +HEEfV UXOH LV D IDPRXV QHXURSV\FKRORJLFDO SRVWXODWLRQ RI KRZ D V\QDSWLF FRQQHFWLRQ ZLOO HYROYH PAGE 24 ZLWK LWV LQSXW DQG RXWSXW >+HE +D\@ ,W ZLOO EH VKRZQ LQ WKLV GLVVHUWDWLRQ KRZ +HE ELDQ W\SH DOJRULWKPV FDQ EH UHODWHG WR WKH HQHUJ\ DQG HQWURS\ RI VLJQDOV 2XWOLQH ,Q &KDSWHU WKH EDVLF LGHDV RI HQHUJ\ LQIRUPDWLRQ HQWURS\ DQG WKHLU UHODWLRQVKLS ZLOO EH UHYLHZHG 6LQFH WKH LQIRUPDWLRQ HQWURS\ GLUHFWO\ UHOLHV RQ WKH SGI RI WKH YDULDEOH WKH 3DU]HQ ZLQGRZ QRQSDUDPHWULF PHWKRG ZLOO EH UHYLHZHG IRU WKH GHYHORSPHQW RI WKH LGHD RI LQIRUPDWLRQ SRWHQWLDO DQG FURVV LQIRUPDWLRQ SRWHQWLDO )LQDOO\ WKH GHULYDWLRQ ZLOO EH JLYHQ WKH LGHD RI WKH LQIRUPDWLRQ IRUFH LQ D LQIRUPDWLRQ SRWHQWLDO ILHOG ZLOO EH LQWURGXFHG IRU LWV XVH LQ OHDUQLQJ V\VWHPV DQG WKH FDOFXODWLRQ SURFHGXUH IRU LQIRUPDWLRQ SRWHQWLDO DQG FURVV LQIRUPDWLRQ SRWHQWLDO DQG DOO WKH IRUFHV LQ FRUUHVSRQGLQJ LQIRUPDWLRQ SRWHQWLDO ILHOGV ZLOO EH GHVFULEHG ,Q &KDSWHU D YDULHW\ RI OHDUQLQJ V\VWHPV DQG OHDUQLQJ PHFKDQLVPV ZLOO EH UHYLHZHG $ XQLI\LQJ SRLQW RI YLHZ DERXW OHDUQLQJ E\ LQIRUPDWLRQ WKHRU\ ZLOO EH JLYHQ 7KH LQIRUPDn WLRQ SRWHQWLDO LPSOHPHQWDWLRQ IRU WKH XQLI\LQJ LGHD ZLOO EH GHVFULEHG $QG JHQHUDOL]DWLRQ RI OHDUQLQJ ZLOO EH GLVFXVVHG ,Q &KDSWHU WKH RQOLQH ORFDO DOJRULWKPV IRU D OLQHDU V\VWHP ZLWK HQHUJ\ FULWHULD ZLOO EH UHYLHZHG 7KH UHODWLRQVKLS EHWZHHQ +HEELDQ DQWL+HEELDQ UXOHV DQG WKH HQHUJ\ FULWHULD ZLOO EH GLVFXVVHG $Q RQOLQH ORFDO DOJRULWKP IRU JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ ZLOO EH SURSRVHG ZLWK WKH GLVFXVVLRQ RI FRQYHUJHQFH SURSHUWLHV VXFK DV WKH FRQYHUJHQFH VSHHG DQG VWDELOLW\ &KDSWHU ZLOO JLYH VHYHUDO DSSOLFDWLRQ H[DPSOHV )LUVW WKH LQIRUPDWLRQ SRWHQWLDO PHWKRG ZLOO EH DSSOLHG WR DVSHFW DQJOH HVWLPDWLRQ IRU 6$5 LPDJHV 6HFRQG WKH VDPH PHWKRG ZLOO EH DSSOLHG WR WKH 6$5 DXWRPDWLF WDUJHW UHFRJQLWLRQ 7KLUG WKH H[DPSOH RI WKH PAGE 25 WUDLQLQJ RI OD\HUHG QHXUDO QHWZRUN E\ WKH LQIRUPDWLRQ SRWHQWLDO PHWKRG ZLOO EH GHVFULEHG )RXUWK WKH PHWKRG ZLOO EH DSSOLHG WR LQGHSHQGHQW FRPSRQHQW DQDO\VLV DQG EOLQG VRXUFH VHSDUDWLRQ &KDSWHU ZLOO FRQFOXGH WKH GLVVHUWDWLRQ DQG SURYLGH D VXUYH\ RQ WKH IXWXUH ZRUN LQ WKLV DUHD PAGE 26 &+$37(5 (1(5*< (17523< $1' ,1)250$7,21 327(17,$/ (QHUJ\ (QWURS\ DQG ,QIRUPDWLRQ RI 6LJQDOV (QHUJ\ RI 6LJQDOV )URP WKH VWDWLVWLFDO SRLQW RI YLHZ WKH HQHUJ\ RI D VWDWLRQDU\ VLJQDO LV UHODWHG WR LWV YDULDQFH )RU D VWDWLRQDU\ VLJQDO [Qf ZLWK YDULDQFH D DQG PHDQ P LWV HQHUJ\ SUHn FLVHO\ VKRUW WLPH HQHUJ\ RU SRZHUf LV ([ (>[@ D P f ZKHUH (> @ LV WKH H[SHFWDWLRQ RSHUDWRU ,I P WKHQ WKH HQHUJ\ LV HTXDO WR WKH YDULDQFH ([ FW 6R EDVLFDOO\ HQHUJ\ LV D TXDQWLW\ UHODWHG WR VHFRQG RUGHU VWDWLVWLFV )RU WZR VLJQDOV [[Qf DQG [Qf ZLWK PHDQ P[ DQG P UHVSHFWLYHO\ WKH FRYDULn DQFH U A>[M fÂ§ P[f[ fÂ§ Pf@ (>[[[? fÂ§ P[P DQG ZH KDYH WKH FURVVFRUUHODWLRQ EHWZHHQ WZR VLJQDOV F? &[[ (>[[[? U P[P f ,I DW OHDVW RQH VLJQDO LV ]HURPHDQ F[ U 7 )RU D VLJQDO ; [[[f DOO WKH VHFRQG VWDWLVWLFV DUH JLYHQ LQ D FRYDULDQFH PDWUL[ ( DQG ZH KDYH (>;;7@ ; P[ P[P = Dc U P[P P U D f PAGE 27 8VXDOO\ WKH ILUVW RUGHU VWDWLVWLFV KDV QRWKLQJ WR GR ZLWK WKH LQIRUPDWLRQ ZH ZLOO MXVW 7 FRQVLGHU ]HURPHDQ FDVH WKXV ZH KDYH (>;; @ )RU D VLJQDO WKHUH DUH WKUHH HQHUJ\ TXDQWLWLHV LQ WKH FRYDULDQFH PDWUL[ D FW DQG U 2QH PD\ DVN ZKDW LV WKH RYHUDOO HQHUJ\ TXDQWLW\ IRU D VLJQDO )URP OLQHDU DOJHn EUD >1RE@ WKHUH DUH FKRLFHV WKH ILUVW LV WKH GHWHUPLQDQW RI ZKLFK LV D YROXPH PHDn VXUH LQ WKH VLJQDO VSDFH DQG LV HTXDO WR WKH SURGXFW RI DOO WKH HLJHQYDOXHV RI e VHFRQG LV WKH WUDFH RI e ZKLFK LV HTXDO WR WKH VXP RI DOO WKH HLJHQYDOXHV RI e WKH WKLUG LV WKH SURGXFW RI DOO WKH GLDJRQDO HOHPHQWV 7KXV ZH KDYH f -? ORJ_=_ WU=f R@ F!O f ORJJHUrf ZKHUH WU f LV WKH WUDFH RSHUDWRU WKH XVH RI ORJ IXQFWLRQ LQ -@ DQG LV WR UHGXFH WKH G\QDPLF UDQJH RI WKH RULJLQDO TXDQWLWLHV DQG WKLV LV DOVR UHODWHG WR WKH LQIRUPDWLRQ RI WKH VLJQDO ZKLFK ZLOO EH FOHDU ODWHU LQ WKLV FKDSWHU 7KH FRPSRQHQW VLJQDOV MFM DQG [ ZLOO EH FDOOHG PDUJLQDO VLJQDOV LQ WKLV GLVVHUWDWLRQ ,I WKH WZR PDUJLQDO VLJQDOV [c DQG [ DUH XQFRUUHODWHG WKHQ -O ,Q JHQHUDO ZH KDYH -!f ZKHUH WKH HTXDOLW\ KROGV LI DQG RQO\ LI WKH WZR PDUJLQDO VLJQDOV DUH XQFRUUHODWHG 7KLV LV WKH VRFDOOHG +DGDPDUGfV LQHTXDOLW\ >1RE 'HF@ ,Q JHQHUDO IRU D SRVLWLYH VHPLGHIL QLWH PDWUL[ ZH KDYH WKH VDPH LQHTXDOLW\ ZKHUH -[ LV WKH GHWHUPLQDQW RI WKH PDWUL[ RU LWV ORJDULWKP QRWH WKDW ORJDULWKP LV D PRQRWRQLF LQFUHDVLQJ IXQFWLRQf LV WKH PXOWLSOLFDn WLRQ RI WKH GLDJRQDO FRPSRQHQWV RU LWV ORJDULWKPf PAGE 28 :KHQ WKH WZR PDUJLQDO VLJQDOV DUH XQFRUUHODWHG DQG WKHLU YDULDQFHV DUH HTXDO WKHQ -c DQG DUH HTXLYDOHQW LQ WKH VHQVH WKDW )RU D Q' VLJQDO ; -c RJfÂ§ RJ RJFM f 7 MH [Qf ZLWK ]HURPHDQ ZH KDYH FRYDULDQFH PDWUL[ (>;;U@ ZKHUH D L O}f DUH WKH YDULDQFH RI WKH PDUJLQDO VLJQDOV [c UcM L Q M Q L Mf DUH WKH FURVVFRUUHODWLRQV EHWZHHQ WKH PDUJLQDO VLJn QDOV [O DQG ;M 7KH WKUHH SRVVLEOH RYHUDOO HQHUJ\ PHDVXUH DUH U -? ORJ LAL D U ,Q QO f Q f < L O f $ ORJ Q L Y f +DGDPDUGfV LQHTXDOLW\ LV WKH HTXDOLW\ KROGV LI DQG RQO\ LI LV GLDJRQDO LH WKH PDUJLQDO VLJQDOV DUH XQFRUUHODWHG ZLWK HDFK RWKHU LV HTXDO WR WKH VXP RI DOO WKH HLJHQYDOXHV RI DQG LV LQYDULDQW XQGHU DQ\ RUWKRQRUn PDO WUDQVIRUPDWLRQ URWDWLRQ WUDQVIRUPf :KHQ WKH PDUJLQDO VLJQDOV DUH XQFRUUHODWHG ZLWK HDFK RWKHU DQG WKHLU YDULDQFHV DUH HTXDO DQG -[ DUH HTXLYDOHQW LQ WKH VHQVH WKDW WKH\ DUH UHODWHG E\ D PRQRWRQLF LQFUHDVLQJ IXQFWLRQ -c Q?RJfÂ§ Q?RJQ mORJD f PAGE 29 ,QIRUPDWLRQ (QWURS\ &RPSDUHG ZLWK HQHUJ\ WKH LQIRUPDWLRQ HQWURS\ RI D VLJQDO LQYROYHV DOO WKH VWDWLVWLFV RI D VLJQDO DQG WKXV LV PRUH SURIRXQG DQG GLIILFXOW WR LPSOHPHQW $V PHQWLRQHG LQ &KDSWHU WKH VWXG\ RI DEVWUDFW TXDQWLWDWLYH PHDVXUHV IRU LQIRUPDWLRQ VWDUWHG LQ V ZKHQ 1\TXLVW DQG +DUWOH\ SURSRVHG D ORJDULWKPLF PHDVXUH >1\T +DU@ /DWHU LQ 6KDQQRQ SRLQWHG RXW WKDW WKH PHDVXUH LV YDOLG RQO\ LI DOO HYHQWV DUH HTXLSUREDEOH >6KD@ )XUWKHU KH FRLQHG WKH WHUP fLQIRUPDWLRQ HQWURS\f ZKLFK LV WKH PDWKHPDWLFDO H[SHFWDWLRQ RI 1\TXLVW DQG +DUWOH\fV PHDVXUHV ,Q 5HQ\L JHQHUDOL]HG 6KDQQRQfV LGHD E\ XVLQJ DQ H[SRQHQWLDO IXQFWLRQ UDWKHU WKDQ D OLQHDU IXQFWLRQ WR FDOFXODWH WKH PHDQ >5HQ 5HQ@ /DWHU RQ RWKHU IRUPV RI LQIRUPDWLRQ HQWURS\ DSSHDUHG HJ +DYUGD DQG &KDUYDWfV PHDVXUH .DSXUfV PHDVXUHf >.DS@ $OWKRXJK 6KDQQRQfV HQWURS\ LV WKH RQO\ RQH ZKLFK SRVVHVVHV DOO WKH SRVWXODWHG SURSHUWLHV ZKLFK ZLOO EH JLYHQ ODWHUf IRU DQ LQIRUPDWLRQ PHDVXUH WKH RWKHU IRUPV VXFK DV 5HQ\LfV DQG +DYUGD&KDUYDWfV DUH HTXLYn DOHQW ZLWK UHJDUGV WR HQWURS\ PD[LPL]DWLRQ >.DS@ ,Q D UHDO SUREOHP ZKLFK IRUP WR XVH GHSHQGV XSRQ RWKHU UHTXLUHPHQWV VXFK DV HDVH RI LPSOHPHQWDWLRQ )RU DQ HYHQW ZLWK SUREDELOLW\ S DFFRUGLQJ WR +DUWOH\fV LGHD WKH LQIRUPDWLRQ JLYHQ ZKHQ WKLV HYHQW KDSSHQV LV ,Sf ORJ fÂ§ORJ" >+DU@ 6KDQQRQ IXUWKHU GHYHORSHG 3 +DUWOH\fV LGHD UHVXOWLQJ LQ 6KDQQRQfV LQIRUPDWLRQ HQWURS\ IRU D YDULDEOH ZLWK WKH SUREDn ELOLW\ GLVWULEXWLRQ ^ScF?N +V <3NWtNf =A 3N!r f N N ,Q WKH JHQHUDO WKHRU\ RI PHDQV D PHDQ RI WKH UHDO QXPEHUV [^ [Q ZLWK ZHLJKWV S[ SQ KDV WKH IRUP PAGE 30 3 ? Q = 30[Nf N Y \ ZKHUH S[f LV .ROPRJRURY1DJXPR IXQFWLRQ ZKLFK VWULFWO\ PRQRWRQLF IXQFWLRQ GHILQHG RQ WKH UHDO QXPEHUV VXUH VKRXOG EH >5HQ2 5HQ @ f LV DQ DUELWUDU\ FRQWLQXRXV DQG 6R LQ JHQHUDO WKH HQWURS\ PHD I ? O 3 Q ] 30-SNff f Y \ $V DQ LQIRUPDWLRQ PHDVXUH FS f FDQ QRW EH DUELWUDU\ VLQFH LQIRUPDWLRQ LV fDGGLWLYHf 7R PHHW WKH DGGLWLYLW\ FRQGLWLRQ FS f FDQ EH HLWKHU FS[f [ RU FS[f ,I WKH IRUPHU LV XVHG f ZLOO EHFRPH 6KDQQRQfV HQWURS\ f ,I WKH ODWWHU LV XVHG 5HQ\LfV HQWURS\ ZLWK RUGHU D LV REWDLQHG >5HQ2 5HQ@ I ? +5D BD ORJ D 3N N D D r f Y \ ,Q +DYUGD DQG &KDUYDW SURSRVHG DQRWKHU HQWURS\ PHDVXUH ZKLFK LV VLPLODU WR 5HQ\LfV PHDVXUH EXW KDV GLIIHUHQW VFDOLQJ >+DY .DS@ LW ZLOO EH FDOOHG +DYUGD&KDU YDWfV HQWURS\ RU +& HQWURS\ IRU VKRUWf & ? +KD D D 3N N D D r f Y \ 7KHUH DUH DOVR VRPH RWKHU HQWURS\ PHDVXUHV IRU LQVWDQFH fÂ§ORJPD[ SNff N >.DS@ 'LIIHUHQW HQWURS\ PHDVXUHV PD\ KDYH GLIIHUHQW SURSHUWLHV 7KHUH DUH PRUH WKDQ D GR]HQ SURSHUWLHV IRU 6KDQQRQfV HQWURS\ :H ZLOO GLVFXVV ILYH EDVLF SURSHUWLHV VLQFH DOO WKH PAGE 31 RWKHU SURSHUWLHV FDQ EH GHULYHG IURP WKHVH SURSHUWLHV >6KD 6KD .DS .DS $F]@ f 7KH HQWURS\ PHDVXUH +S[ SQf LV D FRQWLQXRXV IXQFWLRQ RI DOO WKH SUREDELOLWLHV SN ZKLFK PHDQV WKDW D VPDOO FKDQJH LQ SUREDELOLW\ GLVWULEXWLRQ ZLOO RQO\ UHVXOW LQ D VPDOO FKDQJH LQ WKH HQWURS\ f +S? Â‘Â‘SQf LV SHUPXWDWLRQDOO\ V\PPHWULF LH WKH SRVLWLRQ FKDQJH RI DQ\ WZR RU PRUH SN LQ +S[ SQf ZLOO QRW FKDQJH WKH HQWURS\ YDOXH $FWXDOO\ WKH SHUPXWDWLRQ RI DQ\ SN LQ WKH GLVWULEXWLRQ ZLOO QRW FKDQJH WKH XQFHUWDLQW\ RU GLVRUGHU RI WKH GLVWULEXWLRQ DQG WKXV VKRXOG QRW DIIHFW WKH HQWURS\ f +?Q ?ULf LV D PRQRWRQLF LQFUHDVLQJ IXQFWLRQ RI Q )RU DQ HTXLSUREDEOH GLVWULEXWLRQ ZKHQ WKH QXPEHU RI FKRLFHV Q LQFUHDVHV WKH XQFHUWDLQW\ RU GLVRUGHU LQFUHDVHV DQG VR GRHV WKH HQWURS\ PHDVXUH f 5HFXUVLYLW\ ,I DQ HQWURS\ PHDVXUH VDWLVILHV f RU f WKHQ LW KDV WKH UHFXU VLYLW\ SURSHUW\ ,W PHDQV WKDW WKH HQWURS\ RI Q RXWFRPHV FDQ EH H[SUHVVHG LQ WHUPV RI WKH HQWURS\ RI Q fÂ§ RXWFRPHV SOXV WKH ZHLJKWHG HQWURS\ RI WKH FRPELQHG RXWFRPHV +Q3Of3Of f!3Qf +Q?3? 3f3Of Â‘Â‘Â‘f3Qf 3? 3Of+O>M S L 3 ? f r 3 3? 3+Q3O!3f fÂ‘Â‘f3Qf +QB[^S[ SS Â‘Â‘3ff L3L SI+ 3? 3 f 3? 3 3? 3f ZKHUH D LV WKH SDUDPHWHU LQ 5HQ\LfV HQWURS\ RU +& HQWURS\ f$GGLWLYLW\ ,I S SQf DQG T TY TPf DUH WZR LQGHSHQGHQW SUREDn ELOLW\ GLVWULEXWLRQ DQG WKH MRLQW SUREDELOLW\ GLVWULEXWLRQ LV GHQRWHG E\ S f T WKHQ WKH SURSn HUW\ +^S f Tf +Sf +Tf LV FDOOHG DGGLWLYLW\ PAGE 32 7KH IROORZLQJ WDEOH JLYHV WKH FRPSDULVRQ RI WKH WKUHH W\SHV RI HQWURS\ DERXW WKH DERYH ILYH SURSHUWLHV 7DEOH 7KH &RPSDULVRQ RI 3URSHUWLHV RI 7KUHH (QWURSLHV 3URSHUWLHV f f f f f 6KDQQRQfV \HV \HV \HV \HV \HV 5HQ\LfV \HV \HV \HV QR \HV +&fV \HV \HV \HV \HV QR )URP WKH WDEOH ZH FDQ VHH WKDW WKH WKUHH W\SHV RI HQWURS\ GLIIHU LQ UHFXUVLYLW\ DQG DGGLn WLYLW\ +RZHYHU .DSXU SRLQWHG RXW f7KH PD[LPXP HQWURS\ SUREDELOLW\ GLVWULEXWLRQV JLYHQ E\ +DYUGD&KDUYDW DQG 5HQ\LfV PHDVXUHV DUH LGHQWLFDO 7KLV VKRZV WKDW QHLWKHU DGGLWLYLW\ QRU UHFXUVLYLW\ LV HVVHQWLDO IRU D PHDVXUH WR EH XVHG LQ PD[LPXP HQWURS\ SULQFLn SOHf >.DS SDJH @ 6R WKH WKUHH HQWURSLHV DUH HTXLYDOHQW IRU HQWURS\ PD[LPL]DWLRQ DQG DQ\ RI WKHP FDQ EH XVHG $V ZH FDQ VHH IURP WKH DERYH 6KDQQRQfV HQWURS\ KDV QR SDUDPHWHU EXW ERWK 5HQ\LfV HQWURS\ DQG +DYUGD&KDUYDWfV HQWURS\ KDYH D SDUDPHWHU D 6R ERWK 5HQ\LfV HQWURS\ DQG +DYUGD&KDUYDWfV PHDVXUHV FRQVWLWXWH D IDPLO\ RI HQWURS\ PHDVXUHV 7KHUH LV D UHODWLRQ EHWZHHQ 6KDQQRQfV HQWURS\ DQG 5HQ\LfV HQWURS\ >5HQ .DS@ +5D ! +US LI D DQG S 8P +5D +V D f LH WKH 5HQ\LfV HQWURS\ LV D PRQRWRQLF GHFUHDVLQJ IXQFWLRQ RI WKH SDUDPHWHU D DQG LW DSSURDFKHV 6KDQQRQfV HQWURS\ ZKHQ D DSSURDFKHV 7KXV 6KDQQRQfV HQWURS\ FDQ EH UHJDUGHG DV RQH PHPEHU RI WKH 5HQ\LfV HQWURS\ IDPLO\ 6LPLODU UHVXOWV KROG IRU +DYUGD&KDUYDWfV HQWURS\ PHDVXUH >.DS@ PAGE 33 +KD +V +KS LI D DQG 3 8P +KD +V D f 7KXV 6KDQQRQfV HQWURS\ FDQ DOVR EH UHJDUGHG DV RQH PHPEHU RI +DYUGD&KDUYDWfV HQWURS\ IDPLO\ 6R ERWK 5HQ\L DQG +DYUGD&KDUYDW JHQHUDOL]H 6KDQQRQfV LGHD RI LQIRUPDn WLRQ HQWURS\ Q :KHQ D +KO fÂ§ SN LV FDOOHG TXDGUDWLF HQWURS\ >-XP@ ,Q WKLV GLVVHUWD Q N O WLRQ +U fÂ§ORJ A SN LV DOVR FDOOHG TXDGUDWLF HQWURS\ IRU FRQYHQLHQFH DQG EHFDXVH RI N O WKH GHSHQGHQFH RI WKH HQWURS\ TXDQWLW\ RQ WKH TXDGUDWLF IRUP RI SUREDELOLW\ GLVWULEXWLRQ 7KH TXDGUDWLF IRUP ZLOO JLYH XV PRUH FRQYHQLHQFH DV ZH ZLOO VHH ODWHU )RU WKH FRQWLQXRXV UDQGRP YDULDEOH < ZLWK SGI I<^\f VLPLODUO\ WR WKH %ROW]PDQQ 6KDQQRQ GLIIHUHQWLDO HQWURS\ +6 PAGE 34 WDQFH EHWZHHQ WKH SRLQW RI WKH SUREDELOLW\ GLVWULEXWLRQ S S[ SQf DQG WKH RULJLQ LQ WKH VSDFH RI 5Q $V LOOXVWUDWHG LQ )LJXUH WKH SUREDELOLW\ GLVWULEXWLRQ SRLQW Q S S[SQf LV UHVWULFWHG WR D VHJPHQW RI WKH K\SHUSODQH GHILQHG E\ SN DQG N O SN LQ WKH OHIW JUDSK EHORZ WKH UHJLRQ LV WKH OLQH FRQQHFWLQJ WZR SRLQWV f DQG f LQ WKH ULJKW JUDSK EHORZ WKH UHJLRQ LV WKH WULDQJXODU DUHD FRQILQHG E\ WKH WKUHH FRQQHFWLQJ OLQHV EHWZHHQ HDFK SDLU RI WKUHH SRLQWV f f DQG ff 7KH HQWURS\ RI WKH Q SUREDELOLW\ GLVWULEXWLRQ S "SQf LV D IXQFWLRQ 9D ASN ZKLFK LV WKH D N O QRUP RI WKH SRLQW S UDLVHG SRZHU WR D >1RY *RO@ DQG ZLOO EH FDOOHG fHQWURS\ D QRUPf 5HQ\LfV HQWURS\ UHVFDOH WKH fHQWURS\ DQRUPf 9D E\ D ORJDULWKP +UD MfÂ§AORJ 9D ZKLOH +DYUGD&KDUYDWfV HQWURS\ OLQHDUO\ UHVFDOHV WKH fHQWURS\ D QRUPf 9D +KD fÂ‘ )LJXUH *HRPHWULFDO ,QWHUSUHWDWLRQ RI (QWURS\ 6R ERWK 5HQ\LfV HQWURS\ ZLWK RUGHU D +5Df DQG +DYUGD&KDUYDWfV HQWURS\ ZLWK RUGHU D +KDf DUH UHODWHG WR WKH D QRUP RI WKH SUREDELOLW\ GLVWULEXWLRQ S )RU WKH DERYH PHQWLRQHG LQILQLW\ HQWURS\ +[ WKHUH LV D UHODWLRQ OLP +5D DQG D fÂ§! RR +P fÂ§ORJPD[ SNff >.DS@ 7KHUHIRUH +ULV UHODWHG WR WKH LQILQLW\QRUP RI WKH N PAGE 35 SUREDELOLW\ GLVWULEXWLRQ S )RU 6KDQQRQfV HQWURS\ ZH KDYH OLP +5D +V DQG D fÂ§! O OLP +KD +V ,W PLJKW EH LQWHUHVWLQJ WR FRQVLGHU 6KDQQRQfV HQWURS\ DV WKH UHVXOW RI D fÂ§! QRUP RI WKH SUREDELOLW\ GLVWULEXWLRQ S $FWXDOO\ WKH QRUP RI DQ\ SUREDELOLW\ GLVWULEXWLRQ LV DOZD\V A f ,I ZH SOXJ 9@ DQG D LQ +5D ORJ)D DQG A N O aD +KD @fÂ§f)D f ZH ZLOO JHW ,WV OLPLW KRZHYHU LV 6KDQQRQfV HQWURS\ 6R LQ WKH OLPLW VHQVH 6KDQQRQfV HQWURS\ FDQ EH UHJDUGHG DV WKH IXQFWLRQ YDOXH RI WKH QRUP RI WKH SUREDELOLW\ GLVWULEXWLRQ 7KXV ZH FDQ JHQHUDOO\ VD\ WKDW WKH HQWURS\ ZLWK RUGHU D HLWKHU 5HQ\LfV RU +&fVf LV D PRQRWRQLF IXQFWLRQ RI WKH DQRUP RI WKH SUREDELOLW\ GLVWULn EXWLRQ S DQG WKH HQWURS\ DOO HQWURSLHV DW OHDVW DOO WKH DERYHPHQWLRQHG HQWURSLHVf LV HVVHQWLDOO\ D PRQRWRQLF IXQFWLRQ RI WKH GLVWDQFH IURP WKH SUREDELOLW\ GLVWULEXWLRQ SRLQW S WR WKH RULJLQ )URP OLQHDU DOJHEUD DOO QRUPV DUH HTXLYDOHQW LQ FRPSDULQJ GLVWDQFHV >*RO 1RE@ WKXV WKH\ DUH HTXLYDOHQW IRU GLVWDQFH PD[LPL]DWLRQ RU GLVWDQFH PLQLPL]DWLRQ LQ ERWK XQFRQVWUDLQHG DQG FRQVWUDLQHG FDVHV 7KHUHIRUH DOO HQWURSLHV DW OHDVW WKH DERYH PHQn WLRQHG HQWURSLHVf DUH HTXLYDOHQW IRU WKH SXUSRVH RI HQWURS\ PD[LPL]DWLRQ RU HQWURS\ PLQLn PL]DWLRQ :KHQ D ERWK 5HQ\LfV HQWURS\ +5D DQG +DYUGD&KDUYDWfV HQWURS\ +KD DUH PRQRWRQLF GHFUHDVLQJ IXQFWLRQV RI WKH fHQWURS\ D QRUPf 9D 6R LQ WKLV FDVH WKH HQWURS\ PD[LPL]DWLRQ LV HTXLYDOHQW WR WKH PLQLPL]DWLRQ RI WKH fHQWURS\ DQRUPf 9D DQG WKH HQWURS\ PLQLPL]DWLRQ LV HTXLYDOHQW WR WKH PD[LPL]DWLRQ RI WKH fHQWURS\ D QRUPf 9D :KHQ D ERWK 5HQ\LfV HQWURS\ +5D DQG +DYUGD&KDUYDWfV HQWURS\ +KD DUH PRQRWRQLF LQFUHDVLQJ IXQFWLRQV RI WKH fHQWURS\ D QRUPf 9D 6R LQ WKLV FDVH WKH HQWURS\ PD[LPL]DWLRQ LV HTXLYDOHQW WR WKH PD[LPL]DWLRQ RI WKH fHQWURS\ DQRUPf 9D DQG WKH HQWURS\ PLQLPL]DWLRQ LV HTXLYDOHQW WR WKH PLQLPL]DWLRQ RI WKH fHQWURS\ D QRUPf 9D PAGE 36 2I SDUWLFXODU LQWHUHVW LQ WKLV GLVVHUWDWLRQ DUH WKH TXDGUDWLF HQWURSLHV +5 DQG +KO ZKLFK DUH ERWK PRQRWRQLF GHFUHDVLQJ IXQFWLRQV RI WKH fHQWURS\ QRUPf 9 RI WKH SUREDn ELOLW\ GLVWULEXWLRQ S DQG DUH UHODWHG WR WKH (XFOLGHDQ GLVWDQFH IURP WKH SRLQW S WR WKH RULn JLQ 7KH HQWURS\ PD[LPL]DWLRQ LV HTXLYDOHQW WR WKH PLQLPL]DWLRQ RI 9 DQG WKH HQWURS\ PLQLPL]DWLRQ LV HTXLYDOHQW WR WKH PD[LPL]DWLRQ RI 9 0RUHRYHU VLQFH ERWK +5 DQG +K DUH ORZHU ERXQGV RI 6KDQQRQfV HQWURS\ WKH\ PLJKW EH PRUH HIILFLHQW WKDQ 6KDQQRQfV HQWURS\ IRU HQWURS\ PD[LPL]DWLRQ )RU D FRQWLQXRXV YDULDEOH < WKH SUREDELOLW\ GHQVLW\ IXQFWLRQ I<\f LV D SRLQW LQ D IXQFn WLRQDO VSDFH $OO WKH SGI I\L\f ZLOO FRQVWLWXWHV D VLPLODU UHJLRQ LQ D fK\SHUSODQHf GHILQHG E\ I<\fG\ DQG I<\f! 7KH VLPLODU JHRPHWULFDO LQWHUSUHWDWLRQ FDQ DOVR EH fÂ§ JLYHQ WR WKH GLIIHUHQWLDO HQWURSLHV ,Q SDUWLFXODU ZH KDYH WKH fHQWURS\ D QRUPf DV RR RR 9f I\L\IG\ 9 I\W\IG\ f fÂ§ fÂ§ 0XWXDO ,QIRUPDWLRQ 0XWXDO LQIRUPDWLRQ 0,f PHDVXUHV WKH UHODWLRQVKLS EHWZHHQ WZR YDULDEOHV DQG WKXV LV PRUH GHVLUDEOH LQ PDQ\ FDVHV )ROORZLQJ 6KDQQRQ >6KD 6KD@ WKH PXWXDO LQIRUPDWLRQ EHWZHHQ WZR UDQGRP YDULDEOHV ;@ DQG ; LV GHILQHG DV ;;f ccI;O[[L!rfO4JIUA ?G[LG[A I[>[?fI[[Lf f 7 ZKHUH I[[[[?f[Lf LV WKH MRLQW SGI RI MRLQW YDULDEOH ;M[f I[^[?f DQG I[0Of DUH WKH PDUJLQDO SGI IRU $ DQG ; UHVSHFWLYHO\ 2EYLRXVO\ PXWXDO LQIRUPDWLRQ LV V\PPHWULF LH ,V^;[ ;f ,6^; ;@f ,W LV QRW GLIILFXOW WR VKRZ WKH UHODWLRQ EHWZHHQ PXWXDO LQIRUn PDWLRQ DQG 6KDQQRQfV HQWURS\ LQ f >'HF +D\@ PAGE 37 ,6;9;f +Â;[f+V^;[?;f +V;f+V; ?;[f f +Â;[f +Â;f+Â;;W;f ZKHUH +V;[f DQG +6;f DUH WKH PDUJLQDO HQWURSLHV +6;[;f LV WKH MRLQW HQWURS\ +V^;[ ?;f +V;[ ;ffÂ§+V;f LV WKH FRQGLWLRQDO HQWURS\ RI ;[ JLYHQ ; ZKLFK LV WKH PHDVXUH RI XQFHUWDLQW\ RI ;[ ZKHQ ; LV JLYHQ RU WKH XQFHUWDLQW\ OHIW LQ ;@;f ZKHQ WKH XQFHUWDLQ RI ; LV UHPRYHG VLPLODUO\ +V^;?;[f LV WKH FRQGLWLRQDO HQWURS\ RI ; JLYHQ ;[ DOO HQWURSLHV LQYROYHG DUH 6KDQQRQfV HQWURS\f )URP f LW FDQ EH VHHQ WKDW WKH PXWXDO LQIRUPDWLRQ LV WKH PHDVXUH RI WKH XQFHUWDLQW\ UHPRYHG IURP ;[ ZKHQ ; LV JLYHQ RU LQ DQRWKHU ZRUG WKH PXWXDO LQIRUPDWLRQ LV WKH PHDVXUH RI WKH LQIRUPDWLRQ WKDW ; FRQYH\ DERXW ;[ RU YLFH YHUVD VLQFH WKH PXWXDO LQIRUPDWLRQ LV V\PPHWULFf ,W SURn YLGHV D PHDVXUH RI WKH VWDWLVWLFDO UHODWLRQVKLS EHWZHHQ ;[ DQG ; ZKLFK FRQWDLQV DOO WKH VWDWLVWLFV RI WKH UHODWHG GLVWULEXWLRQV DQG WKXV LV D PRUH JHQHUDO PHDVXUH WKDQ D VLPSOH FURVVFRUUHODWLRQ EHWZHHQ ;[ DQG ; ZKLFK RQO\ LQYROYH WKH VHFRQG RUGHU VWDWLVWLFV RI WKH YDULDEOHV ,W FDQ EH VKRZQ WKDW WKH PXWXDO LQIRUPDWLRQ LV QRQQHJDWLYH RU HTXLYDOHQWO\ WKH 6KDQn QRQfV HQWURS\ UHGXFHV RQ FRQGLWLRQLQJ RU WKH WRWDO PDUJLQDO HQWURSLHV LV WKH XSSHU ERXQG RI WKH MRLQW HQWURS\ LH ,V;[;f! +V^;[f +;[ ?;f +V^;f +; ?;[ f f +V^;[;f+V^;[f +V^;f 7KH PXWXDO LQIRUPDWLRQ FDQ DOVR EH UHJDUGHG DV WKH .XOOEDFN/HLEOHU GLYHUJHQFH ./ GLYHUJHQFH RU FDOOHG FURVVHQWURS\f >.XO 'HF +D\@ EHWZHHQ WKH MRLQW SGI PAGE 38 I;L[[Y rf DQG WKH IDFWRUL]HG PDUJLQDO SGI I;>[OfI[A[f Â‘ 7KH .XOOEDFN/HLEOHU GLYHUn JHQFH EHWZHHQ WZR SGIV [f DQG J[f LV GHILQHG DV f -HQVHQfV LQHTXDOLW\ >'HF $FH@ VD\V IRU D UDQGRP YDULDEOH ; DQG D FRQYH[ IXQFn WLRQ K[f WKH H[SHFWDWLRQ RI WKLV FRQYH[ IXQFWLRQ RI ; LV QR OHVV WKDQ WKH FRQYH[ IXQFWLRQ RI WKH H[SHFWDWLRQ RI ; LH (>K^;f@!K(>;ff RU ?K[fI[[fG[ K?[I[[fG[f f ZKHUH (> @ LV WKH RSHUDWRU RI PDWKHPDWLFDO H[SHFWDWLRQ IA[f LV WKH SGI RI ; )URP -HQVHQfV LQHTXDOLW\ >'HF .XO@ RU E\ XVLQJ WKH GHULYDWLRQ LQ $FHUR >$FH@ LW FDQ EH VKRZQ WKDW WKH .XOOEDFN/HLEOHU GLYHUJHQFH LV QRQQHJDWLYH DQG LV ]HUR LI DQG RQO\ LI WZR GLVWULEXWLRQV DUH WKH VDPH LH f ZKHUH WKH HTXDOLW\ KROGV LI DQG RQO\ LI [f J[f 6R WKH .XOOEDFN/HLEOHU GLYHUJHQFH FDQ EH UHJDUGHG DV D fGLVWDQFHf PHDVXUH EHWZHHQ SGIV [f DQG J[f +RZHYHU LW LV QRW V\PPHWULF LH 'NIJf r'NJIf LQ JHQHUDO DQG WKXV LV FDOOHG fGLUHFWHG GLYHUJHQFHf 2EYLRXVO\ WKH PXWXDO LQIRUPDWLRQ PHQWLRQHG DERYH LV WKH .XOOEDFN/HLEOHU fGLVWDQFHf IURP WKH MRLQW SGI I[W[[?f [Lf WR WKH IDFWRUL]HG PDUJLQDO SGI I[^[? fI[[Lf NI[[[[?f [ffI[6[AI[SFnff Â‘ %DVHG RQ 5HQ\LfV HQWURS\ ZH FDQ GHILQH 5HQ\LfV GLYHUJHQFH PHDVXUH ZLWK RUGHU D IRU WZR SGI [f DQG J[f >5HQ 5HQ .DS@ PAGE 39 '5DLIJf Df O4J^ -t/ G[ f 7KH UHODWLRQ EHWZHHQ 5HQ\LfV GLYHUJHQFH DQG .XOOEDFN/HLEOHU GLYHUJHQFH LV >.DS .DS@ OLP '5IJf 'NIJf D f %DVHG RQ +DYUGD&KDUYDWfV HQWURS\ WKHUH LV DOVR +DYUGD&KDUYDWfV GLYHUJHQFH PHDn VXUH ZLWK RUGHU D IRU WZR SGIV I[f DQG J[f >+DY .DS .DS@ 'KDA6f DBf Â rf DO G[ fÂ§ / J[f f 7KHUH LV DOVR D VLPLODU UHODWLRQ EHWZHHQ WKLV GLYHUJHQFH PHDVXUH DQG .XOOEDFN/HLEOHU GLYHUJHQFH >.DS .DS@ OLP 'KDIf6f 'NIJf f D } 8QIRUWXQDWHO\ DV 5HQ\L SRLQWHG RXW '5DI[[A[^ [fI;L[OfI;L[ff LV QRW DSSURn SULDWH DV D PHDVXUH RI PXWXDO LQIRUPDWLRQ RI WKH YDULDEOHV ;c DQG ; >5HQ@ )XUWKHUn PRUH DOO WKHVH GLYHUJHQFH PHDVXUHV .XOOEDFN/HLEOHU 5HQ\L DQG +DYUGD&KDUYDWf DUH FRPSOLFDWHG GXH WR WKH FDOFXODWLRQ RI WKH LQWHJUDOV LQYROYHG LQ WKHLU IRUPXOD 7KHUHIRUH WKH\ DUH GLIILFXOW WR LPSOHPHQW LQ WKH fOHDUQLQJ IURP H[DPSOHVf DQG JHQHUDO DGDSWLYH VLJn QDO SURFHVVLQJ DSSOLFDWLRQV ZKHUH WKH PD[LPL]DWLRQ RU PLQLPL]DWLRQ RI WKH PHDVXUHV LV GHVLUHG ,Q SUDFWLFH VLPSOLFLW\ EHFRPHV D SDUDPRXQW FRQVLGHUDWLRQ 7KHUHIRUH WKHUH LV D QHHG IRU DOWHUQDWLYH PHDVXUHV ZKLFK PD\ KDYH WKH VDPH PD[LPXP RU PLQLPXP SGI VROXn WLRQV DV .XOOEDFN/HLEOHU GLYHUJHQFH EXW DW WKH VDPH WLPH LV HDV\ WR LPSOHPHQW MXVW OLNH WKH FDVH RI WKH TXDGUDWLF HQWURS\ ZKLFK PHHW WKHVH WZR UHTXLUHPHQWV PAGE 40 )RU GLVFUHWH YDULDEOHV ;c DQG ; ZLWK SUREDELOLW\ GLVWULEXWLRQ SM Ln Om, DQG ZM UHVSHFWLYHO\ DQG WKH MRLQW SUREDELOLW\ GLVWULEXWLRQ ^3O-[ L Q M P` WKH 6KDQQRQfV PXWXDO LQIRUPDWLRQ LV GHILQHG DV f 4XDGUDWLF 0XWXDO ,QIRUPDWLRQ $V SRLQWHG RXW E\ .DSXU >.DS@ WKHUH LV QR UHDVRQ WR UHVWULFW RXUVHOYHV WR 6KDQQRQfV PHDVXUH IRU HQWURS\ DQG WR FRQILQH RXUVHOYHV WR .XOOEDFN/HLEOHUfV PHDVXUH IRU FURVVn HQWURS\ GHQVLW\ GLVFUHSDQF\ RU GHQVLW\ GLVWDQFHf (QWURS\ RU FURVVHQWURS\ LV WRR GHHS DQG WRR FRPSOH[ D FRQFHSW WR EH PHDVXUHG E\ D VLQJOH PHDVXUH XQGHU DOO FRQGLWLRQV 7KH DOWHUQDWLYH PHDVXUHV IRU HQWURS\ GLVFXVVHG LQ EUHDN VXFK UHVWULFWLRQ RQ HQWURS\ HVSHn FLDOO\ WKHUH DUH HQWURSLHV ZLWK VLPSOH TXDGUDWLF IRUP RI SGIV ,Q WKLV VHFWLRQ WKH SRVVLELOn LW\ RI fPXWXDO LQIRUPDWLRQf PHDVXUHV ZLWK RQO\ VLPSOH TXDGUDWLF IRUP RI SGIV ZLOO EH GLVFXVVHG WKH UHDVRQ WR XVH TXDGUDWLF IRUP RI SGIV ZLOO EH FOHDU ODWHU LQ WKLV FKDSWHUf 7KHVH PHDVXUHV ZLOO EH FDOOHG TXDGUDWLF PXWXDO LQIRUPDWLRQ DOWKRXJK WKH\ PD\ ODFN VRPH SURSHUWLHV RI 6KDQQRQfV PXWXDO LQIRUPDWLRQ ,QGHSHQGHQFH LV D IXQGDPHQWDO VWDWLVWLFDO UHODWLRQVKLS EHWZHHQ WZR UDQGRP YDULDEOHV WKH H[WHQVLRQ RI WKH LGHD RI LQGHSHQGHQFH WR PXOWLSOH YDULDEOHV LV QRW GLIILFXOW IRU WKH VLPSOLFLW\ RI H[SRVLWLRQ RQO\ WKH FDVH RI WZR YDULDEOHV ZLOO EH GLVFXVVHG DW WKLV VWDJHf ,W LV GHILQHG ZKHQ WKH MRLQW SGI LV HTXDO WR WKH IDFWRUL]HG PDUJLQDO SGIV )RU LQVWDQFH WZR YDULn DEOHV ;O DQG ; DUH LQGHSHQGHQW ZLWK HDFK RWKHU ZKHQ I[?; [ L [f f I[[ [ O fI[[f f PAGE 41 ZKHUH I;[[[?f[Lf WKH MRLQW SGI DQG I[W[?f DQG A[f DUH PDUJLQDO SGIV $V PHQn WLRQHG LQ WKH SUHYLRXV VHFWLRQ WKH PXWXDO LQIRUPDWLRQ FDQ EH UHJDUGHG DV D GLVWDQFH EHWZHHQ WKH MRLQW SGI DQG WKH IDFWRUL]HG PDUJLQDO SGI LQ WKH SGI IXQFWLRQDO VSDFH :KHQ WKH GLVWDQFH LV ]HUR WKH WZR YDULDEOHV DUH LQGHSHQGHQW :KHQ WKH GLVWDQFH LV PD[LPL]HG WZR YDULDEOHV ZLOO EH IDU DZD\ IURP WKH LQGHSHQGHQW VWDWH DQG URXJKO\ VSHDNLQJ WKH GHSHQn GHQFH EHWZHHQ WKHP ZLOO EH PD[LPL]HG 7KH (XFOLGHDQ GLVWDQFH LV D VLPSOH DQG VWUDLJKWIRUZDUG GLVWDQFH PHDVXUH IRU WZR SGIV 7KH VTXDUHG GLVWDQFH EHWZHHQ WKH MRLQW SGI DQG WKH IDFWRUL]HG PDUJLQDO SGI ZLOO EH FDOOHG (XFOLGHDQ GLVWDQFH TXDGUDWLF PXWXDO LQIRUPDWLRQ ('40,f ,W LV GHILQHG DV '('LIJf ?I[fJ[fIG[ Â‘ f ,HG;?;f 'HG I[[;;?f rff I[>[?fI[[f f 2EYLRXVO\ WKH ('40, EHWZHHQ ;[ DQG ; ,(';[ ;f LV QRQQHJDWLYH DQG LV ]HUR LI DQG RQO\ LI I[[[?![f a Y[? fI[[f KH ;^ DQG ; DUH LQGHSHQGHQW ZLWK HDFK RWKHU 6R LW LV DSSURSULDWH WR PHDVXUH WKH LQGHSHQGHQFH EHWZHHQ ;O DQG ; $OWKRXJK WKHUH LV QR VWULFW WKHRUHWLFDO MXVWLILFDWLRQ \HW WKDW WKH ('40, LV DQ DSSURSULDWH PHDVXUH IRU WKH GHSHQGHQFH EHWZHHQ WZR YDULDEOHV WKH H[SHULPHQWDO UHVXOWV GHVFULEHG ODWHU LQ WKLV GLVVHUWDWLRQ DQG WKH FRPSDULVRQ EHWZHHQ ('40, DQG 6KDQQRQfV 0XWXDO ,QIRUPDWLRQ LQ VRPH VSHFLDO FDVHV GHVFULEHG ODWHU LQ WKLV FKDSWHU ZLOO DOO VXSSRUW WKDW ('40, LV DSSURSULn DWH WR PHDVXUH WKH GHJUHH RI GHSHQGHQFH EHWZHHQ WZR YDULDEOHV HVSHFLDOO\ WKH PD[LPL]Dn WLRQ RI WKLV TXDQWLW\ ZLOO JLYH UHDVRQDEOH UHVXOWV )RU PXOWLSOH YDULDEOHV WKH H[WHQVLRQ RI ('40, LV VWUDLJKWIRUZDUG PAGE 42 ,(';Y;Nf (' IÂƒ[? [Nf QZ f Y \ ZKHUH I[[? Â‘ff[Nf LV WKH MRLQW SGI I[[Lf L Nf DUH PDUJLQDO SGIV $QRWKHU SRVVLEOH SGI GLVWDQFH PHDVXUH LV EDVHG RQ &DXFK\6FKZDUW] LQHTXDOLW\ >+DU@ -[fÂFf-J[fÂFf -[fJ[fÂ[f ZKHUH HTXDOLW\ KROGV LI DQG RQO\ LI frf & J[f IRU D FRQVWDQW VFDODU & ,I I[f DQG J[f DUH SGIV LH >I[fG[ DQG _J[fG[ WKHQ I[f & J[f LPSOLHV & 6R IRU WZR SGIV I[f DQG J[f ZH KDYH HTXDOLW\ KROGLQJ LI DQG RQO\ LI [f J[f 7KXV ZH PD\ GHILQH &DXFK\ 6FKZDUW] GLVWDQFH IRU WZR SGIV DV M[fLIWfIJ[fLtf 'FVIJf ORJ fÂ§ f _$[fJ^[fG[f 2EYLRXVO\ 'FVI Jf ZLWK HTXDOLW\ LI DQG RQO\ LI I[f J[f DOPRVW HYHU\ZKHUH DQG WKH LQWHJUDOV LQYROYHG DUH DOO TXDGUDWLF IRUP RI SGIV %DVHG RQ 'FVI Jf ZH KDYH &DXFK\6FKZDUW] TXDGUDWLF PXWXDO LQIRUPDWLRQ &640,f EHWZHHQ WZR YDULDEOHV ;@ DQG ; DV 'FV I[[[SrL![f! I[A[?fI[A[Lf f f ZKHUH WKH QRWDWLRQV DUH WKH VDPH DV DERYH 'LUHFWO\ IURP WKH DERYH ZH KDYH ,FV;?;f ZLWK WKH HTXDOLW\ LI DQG RQO\ LI $ DQG ; DUH LQGHSHQGHQW ZLWK HDFK RWKHU 6R ,FV LV DQ DSSURSULDWH PHDVXUH IRU LQGHSHQGHQFH +RZHYHU WKH H[SHULPHQWDO UHVXOWV VKRZV WKDW LW PLJKW EH QRW DSSURSULDWH DV D GHSHQGHQFH PHDVXUH )RU PXOWLSOH YDULn DEOHV WKH H[WHQVLRQ LV DOVR VWUDLJKWIRUZDUG PAGE 43 'FV I[[O[Nf PAGE 44 D 7R JHW DQ LGHD DERXW KRZ VLPLODU DQG KRZ GLIIHUHQW WKH PHDVXUHV ,V ,(' DQG ,FV ZLOO EH OHWfV ORRN DW D VLPSOH FDVH ZLWK WZR GLVFUHWH UDQGRP YDULDEOHV ;c DQG ; $V VKRZQ LQ )LJXUH ;[ ZLOO EH HLWKHU RU DQG LWV SUREDELOLW\ GLVWULEXWLRQ LV 3[^ a 3[I3[f f LH 3^;[ f DQG 3;[ f 3r[ 6LPLODUO\ ; FDQ DOVR EH HLWKHU RU ZLWK )LJXUH 7KH 6XUIDFHV DQG &RQWRXUV RI ,V ,(' DQG ,FV YV3[ DQG 3[ PAGE 45 WKH SUREDELOLW\ GLVWULEXWLRQ 3;M 3;O3;Of 3; 3; DQG 3^; f 3[f 7KH MRLQW SUREDELOLW\ GLVWULEXWLRQ LV 3[ a 3\ 3;! 3[ 3ff LH 3;f;f ff Sn[ 3m;f;f ff Sn[ 3;f;f ff 3e DQG 3;W;f ff SI 2EYLRXVO\ Sn[c !_f Sn IM\ 3[ 3I 3r 3 3 DQG 3 3r 3 U; U; DQD U; r ; f[ f )LUVW OHWfV ORRN DW WKH FDVH ZLWK WKH GLVWULEXWLRQ RI ; IL[HG 3[A f 7KHQ WKH IUHH SDUDPHWHUV OHIW DUH 3A IURP WR DQG 3[ IURP WR :KHQ 3[ DQG 3; FKDQJH LQ WKH UDQJHV WKH YDOXHV RI ,V ,(' DQG ,FV FDQ EH FDOFXODWHG )LJXUH VKRZV KRZ WKHVH YDOXHV FKDQJH ZLWK 3[ DQG 3[ ZKHUH WKH OHIW JUDSKV DUH VXUIDFHV IRU ,V ,(' DQG ,FV YHUVXV 3[ DQG 3[ WKH ULJKW JUDSKV DUH WKH FRQWRXUV RI WKH FRUUHVSRQGLQJ OHIW VXUIDFHV FRQWRXU PHDQV WKDW HDFK OLQH KDV WKH VDPH YDOXHf 7KHVH JUDSKV VKRZ WKDW DOWKRXJK WKH VXUIDFHV RU FRQWRXUV RI WKH WKUHH PHDVXUHV DUH GLIIHUHQW WKH\ UHDFK WKH PLQL PXP YDOXH LQ WKH VDPH OLQH 3[ 3A ZKHUH WKH MRLQW SUREDELOLWLHV HTXDO WKH FRUUHn VSRQGLQJ IDFWRUL]HG PDUJLQDO SUREDELOLWLHV $QG WKH PD[LPXP YDOXHV DOWKRXJK GLIIHUHQW DUH DOVR UHDFKHG DW WKH VDPH SRLQWV 3[ !S[f f DQG f ZKHUH WKH MRLQW SUREDn ELOLWLHV DUH UHVSHFWLYHO\ 7KHVH DUH MXVW FDVHV ZKHUH ;[ DQG ; KDYH D WR UHODWLRQ LH ;@ FDQ GHWHUPLQH ; ZLWKRXW DQ\ XQFHUWDLQW\ DQG YLFH YHUVD ,I WKH PDUJLQDO SUREDELOLW\ RI ; LV IXUWKHU IL[HG HJ 3[A f WKHQ WKH IUHH SDUDPHWHU FDQ EH 3[ IURP WR ,Q WKLV FDVH ERWK PDUJLQDO SUREDELOLWLHV RI ;^ DQG ; DUH IL[HG DQG WKH IDFWRUL]HG PDUJLQDO SUREDELOLW\ GLVWULEXWLRQ LV WKXV IL[HG DQG RQO\ WKH S S U[ U[ SA S U[ U[ DQG SL S U; U[ Sn[ cA[ PAGE 46 MRLQW SUREDELOLW\ GLVWULEXWLRQ ZLOO FKDQJH 7KLV FDVH FDQ DOVR EH UHJDUGHG DV WKH SUHYLRXV O f FDVH ZLWK D IXUWKHU FRQVWUDLQW VSHFLILHG E\ 3[ 3[ )LJXUH VKRZV KRZ WKH WKUHH PHDVXUHV FKDQJH ZLWK 3[ LQ WKLV FDVH IURP ZKLFK ZH FDQ VHH WKDW WKH PLQLPD DUH UHDFKHG DW WKH VDPH SRLQW 3[ DQG WKH PD[LPD DUH DOVR UHDFKHG DW WKH VDPH SRLQW !LL 3[ LH S S U[ U[ S S I[ U;B B )LJXUH ,(' DQG ,FV YV 3[ )URP WKLV VLPSOH H[DPSOH ZH FDQ VHH WKDW DOWKRXJK WKH WKUHH PHDVXUHV DUH GLIIHUHQW WKH\ KDYH WKH VDPH PLQLPXP SRLQWV DQG DOVR KDYH WKH VDPH PD[LPXP SRLQWV LQ WKLV SDUn WLFXODU FDVH ,W LV NQRZQ WKDW ERWK 6KDQQRQfV PXWXDO LQIRUPDWLRQ ,V DQG ('40, ,(' DUH FRQYH[ IXQFWLRQV RI SGIV >.DS@ )URP WKH DERYH JUDSKV ZH FDQ FRQILUP WKLV IDFW DQG DOVR FRPH XS WR WKH FRQFOXVLRQ WKDW &640, ,FV LV QRW D FRQYH[ IXQFWLRQ RI SGIV 2Q WKH PAGE 47 ZKROH ZH FDQ VD\ WKDW WKH VLPLODULW\ EHWZHHQ 6KDQQRQfV PXWXDO LQIRUPDWLRQ ,V DQG (' 40,,(' LV FRQILUPHG E\ WKHLU FRQYH[LW\ ZLWK WKH JXDUDQWHHG VDPH PLQLPXP SRLQWV )LJXUH ,OOXVWUDWLRQ RI *HRPHWULFDO ,QWHUSUHWDWLRQ WR 0XWXDO ,QIRUPDWLRQ *HRPHWULFDO ,QWHUSUHWDWLRQ RI 0XWXDO ,QIRUPDWLRQ )URP WKH SUHYLRXV VHFWLRQ ZH FDQ VHH WKDW ERWK ('40, DQG &640, KDYH WKH IROn ORZLQJ WKUHH WHUPV LQ WKHLU IRUPXODV 9??I[[[Â[ L[fG[OG[ n 90 M M r fI[[ffOG[?G[ f 9F -?I[;; f ;OfI[[ fI[6;OfG[ G[ ZKHUH 9M LV REYLRXVO\ WKH fHQWURS\ QRUPf WKH VTXDUHG QRUPf RI WKH MRLQW SGI LV WKH fHQWURS\ QRUPf RI WKH IDFWRUL]HG PDUJLQDO SGI DQG 9F LV WKH FURVVFRUUHODWLRQ RU LQQHU SURGXFW EHWZHHQ WKH MRLQW SGI DQG WKH IDFWRUL]HG PDUJLQDO SGI :LWK WKHVH WKUHH WHUPV 40, FDQ EH H[SUHVVHG DV PAGE 48 G 9M9FYP -FV ORJ YMa ORJ ORJ f )LJXUH VKRZV WKH LOOXVWUDWLRQ RI WKH JHRPHWULFDO LQWHUSUHWDWLRQ WR DOO WKHVH TXDQWLn WLHV ,V DV SUHYLRXVO\ PHQWLRQHG LV WKH ./ GLYHUJHQFH EHWZHHQ WKH MRLQW SGI DQG WKH IDFn WRUL]HG PDUJLQDO SGI ,(' LV WKH VTXDUHG (XFOLGHDQ GLVWDQFH EHWZHHQ WKHVH WZR SGIV DQG ,FV LV UHODWHG WR WKH DQJOH EHWZHHQ WKHVH WZR SGIV 1RWH WKDW FDQ EH IDFWRUL]HG DV WZR PDUJLQDO LQIRUPDWLRQ SRWHQWLDOV 9c DQG 9 90 9?9 Y? ?I[0 ?IG[? 9 ?I[L[ "G[ f (QHUJ\ DQG (QWURS\ IRU *DXVVLDQ 6LJQDO 7 N ,W LV ZHOO NQRZQ WKDW IRU D *DXVVLDQ UDQGRP YDULDEOH ; [ [Nf H 5 ZLWK SGI mUL]L LV FRYDULDQFH PDWUL[ WKH 6KDQQRQfV LQIRUPDWLRQ HQWURS\ LV +6;f LORJO6O _ORJW f VHH $SSHQGL[ % IRU WKH GHULYDWLRQf 6LPLODUO\ ZH FDQ JHW WKH 5HQ\LfV LQIRUPDWLRQ HQWURS\ IRU ; f 7KH GHULYDWLRQ LV JLYHQ LQ $SSHQGL[ &f )RU +DYUGD&KDUYDWfV HQWURS\ ZH KDYH PAGE 49 +KD: 9QI D: IBDf ILODf f 9 7KH GHULYDWLRQ LV JLYHQ LQ $SSHQGL[ 'f 2EYLRXVO\ OLP +D;f +6;f DQG OLP +KD;f +6;f LQ WKLV FDVH D} O D! ZKLFK DUH FRQVLVWHQW ZLWK DQG f UHVSHFWLYHO\ 6LQFH N DQG D LQ f f DQG f KDYH QRWKLQJ WR GR ZLWK WKH GDWD WKH GDWD GHSHQGHQW TXDQWLW\ LV ORJ_,_ RU _6_ )URP WKH LQIRUPDWLRQWKHRUHWLF SRLQW RI YLHZ D PHDn VXUH RI LQIRUPDWLRQ XVLQJ HQHUJ\ TXDQWLWLHV WKH HOHPHQWV LQ FRYDULDQFH PDWUL[ (f LV ORJ_(_ LQ f DQG f RU MXVW _(_ ,I WKH GLDJRQDO HOHPHQWV RI e DUH FW L Nf LH WKH YDULDQFH RI WKH PDUJLQDO VLJQDO [L LV DL WKHQ WKH 6KDQQRQfV DQG 5HQ\LfV PDUJLQDO HQWURSLHV DUH f ? N 6R ORJ Dc LQ f LV UHODWHG WR WKH VXP RI WKH PDUJLQDO 6KDQQRQfV RU L O 5HQ\LfV HQWURSLHV )RU 6KDQQRQfV HQWURS\ ZH JHQHUDOO\ KDYH f DQG LWV JHQHUDOL]DWLRQ f>'HF +D\@ N L f PAGE 50 $SSO\LQJ f DQG f WR f ZH JHW +DGDPDUGfV LQHTXDOLW\ f 6R +DG DPDUGfV LQHTXDOLW\ FDQ EH UHJDUGHG DV D VSHFLDO FDVH RI f ZKHQ WKH YDULDEOH ; LV *DXVVLDQ GLVWULEXWHG 7KH PRVW SRSXODU HQHUJ\ TXDQWLW\ XVHG LQ SUDFWLFH LV LQ f 1 N OU;f L e e rrfIWf f + ,I 7 ZKHUH S S SÂf DQG S LV WKH PHDQ RI WKH PDUJLQDO VLJQDO [c 7KH JHRPHWULFDO PHDQLQJ RI LV WKH DYHUDJH RI WKH VTXDUHG (XFOLGHDQ GLVWDQFH IURP WKH GDWD SRLQWV WR WKH fPHDQ SRLQWf ,I WKH VLJQDO LV DQ HUURU VLJQDO WKLV LV VR FDOOHG 06( PHDQ VTXDUHG HUURUf FULWHULRQ DQG LW LV ZLOGO\ DSSOLHG LQ OHDUQLQJ RU DGDSWLYH V\VWHP HWF 7KLV FULWHULRQ LV QRW GLUHFWO\ UHODWHG WR WKH LQIRUPDWLRQ PHDVXUH RI WKH VLJQDO 2QO\ ZKHQ WKH VLJQDO LV ZKLWH *DXVVLDQ ZLWK ]HURPHDQ DQG -c EHFRPHV HTXLYDOHQW DV f VKRZV 6R IURP WKH LQIRUPDWLRQWKHRUHWLF SRLQW RI YLHZ ZKHQ D 06( FULWHULRQ LV XVHG LW LPSOLFLWO\ DVVXPHV WKDW WKH HUURU VLJQDO LV ZKLWH *DXVVLDQ ZLWK ]HURPHDQ $V PHQWLRQHG LQ -@ LV EDVLFDOO\ WKH GHWHUPLQDQW RI e ZKLFK LV WKH SURGXFW RI DOO WKH HLJHQYDOXHV RI e DQG FDQ EH UHJDUGHG DV D JHRPHWULFDO DYHUDJH RI DOO WKH HLJHQYDOXHV ZKLOH LV WKH WUDFH RI e ZKLFK LV WKH VXP RI DOO WKH HLJHQYDOXHV DQG FDQ EH UHJDUGHG DV DQ DULWKPHWLF DYHUDJH RI DOO WKH HLJHQYDOXHV 1RWH WKDW _(_ FDQ QRW JXDUDQWHH WKH ]HUR HQHUJ\ RI DOO WKH PDUJLQDO VLJQDOV EXW WKH PD[LPL]DWLRQ RI _(_ FDQ PDNH WKH MRLQW HQWURS\ RI ; PD[LPXP ZKLOH WKH PD[LPL]DWLRQ RI WU>e@ FDQ QRW JXDUDQWHH WKH PD[LPXP RI WKH MRLQW HQWURS\ RI ; EXW WKH PLQLPL]DWLRQ RI WU>e@ FDQ PDNH DOO WKH PDUJLQDO VLJQDOV ]HUR 7KLV LV SRVVLELOLW\ WKH UHDVRQ ZK\ WKH PLQLPL]DWLRQ RI 06( LV VR SRSXODU LQ SUDFWLFH PAGE 51 &URVV&RUUHODWLRQ DQG 0XWXDO ,QIRUPDWLRQ IRU *DXVVLDQ 6LJQDO 6XSSRVH ; MFM [f LV D ]HURPHDQ ZLWKRXW ORVH RI JHQHUDOLW\ EHFDXVH ERWK FURVVn FRUUHODWLRQ DQG PXWXDO LQIRUPDWLRQ KDYH QRWKLQJ WR GR ZLWK WKH PHDQf *DXVVLDQ UDQGRP YDULDEOH ZLWK FRYDULDQFH PDWUL[ e $[ L[f D U U D 7KH MRLQW SGI ZLOO EH A[7LUn[ WWf__ f WKH WZR PDUJLQDO SGIV DUH L2Lf $ DM -LQ Af r D? DWWR f 7KH 6KDQQRQfV PXWXDO LQIRUPDWLRQ LV ,V[L[f +V[Of +V[f+V[Y[f KRJfÂ§fÂ§ 3 3 f ZKHUH S LV WKH FRUUHODWLRQ FRHIILFLHQW EHWZHHQ [ DQG [ %\ XVLQJ $ f LQ $SSHQGL[ $ DQG OHWWLQJ 3 D WKHQ ZH KDYH 9??$[Y[fG[[G[ B LS9OS YP ??I[[[fI^[fG[[G[ f YF ??$[?[fI$[[fI[fG[[G[ QILM fÂ§ S 7KH ('40, DQG &640, WKHQ ZLOO EH PAGE 52 A(G[ }[f a WL3 Y9LS A &;O;f ORJ D fÂ§ f )LJXUH 0XWXDO ,QIRUPDWLRQV YV FRUUHODWLRQ FRHIILFLHQW IRU *DXVVLDQ GLVWULEXWLRQ 6LPLODU WR ,V ,FV LV WKH IXQFWLRQ RI RQO\ RQH SDUDPHWHU S DQG ERWK DUH WKH PRQRWRQLF LQFUHDVLQJ IXQFWLRQ RI S ZLWK WKH VDPH PLQLPXP YDOXH WKH VDPH PLQLPXP SRLQW S DQG WKH VDPH PD[LPXP SRLQW S LQ VSLWH RI WKH GLIIHUHQFH RI WKH PD[LPXP YDOXHV ,(' LV WKH IXQFWLRQ RI WZR SDUDPHWHUV S DQG +RZHYHU 3 RQO\ VHUYHV DV D VFDn ODU RI WKH IXQFWLRQ DQG FDQ QRW FKDQJH WKH VKDSH RI WKH IXQFWLRQ 2QFH 3 LV IL[HG ,(' ZLOO EH WKH PRQRWRQLF LQFUHDVLQJ IXQFWLRQ RI S ZLWK WKH VDPH PLQLPXP YDOXH WKH VDPH PLQn LPXP SRLQW S DQG WKH VDPH PD[LPXP SRLQW S DV ,V DQG ,FV LQ VSLWH RI WKH GLIn IHUHQFH RI WKH PD[LPXP YDOXHV )LJXUH VKRZV WKHVH FXUYHV ZKLFK WHOOV XV WKH WZR PAGE 53 SURSRVHG ('40, DQG &640, DUH FRQVLVWHQW ZLWK 6KDQQRQfV 0, LQ WKH *DXVVLDQ FDVH UHJDUGLQJ WKH PLQLPXP DQG PD[LPXP SRLQWV (PSLULFDO (QHUJ\ (QWURS\ DQG 0, 3UREOHP DQG /LWHUDWXUH 5HYLHZ ,Q WKH SUHYLRXV VHFWLRQ WKH FRQFHSW RI YDULRXV HQHUJ\ HQWURS\ DQG PXWXDO LQIRUn PDWLRQ TXDQWLWLHV KDYH EHHQ LQWURGXFHG ,Q SUDFWLFH ZH DUH IDFLQJ WKH SUREOHP RI HVWLPDWn LQJ WKHVH TXDQWLWLHV IURP JLYHQ VDPSOH GDWD ,Q WKLV VHFWLRQ HPSLULFDO HQHUJ\ HQWURS\ DQG 0, SUREOHPV ZLOO EH GLVFXVVHG DQG WKH UHODWHG OLWHUDWXUH UHYLHZ ZLOO EH JLYHQ (PSLULFDO (QHUJ\ 7KH SUREOHP RI HPSLULFDO HQHUJ\ LV UHODWLYHO\ VLPSOH DQG VWUDLJKWIRUZDUG )RU D JLYHQ 7 7 GDWDVHW ^DLf DQLff L RI D Q' VLJQDO ; [M [Qf LW LV QRW GLIILFXOW WR HVWLPDWH WKH PHDQV WKH YDULDQFHV RI WKH PDUJLQDO VLJQDOV DQG WKH FRYDULn DQFH EHWZHHQ WKH PDUJLQDO VLJQDOV :H KDYH VDPSOH PHDQ DQG VDPSOH YDULDQFH PDWUL[ DV IROORZV >'XG 'XG@ PL Â“; DLA f O! M L f 1 7KHVH DUH WKH UHVXOWV RI PD[LPXP OLNHOLKRRG HVWLPDWLRQ >'XG 'XG@ (PSLULFDO (QWURS\ DQG 0XWXDO ,QIRUPDWLRQ 7KH 3UREOHP $V VKRZQ LQ WKH SUHYLRXV VHFWLRQ WKH HQWURS\ DQG PXWXDO LQIRUPDWLRQ DOO UHO\ RQ WKH SUREDELOLW\ GHQVLW\ IXQFWLRQ SGIf RI WKH YDULDEOHV WKXV WKH\ XVH DOO WKH VWDWLVWLFV RI WKH PAGE 54 YDULDEOHV EXW DUH PRUH FRPSOLFDWHG DQG GLIILFXOW WR LPSOHPHQW WKDQ WKH HQHUJ\ 7R HVWLn PDWH WKH HQWURS\ RU PXWXDO LQIRUPDWLRQ WKH ILUVW WKLQJ ZH QHHG WR GR LV WR HVWLPDWH WKH SGI RI WKH YDULDEOHV WKHQ WKH HQWURS\ DQG PXWXDO LQIRUPDWLRQ FDQ EH FDOFXODWHG DFFRUGLQJ WR WKH IRUPXOD GHVFULEHG LQ WKH SUHYLRXV VHFWLRQ )RU FRQWLQXRXV YDULDEOHV WKHUH DUH LQHYn LWDEOH LQWHJUDOV LQ DOO WKH HQWURS\ DQG PXWXDO LQIRUPDWLRQ GHILQLWLRQV GHVFULEHG LQ ZKLFK LV WKH PDMRU GLIILFXOW\ DIWHU SGI HVWLPDWLRQ 7KXV WKH SGI HVWLPDWLRQ DQG WKH PHDn VXUHV IRU HQWURS\ DQG PXWXDO LQIRUPDWLRQ VKRXOG EH DSSURSULDWHO\ FKRVHQ VR WKDW WKH FRUUHn VSRQGLQJ LQWHJUDOV FDQ EH VLPSOLILHG ,Q WKH UHVW RI WKLV FKDSWHU ZH ZLOO VHH WKH LPSRUWDQFH RI WKH FKRLFH LQ SUDFWLFH 'LIIHUHQW HPSLULFDO HQWURSLHV RU PXWXDO LQIRUPDWLRQV DUH DFWXDOO\ WKH UHVXOWV RI GLIIHUHQW FKRLFHV ,I D SULRUL NQRZOHGJH DERXW WKH GDWD GLVWULEXWLRQ LV NQRZQ RU D PRGHO LV DVVXPHG WKHQ SDUDPHWULF PHWKRGV FDQ EH XVHG WR HVWLPDWH WKH SGI PRGHO SDUDPHWHUV DQG WKHQ WKH HQWURn SLHV DQG PXWXDO LQIRUPDWLRQV FDQ EH HVWLPDWHG EDVHG RQ WKH PRGHO DQG WKH HVWLPDWHG SDUDPHWHUV +RZHYHU LQ PDQ\ UHDO ZRUOG SUREOHPV WKH RQO\ DYDLODEOH LQIRUPDWLRQ DERXW WKH GRPDLQ LV FRQWDLQHG LQ WKH GDWD FROOHFWHG DQG WKHUH LV QR D SULRUL NQRZOHGJH DERXW WKH GDWD ,W LV WKHUHIRUH SUDFWLFDOO\ VLJQLILFDQW WR HVWLPDWH WKH HQWURS\ RI D YDULDEOH RU WKH PXWXDO LQIRUPDWLRQ EHWZHHQ YDULDEOHV EDVHG PHUHO\ RQ WKH JLYHQ GDWD VDPSOHV ZLWKRXW IXUWKHU DVVXPSWLRQ RU DQ\ D SULRUL PRGHO DVVXPHG 7KXV ZH DUH DFWXDOO\ VHHNLQJ QRQSDUD PHWULF ZD\V IRU WKH HVWLPDWLRQ RI HQWURSLHV DQG PXWXDO LQIRUPDWLRQV )RUPDOO\ WKH SUREOHPV FDQ EH GHVFULEHG DV IROORZV f 7KH 1RQSDUDPHWULF (QWURS\ (VWLPDWLRQ JLYHQ D GDWD VHW ^DLf?L 1` IRU D VLJQDO ; ; FDQ EH D VFDODU RU Q' VLJQDOf KRZ WR HVWLPDWH WKH HQWURS\ RI ; ZLWKRXW DQ\ RWKHU LQIRUPDWLRQV RU DVVXPSWLRQV PAGE 55 7KH 1RQSDUDPHWULF 0XWXDO ,QIRUPDWLRQ (VWLPDWLRQ JLYHQ D GDWD VHW 7 7 ^Df Df r fÂ§ $7` IRU D VLJQDO ; [[f [ DQG [ FDQ EH VFDODU RU Q' VLJQDOV DQG WKHLU GLPHQVLRQV FDQ EH GLIIHUHQWf KRZ WR HVWLPDWH WKH PXWXDO LQIRUPDWLRQ EHWZHHQ [ DQG [ ZLWKRXW DQ\ DVVXPSWLRQ 7KLV VFKHPH FDQ EH HDVLO\ H[WHQGHG WR WKH PXWXDO LQIRUPDWLRQ RI PXOWLSOH VLJQDOV )RU QRQSDUDPHWULF PHWKRGV WKHUH DUH VWLOO WZR PDMRU GLIILFXOWLHV WKH QRQSDUDPHWULF SGI HVWLPDWLRQ DQG WKH FDOFXODWLRQ RI WKH LQWHJUDOV LQYROYHG LQ WKH HQWURS\ DQG PXWXDO LQIRUPDWLRQ PHDVXUHV ,Q WKH IROORZLQJ WKH OLWHUDWXUH UHYLHZ RQ WKHVH WZR DVSHFWV ZLOO EH JLYHQ 1RQSDUDPHWULF 'HQVLW\ (VWLPDWLRQ 7KH OLWHUDWXUH RI QRQSDUDPHWULF GHQVLW\ HVWLPDWLRQ LV IDLUO\ H[WHQVLYH $ FRPSOHWH GLVn FXVVLRQ RQ WKLV WRSLF LQ VXFK VPDOO VHFWLRQ LV YLUWXDOO\ LPSRVVLEOH +HUH RQO\ D EULHI UHYLHZ RQ WKH UHOHYDQW PHWKRGV VXFK DV KLVWRJUDP 3DU]HQ ZLQGRZ PHWKRG RUWKRJRQDO VHULHV HVWLPDWHV PL[WXUH PRGHO HWF ZLOO EH JLYHQ f +LVWRJUDP >6LO :HJ@ +LVWRJUDP LV WKH ROGHVW DQG PRVW ZLGHO\ XVHG GHQVLW\ HVWLPDWRU )RU D YDULDEOH [ JLYHQ DQ RULJLQ [ DQG D ELQ ZLGWK K WKH ELQV IRU WKH KLVWRJUDP FDQ EH GHILQHG DV WKH LQWHUYDOV > [ PK [ P OfK f 7KH KLVWRJUDP LV WKHQ GHILQHG E\ I[f fÂ§QXPEHU RI VDPSOHV LQ WKH VDPH ELQ DV [f f 7KH KLVWRJUDP FDQ EH JHQHUDOL]HG E\ DOORZLQJ WKH ELQ ZLGWKV WR YDU\ )RUPDOO\ VXSn SRVH WKH UHDO OLQH KDV EHHQ GLVVHFWHG LQWR ELQV WKHQ WKH KLVWRJUDP FDQ EH PAGE 56 B QXPEHU RI VDPSOHV LQ WKH VDPH ELQ DV [f 1 ZLGWK RI WKH ELQ FRQWDLQLQJ [f )RU D PXOWLGLPHQVLRQDO YDULDEOH KLVWRJUDP SUHVHWV VHYHUDO GLIILFXOWLHV )LUVW FRQWRXU GLDJUDPV WR UHSUHVHQW GDWD FDQ QRW EH HDVLO\ GUDZQ 6HFRQG WKH SUREOHP RI FKRRVLQJ WKH RULJLQ DQG WKH ELQV RU FHOOVf DUH H[DFHUEDWHG 7KLUG LI UHFWDQJXODU W\SH RI ELQV DUH XVHG IRU Q' YDULDEOH DQG WKH QXPEHU RI ELQ IRU HDFK PDUJLQDO YDULDEOH LV P WKHQ WKH QXPEHU RI ELQV LV LQ WKH RUGHU RI PQf )RUWK VLQFH WKH KLVWRJUDP GLVFUHWL]HV HDFK PDUJLQDO YDULDEOH LW LV GLIILFXOW WR PDNH IXUWKHU PDWKHPDWLFDO DQDO\VLV 2UWKRJRQDO 6HULHV (VWLPDWHV >+D\ &RP PAGE 57 UHODWLRQ +N [f [+N[f fÂ§ N+KB[[f )XUWKHUPRUH ELRUWKRJRQDO SURSHUW\ H[LVWV EHWZHHQ WKH +HUPLWH SRO\QRPLDOV DQG WKH GHULYDWLYHV RI WKH *DXVVLDQ IXQFWLRQ I +N[f*P?[fG[ OfPP?NP NPf f ZKHUH NP LV WKH .URQHFNHU GHOWD ZKLFK LV HTXDO WR LI N P DQG RWKHUZLVH f LV WKH VR FDOOHG *UDP&KDUOLHU H[SDQVLRQ ,W LV LPSRUWDQW WR QRWH WKDW WKH QDWXUDO RUGHU RI WKH WHUPV LV QRW WKH EHVW IRU WKH *UDP&KDUOLHU VHULHV 5DWKHU WKH JURXSLQJ N f f f f LV PRUH DSSURSULDWH ,Q SUDFWLFH WKH H[SDQVLRQ KDV WR EH WUXQFDWHG )RU %66 RU ,&$ DSSOLFDWLRQ WKH WUXQFDWLRQ RI WKH VHULHV DW N f LV FRQVLGHUHG WR EH DGHTXDWH 7KXV ZH KDYH I[f r *[f N NO N ;2NOf Sm +[f f ZKHUH FXPXODQWV t Z N$ Z fÂ§ nLP N PfÂ§ ?P?fÂ§ ?PPA A PRPHQWV PL (>[O? f 7KH (GJHZRUWK H[SDQVLRQ RQ WKH RWKHU KDQG FDQ EH GHILQHG DV rf *[f X+L[f ? NN fÂ§ N? N L+[f I>+[f f 7KHUH LV QR HVVHQWLDO GLIIHUHQFH EHWZHHQ WKH (GJHZRUWK H[SDQVLRQ DQG WKH *UDP &KDUOLHU H[SDQVLRQ 7KH NH\ IHDWXUH RI WKH (GJHZRUWK H[SDQVLRQ LV WKDW LWV FRHIILFLHQWV GHFUHDVH XQLIRUPO\ ZKLOH WKH WHUPV LQ WKH *UDP&KDUOLHU H[SDQVLRQ GR QRW WHQG XQLn IRUPO\ WR ]HUR IIRP WKH YLHZSRLQW RI QXPHULFDO HUURUV 7KLV LV ZK\ WKH WHUPV LQ *UDP &KDUOLHU H[SDQVLRQ VKRXOG EH JURXSHG DV PHQWLRQHG DERYH PAGE 58 %RWK (GJHZRUWK DQG *UDP&KDUOLHU H[SDQVLRQV ZLOO EH WUXQFDWHG LQ WKH UHDO DSSOLFDn WLRQ ZKLFK PDNH WKHP D NLQG RI DSSUR[LPDWLRQ WR SGIV )XUWKHUPRUH WKH\ XVXDOO\ FDQ RQO\ EH XVHG IRU YDULDEOH )RU PXOWLGLPHQVLRQDO YDULDEOH WKH\ EHFRPH YHU\ FRPn SOLFDWHG f 3DU]HQ :LQGRZ 0HWKRG >3DU 'XG 'XG &KU 9DS 'HY@ 7KH 3DU]HQ :LQGRZ 0HWKRG LV DOVR FDOOHG D NHUQHO HVWLPDWLRQ PHWKRG RU SRWHQWLDO IXQFWLRQ PHWKRG 6HYHUDO QRQSDUDPHWULF PHWKRGV IRU GHQVLW\ HVWLPDWLRQ DSSHDUHG LQ WKH fV $PRQJ WKHVH PHWKRGV WKH 3DU]HQ ZLQGRZ PHWKRG LV WKH PRVW SRSXODU $FFRUGLQJ WR WKH PHWKRG RQH ILUVW KDV WR GHWHUPLQH WKH VRFDOOHG NHUQHO IXQFWLRQ )RU VLPSOLFLW\ DQG WKH ODWHU XVH LQ WKLV GLVVHUWDWLRQ ZH FRQVLGHU D VLPSOH V\PPHWULF *DXVn VLDQ NHUQHO IXQFWLRQ *[ D f Q -F ÂH;3 ^Qf D W ? [ [ 9 D\ f ZKHUH D ZLOO FRQWURO WKH NHUQHO VL]H DQG [ FDQ EH D Q' YDULDEOHV )RU D GDWD VHW GHVFULEHG LQ WKH GHQVLW\ IXQFWLRQ ZLOO EH I[f *[DLfRf f L O ZKLFK PHDQV WKDW HDFK GDWD SRLQW ZLOO EH RFFXSLHG E\ D NHUQHO IXQFWLRQ DQG WKH ZKROH GHQVLW\ LV WKH DYHUDJH RI DOO NHUQHO IXQFWLRQV 7KH DV\PSWRWLF WKHRU\ IRU 3DU]HQ W\SH QRQSDUDPHWULF GHQVLW\ HVWLPDWLRQ ZDV GHYHORSHG LQ WKH V >'HY@ ,W FRQFOXGHV WKDW Lf 3DU]HQfV HVWLPDWRU LV FRQVLVWHQW LQ WKH YDULRXV PHWULFVf IRU HVWLPDWLQJ D GHQVLW\ IURP D YHU\ ZLGH FODVVHV RI GHQVLWLHV LLf 7KH DV\PSWRWLF UDWH RI FRQYHUJHQFH IRU PAGE 59 3DU]HQfV HVWLPDWRU LV RSWLPDO IRU fVPRRWKf GHQVLWLHV :H ZLOO VHH ODWHU LQ WKLV &KDSWHU KRZ WKLV GHQVLW\ HVWLPDWLRQ PHWKRG FDQ EH FRPELQHG ZLWK TXDGUDWLF HQWURS\ DQG TXDn GUDWLF PXWXDO LQIRUPDWLRQ WR GHYHORS WKH LGHDV RI WKH LQIRUPDWLRQ SRWHQWLDO DQG WKH FURVV LQIRUPDWLRQ SRWHQWLDO +RZHYHU VHOHFWLQJ WKH 3DU]HQ ZLQGRZ PHWKRG LV QRW MXVW RQO\ IRU VLPSOLFLW\ EXW DOVR IRU LWV JRRG DV\PSWRWLF SURSHUWLHV ,Q DGGLWLRQ WKLV NHUQHO IXQFWLRQ LV DFWXDOO\ FRQVLVWHQW ZLWK WKH PDVVHQHUJ\ VSLULW PHQWLRQHG LQ &KDSWHU ,Q IDFW RQH GDWD SRLQW VKRXOG QRW RQO\ UHSUHVHQW LWVHOI EXW DOVR UHSUHVHQW LWV QHLJKERUn KRRG 7KH NHUQHO IXQFWLRQ LV QRWKLQJ EXW PRUH OLNH D PDVVGHQVLW\ IXQFWLRQ LQ WKLV VHQVH $QG IURP WKLV SRLQW RI YLHZ LW QDWXUDOO\ LQWURGXFH WKH LGHD RI ILHOG DQG SRWHQWLDO HQHUJ\ :H ZLOO VHH WKLV LQ D FOHDUHU ZD\ ODWHU LQ WKLV FKDSWHU f 0L[WXUH 0RGHO >0F/ 0F/ 'HP 5DE +XD@ 7KH PL[WXUH PRGHO LV D NLQG RI fVHPLSDUDPHWULFf PHWKRG RU ZH PD\ FDOO LW VHPL QRQSDUDPHWULFf 7KH PL[WXUH PRGHO HVSHFLDOO\ WKH *DXVVLDQ PL[WXUH PRGHO KDV EHHQ H[WHQVLYHO\ DSSOLHG LQ YDULRXV HQJLQHHULQJ DUHDV VXFK DV WKH KLGGHQ PDUNRY PRGHO LQ VSHHFK UHFRJQLWLRQ DQG PDQ\ RWKHU DUHDV $OWKRXJK *DXVVLDQ PL[WXUH PRGHO DVVXPHV WKDW WKH GDWD VDPSOHV FRPH IURP VHYHUDO *DXVVLDQ VRXUFHV LW FDQ DSSUR[LPDWH TXLWH GLYHUVH GHQVLWLHV *HQHUDOO\ WKH GHQVLW\ IRU D Q' YDULDEOH [ LV DVVXPHG DV [f eFr*[ fÂ§ONf f N ZKHUH LV WKH QXPEHU RI PL[WXUH VRXUFHV FN DUH PL[WXUH FRHIILFLHQWV ZKLFK DUH QRQ QHJDWLYH DQG WKHLU VXPPDWLRQ HTXDOV 9 FN S DQG n=L DUH PHDQV DQG FRYDUL N ? DQHH PDWULFHV IRU HDFK *DXVVLDQ VRXUFH ZKHUH *DXVVLDQ IXQFWLRQ LV QRWDWHG E\ PAGE 60 O A[>fWL?[?[f *[ fÂ§ MO =f fÂ§fÂ§fÂ§H ZLWK WKH PHDQ S DQG FRYDULDQFH WFf _=_ PDWUL[ DV WKH SDUDPHWHUV $OO WKH SDUDPHWHUV FN ?LN DQG ON FDQ EH HVWLPDWHG IURP GDWD VDPSOHV E\ WKH (0 DOJRULWKP LQ WKH PD[LPXP OLNHOLKRRG VHQVH 2QH PD\ QRWLFH WKH VLPLODULW\ EHWZHHQ WKH *DXVVLDQ PL[WXUH PRGHO DQG WKH *DXVVLDQ NHUQHO HVWLPDWLRQ PHWKRG $FWXDOO\ WKH *DXVVLDQ NHUQHO HVWLPDWLRQ PHWKRG LV WKH H[WUHPH FDVH RI WKH *DXVVLDQ PL[WXUH PRGHO ZKHUH DOO WKH PHDQV DUH GDWD SRLQWV WKHPVHOYHV DQG DOO WKH PL[WXUH FRHIILFLHQWV DQG DOO WKH FRYDULDQFH PDWULFHV DUH HTXDO ,Q RWKHU ZRUGV HDFK GDWD SRLQW LQ WKH *DXVVLDQ NHUQHO HVWLPDWLRQ PHWKRG LV WUHDWHG DV D *DXVVLDQ VRXUFH ZLWK HTXDO PL[WXUH FRHIILFLHQW DQG HTXDO FRYDULDQFH 7KHUH DUH DOVR RWKHU QRQSDUDPHWULF PHWKRG VXFK DV WKH NQHDUHVW QHLJKERU PHWKRG >'XG 'XG 6LO@ WKH QDLYH HVWLPDWRU >6Â@ HWF 7KHVH HVWLPDWHG GHQVLW\ IXQFn WLRQV DUH QRW WKH fQDWXUDO GHQVLW\ IXQFWLRQVf LH WKH LQWHJUDWLRQV RI WKHVH IXQFWLRQV DUH QRW HTXDO WR $QG WKHLU XQVPRRWKQHVV LQ GDWD SRLQWV DOVR PDNH WKHP GLIILFXOW WR EH DSSOLHG WR WKH HQWURS\ RU PXWXDO LQIRUPDWLRQ HVWLPDWLRQ (PSLULFDO (QWURS\ DQG 0XWXDO ,QIRUPDWLRQ 7KH /LWHUDWXUH 5HYLHZ :LWK WKH SUREDELOLW\ GHQVLW\ IXQFWLRQ ZH FDQ WKHQ FDOFXODWH WKH HQWURS\ RU WKH PXWXDO LQIRUPDWLRQ ZKHUH WKH GLIILFXOW\ OLHV LQ WKH LQWHJUDOV LQYROYHG %RWK 6KDQQRQfV HQWURS\ DQG 6KDQQRQfV PXWXDO LQIRUPDWLRQ DUH WKH GRPLQDWLQJ PHDVXUHV XVHG LQ WKH OLWHUDWXUH ZKHUH WKH ORJDULWKP XVXDOO\ EULQJV ELJ GLIILFXOWLHV LQ WKHLU HVWLPDWLRQV 6RPH UHVHDUFKHUV WULHG WR DYRLG WKH XVH RI 6KDQQRQfV PHDVXUHV LQ RUGHU WR JHW VRPH WUDFWDELOLW\ 7KH VXPn PDU\ RQ YDULRXV H[LVWLQJ PHWKRGV ZLOO EH JLYHQ DQG RUJDQL]HG LQ WKH IROORZLQJ PDQQHU ZKLFK ZLOO VWDUW ZLWK WKH VLPSOH PHWKRG RI KLVWRJUDP PAGE 61 f +LVWRJUDP %DVHG 0HWKRG ,I WKH SGI RI D YDULDEOH LV HVWLPDWHG E\ WKH KLVWRJUDP PHWKRG WKH YDULDEOH KDV WR EH GLVn FUHWL]HG E\ KLVWRJUDP ELQV 7KXV WKH LQWHJUDWLRQ LQ 6KDQQRQfV HQWURS\ RU PXWXDO LQIRUn PDWLRQ EHFRPHV D VXPPDWLRQ DQG WKHUH LV QR GLIILFXOW\ DW DOO IRU LWV FDOFXODWLRQ +RZHYHU WKLV LV WUXH RQO\ IRU D ORZ GLPHQVLRQ YDULDEOH $V SRLQWHG RXW LQ WKH SUHYLRXV VHFWLRQ IRU D KLJK GLPHQVLRQ YDULDEOH WKH FRPSXWDWLRQDO FRPSOH[LW\ EHFRPHV WRR ODUJH IRU WKH PHWKRG WR EH LPSOHPHQWDEOH )XUWKHUPRUH LQ VSLWH RI WKH VLPSOLFLW\ LW PDGH LQ WKH FDOFXODWLRQ WKH GLVFUHWL]DWLRQ PDNHV LW LPSRVVLEOH WR PDNH IXUWKHU PDWKHn PDWLFDO DQDO\VLV DQG WR DSSO\ WKLV PHWKRG WR WKH SUREOHP RI RSWLPL]DWLRQ RI HQWURS\ RU PXWXDO LQIRUPDWLRQ ZKHUH GLIIHUHQWLDO FRQWLQXRXV IXQFWLRQV DUH QHHGHG IRU DQDO\VLV 1HYHUWKHOHVV VXFK VLPSOH PHWKRG LV VWLOO YHU\ XVHIXO LQ WKH FDVHV VXFK DV WKH IHDWXUH VHOHFWLRQ >%DW@ ZKHUH RQO\ WKH VWDWLF FRPSDULVRQ RI WKH HQWURS\ RU PXWXDO LQIRUPDn WLRQ LV QHHGHG 7KH &DVH RI )XOO 5DQN /LQHDU 7UDQVIRUP )URP SUREDELOLW\ WKHRU\ ZH NQRZ WKDW IRU D IXOO UDQN OLQHDU WUDQVIRUP < :; ZKHUH ; [M [Qf DQG < \ \Qf7 DUH DOO YHFWRUV LQ DQ QGLPHQVLRQDO UHDO VSDFH : LV QE\Q IXOO UDQN PDWUL[ WKHUH LV D UHODWLRQ EHWZHHQ WKH GHQVLW\ IXQFWLRQ RI I$[f ; DQG WKH GHQVLW\ IXQFWLRQ RI < I<\f A >3DS@ ZKHUH I< DQG I[ DUH GHQn VLW\ RI < DQG ; UHVSHFWLYHO\ DQG GHW f LV WKH GHWHUPLQDQW RSHUDWRU $FFRUGLQJO\ ZH KDYH WKH UHODWLRQ EHWZHHQ WKH HQWURS\ RI < DQG WKH HQWURS\ RI ; + PAGE 62 WKH SXUSRVH RI WKH PDQLSXODWLRQ RI WKH RXWSXW HQWURS\ + PAGE 63 $OWKRXJK WKLV PHWKRG DYRLG WKH DUELWUDU\ DVVXPSWLRQ RQ WKH FGI RI WKH VRXUFHV LW VWLOO VXIIHUV IURP WKH SUREOHP VXFK DV WKH FRXSOLQJ ZLWK WKH VWUXFWXUH RI D OHDUQLQJ PDFKLQH f 1XPHULFDO 0HWKRG 7KH LQWHJUDWLRQ LQYROYHG LQ WKH FDOFXODWLRQ RI WKH HQWURS\ RU PXWXDO LQIRUPDWLRQ LV XVXDOO\ FRPSOLFDWHG $ QXPHULFDO PHWKRG FDQ EH XVHG WR FDOFXODWH WKH LQWHJUDWLRQ +RZHYHU WKLV PHWKRG FDQ RQO\ EH XVHG IRU ORZ GLPHQVLRQDO YDULDEOHV >3KD@ XVHG WKH 3DU]HQ ZLQGRZ PHWKRG WR HVWLPDWH WKH PDUJLQDO GHQVLW\ DQG DSSOLHG WKLV PHWKRG IRU WKH FDOFXODWLRQ RI WKH PDUJLQDO HQWURSLHV QHHGHG LQ WKH FDOFXODWLRQ RI WKH PXWXDO LQIRUPDWLRQ RI WKH RXWSXWV RI D OLQHDU WUDQVIRUP GHVFULEHG DERYH $V SRLQWHG RXW E\ >9LR@ WKH LQWHJUDWLRQ LQ 6KDQQRQfV HQWURS\ RU PXWXDO LQIRUPDWLRQ ZLOO EHFRPH H[WUHPHO\ FRPSOLFDWHG ZKHQ 3DU]HQ ZLQGRZ LV XVHG IRU WKH GHQVLW\ HVWLPDWLRQ $SSO\n LQJ WKH QXPHULFDO PHWKRG PDNHV WKH FDOFXODWLRQ SRVVLEOH EXW UHVWULFWV LWVHOI WR VLPSOH FDVHV DQG WKH PHWKRG LV DOVR FRXSOHG ZLWK WKH VWUXFWXUH RI WKH OHDUQLQJ PDFKLQH f (GJHZRUWK DQG *UDP&KDUOLHU ([SDQVLRQ %DVHG 0HWKRG $V GHVFULEHG DERYH ERWK H[SDQVLRQV FDQ EH H[SUHVVHG LQ WKH IRUP I[f *^MFf $[ff ZKHUH $[f LV D SRO\QRPLDO %\ XVLQJ WKH 7D\ORU H[SDQVLRQ ZH KDYH ORJO $[ff $[f fÂ§ AA %[f IRU UHODWLYH VPDOO $[f 7KHQ +[f fÂ§MI[fORJI[fG[ fÂ§M*[fO $[ffORJ*[f %[ffG[ 1RWLFH WKDW *[f LV WKH *DXVVLDQ IXQFWLRQ DQG $[f DQG %[f DUH DOO SRO\QRPLDOV WKLV LQWHJUDn WLRQ ZLOO KDYH DQ DQDO\WLFDO UHVXOW 7KXV D UHODWLRQ EHWZHHQ WKH HQWURS\ DQG WKH FRHIILn FLHQWV RI WKH SRO\QRPLDOV $[f DQG %[f LH WKH VDPSOH FXPXODQWV RI WKH YDULDEOHf FDQ EH HVWDEOLVKHG 8QIRUWXQDWHO\ WKLV PHWKRG FDQ RQO\ EH XVHG IRU YDULDEOH DQG PAGE 64 WKXV LW LV XVXDOO\ XVHG LQ WKH FDOFXODWLRQ RI WKH PXWXDO LQIRUPDWLRQ GHVFULEHG DERYH IRU %66 DQG ,&$ SUREOHPV > PAGE 65 GN 1 GZ Â‘ ; 6 D\ D 8 S Q AmfDZ r \% }f f ZKHUH \Zf DUH VDPSOHV RI WKH RXWSXW 7KH SDUWLDO GHULYDWLYH RI WKH PHDQ VTXDUHG GLIn IHUHQFH ZLWK UHVSHFW WR RXWSXW VDPSOHV FDQ EH EURNHQ GRZQ DV .X]f X]f f *J]f _X\f*J]fÂ§\fG\ .J]f *] Df f *J]f _*\ Rf*J]\fG\ *J]f i*] Df ZKHUH \f LV WKH JUDGLHQW RI WKH *DXVVLDQ .HUQHO .X]f LV WKH FRQYROXWLRQ EHWZHHQ WKH XQLIRUP SGI X]f DQG WKH JUDGLHQW RI WKH *DXVVLDQ .HUQHO *J]f .*]f LV WKH FRQYROXWLRQ EHWZHHQ WKH *DXVVLDQ .HUQHO *] FW f DQG LWV JUDGLHQW ]f $V VKRZQ LQ )LVKHU >)LV@ WKH FRQYROXWLRQ .*]f WXUQV RXW WR EH Âf Y] QR \ ,I GRPDLQ =f LV V\PPHWULF LH Ec fÂ§DL D L N WKHQ WKH FRQYROXWLRQ .X]f LV .Xt r I] A QL HUM n -OR ] r N \ N D < AO QO HU\ n -OR LrN Y \ HU\ Be?? n -OR Y \\ HU\ -OR Y \\ *LO]L Â! r *rO]r i!A f D *L>]D f L D ]AfD f PAGE 66 7 &r ZKHUH ] ]]Nf *] D f LV WKH VDPH DV f HU[f SUH[S LV WKH HUURU IXQFWLRQ 7KLV PHWKRG LV LQGLUHFW DQG VWLOO GHSHQGV RQ WKH WRSRORJ\ RI WKH QHWZRUN %XW LW DOVR VKRZV WKH IOH[LELOLW\ E\ XVLQJ 3DU]HQ :LQGRZ PHWKRG ,W KDV EHHQ XVHG LQ SUDFWLFH ZLWK JRRG UHVXOWV IRU WKH 0$&( >)LV@ 6XPPDUL]LQJ WKH DERYH ZH VHH WKDW WKHUH LV QR GLUHFW HIILFLHQW QRQSDUDPHWULF PHWKRG WR HVWLPDWH WKH HQWURS\ RU PXWXDO LQIRUPDWLRQ IRU D JLYHQ GLVFUHWH GDWD VHW ZKLFK LV GHFRXn SOHG IURP WKH VWUXFWXUH RI WKH OHDUQLQJ PDFKLQH DQG FDQ EH DSSOLHG WR Q' YDULDEOHV ,Q WKH QH[W VHFWLRQV ZH ZLOO VKRZ KRZ WKH TXDGUDWLF HQWURS\ DQG WKH TXDGUDWLF PXWXDO LQIRUPDn WLRQ UDWKHU WKDQ 6KDQQRQfV HQWURS\ DQG PXWXDO LQIRUPDWLRQ FDQ EH FRPELQHG ZLWK WKH *DXVVLDQ NHUQHO HVWLPDWLRQ RI SGIV WR GHYHORS WKH LGHDV RI fLQIRUPDWLRQ SRWHQWLDOf DQG fFURVV LQIRUPDWLRQ SRWHQWLDOf UHVXOWLQJ LQ D HIIHFWLYH DQG JHQHUDO PHWKRG IRU WKH FDOFXODn WLRQ RI WKH HPSLULFDO HQWURS\ DQG PXWXDO LQIRUPDWLRQ 4XDGUDWLF (QWURS\ DQG ,QIRUPDWLRQ 3RWHQWLDO 7KH 'HYHORSPHQW RI ,QIRUPDWLRQ 3RWHQWLDO $V PHQWLRQHG LQ WKH SUHYLRXV VHFWLRQ WKH LQWHJUDWLRQ RI 6KDQQRQfV HQWURS\ ZLWK WKH *DXVVLDQ NHUQHO HVWLPDWLRQ IRU SGI ZLOO EHFRPH fLQRUGLQDWHO\ GLIILFXOWf >9LR@ +RZn HYHU LI ZH FKRRVH WKH TXDGUDWLF HQWURS\ DQG QRWLFH WKH IDFW WKDW WKH LQWHJUDWLRQ RI WKH SURGn XFW RI WZR *DXVVLDQ IXQFWLRQ FDQ VWLOO EH HYDOXDWHG E\ DQRWKHU *DXVVLDQ IXQFWLRQ DV $Of VKRZV WKHQ ZH FDQ FRPH XS WR D VLPSOH PHWKRG )RU D GDWD VHW GHVFULEHG LQ ZH FDQ XVH *DXVVLDQ NHUQHO PHWKRG LQ f WR HVWLPDWH SGI RI ; DQG WKHQ WR FDOFXODWH WKH fHQWURS\ QRUPf DV PAGE 67 9 ^I[[fG[ fÂ§ RR 1 Â= *[DLfDf fÂ§ O 9 9 1 A= *L[aD-? rf 9 M O G[ A 1 1 FF = = Â Am F!f*[DMfF!fG[ $Uc L\ LBRR -9 $7 fÂ§ = = *DLfDMfDf OY L ? M O f 6R 5HQ\LfV TXDGUDWLF HQWURS\ DQG +DYUGD&KDUYDWfV TXDGUDWLF HQWURS\ OHDG WR D PXFK VLPSOHU HQWURS\ HVWLPDWRU IRU D VHW RI GLVFUHWH GDWD SRLQWV ^ D]nf L 1` +5;?^D`f fÂ§ORJ) AA_^m`f LY f -9 $Â Y f Y L= =*A]nfAnffDf mÂ O\ 7KH FRPELQDWLRQ RI WKH TXDGUDWLF HQWURSLHV ZLWK WKH 3DU]HQ ZLQGRZ PHWKRG OHDGV WR HQWURS\ HVWLPDWRU WKDW FRPSXWHV WKH LQWHUDFWLRQV DPRQJ SDLUV RI VDPSOHV 1RWLFH WKDW WKHUH LV QR DSSUR[LPDWLRQ LQ WKHVH HYDOXDWLRQV H[FHSW SGI HVWLPDWLRQ :H ZURWH f LQ WKLV ZD\ EHFDXVH WKHUH LV D YHU\ LQWHUHVWLQJ SK\VLFDO LQWHUSUHWDWLRQ IRU WKLV HVWLPDWRU RI HQWURS\ /HW XV DVVXPH WKDW ZH SODFH SK\VLFDO SDUWLFOHV LQ WKH ORFDn WLRQV SUHVFULEHG E\ D^Lf DQG DMf $FWXDOO\ WKH 3DU]HQ ZLQGRZ PHWKRG LV MXVW LQ WKH VSLULW RI PDVVHQHUJ\ 7KH LQWHJUDWLRQ RI WKH SURGXFW RI WZR *DXVVLDQ NHUQHOV UHSUHVHQWLQJ VRPH NLQG RI PDVV GHQVLW\ FDQ EH UHJDUGHG DV WKH LQWHUDFWLRQ EHWZHHQ SDUWLFOHV DLf DQG DMf ZKLFK UHVXOWV LQ WKH SRWHQWLDO HQHUJ\ *DLf fÂ§ DMf D f 1RWLFH WKDW LW LV DOZD\V SRVLWLYH PAGE 68 DQG LV LQYHUVHO\ SURSRUWLRQDO WR WKH GLVWDQFH VTXDUH EHWZHHQ WKH SDUWLFOHV :H FDQ FRQVLGHU WKDW D SRWHQWLDO ILHOG H[LVWV IRU HDFK SDUWLFOH LQ WKH VSDFH RI ZLWK D ILHOG VWUHQJWK GHILQHG E\ WKH *DXVVLDQ NHUQHO LH DQ H[SRQHQWLDO GHFD\ ZLWK WKH GLVWDQFH VTXDUH ,Q WKH UHDO ZRUOG SK\VLFDO SDUWLFOHV LQWHUDFW ZLWK WKH SRWHQWLDO HQHUJ\ LQYHUVH WR WKH GLVWDQFH EHWZHHQ WKHP EXW KHUH WKH SRWHQWLDO HQHUJ\ DELGHV E\ D GLIIHUHQW ODZ ZKLFK LQ IDFW LV GHWHUPLQHG E\ WKH NHUQHO LQ SGI HVWLPDWLRQ 9 LQ f LV WKH RYHUDOO SRWHQWLDO HQHUJ\ LQFOXGLQJ HDFK SDLU RI GDWD SDUWLFOHV $V SRLQWHG RXW SUHYLRXVO\ WKHVH SRWHQWLDO HQHUJLHV DUH UHODWHG WR fLQIRUPDn WLRQf DQG WKXV DUH FDOOHG fLQIRUPDWLRQ SRWHQWLDOVf ,3f $FFRUGLQJO\ GDWD VDPSOHV ZLOO EH FDOOHG fLQIRUPDWLRQ SDUWLFOHVf ,37f 1RZ WKH HQWURS\ LV H[SUHVVHG LQ WHUPV RI WKH SRWHQn WLDO HQHUJ\ DQG WKH HQWURS\ PD[LPL]DWLRQ QRZ EHFRPHV HTXLYDOHQW WR WKH PLQLPL]DWLRQ RI WKH LQIRUPDWLRQ SRWHQWLDO 7KLV LV DJDLQ D VXUSULVLQJ VLPLODULW\ WR WKH VWDWLVWLFDO PHFKDQLFV ZKHUH WKH HQWURS\ PD[LPL]DWLRQ SULQFLSOH KDV D FRUROODU\ RI WKH HQHUJ\ PLQLPL]DWLRQ SULQn FLSOH ,W LV D SOHDVDQW VXUSULVH WR YHULI\ WKDW WKH QRQSDUDPHWULF HVWLPDWLRQ RI HQWURS\ KHUH HQGV XS ZLWK D SULQFLSOH WKDW UHVHPEOHV WKH RQH RI WKH SK\VLFDO SDUWLFOH ZRUOG ZKLFK ZDV RQH RI WKH RULJLQ RI WKH FRQFHSW RI HQWURS\ :H FDQ DOVR VHH IURP f DQG f WKDW WKH 3DU]HQ ZLQGRZ PHWKRG LPSOHPHQWHG ZLWK WKH *DXVVLDQ NHUQHO DQG FRXSOHG ZLWK 5HQ\LfV HQWURS\ RU +DYUGD&KDUYDWfV HQWURS\ RI KLJKHU RUGHU D!f ZLOO FRPSXWH HDFK LQWHUDFWLRQ DPRQJ DWXSOHV RI VDPSOHV SURYLGLQJ HYHQ PRUH LQIRUPDWLRQ DERXW WKH GHWDLOHG VWUXFWXUH DQG GLVWULEXWLRQ RI WKH GDWD VHW ,QIRUPDWLRQ )RUFH 4), -XVW OLNH LQ PHFKDQLFV WKH GHULYDWLYH RI WKH SRWHQWLDO HQHUJ\ LV D IRUFH LQ WKLV FDVH DQ LQIRUPDWLRQ GULYHQ IRUFH WKDW PRYHV WKH GDWD VDPSOHV LQ WKH VSDFH RI WKH LQWHUDFWLRQV WR FKDQJH WKH GLVWULEXWLRQ RI WKH GDWD DQG WKXV WKH HQWURS\ RI WKH GDWD 7KHUHIRUH PAGE 69 Â/*DLfDMf Rf *DLfDMfRfDMfDLffRf f RDLf FDQ EH UHJDUGHG DV WKH IRUFH WKDW D SDUWLFOH LQ WKH SRVLWLRQ RI VDPSOH DMf LPSLQJHV XSRQ D^Lf DQG ZLOO EH FDOOHG DQ LQIRUPDWLRQ IRUFH ,I DOO WKH GDWD VDPSOHV DUH IUHH WR PRYH LQ D FHUWDLQ UHJLRQ RI WKH VSDFH WKHQ WKH LQIRUPDWLRQ IRUFHV EHWZHHQ HDFK SDLU RI VDPSOHV ZLOO GULYH DOO WKH VDPSOHV WR D VWDWH ZLWK PLQLPXP LQIRUPDWLRQ SRWHQWLDO ,I ZH DGG DOO WKH FRQn WULEXWLRQV RI WKH LQIRUPDWLRQ IRUFHV IURP WKH HQVHPEOH RI VDPSOHV RQ DLf ZH KDYH WKH RYHUDOO HIIHFW RI WKH LQIRUPDWLRQ SRWHQWLDO RQ VDPSOH DLf LH f9 1 = 7L7 *DRfD fDLfmff f Vff 1 D\ L 7KH ,QIRUPDWLRQ IRUFH LV WKH UHDOL]DWLRQ RI WKH LQWHUDFWLRQ DPRQJ fLQIRUPDWLRQ SDUWLn FOHVf 7KH HQWURS\ ZLOO FKDQJH WRZDUGV WKH GLUHFWLRQ IRU HDFK LQIRUPDWLRQ SDUWLFOHf RI WKH LQIRUPDWLRQ IRUFH $FFRUGLQJO\ (QWURS\ PD[LPL]DWLRQ RU PLQLPL]DWLRQ FRXOG EH LPSOHn PHQWHG LQ D VLPSOH DQG HIIHFWLYH ZD\ 7KH &DOFXODWLRQ RI ,QIRUPDWLRQ 3RWHQWLDO DQG )RUFH 7KH DERYH KDV JLYHQ WKH FRQFHSW RI WKH LQIRUPDWLRQ SRWHQWLDO DQG WKH LQIRUPDWLRQ IRUFH +HUH WKH SURFHGXUH IRU WKH FDOFXODWLRQ RI WKH LQIRUPDWLRQ SRWHQWLDO DQG WKH LQIRUPDn WLRQ IRUFH ZLOO EH JLYHQ DFFRUGLQJ WR WKH IRUPXOD DERYH 7KH SURFHGXUH LWVHOI DQG WKH SORW KHUH PD\ HYHQ KHOS WR IXUWKHU XQGHUVWDQG WKH LGHD RI WKH LQIRUPDWLRQ SRWHQWLDO DQG WKH LQIRUPDWLRQ IRUFH 7R FDOFXODWH WKH LQIRUPDWLRQ SRWHQWLDO DQG WKH LQIRUPDWLRQ IRUFH WZR PDWULFHV FDQ EH GHILQHG DV f DQG WKHLU VWUXFWXUHV DUH LOOXVWUDWHG LQ )LJXUH PAGE 70 ' ^GLMf` GLMf DLf fÂ§ D-f Y 0f` YLMf *GLMfRf f D?f Df Â‘Â‘Â‘ DMf Â‘Â‘Â‘ D1f )LJXUH 7KH VWUXFWXUH RI 0DWUL[ DQG 9 1RWLFH WKDW HDFK HOHPHQW RI LV D YHFWRU LQ 5Q VSDFH ZKLOH HDFK HOHPHQW RI Y LV D VFDODU ,W LV HDV\ WR VKRZ IURP WKH DERYH WKDW 1 1 r1L ?M 1 f $2 fÂ§fÂ§ ; 1 9 M ZKHUH ) LV WKH RYHUDOO LQIRUPDWLRQ SRWHQWLDO ILf LV WKH IRUFH WKDW DLf UHFHLYHV :H FDQ DOVR GHILQH WKH LQIRUPDWLRQ SRWHQWLDO IRU HDFK SDUWLFOH DLf DV Y ALf 2EYLRXVO\ ) ^ )URP WKLV SURFHGXUH ZH FDQ FOHDUO\ VHH WKDW WKH LQIRUPDWLRQ SRWHQWLDO UHOLHV RQ WKH GLIn IHUHQFH EHWZHHQ HDFK SDLU RI GDWD SRLQWV DQG WKHUHIRUH PDNHV IXOO XVH RI WKH LQIRUPDWLRQ RI WKHLU UHODWLYH SRVLWLRQ LH WKH GDWD GLVWULEXWLRQ PAGE 71 4XDGUDWLF 0XWXDO ,QIRUPDWLRQ DQG &URVV ,QIRUPDWLRQ 3RWHQWLDO 20, DQG &URVV ,QIRUPDWLRQ 3RWHQWLDO 23f 7 )RU WKH JLYHQ GDWD VHW ^Df D[Lf DLff L RI D YDULDEOH 7 ; ;>MFf GHVFULEHG LQ WKH MRLQW DQG PDUJLQDO SGIV FDQ EH HVWLPDWHG E\ WKH *DXVVLDQ NHUQHO PHWKRG DV 1 I[W[[Lfrf A= *[ODALfDf*[D-fRf L I[6[ Lf ALIOLFUf f L 1 I[; f 7Y=*M&IOAf L fÂ§ )ROORZLQJ WKH VDPH SURFHGXUH DV WKH GHYHORSPHQW RI WKH LQIRUPDWLRQ SRWHQWLDO ZH FDQ REWDLQ WKH WKUHH WHUPV LQ ('40, DQG &640, EDVHG RQO\ RQ WKH JLYHQ GDWD VHW YM fÂ§ = = *DLfDMfRf 1 L ?M 1 1 fÂ§ = = *D[LfD[-f f*^D^LfD-fD f 1 L ?M 9;9 1 1 YN fÂ§ = = mrf R f N ? 1L LM != != *IOLm@Â‘f Af A=*DDfDf ,9 L 9 n Q \ L f ,I ZH GHILQH VLPLODU PDWULFHV WR f WKHQ ZH KDYH PAGE 72 ' ^GLMf` GLMf DLfDMf 'N ^GN:f` GNLMf DN^LfDN^Mf N Y ^Y]f` YLMf *GLMf Df ^YÂrf`! Yrrf *GNLMf Df N f Y A= Â= N O! ZKHUH Y]nf LV WKH LQIRUPDWLRQ SRWHQWLDO LQ WKH MRLQW VSDFH WKXV LV FDOOHG WKH MRLQW SRWHQWLDO YNLMf LV WKH LQIRUPDWLRQ SRWHQWLDO LQ WKH PDUJLQDO VSDFH WKXV LV FDOOHG WKH PDUJLQDO SRWHQn WLDO YLf LV WKH MRLQW LQIRUPDWLRQ SRWHQWLDO HQHUJ\ IRU ,37 DLf YNLf LV WKH PDUJLQDO LQIRUPDWLRQ SRWHQWLDO HQHUJ\ IRU WKH PDUJLQDO ,37 DNLf LQ WKH PDUJLQDO VSDFH LQGH[HG E\ N %DVHG RQ WKHVH TXDQWLWLHV WKH DERYH WKUHH WHUPV FDQ EH H[SUHVVHG DV f 9F r,YLY L 6R ('40, DQG &640, FDQ EH H[SUHVVHG DV PAGE 73 Gr?f;Lf 9(' Nn= =Y}\f_OYOYA) 1 L ?M O L =YZfY AAf -Y c f ,FV[ rrf A&6 a rin 1 ? A=YLr!R L )URP WKH DERYH ZH FDQ VHH WKDW ERWK 40,V FDQ EH H[SUHVVHG DV WKH FURVVFRUUHODWLRQV EHWZHHQ WKH PDUJLQDO LQIRUPDWLRQ SRWHQWLDOV DW GLIIHUHQW OHYHOV 9MLMfYLMf 9MLfY DQG 9[ 9 7KXV WKH DERYH PHDVXUH A(' LV FDOOHG WKH (XFOLGHDQ GLVWDQFH FURVV LQIRUPDn WLRQ SRWHQWLDO ('&,3f DQG WKH PHDVXUH 9&V LV WKH FDOOHG &DXFK\6FKZDUW] FURVV LQIRUn PDWLRQ SRWHQWLDO &6&,3f 7KH TXDGUDWLF PXWXDO LQIRUPDWLRQ DQG WKH FRUUHVSRQGLQJ FURVV LQIRUPDWLRQ SRWHQWLDO 7 FDQ EH HDVLO\ H[WHQGHG WR WKH FDVH ZLWK PXOWLSOH YDULDEOHV HJ ; [" [.f ,Q WKLV FDVH ZH KDYH VLPLODU PDWULFHV DQG Y DQG DOO VLPLODU ,3V DQG PDUJLQDO ,3V 7KHQ ZH KDYH WKH ('40, DQG &640, DQG WKHLU FRUUHVSRQGLQJ ('&,3 DQG &6&,3 DV IROORZV L OIF L('[W \(' ?WOIO Yrm!] IW Yrm IWYr f$7 L ?M ?N ? ? N N 1 1 ; Â‹ IW Yr\! -Y L OM ? N ,FV[Of ffff [Nf \FV a OR6 9 a? 7,"r N \Y \ f A 1 VLQLZ r ?N 9 PAGE 74 &URVV ,QIRUPDWLRQ )RUFHV Â&,)f 7KH FURVV LQIRUPDWLRQ SRWHQWLDO LV PRUH FRPSOH[ WKDQ WKH LQIRUPDWLRQ SRWHQWLDO 7KUHH GLIIHUHQW WHUPV RU SRWHQWLDOVf FRQWULEXWH WR WKH FURVV LQIRUPDWLRQ SRWHQWLDO 6R WKH IRUFH WKDW RQH GDWD SRLQW DLf UHFHLYHV FRPHV IURP WKHVH WKUHH VRXUFHV $ IRUFH LQ WKH MRLQW VSDFH FDQ GHFRPSRVHG LQWR PDUJLQDO FRPSRQHQWV 7KH PDUJLQDO IRUFH LQ HDFK PDUJLQDO VSDFH VKRXOG EH FRQVLGHUHG VHSDUDWHO\ WR VLPSOLI\ WKH DQDO\VLV 7KH FDVH RI ('&,3 DQG &6&,3 DUH GLIIHUHQW 7KH\ VKRXOG DOVR EH FRQVLGHUHG VHSDUDWHO\ 2QO\ WKH FURVV LQIRUPDn WLRQ SRWHQWLDO EHWZHHQ WZR YDULDEOHV ZLOO EH GHDOW ZLWK KHUH 7KH FDVH IRU PXOWLSOH YDULn DEOHV FDQ EH UHDGLO\ REWDLQHG LQ D VLPLODU ZD\ G9 (' )LUVW OHWfV ORRN DW WKH &,) RI ('&,3 fÂ§fÂ§ N f %\ WKH VLPLODU GHULYDWLRQ SUR RFLNLf FHGXUH WR WKDW RI WKH ,QIRUPDWLRQ )RUFH LQ ,3 ILHOG ZH FDQ REWDLQ WKH IROORZLQJ FN LFN:f` FNL-f YNL-fYNLfYNMf9N N G9 1 M L L N OrN f UL;AUQGP f ZKHUH DOO GN^LMf YNLMf YNLf 9N DUH GHILQHG DV WKH SUHYLRXV RQHV &N DUH FURVV PDWULFHV ZKLFK VHUYH DV IRUFH PRGLILHUV )RU WKH &,) RI &6&,3 VLPLODUO\ ZH KDYH r R G9 FV G9M G9F ^G9N Â‘ GDNLf 9MGDNLf 9FGDNLf 9NGDNLf e Y LMfYLMfGNLMf < YNLMfGNLMf YcLf YMffYNLMfGNLMf M M M 1 1 1 1 1 ] = YL&RYR A=YLfYf ?M L ?M L f PAGE 75 PDUJLQDO ,37 )LJXUH ,OOXVWUDWLRQ RI fUHDO ,37f DQG fYLUWXDO ,37f $Q ([SODQDWLRQ WR 20, $QRWKHU ZD\ WR ORRN DW WKH &,3 FRPHV IURP WKH H[SUHVVLRQ RI WKH IDFWRUL]HG PDUJLQDO SGIV )URP WKH DERYH ZH KDYH 1 1 $[Lf$[f *[[D[Lf Df*[DMf Df f 1 L 7KLV VXJJHVWV WKDW LQ WKH MRLQW VSDFH WKHUH DUH 79 fYLUWXDO ,37Vf 7 ^IMLf D-ff LM 1` ZKRVH SGI HVWLPDWHG E\ WKH 3DU]HQ :LQGRZ PHWKRG ZLOO EH H[DFWO\ WKH IDFWRUL]HG PDUJLQDO SGIV RI WKH fUHDO ,37Vf 7KH UHODWLRQ EHWZHHQ DOO W\SHV RI ,37V LV LOOXVWUDWHG LQ )LJXUH )URP WKH DERYH GHVFULSWLRQ ZH FDQ VHH WKDW WKH ('&,3 LV WKH VTXDUH RI WKH (XFOLGHDQ GLVWDQFH EHWZHHQ UHDO ,3 ILHOG IRUPHG E\ UHDO ,37Vf DQG WKH YLUWXDO ,3 ILHOG IRUPHG E\ YLUn WXDO ,37Vf DQG WKH &6&,3 LV UHODWHG WR WKH DQJOH EHWZHHQ WKH UHDO ,3 ILHOG DQG WKH YLUWXDO ,3 ILHOG DV )LJXUH VKRZV :KHQ UHDO ,37V DUH RUJDQL]HG VXFK WKDW HDFK YLUWXDO ,37 KDV DW OHDVW RQH UHDO ,37 LQ WKH VDPH SRVLWLRQ WKH &,3 LV ]HUR DQG WZR PDUJLQDO YDULDEOHV ;M PAGE 76 DQG [ DUH VWDWLVWLFDOO\ LQGHSHQGHQW ZKHQ UHDO ,37V DUH GLVWULEXWHG DORQJ D GLDJRQDO OLQH WKH GLIIHUHQFH EHWZHHQ WKH GLVWULEXWLRQ RI UHDO ,37V DQG YLUWXDO ,37V LV PD[LPL]HG 7ZR H[WUHPH FDVHV DUH LOOXVWUDWHG LQ )LJXUH DQG )LJXUH ,W VKRXOG EH QRWLFHG WKDW ERWK ;@ DQG [ DUH QRW QHFHVVDULO\ VFDODUV $FWXDOO\ WKH\ FDQ EH PXOWLGLPHQVLRQDO YDULDEOHV DQG WKHLU GLPHQVLRQV FDQ EH HYHQ GLIIHUHQW &,3V DUH JHQHUDO PHDVXUHV IRU WKH VWDWLVWLFDO UHODWLRQ EHWZHHQ WZR YDULDEOHV EDVHG PHUHO\ RQ JLYHQ GDWDf PDUJLQDO ,37 )LJXUH ,OOXVWUDWLRQ RI ,QGHSHQGHQW ,37V PDUJLQDO ,37 )LJXUH ,OOXVWUDWLRQ RI +LJKO\ &RUUHODWHG 9DULDEOHV PAGE 77 &+$37(5 /($51,1* )520 (;$03/(6 $ OHDUQLQJ PDFKLQH LV XVXDOO\ D QHWZRUN 1HXUDO QHWZRUNV DUH RI SDUWLFXODU LQWHUHVW LQ WKLV GLVVHUWDWLRQ $FWXDOO\ DOPRVW DOO DGDSWLYH V\VWHPV FDQ EH UHJDUGHG DV QHWZRUN PRGHOV QR PDWWHU LI WKH\ DUH OLQHDU RU QRQOLQHDU IHHGIRUZDUG RU UHFXUUHQW ,Q WKLV VHQVH WKH OHDUQn LQJ PDFKLQHV VWXGLHG KHUH DUH QHXUDO QHWZRUNV 6R OHDUQLQJ LQ WKLV FLUFXPVWDQFH LV D SURn FHVV E\ ZKLFK WKH IUHH SDUDPHWHUV RI D QHXUDO QHWZRUN DUH DGDSWHG WKURXJK D SURFHVV RI VWLPXODWLRQ E\ WKH HQYLURQPHQW LQ ZKLFK WKH QHWZRUN LV HPEHGGHG >0HQ@ 7KH HQYLURQn PHQWDO VWLPXODWLRQ DV SRLQWHG RXW LQ &KDSWHU LV XVXDOO\ LQ WKH IRUP RI fH[DPSOHVf DQG WKXV OHDUQLQJ LV DERXW KRZ WR REWDLQ LQIRUPDWLRQ IURP fH[DPSOHVf f/HDUQLQJ IURP H[DPn SOHVf LV WKH WRSLF RI WKLV FKDSWHU ZKLFK ZLOO LQFOXGH WKH UHYLHZ DQG GLVFXVVLRQ RQ OHDUQLQJ V\VWHPV OHDUQLQJ PHFKDQLVPV WKH LQIRUPDWLRQWKHRUHWLF YLHZSRLQW DERXW OHDUQLQJ fOHDUQn LQJ IURP H[DPSOHVf E\ WKH LQIRUPDWLRQ SRWHQWLDO DQG ILQDOO\ D GLVFXVVLRQ RQ JHQHUDOL]Dn WLRQ /HDUQLQJ 6\VWHP $FFRUGLQJ WR WKH DEVWUDFW PRGHO GHVFULEHG LQ &KDSWHU D OHDUQLQJ V\VWHP LV D PDSn SLQJ QHWZRUN 7KH IOH[LELOLW\ RI WKH PDSSLQJ KLJKO\ GHSHQGV RQ WKH VWUXFWXUH RI WKH V\Vn WHP 7KH VWUXFWXUH RI VHYHUDO W\SLFDO QHWZRUN V\VWHPV ZLOO EH UHYLHZHG LQ WKLV VHFWLRQ 1HWZRUN PRGHOV FDQ EDVLFDOO\ EH GLYLGHG LQWR WZR FDWHJRULHV VWDWLF PRGHOV DQG G\QDPLF PRGHOV 7KH VWDWLF PRGHO FDQ DOVR EH FDOOHG D PHPRU\OHVV PRGHO ,Q D QHWZRUN PAGE 78 PHPRU\ DERXW WKH VLJQDO SDVW LV REWDLQHG E\ XVLQJ GHOD\HG FRQQHFWLRQV WKH FRQQHFWLRQV WKURXJK GHOD\ XQLWVf ,Q FRQWLQXRXV WLPH FDVH GHOD\ FRQQHFWLRQV EHFRPH IHHGEDFN FRQQHFn WLRQV ,Q WKLV GLVVHUWDWLRQ RQO\ GLVFUHWH WLPH VLJQDOV DQG V\VWHPV DUH VWXGLHGf *HQHUDOO\ VSHDNLQJ LI WKHUH DUH GHOD\ XQLWV LQ D QHWZRUN WKHQ WKH QHWZRUN ZLOO KDYH PHPRU\ )RU LQVWDQFH WKH WUDQVYHUVDO ILOWHU >+D\ :LG +RQ@ WKH JHQHUDO +5 ILOWHU >+D\ :LG +RQ@ WKH WLPH GHOD\ QHXUDO QHWZRUN 7'11f >/DQ :DL@ WKH JDPPD QHXn UDO QHWZRUN >GH9 3UL@ WKH JHQHUDO UHFXUUHQW QHXUDO QHWZRUNV >+D\ +D\@ HWF DUH DOO G\QDPLF QHWZRUN V\VWHPV ZLWK PHPRU\ RU GHOD\ FRQQHFWLRQV ,I D QHWZRUN KDV GHOD\ FRQQHFWLRQV LW KDV WR EH GHVFULEHG E\ GLIIHUHQFH HTXDWLRQV LQ WKH FRQWLQXRXV WLPH FDVH GLIIHUHQWLDO HTXDWLRQVf ZKLOH D VWDWLF QHWZRUN FDQ EH H[SUHVVHG E\ DOJHEUDLF HTXDn WLRQV OLQHDU RU QRQOLQHDUf 7KHUH LV DOVR DQRWKHU WD[RQRP\ IRU WKH VWUXFWXUH RI OHDUQLQJ RU DGDSWLYH V\VWHPV )RU LQVWDQFH OLQHDU PRGHOV DQG QRQOLQHDU PRGHOV EHORQJV WR DQRWKHU FDWHJRU\ 7KH IROORZLQJ ZLOO VWDUW ZLWK WKH VWDWLF OLQHDU PRGHO 6WDWLF 0RGHOV ( /LQHDU 0RGHO 3RVVLEO\ WKH VLPSOHVW PDSSLQJ QHWZRUN VWUXFWXUH LV WKH OLQHDU PRGHO 0DWKHPDWLFDOO\ LW LV D OLQHDU WUDQVIRUPDWLRQ $V VKRZQ LQ )LJXUH WKH LQSXW DQG RXWSXW UHODWLRQ RI WKH QHWZRUN LV GHILQHG E\ f \ Z7[ \ \ \Nf7 H5N P[ N UP :c 5 [H5P Z ZM :IFf J 5 f PAGE 79 ZKHUH [ LV WKH LQSXW VLJQDO DQG \ LV WKH RXWSXW VLJQDO Z LV WKH OLQHDU WUDQVIRUPDWLRQ PDWUL[ ZKHUH HDFK FROXPQ ZL L tf LV D YHFWRU (DFK RXWSXW RU JURXS RI RXWSXWV LV D VXEVSDFH RI WKH LQSXW VLJQDO VSDFH (LJHQDQDO\VLV SULQFLSDO FRPSRQHQW DQDO\VLVf >2MD 'LD .XQ 'XG 'XG@ DQG JHQHUDOL]HG HLJHQDQDO\VLV >;X' &KD 'XG 'XG@ DUH VHHNLQJ VLJQDO VXEn VSDFH ZLWK PD[LPXP VLJQDOWRQRLVH UDWLR 615f RU VLJQDOWRVLJQDO UDWLR )RU SDWWHUQ FODVVLILFDn WLRQ VXEVSDFH PHWKRGV VXFK DV )LVKHU 'LVFULPLQDQW $QDO\VLV DUH DOVR YHU\ XVHIXO WRROV >2MD 'XG 'XG@ /LQHDU PRGHOV FDQ DOVR EH XVHG IRU LQYHUVH SUREOHPV VXFK DV %66 DQG ,&$ >&RP &DR &DUE %HO 'HF &DU PAGE 80 FHSWURQ LQLWLDWHG WKH PDWKHPDWLFDO DQDO\VLV RI OHDUQLQJ DQG LW LV WKH ILUVW PDFKLQH ZKLFK OHDUQV GLUHFWO\ IURP H[DPSOHV >9DS@ $OWKRXJK WKH SHUFHSWURQ GHPRQVWUDWHG DQ DPD]LQJ OHDUQLQJ DELOLW\ LWV SHUIRUPDQFH LV VWLOO OLPLWHG E\ LWV VLQJOH OD\HU VWUXFWXUH >0LQ@ 7KH 0/3 H[WHQGV WKH SHUFHSWURQ E\ SXWWLQJ PRUH OD\HU LQ WKH QHWZRUN VWUXFWXUH DV VKRZQ LQ )LJXUH )RU WKH HDVH RI PDWKHPDWLFDO DQDO\VLV WKH QRQOLQHDU IXQFWLRQ LQ HDFK QRGH LV XVXDOO\ D FRQWLQXRXV GLIIHUHQWLDEOH IXQFWLRQ HJ WKH VLJPRLG IXQFWLRQ I[f H nf f JLYHV D W\SLFDO LQSXWRXWSXW UHODWLRQ RI WKH QHWZRUN LQ )LJXUH n rL 2Âr Ecf L \M = DM! = ]]Mf M O 7 PAGE 81 WKH 0/3 LV D YHU\ GHVLUDEOH IHDWXUH IRU D OHDUQLQJ V\VWHP 7KLV LV RQH UHDVRQ ZK\ WKH 0/3 LV VR SRSXODU 7KH 0/3 LV D NLQG RI fJOREDOf PRGHO ZKRVH EDVLF EXLOGLQJ EORFN LV D K\SHUn SODQH ZKLFK LV WKH SURMHFWLRQ UHSUHVHQWHG E\ WKH VXP RI WKH SURGXFWV DW HDFK QRGH 7KH QRQOLQHDU IXQFWLRQ DW HDFK QRGH GLVWRUWV LWV K\SHUSODQH WR D ULGJH IXQFWLRQ ZKLFK DOVR VHUYHV DV D VHOHFWRU 6R WKH RYHUDOO IXQFWLRQDO VXUIDFH RI D 0/3 LV WKH FRPELQDWLRQ RI WKHVH ULGJH IXQFWLRQ 7KH QXPEHU RI KLGGHQ QRGHV SURYLGHV WKH QXPEHU RI ULGJH IXQFWLRQV 7KHUHIRUH DV ORQJ DV WKH QXPEHU RI QRGHV LV ODUJH HQRXJK WKH RYHUDOO IXQFWLRQDO VXUIDFH FDQ DSSUR[LPDWH DQ\ PDSSLQJ 7KLV LV DQ LQWXLWLYH XQGHUVWDQGLQJ RI WKH XQLYHUVDO DSSUR[Ln PDWLRQ SURSHUW\ RI WKH 0/3 )LJXUH 0XOWLOD\HU 3HUFHSWURQ 5DGLDO%DVLV )XQFWLRQ 5%)f $V VKRZQ LQ )LJXUH WKH 5%) QHWZRUN KDV WZR OD\HUV WKH KLGGHQ OD\HU LV WKH QRQn OLQHDU OD\HU ZKRVH LQSXWRXWSXW UHODWLRQ LV D UDGLDOEDVLV IXQFWLRQ HJ WKH *DXVVLDQ IXQF PAGE 82 ,,r+, WLRQ ]" H f ZKHUH S LV WKH PHDQ FHQWHUf RI WKH *DXVVLDQ IXQFWLRQ DQG GHWHUPLQHV WKH ORFDWLRQ RI WKH *DXVVLDQ IXQFWLRQ LQ WKH LQSXW VSDFH FW LV WKH YDULDQFH RI WKH *DXVVLDQ IXQFWLRQ DQG GHWHUPLQHV WKH VKDSH RU VKDUSQHVV RI WKH *DXVVLDQ IXQFWLRQ 7KH RXWSXW OD\HU LV D OLQHDU OD\HU 6R WKH RYHUDOO LQSXWRXWSXW UHODWLRQ RI WKH QHWZRUN FDQ EH H[SUHVVHG DV L Q LL ULOOr+, ]W H FWn \M ZM] ] ]f f ZKHUH Z DUH OLQHDU SURMHFWLRQV D DQG DUH WKH VDPH DV DERYH )LJXUH 5DGLDO%DVLV )XQFWLRQ 1HWZRUN 5%) 1HWZRUNf 7KH 5%) QHWZRUN LV DOVR D XQLYHUVDO DSSUR[LPDWRU LI WKH QXPEHU RI KLGGHQ QRGHV LV ODUJH HQRXJK >3RJ 3DU +D\@ +RZHYHU XQOLNH WKH 0/3 WKH EDVLF EXLOGLQJ EORFN LV QRW D fJOREDOf IXQFWLRQ EXW D fORFDOf RQH VXFK DV WKH *DXVVLDQ IXQFWLRQ 7KH RYHUDOO PAGE 83 PDSSLQJ VXUIDFH LV DSSUR[LPDWHG E\ WKH OLQHDU FRPELQDWLRQ RI VXFK fORFDOf VXUIDFHV ,QWXn LWLYHO\ ZH FDQ DOVR LPDJLQH WKDW DQ\ VKDSH RI WKH PDSSLQJ VXUIDFH FDQ EH DSSUR[LPDWHG E\ WKH OLQHDU FRPELQDWLRQ RI VPDOO SLHFH RI ORFDO VXUIDFHV LI WKHUH LV HQRXJK VXFK EDVLF EXLOGLQJ EORFNV 7KH 5%) QHWZRUN LV DOVR DQ RSWLPDO UHJXODUL]DWLRQ IXQFWLRQ >3RJ +D\@ ,W KDV EHHQ DSSOLHG DV H[WHQVLYHO\ DV WKH 0/3 LQ YDULRXV DUHDV '\QDPLF 0RGHOV + 7UDQVYHUVDO )LOWHU 7KH WUDQVYHUVDO ILOWHU DOVR UHIHUUHG WR DV D WDSSHGGHOD\ OLQH ILOWHU RU ),5 ILOWHU FRQVLVWV RI WZR SDUWV DV GHSLFWHG LQ )LJXUH f f WKH WDSSHGGHOD\ OLQH f WKH OLQHDU SURMHFWLRQ 7KH LQSXWRXWSXW UHODWLRQ FDQ EH H[SUHVVHG DV T US \L \ \Qf <-ZL[QLf Z [ Z Z ZTf [ rmf [QTff f L ZKHUH ZL DUH WKH SDUDPHWHUV RI WKH ILOWHU %HFDXVH RI LWV YHUVDWLOLW\ DQG HDVH RI LPSOHPHQn WDWLRQ WKH WUDQVYHUVDO ILOWHU KDV EHFRPH DQ HVVHQWLDO VLJQDO SURFHVVLQJ VWUXFWXUH LQ D ZLGH YDULHW\ RI DSSOLFDWLRQV >+D\ +RQ@ )LJXUH 7UDQVYHUVDO )LOWHU PAGE 84 )LJXUH *DPPD )LOWHU *DPPD 0RGHO $V VKRZQ LQ )LJXUH WKH JDPPD ILOWHU LV VLPLODU WR WUDQVYHUVDO ILOWHU H[FHSW WKDW WKH WDSSHG GHOD\ OLQH LV UHSODFHG E\ WKH JDPPD PHPRU\ OLQH >GH9 3UL@ 7KH JDPPD PHPRU\ LV D GHOD\ WDS ZLWK IHHGEDFN 7KH WUDQVIHU IXQFWLRQ RI RQH WDS JDPPD PHPRU\ LV *]f -s/ fÂ§ fÂ§ Sf]B =a3f f 7KH FRUUHVSRQGLQJ LPSXOVH UHVSRQVH LV WKH JDPPD IXQFWLRQ ZLWK RQH SDUDPHWHU S JQf SOSf f Q f )RU WKH SWK WDS RI WKH JDPPD PHPRU\ OLQH WKH WUDQVIHU IXQFWLRQ DQG LWV LPSXOVH UHVSRQVH WKH JDPPD IXQFWLRQf DUH *Af ]=IfÂ§M Q!aS &RPSDUHG ZLWK WKH WDSSHG GHOD\ OLQH WKH JDPPD PHPRU\ OLQH LV D UHFXUVLYH VWUXFWXUH DQG KDV LQILQLWH OHQJWK RI LPSXOVH UHVSRQVH 7KHUHIRUH WKH fPHPRU\ GHSWKf FDQ EH DGMXVWHG E\ WKH SDUDPHWHU S LQVWHDG RI IL[HG E\ WKH QXPEHU RI WDSV LQ WKH WDSSHG GHOD\ OLQH &RPSDUHG ZLWK WKH JHQHUDO +5 ILOWHU WKH DQDO\VLV RI WKH VWDELOLW\ RI WKH JDPPD PHPn RU\ LV VLPSOH :KHQ S WKH JDPPD PHPRU\ OLQH LV VWDEOH HYHU\ZKHUH LQ WKH OLQHf PAGE 85 $QG DOVR ZKHQ S WKH JDPPD PHPRU\ OLQH EHFRPHV WKH WDSSHG GHOD\ OLQH 6R WKH JDPPD PHPRU\ OLQH LV WKH JHQHUDOL]DWLRQ RI WKH WDSSHG GHOD\ OLQH 7KH JDPPD ILOWHU LV D JRRG FRPSURPLVH EHWZHHQ WKH ),5 ILOWHU DQG WKH +5 ILOWHU ,W KDV EHHQ ZLGHO\ DSSOLHG WR D YDULHW\ RI VLJQDO SURFHVVLQJ DQG SDWWHUQ UHFRJQLWLRQ SUREOHPV 7KH $OO 3ROH +5 )LOWHU )LJXUH 7KH $OO 3ROH +5 )LOWHU $V VKRZQ LQ )LJXUH WKH DOO SROH +5 ILOWHU LV FRPSRVHG RI RQO\ WKH GHOD\HG IHHGn EDFN DQG WKHUH LV QR IHHGIRUZDUG FRQQHFWLRQV LQ WKH QHWZRUN VWUXFWXUH 7KH WUDQVIHU IXQFn WLRQ RI WKH ILOWHU LV +]f f Â Z]fn L Q 2EYLRXVO\ WKLV LV WKH LQYHUVH V\VWHP RI WKH ),5 ILOWHU +^]f fÂ§ Z] f ZKLFK KDV O EHHQ XVHG LQ GHFRQYROXWLRQ SUREOHPV >+D\D@ 7KHUH DUH DOVR LWV FRXQWHUSDUW IRU WZR LQSXWV DQG WZR RXWSXWV V\VWHP ZKLFK KDV EHHQ XVHG LQ WKH EOLQG VRXUFH DQG EOLQG VRXUFH VHSDUDWLRQ SUREOHPV >1JX :DQ@ ,Q JHQHUDO WKLV W\SH RI ILOWHUV PD\ EH YHU\ XVHIXO LQ LQYHUVH RU V\VWHP LGHQWLILFDWLRQ SUREOHP PAGE 86 . 7'11 DQG *DPPD 1HXUDO 1HWZRUN ,Q DQ 0/3 HDFK FRQQHFWLRQ LV LQVWDQWDQHRXV DQG WKHUH LV QR WHPSRUDO VWUXFWXUH LQ LW ,I WKH LQVWDQWDQHRXV FRQQHFWLRQV DUH UHSODFHG E\ D ILOWHU WKHQ HDFK QRGH ZLOO KDYH WKH DELOLW\ WR SURFHVV WLPH VLJQDOV 7KH WLPH GHOD\ QHXUDO QHWZRUN 7'11f LV IRUPHG E\ UHSODFLQJ WKH FRQQHFWLRQV LQ WKH 0/3 ZLWK WUDQVYHUVDO ILOWHUV >/DQ :DL@ 7KH JDPPD QHXUDO QHWn ZRUN LV WKH UHVXOW RI UHSODFLQJ WKH FRQQHFWLRQV LQ WKH 0/3 ZLWK JDPPD ILOWHUV >GH9 3UL@ 7KHVH W\SHV RI QHXUDO QHWZRUNV H[WHQG WKH DELOLW\ RI WKH 0/3 [ )LJXUH 0XOWLOD\HU 3HUFHSWURQ ZLWK 'HOD\HG &RQQHFWLRQV / *HQHUDO 5HFXUUHQW 1HXUDO 1HWZRUN $ JHQHUDO QRQOLQHDU G\QDPLF V\VWHP LV WKH PXOWLOD\HU SHUFHSWURQ ZLWK VRPH GHOD\HG FRQQHFWLRQV $V )LJXUH VKRZV IRU LQVWDQFH WKH RXWSXW RI QRGH ]c UHOLHV RQ WKH SUHYLn RXV RXWSXW RI QRGH \N ]cQf I^Z7O[^Qf EO G\N^Q?ff f PAGE 87 7KHUH PD\ EH VRPH RWKHU QRGHV ZKLFK KDYH WKH VLPLODU GHOD\HG FRQQHFWLRQV 7KLV W\SH RI QHXUDO QHWZRUN LV SRZHUIXO EXW FRPSOLFDWHG ,W LV GLIILFXOW WR DQDO\]H DGDSWDWLRQ DOWKRXJK LWV IOH[LELOLW\ DQG SRWHQWLDO DUH KLJK /HDUQLQJ 0HFKDQLVPV 7KH FHQWUDO SDUW RI D OHDUQLQJ PHFKDQLVP LV WKH FULWHULRQ 7KH UDQJH RI DSSOLFDWLRQ RI D OHDUQLQJ V\VWHP PD\ EH YHU\ EURDG )RU LQVWDQFH D OHDUQLQJ V\VWHP RU DGDSWLYH VLJQDO SURn FHVVLQJ V\VWHP FDQ EH XVHG IRU GDWD FRPSUHVVLRQ HQFRGLQJ RU GHFRGLQJ VLJQDOV QRLVH RU HFKR FDQFHOODWLRQ VRXUFH VHSDUDWLRQ VLJQDO HQKDQFHPHQW SDWWHUQ FODVVLILFDWLRQ V\VWHP LGHQWLILFDWLRQ DQG FRQWURO HWF +RZHYHU WKH FULWHULRQ WR DFKLHYH VXFK GLYHUVH SXUSRVHV FDQ EH EDVLFDOO\ GLYLGHG LQWR RQO\ WZR W\SHV RQH LV EDVHG RQ WKH HQHUJ\ PHDVXUHV WKH RWKHU LV EDVHG RQ LQIRUPDWLRQ PHDVXUHV $V SRLQWHG RXW LQ &KDSWHU WKH HQHUJ\ PHDVXUHV FDQ EH UHJDUGHG DV VSHFLDO FDVHV RI LQIRUPDWLRQ PHDVXUHV ,Q WKH IROORZLQJ YDULRXV HQHUJ\ PHDVXUHV DQG LQIRUPDWLRQ PHDVXUHV ZLOO EH GLVFXVVHG 2QFH WKH FULWHULRQ RI D V\VWHP LV GHWHUPLQHG WKH WDVN OHIW LV WR DGMXVW WKH SDUDPHWHUV RI WKH V\VWHP VR DV WR RSWLPL]H WKH FULWHULRQ 7KHUH DUH D YDULHW\ RI RSWLPL]DWLRQ WHFKQLTXHV 7KH JUDGLHQW PHWKRG LV SHUKDSV WKH VLPSOHVW EXW LW LV D JHQHUDO PHWKRG >*LO +HV :LG@ ZKLFK LV EDVHG RQ WKH ILUVW RUGHU DSSUR[LPDWLRQ RI WKH SHUIRUPDQFH VXUIDFH ,WV RQn OLQH YHUVLRQfÂ§WKH VWRFKDVWLF JUDGLHQW PHWKRG >:LG@ LV ZLGHO\ XVHG LQ DGDSWLYH DQG OHDUQn LQJ V\VWHPV 1HZWRQfV PHWKRG >*Â +HV :LG@ LV D PRUH VRSKLVWLFDWHG PHWKRG ZKLFK LV EDVHG RQ WKH VHFRQG RUGHU DSSUR[LPDWLRQ RI WKH SHUIRUPDQFH VXUIDFH ,WV YDULHG YHUVLRQfÂ§WKH FRQMXJDWH JUDGLHQW PHWKRG >+HV@ ZLOO DYRLG WKH FDOFXODWLRQ RI WKH LQYHUVH RI WKH +HVVLDQ PDWUL[ DQG WKXV LV FRPSXWDWLRQDOO\ PRUH HIILFLHQW >+HV@ 7KHUH DUH DOVR RWKHU WHFKQLTXHV ZKLFK DUH HIILFLHQW IRU VSHFLILF DSSOLFDWLRQV )RU LQVWDQFH WKH ([SHFWD PAGE 88 WLRQ DQG 0D[LPL]DWLRQ DOJRULWKP IRU WKH PD[LPXP OLNHOLKRRG HVWLPDWLRQ RU D FODVV RI QRQn QHJDWLYH IXQFWLRQ PD[LPL]DWLRQ >'HP 0FO ;X' ;X'@ 7KH QDWXUDO JUDGLHQW PHWKRG E\ PHDQV RI LQIRUPDWLRQ JHRPHWU\ LV XVHG LQ WKH FDVH ZKHUH WKH SDUDPHWHU VSDFH LV FRQVWUDLQHG >$PD@ ,Q WKH IROORZLQJ YDULRXV WHFKQLTXHV ZLOO DOVR EH EULHIO\ UHYLHZHG /HDUQLQJ &ULWHULD f 06( &ULWHULRQ 7KH PHDQ VTXDUHG HUURU 06(f FULWHULRQ LV RQH RI WKH PRVW ZLGHO\ XVHG FULWHULD )RU WKH OHDUQLQJ V\VWHP GHVFULEHG LQ &KDSWHU LI WKH JLYHQ HQYLURQPHQWDO GDWD LV ^[Qf G^Qff?Q 1` ZKHUH [Qf LV WKH LQSXW VLJQDO DQG G^Qf LV WKH GHVLUHG VLJQDO WKHQ WKH RXWSXW VLJQDO LV \Qf T[Qf :f DQG WKH HUURU VLJQDO LV HQf GQf fÂ§\Qf 7KH 06( FULWHULRQ FDQ EH GHILQHG DV 1 1 ; O ; $}f$QfI f Q Q ,W LV EDVLFDOO\ WKH VTXDUHG (XFOLGHDQ GLVWDQFH EHWZHHQ GHVLUHG VLJQDO GQf DQG WKH RXWn SXW VLJQDO \Qf IURP WKH JHRPHWULFDO SRLQW RI YLHZ DQG WKH HQHUJ\ RI WKH HUURU VLJQDO IURP WKH SRLQW RI YLHZ RI WKH HQHUJ\ DQG HQWURS\ PHDVXUHV 0LQLPL]DWLRQ RI WKH 06( FULWHULRQ ZLOO UHVXOW LQ D FORVHVW RXWSXW VLJQDO WR WKH GHVLUHG VLJQDO LQ WKH (XFOLGHDQ GLVn WDQFH VHQVH $V PHQWLRQHG LQ &KDSWHU LI ZH DVVXPH WKH HUURU VLJQDO LV ZKLWH *DXVVn LDQ ZLWK ]HURPHDQ WKHQ WKH PLQLPL]DWLRQ RI WKH 06( LV HTXLYDOHQW WR WKH PLQLPL]DWLRQ RI WKH HQWURS\ RI WKH HUURU VLJQDO PAGE 89 )RU D PXOWLSOH RXWSXW V\VWHP LH WKH RXWSXW VLJQDO DQG WKH GHVLUHG VLJQDO DUH PXOWLn GLPHQVLRQDO WKH HUURU VLJQDO LV WKHQ PXOWLGLPHQVLRQDO DQG WKH GHILQLWLRQ RI WKH 06( FULWHULRQ LV WKH VDPH DV GHVFULEHG LQ &KDSWHU f 6LJQDOWR1RLVH 5DWLR 615f 7KH VLJQDOWRQRLVH UDWLR LV DOVR D IUHTXHQWO\ XVHG FULWHULRQ LQ WKH VLJQDO SURFHVVLQJ DUHD 7KH SXUSRVH RI PDQ\ VLJQDO SURFHVVLQJ V\VWHPV LV WR HQKDQFH WKH 615 $ ZHOO NQRZQ H[DPSOH LV WKH SULQFLSDO FRPSRQHQW DQDO\VLV 3&$f ZKHUH D OLQHDU SURMHFWLRQ LV GHVLUHG VXFK WKDW WKH 615 LQ WKH RXWSXW LV PD[LPL]HG ZKHQ WKH QRLVH LV DVVXPHG WR 7 7 PAGE 90 +Z7[f+Z7[QRLVHf ORJ f 7 Z Z ZKLFK LV HTXLYDOHQW WR WKH 615 FULWHULRQ 7KH VROXWLRQ WR WKLV SUREOHP LV WKH HLJHQYHFn WRU WKDW FRUUHVSRQGV WR WKH ODUJHVW HLJHQYDOXH RI 5[ 7KH 3&$ SUREOHP FDQ DOVR EH IRUPXODWHG DV WKH PLQLPXP UHFRQVWUXFWLRQ 06( SUREn OHP >.XQ@ f fFDQ DOVR EH UHJDUGHG DV DQ DXWRDVVRFLDWLRQ SUREOHP LQ D WZROD\HU QHWZRUN ZLWK WKH FRQVWUDLQWV WKDW WKH WZR OD\HU ZHLJKWV VKRXOG EH GXDO ZLWK HDFK RWKHU LH RQH LV WKH WUDQVSRVH RI WKH RWKHUf 7KH PLQLPL]DWLRQ VROXWLRQ WR f LV HTXLYDOHQW WR WKH PD[Ln PL]DWLRQ VROXWLRQ WR f RU f f 6LJQDOWR6LJQDO 5DWLR )RU WKH VDPH OLQHDU QHWZRUN LI WKH LQSXW VLJQDO LV VZLWFKHG EHWZHHQ WZR ]HURPHDQ VLJQDOV MFM DQG [ WKHQ WKH VLJQDOWRVLJQDO UDWLR LQ WKH RXWSXW RI WKH OLQHDU SURMHFWLRQ ZLOO EH f ZKHUH ="Y LV WKH FRYDULDQFH PDWUL[ RI [ DQG 5U LV WKH FRYDULDQFH PDWUL[ RI [ 7KH 0D[LPL]DWLRQ RI WKLV FULWHULRQ LV WR HQKDQFH WKH VLJQDO [c LQ WKH RXWSXW DQG WR DWWHQXDWH WKH VLJQDO [ DW WKH VDPH WLPH )URP WKH LQIRUPDWLRQWKHRUHWLF SRLQW RI YLHZ LI ERWK VLJQDOV DUH *DXVVLDQ VLJQDOV WKHQ WKH HQWURS\ GLIIHUHQFH LQ WKH RXWSXW ZLOO EH PAGE 91 7 7 L Z 5U Z +Z7;Of+:fO[f ]ORJ A f Z 5< Z [ ZKLFK LV HTXLYDOHQW WR D VLJQDOWRVLJQDO UDWLR 7KH PD[LPL]DWLRQ VROXWLRQ WR f RU f LV WKH JHQHUDOL]HG HLJHQYHFWRU ZLWK WKH ODUJHVW JHQHUDOL]HG HLJHQYDOXH A[r1RSWLPDO fÂ§ APD[A[r1RSWLPDO f >&KD@ DOVR VKRZV WKDW ZKHQ WKLV FULWHULRQ LV DSSOLHG WR FODVVLILFDWLRQ SUREOHPV LW FDQ EH IRUPXODWHG DV D KHWHURDVVRFLDWLRQ SUREOHP ZLWK D 06( FULWHULRQ DQG D FRQVWUDLQW f 7KH 0D[LPXP /LNHOLKRRG 7KH PD[LPXP OLNHOLKRRG HVWLPDWLRQ KDV EHHQ ZLGHO\ XVHG LQ WKH SDUDPHWULF PRGHO HVWLPDWLRQ >'XG 'XG@ ,W KDV DOVR EHHQ H[WHQVLYHO\ DSSOLHG WR fOHDUQLQJ IURP H[DPSOHVf )RU LQVWDQFH WKH KLGGHQ PDUNRY PRGHO KDV EHHQ VXFFHVVIXOO\ DSSOLHG LQ WKH VSHHFK UHFRJQLWLRQ SUREOHP >5DE +XD@ 7UDLQLQJ RI PRVW KLGGHQ PDUNRY PRGHOV LV EDVHG RQ PD[LPXP OLNHOLKRRG HVWLPDWLRQ ,Q JHQHUDO VXSSRVH WKHUH LV D VWDn WLVWLFDO PRGHO S] Zf ZKHUH ] LV D UDQGRP YDULDEOH DQG Z DUH D VHW RI SDUDPHWHUV DQG WKH WUXH SUREDELOLW\ GLVWULEXWLRQ LV T^]f EXW XQNQRZQ 7KH SUREOHP LV WR ILQG Z VR WKDW S] Zf LV WKH FORVHVW WR T]f :H FDQ VLPSO\ DSSO\ WKH LQIRUPDWLRQ FURVVHQWURS\ FULn WHULRQ LH WKH .XOOEDFN/HLEOHU FULWHULRQ WR WKH SUREOHP -Zf -J]fORJ (>?RJS]Zf?+V]f f ZKHUH +V]f LV WKH 6KDQQRQ HQWURS\ RI ] ZKLFK GRHV QRW GHSHQG RQ WKH SDUDPHWHUV Z DQG /Zf (>?RJS] Zf@ LV H[DFWO\ WKH ORJ OLNHOLKRRG IXQFWLRQ RI S] Zf 6R WKH PLQLPL]DWLRQ RI f LV HTXLYDOHQW WR WKH PD[LPL]DWLRQ RI WKH ORJ OLNHOLKRRG IXQFWLRQ /Zf ,Q RWKHU ZRUGV WKH PD[LPXP OLNHOLKRRG HVWLPDWLRQ LV H[DFWO\ WKH VDPH DV WKH PAGE 92 PLQLPXP .XOOEDFN/HLEOHU FURVVHQWURS\ EHWZHHQ WKH WUXH SUREDELOLW\ GLVWULEXWLRQ DQG WKH PRGHO SUREDELOLW\ GLVWULEXWLRQ >$PD@ f 7KH ,QIRUPDWLRQ7KHRUHWLF 0HDVXUHV IRU %%6 DQG ,&$ $V LQWURGXFHG LQ &KDSWHU WKH PD[LPL]DWLRQ RI WKH RXWSXW HQWURS\ DQG WKH PLQLPL]Dn WLRQ RI WKH PXWXDO LQIRUPDWLRQ EHWZHHQ WKH RXWSXWV FDQ EH XVHG LQ %%6 DQG ,&$ SUREn OHPV :H ZLOO GHDO ZLWK WKLV FDVH LQ PRUH GHWDLOV ODWHU 2SWLPL]DWLRQ 7HFKQLTXHV f 7KH %DFN3URSDJDWLRQ $OJRULWKP 7 PAGE 93 1 )RU D 0/3 QHWZRUN GHVFULEHG DERYH WKH 06( FULWHULRQ LV VWLOO A -Qf /HWfV Q O 7 7 ORRN DW D VLPSOH FDVH ZLWK RQO\ RQH RXWSXW QRGH \ IY ] Df Y Y Ycf 7 7 ] ]M ]cf ]c I^ZL[ Ecf L 7KHQ E\ WKH FKDLQ UXOH ZH KDYH GGY 1 G 1 Imf e f L GY A A \mf Q .mf Y f :H FDQ VHH IURP WKLV HTXDWLRQ WKDW WKH NH\ SRLQW KHUH LV KRZ WR FDOFXODWH WKH VHQVLWLYLW\ 4 4 RI WKH QHWZRUN RXWSXW ]\Qf 7KH WHUP fÂ§fÂ§fÂ§-Qf LQ WKH 06( FDVH LV WKH HUURU VLJQDO GY 6\Qf fÂ§fÂ§-Qf HQf \Qf fÂ§ GQf 7KH VHQVLWLYLW\ FDQ WKHQ EH UHJDUGHG DV D PHFKD G\Qf QLVP ZKLFK ZLOO SURSDJDWH WKH HUURU HQf EDFN WR WKH SDUDPHWHUV Y RU ZÂ 7R EH PRUH VSHFLILF ZH KDYH f LI ZH FRQVLGHU WKH UHODWLRQ A \^ fÂ§\f IRU D VLJPRLG IXQFn WLRQ \ I[f H [f DQG DSSO\ WKH FKDLQ WR WKH SUREOHP b^Qf f ^\mfO\mff` Âr =Qf] GY AmfY G] ]mf f ZKHUH f LV WKH RSHUDWRU IRU FRPSRQHQWZLVH PXOWLSOLFDWLRQ 7KH SURFHVV RI f LV D OLQHDU SURFHVV ZKLFK EDFNSURSDJDWH WKURXJK WKH fGXDO QHWZRUNf V\VWHP EDFN WR HDFK SDUDPHWHU DQG WKXV LV FDOOHG fEDFNSURSDJDWLRQf ,I ZH QHHG WR EDFNSURSDJDWH DQ HUURU HQf WKHQ WKH LQ emf RI f ZLOO EH UHSODFHG E\ HQf DQG f ZLOO EH FDOOHG WKH fHUURU EDFNSURSDJDWLRQf $FWXDOO\ WKH fHUURU EDFNSURSDJDWLRQf LV QRWKLQJ EXW WKH JUDGLHQW PHWKRG LPSOHPHQWDWLRQ ZLWK WKH FDOFXODWLRQ RI WKH JUDGLHQW E\ WKH PAGE 94 FKDLQ UXOH DSSOLHG WR WKH QHWZRUN VWUXFWXUH 7KH HIIHFWLYHQHVV RI WKH fEDFNSURSDJDn WLRQf LV LWV ORFDOLW\ LQ FDOFXODWLRQ E\ XWLOL]LQJ WKH WRSRORJ\ RI WKH QHWZRUN ,W LV VLJQLILn FDQW IRU HQJLQHHULQJ LPSOHPHQWDWLRQV )RU D GHWDLOHG GHVFULSWLRQ RQH FDQ UHIHU WR 5XPHOKDUW HWDO >5XE 5XF@ )LJXUH 7KH 7LPH ([WHQVLRQ RI WKH 5HFXUUHQW 1HXUDO 1HWZRUN LQ )LJXUH )RU D G\QDPLF V\VWHP ZLWK GHOD\ FRQQHFWLRQV WKH ZKROH QHWZRUN FDQ EH H[WHQGHG DORQJ WLPH ZLWK WKH GHOD\ FRQQHFWLRQV OLQNLQJ WKH QRGHV EHWZHHQ WLPH VOLFHV 7KH UHFXUUHQW QHXUDO QHWZRUN LQ )LJXUH LV VKRZQ LQ )LJXUH LQ ZKLFK WKH VWUXFWXUH LQ HDFK WLPH VOLFH ZLOO RQO\ FRQWDLQ WKH LQVWDQWDQHRXV FRQQHFWLRQV DQG WKH GHOD\ FRQn QHFWLRQV ZLOO FRQQHFW WKH FRUUHVSRQGLQJ QRGHV EHWZHHQ WLPH VOLFHV 2QFH D G\QDPLF QHWZRUN LV H[WHQGHG LQ WLPH WKH ZKROH VWUXFWXUH FDQ EH UHJDUGHG DV D ODUJH VWDWLF QHW PAGE 95 ZRUN DQG WKH EDFNSURSDJDWLRQ DOJRULWKP FDQ EH DSSOLHG DV XVXDO 7KLV LV VR FDOOHG WKH fEDFNSURSDJDWLRQ WKURXJK WLPHf %377f >:HU :LO +D\@ 7KHUH LV DQRWKHU DOJRULWKP IRU WKH WUDLQLQJ RI G\QDPLF QHWZRUNV ZKLFK LV FDOOHG fUHDO WLPH UHFXUUHQW OHDUQLQJf 575/f >:Â +D\@ %RWK WKH %377 DQG WKH 575/ DUH WKH JUDGLHQW EDVHG PHWKRG DQG ERWK RI WKHP XVH WKH FKDLQ UXOH WR FDOFXODWH WKH JUDGLHQW 7KH GLIIHUn HQFH LV WKDW WKH %377 VWDUWV WKH FKDLQ UXOH IURP WKH HQG RI D WLPH EORFN WR WKH EHJLQn QLQJ RI LW ZKLOH WKH 575/ VWDUWV WKH FKDLQ UXOH IURP WKH EHJLQQLQJ RI D WLPH EORFN WR WKH HQG RI LW UHVXOWLQJ LQ GLIIHUHQFHV RI WKH PHPRU\ FRPSOH[LW\ DQG FRPSXWDWLRQDO FRPSOH[LW\ >+D\@ 1HZWRQfV 0HWKRG 7KH JUDGLHQW PHWKRG LV EDVHG RQ WKH ILUVW RUGHU DSSUR[LPDWLRQ RI WKH SHUIRUPDQFH VXUn IDFH DQG LV VLPSOH %XW LWV FRQYHUJHQFH VSHHG PD\ EH VORZ 1HZWRQfV PHWKRG LV EDVHG RQ WKH VHFRQG RUGHU DSSUR[LPDWLRQ RI WKH SHUIRUPDQFH VXUIDFH DQG WKH FORVHG IRUP RSWLPL]DWLRQ VROXWLRQ WR D TXDGUDWLF IXQFWLRQ )LUVW OHWfV ORRN DW WKH RSWLPL]DWLRQ VROXn WLRQ WR D TXDGUDWLF IXQFWLRQ )[f `M;7$[ fÂ§ K7[ F ZKHUH $ H 5P r P LV V\PPHWULF PDWUL[ LW LV HLWKHU SRVLWLYH GHILQLWH RU QHJDWLYH GHILQLWH K H 5P DQG [ H 5P DUH YHFn WRUV F LV D VFDODU FRQVWDQW 7KHUH LV DQ PD[LPXP VROXWLRQ [ LI $ LV QHJDWLYH GHILQLWH RU WKHUH LV DQ PLQLPXP VROXWLRQ [ LI $ LV SRVLWLYH GHILQLWH ZKHUH LQ ERWK FDVH [ VKRXOG VDWLVI\ WKH OLQHDU HTXDWLRQ -/I[f LH $[ K RU [ $ n] )RU D JHQHUDO FRVW IXQFWLRQ -Zf LWV VHFRQG RUGHU DSSUR[LPDWLRQ DW Z ZQ ZLOO EH -Zf -ZQf 2Z_ZZ,fIZ,fZZ_f f PAGE 96 ZKHUH +ZQf LV WKH +HVVLDQ PDWUL[ RI -Zf DW Z ZQ 6R WKH RSWLPL]DWLRQ SRLQW fÂ§O 4 IRU f LV Z fÂ§ ZQ fÂ§+ZQf =fÂ§-Z f 7KXV ZH KDYH 1HZWRQfV PHWKRG DV IRO RZ ORZV >+HV +D\ :LG@ BO $ f $V SRLQWHG LQ +D\NLQ >+D\@ WKHUH DUH VHYHUDO SUREOHPV IRU 1HZWRQfV PHWKRG WR EH DSSOLHG WR WKH 0/3 WUDLQLQJ )RU LQVWDQFH 1HZWRQfV PHWKRG LQYROYHV WKH FDOFXODWLRQ RI WKH LQYHUVH RI WKH +HVVLDQ PDWUL[ ,W LV FRPSXWDWLRQDOO\ FRPSOH[ DQG WKHUH LV QR JXDUDQWHH WKDW WKH +HVVLDQ PDWUL[ LV QRQVLQJXODU DQG DOZD\V SRVLWLYH RU QHJDWLYH GHILn QLWH )RU D QRQTXDGUDWLF SHUIRUPDQFH VXUIDFH WKHUH LV QR JXDUDQWHH IRU WKH FRQYHUn JHQFH RI 1HZWRQfV PHWKRG 7R RYHUFRPH WKHVH SUREOHPV WKHUH DSSHDU WKH 4XDVL 1HZWRQ PHWKRG >+D\@ DQG WKH FRQMXJDWH JUDGLHQW PHWKRG >+HV +D\@ HWF f 4XDVL1HZWRQ 0HWKRG 7KLV PHWKRG XVHV DQ HVWLPDWH RI WKH LQYHUVH +HVVLDQ PDWUL[ ZLWKRXW WKH FDOFXODWLRQ RI WKH UHDO LQYHUVH 7KLV HVWLPDWH LV JXDUDQWHHG WR EH SRVLWLYH GHILQLWH IRU D PLQLPL]DWLRQ SUREOHP RU QHJDWLYH GHILQLWH IRU D PD[LPL]DWLRQ SUREOHP +RZHYHU WKH FRPSXWDWLRQDO FRPSOH[LW\ LV VWLOO LQ WKH RUGHU RI c9f ZKHUH : LV WKH QXPEHU RI SDUDPHWHUV >+D\@ f 7KH &RQMXJDWH *UDGLHQW 0HWKRG 7KH FRQMXJDWH JUDGLHQW PHWKRG LV EDVHG RQ WKH IDFW WKDW WKH RSWLPDO SRLQW RI D TXDn GUDWLF IXQFWLRQ FDQ EH REWDLQHG E\ D VHTXHQWLDO VHDUFKHV DORQJ WKH VR FDOOHG FRQMXJDWH GLUHFWLRQV UDWKHU WKDQ WKH GLUHFW FDOFXODWLRQ RI WKH LQYHUVH RI WKH +HVVLDQ PDWUL[ 7KHUH LV D JXDUDQWHH WKDW WKH RSWLPDO VROXWLRQ FDQ EH REWDLQHG ZLWKLQ : VWHSV IRU D TXDGUDWLF PAGE 97 IXQFWLRQ : LV WKH QXPEHU RI SDUDPHWHUVf 2QH PHWKRG WR REWDLQ WKH FRQMXJDWH GLUHFn WLRQV LV EDVHG RQ WKH JUDGLHQW GLUHFWLRQV LH WKH PRGLILFDWLRQ RI WKH JUDGLHQW GLUHFn WLRQV PD\ UHVXOW LQ WKH RQH VHW RI FRQMXJDWH GLUHFWLRQV WKXV WKH QDPH fFRQMXJDWH JUDGLHQW PHWKRGf >+HV +D\@ 7KH FRQMXJDWH JUDGLHQW PHWKRG FDQ DYRLG WKH FDOn FXODWLRQ RI WKH LQYHUVH DQG HYHQ WKH HYDOXDWLRQ RI WKH +HVVLDQ PDWUL[ LWVHOI DQG WKXV LV FRPSXWDWLRQDO HIILFLHQW 7KH FRQMXJDWH JUDGLHQW PHWKRG LV SHUKDSV WKH RQO\ VHFRQG RUGHU RSWLPL]DWLRQ PHWKRG ZKLFK FDQ EH DSSOLHG WR ODUJHVFDOH SUREOHPV >+D\@ f 7KH 1DWXUDO *UDGLHQW 0HWKRG :KHQ D SDUDPHWHU VSDFH KDV D FHUWDLQ XQGHUO\LQJ VWUXFWXUH WKH RUGLQDU\ JUDGLHQW RI D IXQFWLRQ GRHV QRW UHSUHVHQW LWV VWHHSHVW GLUHFWLRQ EXW WKH QDWXUDO JUDGLHQW GRHV 7KH EDVLF SRLQW RI WKH QDWXUDO JUDGLHQW PHWKRG LV DV IROORZV >$PD@ )RU D FRVW IXQFWLRQ -Zf LI WKH VPDOO LQFUHPHQWDO YHFWRU GZ LV IL[HG ZLWK LWV OHQJWK LH ?GZ? ZKHUH LV D VPDOO FRQVWDQW WKHQ WKH VWHHSHVW GHVFHQW GLUHFWLRQ RI -Zf LV fÂ§a-Zf DQG WKH VWHHSHVW DVFHQW GLUHFWLRQ LV L-Zf +RZHYHU LI WKH OHQJWK GZ GZ 7 RI GZ LV FRQVWUDLQHG LQ VXFK D ZD\ WKDW WKH TXDGUDWLF IRUP GZf *GZf V ZKHUH LV VR FDOOHG 5LHPDQQLDQ PHWULF WHQVRU ZKLFK LV DOZD\V SRVLWLYH GHILQLWH WKHQ WKH VWHHSHVW GHVFHQW GLUHFWLRQ ZLOO EH fÂ§ A-Zf DQG WKH VWHHSHVW DVFHQW GLUHFWLRQ ZLOO GZ EH ;A-Zf GZ f 7KH ([SHFWDWLRQ DQG 0D[LPL]DWLRQ (0f $OJRULWKP 7KH (0 DOJRULWKP FDQ EH JHQHUDOL]HG DQG VXPPDUL]HG DV WKH IROORZLQJ LQHTXDOLW\ FDOOHG WKH JHQHUDOL]HG (0 LQHTXDOLW\ >;X'@ ZKLFK FDQ EH GHVFULEHG DV IROORZV PAGE 98 )RU D QRQQHJDWLYH IXQFWLRQ I' f A B@ef f fef f 9' f M O N f f ^GL H 5 ` LV WKH GDWD VHW LV WKH SDUDPHWHU VHW ZH KDYH P Hff!$ff ,I f DUJPD[ ffORJA$ f f 7KLV LQHTXDOLW\ VXJJHVWV DQ LWHUDWLYH PHWKRG IRU WKH PD[LPL]DWLRQ RI WKH IXQFWLRQ I' f ZLWK UHVSHFW WR WKH SDUDPHWHUV WKDW LV WKH JHQHUDOL]HG (0 DOJRULWKP DOO IXQFWLRQV Ic' f DQG I' f DUH QRW UHTXLUHG WR EH D SGI IXQFWLRQ DV ORQJ DV WKH\ DUH QRQQHJDWLYH IXQFWLRQVf )LUVW XVH WKH NQRZQ SDUDPHWHUV 4Q WR FDOFXODWH Ic' 4Qf L DQG WKXV ffORJfef f WKLV LV VR FDOOHG H[SHFWDWLRQ VWHS L O QfORJIÂ' f FDQ EH UHJDUGHG DV D JHQHUDOL]HG H[SHFWDWLRQf 6HFRQG ILQG L O WKH PD[LPXP SRLQW A IRU WKH H[SHFWDWLRQ IXQFWLRQ A7\f=f ]fORJIc' f WKLV O LV VR FDOOHG PD[LPL]DWLRQ VWHS 7KH SURFHVV FDQ JR RQ LWHUDWLYHO\ :LWK WKLV LQHTXDOLW\ LW LV QRW GLIILFXOW WR SURYH WKH %DXP(DJRQ LQHTXDOLW\ ZKLFK LV WKH EDVLV IRU WKH WUDLQLQJ RI WKH ZHOO NQRZQ KLGGHQ PDUNRY PRGHO 7KH %DXP(DJRQ LQHn TXDOLW\ FDQ EH VWDWHG DV 3\f!3[f ZKHUH f[f f^[A`f LV D SRO\QRPLDO ZLWK QRQQHJDWLYH FRHIILFLHQWV KRPRJHQHRXV RI GHJUHH G LQ LWV YDULDEOHV [A [ ^[WM` LV D SRLQW LQ WKH GRPDLQ 3' A [cM M DQG G M L [cMfÂ§3[f r IRU DOO \ ^\cM` LV DQRWKHU SRLQW LQ WKH 3' VDWLVI\LQJ M L \D [L O-G[ 3[f M L Â‘ ,I ZH UHJDUG [ DV D SDUDPHWHU VHW WKHQ WKLV LQHTXDOLW\ DOVR VXJJHVWV DQ LWHUDWLYH ZD\ WR PD[LPL]H WKH SRO\QRPLDO 3[f 7KDW LV WKH DERYH \ LV D EHWWHU HVWLPDWLRQ RI SDUDPHWHUV EHWWHU PHDQV PDNHV WKH SRO\QRPLDO ODUJHUf DQG WKH SURFHVV FDQ JR RQ LWHUDWLYHO\ 7KH SRO\QRPLDO FDQ DOVR EH QRQKRPRJH QHRXV EXW ZLWK QRQQHJDWLYH FRHIILFLHQWV 7KLV LV D JHQHUDO UHVXOW ZKLFK KDV EHHQ PAGE 99 DSSOLHG WR WUDLQ VXFK JHQHUDO PRGHO DV WKH PXOWLFKDQQHO KLGGHQ PDUNRY PRGHO >;X'@ ZKHUH WKH FDOFXODWLRQ RI WKH JUDGLHQW 3[f LV VWLOO QHHGHG DQG ZKLFK LV G[LM DFFRPSOLVKHG E\ WKH EDFNSURSDJDWLRQ WKURXJK WLPH 6R WKH IRUZDUG DQG EDFNZDUG DOJRULWKP LQ WKH WUDLQLQJ RI WKH KLGGHQ PDUNRY PRGHO FDQ EH UHJDUGHG DV WKH IRUZDUG SURFHVV DQG EDFNSURSDJDWLRQ WKURXJK WLPH IRU WKH KLGGHQ PDUNRY QHWZRUN >;X'@ 7KH GHWDLOV DERXW WKH (0 DOJRULWKP FDQ EH IRXQG LQ 'HPSVWHU DQG 0F/DFKODQ >'HS 0FO@ *HQHUDO 3RLQW RI 9LHZ ,W FDQ EH VHHQ IURP WKH DERYH WKDW WKHUH DUH YDULHW\ RI OHDUQLQJ FULWHULD 6RPH RI WKHP DUH EDVHG RQ HQHUJ\ TXDQWLWLHV VRPH RI WKHP DUH EDVHG RQ LQIRUPDWLRQWKHRUHWLF PHDn VXUHV ,Q WKLV FKDSWHU D XQLI\LQJ SRLQW RI YLHZ ZLOO EH JLYHQ ,QIR0D[ 3ULQFLSOH ,Q WKH ODWH V /LQVNHU JDYH D UDWKHU JHQHUDO SRLQW RI YLHZ DERXW OHDUQLQJ RU VWDWLVWLn FDO VLJQDO SURFHVVLQJ >/LQ /LQ@ +H SRLQWHG RXW WKDW WKH WUDQVIRUPDWLRQ RI D UDQGRP YHFWRU ; REVHUYHG DW WKH LQSXW OD\HU RI D QHXUDO QHWZRUN WR D UDQGRP YHFWRU < SURGXFHG DW WKH RXWSXW OD\HU RI WKH QHWZRUN VKRXOG EH VR FKRVHQ WKDW WKH DFWLYLWLHV RI WKH QHXURQV LQ WKH RXWSXW OD\HU MRLQWO\ PD[LPL]H LQIRUPDWLRQ DERXW WKH DFWLYLWLHV LQ WKH LQSXW OD\HU 7R DFKLHYH WKLV WKH PXWXDO LQIRUPDWLRQ < ;f EHWZHHQ WKH LQSXW YHFWRU ; DQG WKH RXWSXW YHFWRU < VKRXOG EH XVHG DV WKH FRVW IXQFWLRQ RU FULWHULD IRU WKH OHDUQLQJ SURFHVV RI WKH QHXn UDO QHWZRUN 7KLV LV FDOOHG WKH ,QIR0D[ SULQFLSOH 7KH ,QIR0D[ SULQFLSOH SURYLGHV D PDWKn HPDWLFDO IUDPHZRUN IRU VHOIRUJDQL]DWLRQ RI WKH OHDUQLQJ QHWZRUN WKDW LV LQGHSHQGHQW RI WKH UXOH XVHG IRU LWV LPSOHPHQWDWLRQ 7KLV SULQFLSOH FDQ DOVR EH YLHZHG DV WKH QHXUDO QHW PAGE 100 ZRUN FRXQWHUSDUW RI WKH FRQFHSW RI FKDQQHO FDSDFLW\ ZKLFK GHILQHV WKH 6KDQQRQ OLPLW RQ WKH UDWH RI LQIRUPDWLRQ WUDQVPLVVLRQ WKURXJK D FRPPXQLFDWLRQ FKDQQHO 7KH ,QIR0D[ SULQn FLSOH LV GHSLFWHG LQ WKH IROORZLQJ ILJXUH )LJXUH ,QIR0D[ 6FKHPH :KHQ WKH QHXUDO QHWZRUN RU PDSSLQJ V\VWHP LV GHWHUPLQLVWLF WKH PXWXDO LQIRUPDWLRQ LV GHWHUPLQHG E\ WKH RXWSXW HQWURS\ DV LW FDQ EH VKRZQ E\ ,<;f + PAGE 101 f 0D[LPL]DWLRQ RI WKH 0XWXDO ,QIRUPDWLRQ %HWZHHQ 6FDODU 2XWSXWV $V GHSLFWHG LQ )LJXUH WKH REMHFWLYH RI WKLV OHDUQLQJ VFKHPH LV WR PD[LPL]H WKH PXWXDO LQIRUPDWLRQ EHWZHHQ WZR VFDODU RXWSXWV VXFK WKDW WKH RXWSXW \D ZLOO FRQYH\ PRVW LQIRUPDWLRQ DERXW \E DQG YLFH YHUVD 7KH H[DPSOH RI WKLV VFKHPH LV WKH VSDWLDOO\ FRKHUHQW IHDWXUH H[WUDFWRU >%HF %HF +D\@ ZKHUH DV GHSLFWHG LQ )LJXUH WKH WUDQVIRUPDWLRQ RI D SDLU RI YHFWRUV ;D DQG ;E UHSUHVHQWLQJ DGMDFHQW QRQRYHUODSn SLQJ UHJLRQV RI DQ LPDJH E\ D QHXUDO V\VWHPf VKRXOG EH VR FKRVHQ WKDW WKH VFDODU RXWSXW \D RI WKH V\VWHP GXH WR WKH LQSXW ;D PD[LPL]HV LQIRUPDWLRQ DERXW WKH VHFRQG VFDODU RXWSXW \E GXH WR ;E )LJXUH 0D[LPL]DWLRQ RI WKH 0XWXDO ,QIRUPDWLRQ EHWZHHQ 6FDODU 2XWSXWV )LJXUH 3URFHVVLQJ RI WZR 1HLJKERULQJ 5HJLRQV RI DQ ,PDJH PAGE 102 )LJXUH 0LQLPL]DWLRQ RI WKH 0XWXDO ,QIRUPDWLRQ EHWZHHQ 6FDODU 2XWSXWV f 0LQLPL]DWLRQ WKH 0XWXDO ,QIRUPDWLRQ EHWZHHQ 6FDODU 2XWSXWV 6LPLODU WR WKH SUHYLRXV VFKHPH WKLV VFKHPH LV WU\LQJ WR PDNH WKH WZR VFDODU RXWSXWV WR EH WKH PRVW LUUHOHYDQW 7KH H[DPSOH RI WKLV VFKHPH LV WKH VSDWLDOO\ LQFRKHUHQW IHDWXUH H[WUDFWRU >8NU +D\@ $V GHSLFWHG LQ )LJXUH WKH WUDQVIRUPDWLRQ RI D SDLU RI LQSXW YHFWRUV ;D DQG ;E UHSUHVHQWLQJ GDWD GHULYHG IURP FRUUHVSRQGLQJ UHJLRQV LQ D SDLU RI VHSDUDWH LPDJHV E\ D QHXUDO V\VWHP VKRXOG EH VR FKRVHQ WKDW WKH VFDODU RXWSXW \D GXH WR WKH LQSXW ;D PLQLPL]H LQIRUPDWLRQ DERXW WKH VHFRQG VFDODU RXWSXW \E GXH WR WKH LQSXW ;E DQG YLFH YHUVD )LJXUH 6SDFLDOO\ ,QFKRKHUHQW )HDWXUH ([WUDFWLRQ PAGE 103 )LJXUH 0LQLPL]DWLRQ 0XWXDO ,QIRUPDWLRQ DPRQJ 2XWSXWV f 6WDWLVWLFDO ,QGHSHQGHQFH EHWZHHQ 2XWSXWV 7KLV VFKHPH UHTXLUHV WKDW DOO WKH RXWSXWV RI WKH V\VWHP DUH LQGHSHQGHQW ZLWK HDFK RWKHU 7KH H[DPSOHV IRU WKLV VFKHPH DUH DOO V\VWHPV IRU %OLQG 6RXUFH 6HSDUDWLRQ DQG ,QGHSHQGHQW &RPSRQHQW $QDO\VLV GHVFULEHG LQ WKH SUHYLRXV FKDSWHUV ZKHUH XVXDOO\ WKH V\VWHPV DUH IXOO UDQN OLQHDU QHWZRUNV 'HVLUHG 6LJQDO )LJXUH $ *HQHUDO /HDUQLQJ )UDPHZRUN PAGE 104 $ *HQHUDO 6FKHPH $V FDQ EH VHHQ IURP WKH DERYH DOO WKH H[LVWLQJ OHDUQLQJ VFKHPHV DUH E\ QR PHDQV JHQn HUDO 7KH ,QIR0D[ SULQFLSOH GHDOV ZLWK RQO\ WKH PXWXDO LQIRUPDWLRQ EHWZHHQ WKH LQSXW DQG WKH RXWSXW DOWKRXJK LW PRWLYDWHG WKH DQDO\VLV RI D OHDUQLQJ SURFHVV IURP LQIRUPDWLRQWKHRn UHWLF DQJOH 7KH RWKHU VFKHPHV VXPPDUL]HG E\ +D\NLQ DUH DOVR VRPH VSHFLILF FDVHV HYHQ ZLWK WKH OLPLWDWLRQ RI PRGHO OLQHDULW\ DQG *DXVVLDQ DVVXPSWLRQ 7KHVH OHDUQLQJ VFKHPHV KDYH QRW FRQVLGHUHG WKH FDVH ZLWK H[WHUQDO WHDFKHU VLJQDOV LH WKH VXSHUYLVHG OHDUQLQJ FDVH ,Q RUGHU WR XQLI\ DOO WKH VFKHPHV D JHQHUDO OHDUQLQJ IUDPHZRUN LV SURSRVHG KHUH $V GHSLFWHG LQ )LJXUH WKLV JHQHUDO OHDUQLQJ VFKHPH LV QRWKLQJ EXW WKH DEVWUDFW DQG JHQHUDO OHDUQLQJ PRGHO GHVFULEHG LQ &KDSWHU ZLWK WKH VSHFLILFDWLRQ RI WKH OHDUQLQJ PHFKDQLVP DV WKH RSWLPL]DWLRQ RI WKH LQIRUPDWLRQ PHDVXUH EDVHG RQ WKH UHVSRQVH RI WKH OHDUQLQJ V\VWHP < DQG WKH GHVLUHG RU WHDFKHU VLJQDO ,I WKH GHVLUHG VLJQDO LV WKH LQSXW VLJQDO ; DQG WKH LQIRUPDWLRQ PHDVXUH LV WKH PXWXDO LQIRUPDWLRQ WKHQ WKLV VFKHPH GHJHQn HUDWH WKH ,QIR0D[ SULQFLSOH ,I WKH GHVLUHG VLJQDO LV RQH RU VRPH RI WKH RXWSXW VLJQDOV WKHQ WKLV VFKHPH GHJHQHUDWHV WKH VFKHPHV VXPPDUL]HG E\ +D\NLQ DQG WKH FDVH RI %%6 DQG ,&$ (YHU IRU D VXSHUYLVHG OHDUQLQJ FDVH ZKHUH WKHUH LV DQ H[WHUQDO WHDFKHU VLJQDO WKH PXWXDO LQIRUPDWLRQ EHWZHHQ WKH UHVSRQVH RI WKH OHDUQLQJ V\VWHP < DQG WKH GHVLUHG VLJn QDO FDQ EH PD[LPL]HG XQGHU WKLV VFKHPH 7KDW PHDQV LQ JHQHUDO WKH SXUSRVH RI OHDUQn LQJ LV WR WUDQVPLW DV PXFK LQIRUPDWLRQ DERXW WKH GHVLUHG VLJQDO DV SRVVLEOH LQ WKH RXWSXW RU UHVSRQVH RI WKH OHDUQLQJ V\VWHP < 7KH H[WHQVLYHO\ XVHG 06( FULWHULRQ WKLV VFKHPH LV VWLOO FRQWDLQHG LQ WKLV VFKHPH ZKHUH WKH GLIIHUHQFH VLJQDO RU HUURU VLJQDO PAGE 105 ,Q WKLV OHDUQLQJ VFKHPH WKH VXSHUYLVHG OHDUQLQJ FDQ EH GHILQHG DV WKH FDVH ZLWK DQ H[WHUQDO GHVLUHG VLJQDO ,Q WKLV FDVH WKH RUGHU RI WKH OHDUQLQJ V\VWHP DSSHDUV VXFK WKDW LWV UHVSRQVH EHVW UHSUHVHQWV WKH GHVLUHG VLJQDO ,I WKH GHVLUHG VLJQDO LV HLWKHU WKH LQSXW RI WKH V\VWHP RU WKH RXWSXW RI WKH V\VWHP WKLV VFKHPH EHFRPHV XQVXSHUYLVHG OHDUQLQJ ZKHUH WKH V\VWHP ZLOO VHOIRUJDQL]H VXFK WKDW HLWKHU WKH RXWSXW VLJQDO EHVW UHSUHVHQW WKH LQSXW VLJQDO RU WKH RXWSXWV DUH LQGHSHQGHQW ZLWK HDFK RWKHU RU KLJKO\ UHODWHG ZLWK HDFK RWKHU 7KH IROn ORZLQJ ZLOO JLYH WZR VSHFLILF FDVHV RI WKLV JHQHUDO SRLQW RI YLHZ /HDUQLQJ DV ,QIRUPDWLRQ 7UDQVPLVVLRQ /DYHUEY/DYHU )RU D OD\HUHG QHWZRUN HDFK OD\HU FDQ LWVHOI EH UHJDUGHG DV D OHDUQLQJ V\VWHP 7KH ZKROH V\VWHP LV WKH FRQFDWHQDWLRQ RI HDFK OD\HU )URP WKH DERYH JHQHUDO SRLQW RI YLHZ LI WKH GHVLUHG VLJQDO LV HLWKHU DQ H[WHUQDO RQH RU WKH LQSXW VLJQDO WKHQ HDFK OD\HU VKRXOG VHUYH WKH VDPH SXUSRVH IRU WKH OHDUQLQJ DV WR WUDQVPLW DV PXFK LQIRUPDWLRQ DERXW WKH GHVLUHG VLJn QDO DV SRVVLEOH ,Q WKLV ZD\ WKH ZKROH OHDUQLQJ SURFHVV LV EURNHQ GRZQ WR VHYHUDO VPDOO VFDOH OHDUQLQJ SURFHVVHV DQG HDFK VPDOO OHDUQLQJ SURFHVV FDQ SURFHHG VHTXHQWLDOO\ 7KLV LV DQ DOWHUQDWLYH OHDUQLQJ VFKHPH IRU D OD\HUHG QHWZRUN ZKHUH WKH EDFNSURSDJDWLRQ OHDUQLQJ PAGE 106 DOJRULWKP KDV GRPLQDWHG IRU PRUH WKDQ \HDUV 7KH OD\HUE\OD\HU OHDUQLQJ VFKHPH PD\ VLPSOLI\ WKH ZKROH OHDUQLQJ SURFHVV DQG VKHG PRUH OLJKW LQWR WKH HVVHQFH RI WKH OHDUQLQJ SURFHVV LQ WKLV FDVH 7KH VFKHPH LV VKRZQ LQ WKH IROORZLQJ ILJXUH ([DPSOHV RI WKH DSSOLn FDWLRQ RI VXFK OHDUQLQJ VFKHPH ZLOO EH JLYHQ LQ &KDSWHU ,QIRUPDWLRQ )LOWHULQJ )LOWHULQJ EH\RQG 6SHFWUXP 7UDGLWLRQDO ILOWHULQJ LV EDVHG RQ WKH VSHFWUXP LH DQ HQHUJ\ TXDQWLW\ 7KH EDVLF LQWHUn HVW RI WUDGLWLRQDO ILOWHULQJ LV WR ILQG VRPH VLJQDO FRPSRQHQWV RU VLJQDO VXEVSDFH DFFRUGLQJ WR WKH VSHFWUXP )URP WKH LQIRUPDWLRQWKHRUHWLF SRLQW RI YLHZ WKH VLJQDO FRPSRQHQWV RU VLJQDO VXEVSDFH OLQHDU RU QRQOLQHDU VKRXOG EH FKRVHQ QRW LQ WKH GRPDLQ RI WKH VSHFWUXP EXW LQ WKH GRPDLQ RI fWKH VLJQDO LQIRUPDWLRQ VWUXFWXUHf $ VLJQDO PD\ FRQWDLQ YDULRXV NLQGV RI LQIRUPDWLRQ 7KH OLVW RI YDULRXV LQIRUPDWLRQ ZLOO EH VR FDOOHG fLQIRUPDWLRQ VSHFn WUXPf ,W LV PRUH GHVLUHG WR FKRRVH VLJQDO FRPSRQHQWV RU VXEVSDFH DFFRUGLQJ WR VXFK fLQIRUPDWLRQ VSHFWUXPf WKDQ WR FKRRVH VLJQDO FRPSRQHQWV DFFRUGLQJ WR WKH HQHUJ\ VSHFn WUXP ZKLFK LV WKH WUDGLWLRQDO ZD\ RI ILOWHULQJ 7KH LGHD RI WKH LQIRUPDWLRQ ILOWHULQJ SURn SRVHG KHUH ZLOO JHQHUDOL]H WKH WUDGLWLRQDO ZD\ RI ILOWHULQJ DQG EULQJ PRUH SRZHUIXO WRROV LQ WKH VLJQDO SURFHVVLQJ DUHD ([DPSOHV RI LQIRUPDWLRQ ILOWHULQJ DSSOLFDWLRQ WR SRVH HVWLPDn WLRQ RI 6$5 V\QWKHWLF DSHUWXUH UDGDUf LPDJH ZLOO EH JLYHQ LQ &KDSWHU /HDUQLQJ E\ ,QIRUPDWLRQ )RUFH 7KH JHQHUDO SRLQW RI YLHZ LV LPSRUWDQW EXW WKH SUDFWLFDO LPSOHPHQWDWLRQ LV PRUH FKDOn OHQJLQJ ,Q WKLV VHFWLRQ ZH ZLOO VHH KRZ WKH JHQHUDO OHDUQLQJ VFKHPH FDQ EH LPSOHPHQWHG RU IXUWKHU VSHFLILHG E\ XVLQJ WKH SRZHUIXO WRRO RI WKH LQIRUPDWLRQ SRWHQWLDO DQG WKH FURVV LQIRUPDWLRQ SRWHQWLDO 7KH JHQHUDO OHDUQLQJ VFKHPH FDQ EH GHSLFWHG DV PAGE 107 )LJXUH 7KH *HQHUDO /HDUQLQJ 6FKHPH E\ ,QIRUPDWLRQ 3RWHQWLDO ,Q WKH JHQHUDO OHDUQLQJ VFKHPH GHSLFWHG LQ )LJXUH LI WKH LQIRUPDWLRQ PHDVXUH XVHG LV WKH HQWURS\ WKHQ WKH LQIRUPDWLRQ SRWHQWLDO FDQ EH XVHG LI WKH LQIRUPDWLRQ PHDVXUH LV WKH PXWXDO LQIRUPDWLRQ WKHQ WKH FURVV LQIRUPDWLRQ SRWHQWLDO FDQ EH XVHG 6R WKH LQIRUn PDWLRQ SRWHQWLDO LQ )LJXUH LV D JHQHUDO WHUP ZKLFK VWDQGV IRU ERWK WKH QDUURZ VHQVH LQIRUPDWLRQ SRWHQWLDO DQG WKH FURVV LQIRUPDWLRQ SRWHQWLDO :H PD\ FDOO VXFK D JHQHUDO WHUP DV WKH JHQHUDO LQIRUPDWLRQ SRWHQWLDO *LYHQ D VHW RI HQYLURQPHQWDO GDWD ^r}f GQff?Q 1` WKHUH ZLOO EH WKH UHVSRQVH GDWD VHW ^\Qf?Q 1` \Qf T[Qf Zf WKHQ WKH JHQHUDO LQIRUPDWLRQ SRWHQWLDO 9 ^\^Qf`f FDQ EH FDOFXODWHG DFFRUGLQJ WR WKH IRUPXOD LQ &KDSWHU 7R RSWLPL]H 9^\Qf`f WKH JUDGLHQW PHWKRG FDQ EH XVHG 7KHQ WKH JUDGLHQW RI 9^\Qf`f ZLWK UHVSHFW WR WKH SDUDPHWHUV RI WKH OHDUQLQJ V\VWHP DQG WKH OHDUQLQJ RI WKH V\VWHP ZLOO EH 9^\Qf`f =IO7U PUWn2Q-r}f RZ f L G\Qf GZ Z ZsU?A9^\^Qf`f RZ f PAGE 108 $V GHVFULEHG LQ &KDSWHU fÂ§ ) ^!!mf`f LV WKH LQIRUPDWLRQ IRUFH WKDW WKH LQIRUPD G\^Qf WLRQ SDUWLFOH \Qf UHFHLYHV LQ WKH LQIRUPDWLRQ SRWHQWLDO ILHOG $V SRLQWHG RXW LQ WKH DERYH fÂ§\Qf LV WKH VHQVLWLYLW\ RI WKH OHDUQLQJ QHWZRUN RXWSXW DQG LW VHUYHV DV WKH PHFKDQLVP RI RZ HUURU EDFNSURSDJDWLRQ LQ WKH HUURU EDFNSURSDJDWLRQ DOJRULWKP +HUH f FDQ EH LQWHUn SUHWHG DV fLQIRUPDWLRQ IRUFH EDFNSURSDJDWLRQf 6R IURP D SK\VLFDO SRLQW RI YLHZ VXFK DV D PDVVHQHUJ\ SRLQW RI YLHZ WKH OHDUQLQJ VWDUWV IURP WKH LQIRUPDWLRQ SRWHQWLDO ILHOG ZKHUH HDFK LQIRUPDWLRQ SDUWLFOH UHFHLYHV WKH LQIRUPDWLRQ IRUFH IURP WKH ILHOG ZKLFK WKHQ WUDQVPLWV WKURXJK WKH QHWZRUN WR WKH SDUDPHWHUV VR DV WR GULYH WKHP WR D VWDWH ZKLFK ZLOO PDNH WKH LQIRUPDWLRQ SRWHQWLDO EH RSWLPL]HG 7KH LQIRUPDWLRQ IRUFH EDFNSURSDJDWLRQ LV LOOXVWUDWHG LQ )LJXUH ZKHUH WKH QHWZRUN IXQFWLRQV DV D fOHYHUf ZKLFK FRQQHFWV WKH SDUDPHWHUV DQG GDWD VDPSOHV LQIRUPDWLRQ SDUWLFOHVf DQG WUDQVPLW WKH IRUFH WKDW WKH ILHOG LPSLQJHV RQ WKH LQIRUPDWLRQ SDUWLFOHV WR WKH SDUDPHWHUV f,QIRUPDWLRQ )RUFHf A9L\Qfff Â‘ Â‘ f 1HWZRUN 'DWD 6DPSOH )LJXUH ,OOXVWUDWLRQ RI ,QIRUPDWLRQ )RUFH %DFN3URSDJDWLRQ 3DUDPHWHUV 'LVFXVVLRQ RI *HQHUDOL]DWLRQ E\ /HDUQLQJ 7KH EDVLF SXUSRVH RI OHDUQLQJ LV WR JHQHUDOL]H $V SRLQWHG RXW LQ &KDSWHU JHQHUDOL]Dn WLRQ LV QRWKLQJ EXW WR PDNH IXOO XVH RI WKH LQIRUPDWLRQ JLYHQ QHLWKHU OHVV QRU PRUH 6LPLODU PAGE 109 SRLQW RI YLHZ FDQ EH IRXQG LQ &KULVWHQVHQ >&KU SDJH YLL@ ZKHUH KH SRLQWHG RXW f7KH JHQHUDOL]DWLRQV VKRXOG UHSUHVHQW DOO RI WKH LQIRUPDWLRQ ZKLFK LV DYDLODEOH 7KH JHQHUDOL]Dn WLRQV VKRXOG UHSUHVHQW QR PRUH LQIRUPDWLRQ WKDQ LV DYDLODEOHf7GHDV RI WKLV NLQG DUH IRXQG LQ DQFLHQW ZLVGRP 7KH DQFLHQW &KLQHVH SKLORVRSKHU &RQIXFLXV SRLQWHG RXW f6D\ fNQRZf ZKHQ \RX NQRZ VD\ fGRQfW NQRZf ZKHQ \RX GRQfW NQRZ WKDW LV WKH UHDO NQRZOHGJHf $OWKRXJK &RQIXFLXVf ZRUG LV DERXW WKH ULJKW DWWLWXGH WKDW D VFKRODU VKRXOG WDNH ZKHQ ZH DUH WKLQNLQJ DERXW PDFKLQH OHDUQLQJ WRGD\ WKLV LV VWLOO WKH ULJKW fDWWLWXGHf WKDW D PDFKLQH VKRXOG WDNH LQ RUGHU WR REWDLQ LQIRUPDWLRQ IURP LWV HQYLURQPHQW 7KH LQIRUPDWLRQ SRWHQWLDO SURYLGHV D SRZHUIXO WRRO WR DFKLHYH WKH EDODQFH RI PDNLQJ IXOO XVH RI JLYHQ LQIRUPDWLRQ ZKLOH DYRLGLQJ H[SOLFLW RU LPSOLFLW DVVXPSWLRQV WKDW DUH QRW JLYHQ 7R EH PRUH VSHFLILF WKH LQIRUPDWLRQ SRWHQWLDO GRHV QRW UHO\ RQ DQ\ H[WHUQDO DVVXPSWLRQ DQG LWV IRUPXODWLRQ WHOOV XV WKDW LW H[DPLQHV HDFK SDLU RI GDWD H[WUDFWLQJ PRUH GHWDLOHG LQIRUPDWLRQ IURP WKH GDWD VHW WKDQ WKH WUDGLWLRQDO 06( FULWHULRQ ZKHUH RQO\ WKH UHODWLYH SRVLWLRQ EHWZHHQ HDFK GDWD VDPSOH DQG WKHLU PHDQ LV FRQVLGHUHG DQG WKH UHODWLYH SRVLWLRQ RI HDFK SDLU RI GDWD VDPSOHV LV REYLRXVO\ LJQRUHG DQG WKXV WKH\ FDQ EH WUHDWHG LQGHSHQGHQWO\ ,Q WKLV DVSHFW WKH LQIRUPDWLRQ SRWHQWLDO LV VLPLODU WR WKH VXSSRUWLQJ YHFWRU PDFKLQH >9DS &RU@ ZKHUH D PD[LPXP PDUJLQ LV SXUVXHG IRU D OLQHDU FODVVLILHU DQG IRU WKLV SXUSRVH WKH GHWDLOHG GDWD GLVWULEXWLRQ LQIRUPDWLRQ LV DOVR QHHGHG 7KH VXSSRUWLQJ YHFWRU PDFKLQH KDV VKRZQ WR KDYH D YHU\ JRRG JHQHUDOL]DWLRQ DELOLW\ 7KH H[SHULPHQWDO UHVXOWV LQ &KDSWHU ZLOO DOVR VKRZ WKDW WKH LQIRUPDWLRQ SRWHQWLDO KDYH D YHU\ JRRG JHQHUn DOL]DWLRQ WRR DQG HYHQ EHWWHU UHVXOW WKDQ VXSSRUWLQJ YHFWRU PDFKLQH PAGE 110 &+$37(5 /($51,1* :,7+ 21/,1( /2&$/ 58/( $ &$6( 678'< 21 *(1(5$/,=(' (,*(1'(&20326,7,21 ,Q WKLV FKDSWHU WKH LVVXH RI OHDUQLQJ ZLWK RQOLQH ORFDO UXOHV ZLOO EH GLVFXVVHG $V SRLQWHG RXW LQ &KDSWHU OHDUQLQJ RU DGDSWLYH HYROXWLRQ RI D V\VWHP FDQ KDSSHQ ZKHQHYHU WKHUH DUH GDWD IORZLQJ LQWR WKH V\VWHP DQG WKXV VKRXOG EH RQOLQH )RU D ELRORJLFDO QHXUDO QHWZRUN WKH VWUHQJWK RI D V\QDSWLF FRQQHFWLRQ ZLOO HYROYH RQO\ ZLWK LWV LQSXW DQG RXWSXW DFWLYLWLHV )RU D OHDUQLQJ PDFKLQH DOWKRXJK WKH IHDWXUHV RI fRQOLQHf DQG fORFDOLW\f PD\ QRW EH QHFHVVDU\ LQ VRPH FDVHV D V\VWHP ZLWK VXFK IHDWXUHV ZLOO FHUWDLQO\ EH PXFK PRUH DSSHDOLQJ 7KH +HEELDQ UXOH LV WKH ZHOONQRZQ SRVWXODWHG UXOH IRU WKH DGDSWDWLRQ RI D QHX URELRORJLFDO V\VWHP >+HK@ +HUH LW ZLOO EH VKRZQ KRZ WKH +HEELDQ UXOH DQG WKH DQWL +HEELDQ UXOH FDQ EH PDWKHPDWLFDOO\ UHODWHG WR WKH HQHUJ\ DQG FURVV FRUUHODWLRQ RI D VLJQDO DQG KRZ WKHVH VLPSOH UXOHV FDQ EH FRPELQHG WRJHWKHU WR DFKLHYH RQOLQH ORFDO DGDSWDWLRQ IRU D SUREOHP DV LQWULFDWH DV JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ :H ZLOO DJDLQ VHH WKH UROH RI WKH PDVVHQHUJ\ FRQFHSW (QHUJ\ &RUUHODWLRQ DQG 'HFRUUHODWLRQ IRU /LQHDU 0RGHO ,Q &KDSWHU D OLQHDU PRGHO LV LQWURGXFHG ZKHUH WKH LQSXWRXWSXW UHODWLRQ LV IRUPXn ODWHG LQ f DQG WKH V\VWHP LV LOOXVWUDWHG LQ )LJXUH ,Q WKH IROORZLQJ LW ZLOO EH VKRZQ KRZ WKH HQHUJ\ PHDVXUH RI D OLQHDU PRGHO FDQ EH UHODWHG WR +HEELDQ DQG DQWL+HEELDQ OHDUQLQJ UXOH PAGE 111 6LJQDO 3RZHU 4XDGUDWLF )RUP &RUUHODWLRQ +HEELDQ DQG $QWL+HEELDQ /HDUQLQJ f 7 ,Q )LJXUH WKH RXWSXW VLJQDO LQ WKH LWK QRGH LV \Â Zc [ 6R JLYHQ D GDWD VHW ^[Qf Q 1f WKH SRZHU RI WKH RXWSXW VLJQDO \c LV WKH TXDGUDWLF IRUP 3 ; 6 (^[[7` -M PAGE 112 6DPSOH E\ 6DPSOH 0RGH $ZÂmfRF\mf0} +HEELDQ 1 $ZcQf FF A \cQf[Qf TF 6Zc %DWFK 0RGH f Q $:Qf FF fÂ§ 1 6DPSOH E\ 6DPSOH 0RGH $QWL+HEELDQ f $:Qf RF fÂ§ A \mf[mf TF fÂ§%DWFK 0RGH Q O ZKHUH WKH DGMXVWPHQW RI WKH SURMHFWLRQ Zc VKRXOG EH SURSRUWLRQDO WR WKH LQSXW DQG RXWSXW VLJQDO FRUUHODWLRQV IRU +HEELDQ OHDUQLQJ RU WKH QHJDWLYH RI WKH FRUUHODWLRQ IRU WKH DQWL +HEELDQ OHDUQLQJf 6R WKH GLUHFWLRQ RI +HEELDQ EDWFK OHDUQLQJ LV DFWXDOO\ WKH GLUHFWLRQ RI WKH IDVWHVW DVFHQW LQ WKH SRZHU ILHOG RI WKH RXWSXW VLJQDO ZKLOH WKH DQWL+HEELDQ EDWFK OHDUQLQJ PRYHV WKH V\VWHP ZHLJKWV LQ WKH GLUHFWLRQ RI WKH IDVWHVW GHVFHQW RI WKH SRZHU ILHOG 7KH VDPSOHE\VDPSOH +HEELDQ DQG DQWL+HEELDQ OHDUQLQJ UXOHV DUH MXVW WKH VWRFKDVn WLF YHUVLRQV IRU WKHLU FRUUHVSRQGLQJ EDWFK PRGH OHDUQLQJ UXOHV +HQFH WKHVH VLPSOH UXOHV DUH DEOH WR VHHN ERWK WKH GLUHFWLRQV RI WKH VWHHSHVW DVFHQW DQG GHVFHQW LQ WKH LQSXW SRZHU ILHOG XVLQJ RQO\ ORFDO LQIRUPDWLRQ /DWHUDO ,QKLELWLRQ &RQQHFWLRQV $QWL+HEELDQ /HDUQLQJ DQG 'HFRUUHODWLRQ /DWHUDO LQKLELWLRQ FRQQHFWLRQV DGDSWHG ZLWK WKH DQWL+HEELDQ OHDUQLQJ DUH NQRZQ WR GHFRUUHODWH VLJQDOV $V VKRZQ LQ )LJXUH F LV WKH ODWHUDO LQKLELWLRQ FRQQHFWLRQ IURP \? WR \M \c \c \M F\? \M Â‘ 7KH FURVVFRUUHODWLRQ EHWZHHQ \c DQG \M LV DV f QRWH WKH XSSHU & GHQRWHV WKH FURVVFRUUHODWLRQ DQG WKH ORZHU F GHQRWHV WKH ODWHUDO LQKLELWLRQ FRQQHFWLRQVf PAGE 113 &\c\Mf AmfAmf FM\Qf \eM\Qf\`ULf f )LJXUH /DWHUDO ,QKLELWLRQ &RQQHFWLRQ $VVXPH WKH HQHUJ\ RI WKH VLJQDO \c 9\mf LV DOZD\V JUHDWHU WKDQ 7KHQ WKHUH Q DOZD\V H[LVWV D YDOXH F -Â22W\22 ;7mf $ Â‘ 9 Q f ZKLFK ZLOO PDNH &\c \ f LH GHFRUUHODWH VLJQDO \W DQG \ 7KH DQWL+HEELDQ OHDUQLQJ UHTXLUHV WKH DGMXVWPHQW RI F WR EH SURSRUWLRQDO WR WKH QHJDn WLYH RI WKH FURVVFRUUHODWLRQ EHWZHHQ WKH RXWSXW VLJQDOV DV f VKRZV n$F fÂ§U_&\cQf\MQff 6DPSOH E\ 6DPSOH PRGH $F fÂ§Y?&\Â\Mf mf!!mf %DWFK 0RGH Q ZKHUH UM LV WKH OHDUQLQJ VWHS VL]H $FFRUGLQJO\ ZH KDYH f IRU WKH EDWFK PRGH $& $Ffe! Kf U?(& ( \WQ I f f f PAGE 114 ,W LV REYLRXV WKDW LV WKH RQO\ IL[HG VWDEOH DWUDFWRU IRU WKH G\QDPLF SURFHVV G&GW fÂ§(& 6R WKH DQWL+HEELDQ OHDUQLQJ ZLOO FRQYHUJH WR GHFRUUHODWH WKH VLJQDOV DV ORQJ DV WKH OHDUQLQJ VWHS VL]H S LV VPDOO HQRXJK 6XPPDUL]LQJ WKH DERYH ZH FDQ VD\ WKDW IRU D OLQHDU SURMHFWLRQ WKH +HEELDQ OHDUQLQJ WHQGV WR PD[LPL]H WKH RXWSXW HQHUJ\ ZKLOH WKH DQWL+HEELDQ OHDUQLQJ WHQGV WR PLQLPL]H WKH RXWSXW HQHUJ\ DQG IRU D ODWHUDO LQKLELWLRQ FRQQHFWLRQ WKH DQWL+HEELDQ OHDUQLQJ WHQGV WR PLQLPL]H WKH FURVVFRUUHODWLRQ EHWZHHQ WKH WZR RXWSXW VLJQDOV (LJHQGHFRPSRVLWLRQ DQG *HQHUDOL]HG (LJHQGHFRPSRVLWLRQ (LJHQGHFRPSRVLWLRQ DQG JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ DULVH QDWXUDOO\ LQ PDQ\ VLJn QDO SURFHVVLQJ SUREOHPV )RU LQVWDQFH SULQFLSDO FRPSRQHQW DQDO\VLV 3&$f LV EDVLFDOO\ DQ HLJHQYDOXH SUREOHP ZLWK ZLGH DSSOLFDWLRQ LQ GDWD FRPSUHVVLRQ IHDWXUH H[WUDFWLRQ DQG RWKHU DUHDV >.XQ 'LD@ DV DQRWKHU H[DPSOH )LVKHU OLQHDU GLVFULPLQDQW DQDO\VLV /'$f LV D JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ SUREOHP >'XG ;X'@ VLJQDO GHWHFWLRQ DQG HQKDQFHPHQW >'LD@ DQG HYHQ EOLQG VRXUFH VHSDUDWLRQ >6RX@ FDQ DOVR EH UHODWHG WR RU IRUPXODWHG DV DQ HLJHQGHFRPSRVLWLRQ RU JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ $OWKRXJK WKH VROXWLRQV EDVHG RQ QXPHULFDO PHWKRGV KDYH EHHQ ZHOO VWXGLHG >*RO@ DGDSWLYH RQOLQH VROXWLRQV DUH PRUH GHVLUDEOH LQ PDQ\ FDVHV >'LD@ $GDSWLYH RQOLQH VWUXFWXUHV DQG PHWKn RGV VXFK DV 2MDfV UXOH >2MD@ DQG WKH $3(; UXOH >.XQ@ HPHUJHG LQ WKH SDVW GHFDGH WR VROYH WKH HLJHQGHFRPSRVLWLRQ SUREOHP +RZHYHU WKH VWXG\ RI DGDSWLYH RQOLQH PHWKRGV IRU JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ LV IDU IURP VDWLVIDFWRU\ 0DR DQG -DLQ >0DR@ XVH WZR VWHSV 3&$ IRU /'$ ZKLFK LV FOXPV\ DQG QRW HIILFLHQW 3ULQFLSH DQG ;X >3UD 3UE@ RQO\ GLVFXVV WKH WZRFODVV FRQVWUDLQHG /'$ FDVH 'LDPDQWDUDV DQG .XQJ >'LD@ GHVFULEH WKH SUREOHP DV RULHQWHG 3&$ DQG SUHVHQW WKH UXOH RQO\ IRU WKH ODUJHVW JHQHUDOL]HG PAGE 115 HLJHQYDOXH DQG LWV FRUUHVSRQGLQJ HLJHQYHFWRU 0RUH UHFHQWO\ &KDWWHLMHH HWDO >&KD@ IRUn PXODWH /'$ IURP WKH SRLQW RI YLHZ RI KHWHURDVVRFLDWLRQ DQG SURYLGHG DQ LWHUDWLYH VROXWLRQ ZLWK WKH SURRI RI FRQYHUJHQFH IRU LWV RQOLQH YHUVLRQ %XW WKH PHWKRG GRHV QRW XVH ORFDO FRPSXWDWLRQV DQG LV VWLOO FRPSXWDWLRQDOO\ FRPSOH[ +HQFH D V\VWHPDWLF RQOLQH ORFDO DOJRn ULWKP IRU WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ LQ QRW SUHVHQWO\ DYDLODEOH ,Q WKLV FKDSWHU DQ RQOLQH ORFDO UXOH WR DGDSW ERWK WKH IRUZDUG DQG ODWHUDO FRQQHFWLRQV RI D VLQJOH OD\HU QHWn ZRUN LV SURSRVHG ZKLFK SURGXFHV JHQHUDOL]HG HLJHQYDOXHV DQG WKH FRUUHVSRQGLQJ HLJHQn YHFWRUV LQ GHVFHQGLQJ RUGHUV 7KH SUREOHP RI WKH HLJHQGHFRPSRVLWLRQ DQG WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ ZLOO EH IRUPXODWHG KHUH LQ D GLIIHUHQW ZD\ ZKLFK ZLOO OHDG WR WKH SURn SRVHG VROXWLRQV $Q LQIRUPDWLRQWKHRUHWLF SUREOHP IRUPXODWLRQ IRU WKH HLJHQGHFRPSRVLn WLRQ DQG WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ ZLOO EH JLYHQ LQ WKH IROORZLQJ ILUVW DQG WKHQ WKH IRUPXODWLRQ EDVHG RQ WKH HQHUJ\ PHDVXUHV IRU HLJHQGHFRPSRVLWLRQ DQG WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ 7KH ,QIRUPDWLRQ7KHRUHWLF )RUPXODWLRQ IRU (LJHQGHFRPSRVLWLRQ DQG *HQHUDOL]HG (LJHQGHFRPSRVLWLRQ $V SRLQWHG RXW LQ &KDSWHU WKH ILUVW FRPSRQHQW RI WKH 3& $ FDQ EH IRUPXODWHG DV PD[LPL]LQJ DQ HQWURS\ GLIIHUHQFH DQG WKH ILUVW FRPSRQHQW RI WKH JHQHUDOL]HG HLJHQGHn FRPSRVLWLRQ FDQ DOVR EH IRUPXODWHG DV PD[LPL]LQJ DQ HQWURS\ GLIIHUHQFH +HUH PRUH JHQn HUDOL]HG IRUPXODWLRQV ZLOO EH JLYHQ 6XSSRVH WKHUH DUH RQH ]HURPHDQ *DXVVLDQ VLJQDO [Qf H 5P Q ZLWK f 1 7 7 FRYDULDQFH PDWUL[ 6 ([[ f \ [Qf[Qf WKH WULYLDO FRQVWDQW VFDODU O1 LV Q LJQRUHG KHUH IRU FRQYHQLHQFHf DQG RQH ]HURPHDQ ZKLWH *DXVVLDQ QRLVH ZLWK FRYDULDQFH PDWUL[ DV WKH LGHQWLW\ PDWUL[ $IWHU WKH OLQHDU WUDQVIRUP VKRZQ LQ )LJXUH WKH VLJQDO PAGE 116 7 DQG WKH QRLVH ZLOO VWLOO EH *DXVVLDQ VLJQDO DQG QRLVH ZLWK FRYDULDQFH PDWULFHV DV Z 6Z 7 DQG Z Z UHVSHFWLYHO\ 7KH HQWURSLHV IRU WKH RXWSXWV ZKHQ WKH LQSXW DUH WKH VLJQDO DQG WKH QRLVH ZLOO EH WKH IROORZLQJ DFFRUGLQJ WR f LQ &KDSWHU +Z7[f _ORJ_Y6Z_ _ORJL f +Z7QRLVHf ORJ_ZZ_ ORJW ,I ZH DUH JRLQJ WR ILQG D OLQHDU WUDQVIRUP VXFK WKDW WKH LQIRUPDWLRQ DERXW WKH VLJQDO DW WKH 7 RXWSXW HQG LH +Z [f LV PD[LPL]HG ZKLOH WKH LQIRUPDWLRQ DERXW WKH QRLVH DW WKH RXWSXW I HQG LH +Z QRLVHf LV PLQLPL]HG DW WKH VDPH WLPH WKH HQWURS\ GLIIHUHQFH FDQ EH XVHG DV WKH PD[LPL]DWLRQ FULWHULRQ +Z7[f fÂ§ +Z7QRLVHf AORJ\AS\ f A Z Z? HTXLYDOHQWO\ f _Z Z? 7KLV SUREOHP LV QRW D HDV\ RQH EXW KDV EHHQ VWXGLHG EHIRUH )RUWXQDWHO\ WKH VROXWLRQ WXUQV RXW WR EH WKH HLJHQYHFWRUV RI 6 ZLWK WKH ODUJHVW HLJHQYDOXHV >:LO 'XG@ 6ZL $LZL L N FDQ EH IURP WR P f 6R WKH HLJHQGHFRPSRVLWLRQ FDQ EH UHJDUGHG DV ILQGLQJ D OLQHDU WUDQVIRUP LQ WKH FDVH RI *DXVVLDQ VLJQDO DQG *DXVVLDQ QRLVH VXFK WKDW WKH HQWURS\ GLIIHUHQFH LQ WKH RXWSXW LV PD[Ln PL]HG LH WKH RXWSXW LQIRUPDWLRQ HQWURS\ RI WKH VLJQDO LV PD[LPL]HG ZKLOH WKH RXWSXW LQIRUPDWLRQ HQWURS\ RI WKH QRLVH LV PLQLPL]HG DW WKH VDPH WLPH 2QH PD\ QRWH WKDW WKH 5HQ\LfV HQWURS\ ZLOO OHDG WR WKH VDPH UHVXOW PAGE 117 6LPLODUO\ IRU WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ VXSSRVH WKHUH DUH WZR ]HURPHDQ 7 *DXVVLDQ VLJQDOV [mf [Qf Q 9 ZLWK FRYDULDQFH PDWULFHV DV 6f @ 1 1 M < \ A ;MmfM&MQf DQG 6 LV>MWF@ A [Qf[Qf UHVSHFWLYHO\ WKH WULYLDO FRQ Q Q VWDQW VFDODU 1 LV LJQRUHG IRU FRQYHQLHQFHf 7KH RXWSXWV DIWHU WKH OLQHDU WUDQVIRUP ZLOO 7 VWLOO EH *DXVVLDQ VLJQDOV ZLWK ]HURPHDQ DQG WKH FRYDULDQFH PDWULFHV DV Z 6Z DQG 7 Z 6Z UHVSHFWLYHO\ 6R WKH RXWSXW LQIRUPDWLRQ HQWURS\ IRU WKHVH WZR VLJQDOV ZLOO EH +Z7[[f a ORJ_ ZAL6f Z fORJ L a +Z7[f AORJ_ZUZ AORJL A f ,I ZH DUH ORRNLQJ IRU D OLQHDU WUDQVIRUP VXFK WKDW DW WKH RXWSXW WKH LQIRUPDWLRQ DERXW WKH ILUVW VLJQDO LV PD[LPL]HG ZKLOH WKH LQIRUPDWLRQ DERXW WKH VHFRQG VLJQDO LV PLQLPL]HG WKHQ ZH FDQ XVH WKH HQWURS\ GLIIHUHQFH DV WKH PD[LPL]DWLRQ FULWHULRQ ,Q WKLV FDVH WKH HQWURS\ GLIIHUHQFH ZLOO EH ERWK 6KDQQRQfV HQWURS\ DQG 5HQ\LfV HQWURS\f _O2J 7 Z 6Z 7 Z 6Z f HTXLYDOHQWO\ 7 Z 6nM: 7 Z 6Z f $JDLQ WKLV LV QRW D HDV\ SUREOHP )RUWXQDWHO\ WKH VROXWLRQ WXUQV RXW WR EH WKH JHQHUDOn L]HG HLJHQYHFWRUV ZLWK WKH ODUJHVW JHQHUDOL]HG HLJHQYDOXHV >:Â 'XG@ DV 6LZc ;L6ZL L N FDQ EH IURP WR P f 6R LQ WKH FDVH RI *DXVVLDQ VLJQDOV WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ LV WKH VDPH DV ILQGLQJ D OLQHDU WUDQVIRUP VXFK WKDW WKH LQIRUPDWLRQ DERXW WKH ILUVW VLJQDO DW WKH RXWSXW HQG PAGE 118 LV PD[LPL]HG ZKLOH WKH LQIRUPDWLRQ DERXW WKH VHFRQG VLJQDO DW WKH RXWSXW HQG LV PLQLn PL]HG 7KH )RUPXODWLRQ RI (LJHQGHFRPSRVLWLRQ DQG *HQHUDOL]HG (LJHQGHFRPSRVLWLRQ %DVHG RQ WKH (QHUJ\ 0HDVXUHV %DVHG RQ WKH HQHUJ\ FULWHULRQ WKH HLJHQGHFRPSRVLWLRQ FDQ DOVR EH IRUPXODWHG DV ILQGn LQJ OLQHDU SURMHFWLRQV Zc H 5P L N N IURP WR Pf )LJXUH f ZKLFK PD[Ln PL]H WKH FULWHULD LQ f 7 : Â‘ 6Z f W -^Zcf fÂ§fÂ§Â‘ VXEMHFW WR Zf :M M f :W Z ZKHUH Zr H 5P DUH WKH SURMHFWLRQV ZKLFK PD[LPL]H -Zf 2EYLRXVO\ ZKHQ L WKHUH LV QR FRQVWUDLQW IRU WKH PD[LPL]DWLRQ RI f 8VLQJ /DJUDQJH 0XOWLSOLHUV ZH FDQ YHULI\ WKDW WKH VROXWLRQV ;c Zrff RI WKH RSWLPL]DWLRQ DUH HLJHQYHFWRUV DQG HLJHQYDOXHV ZKLFK VDWLVI\ 6Zr ;Zr ZKHUH ;c DUH HLJHQYDOXHV RI 6 LQ GHVFHQGLQJ RUGHU )URP VHFWLRQ ZH NQRZ WKDW WKH QXPHUDWRU LQ f LV WKH SRZHU RI WKH RXWSXW VLJQDO RI WKH SURMHFWLRQ Zc ZKHQ WKH LQSXW LV DSSOLHG 7KH GHQRPLQDWRU FDQ DFWXDOO\ EH UHJDUGHG DV WKH SRZHU RI D ZKLWH QRLVH VRXUFH DSSOLHG WR WKH VDPH OLQHDU SUR 7 7 MHFWLRQ LQ WKH DEVHQFH RI [Qf VLQFH Z Z ZL Z ZKHUH LV WKH LGHQWLW\ PDWUL[ LH WKH FRYDULDQFH PDWUL[ RI WKH QRLVH 6R WKH HLJHQGHFRPSRVLWLRQ LV DFWXDOO\ WKH RSWLPL]DWLRQ RI D VLJQDOWRQRLVH UDWLR PD[LPL]LQJ WKH VLJQDO SRZHU ZLWK UHVSHFW WR DQ DOWHUQDWH ZKLWH QRLVH VRXUFH DSSOLHG WR WKH VDPH OLQHDU SURMHFWLRQf ZKLFK LV DQ LQWHUHVWLQJ REVHUYDWLRQ IRU VLJQDO SURFHVVLQJ DSSOLFDWLRQV 7KH FRQVWUDLQWV LQ f VLPSO\ UHTXLUH WKH RUWKRJRQDOLW\ RI HDFK SDLU RI SURMHFWLRQV 6LQFH Zr DUH HLJHQYHFWRUV RI 6 HTXLYDOHQW FRQVWUDLQWV FDQ EH ZULWWHQ DV PAGE 119 Z-ZrM;M ZMVZM PAGE 120 ,OO 7 /HW \cLQf Yc[OQf GHQRWHV WKH LWK RXWSXW ZKHQ WKH LQSXW LV [cQf WKHQ 7 7 YL 6MYL A \QQf LV WKH HQHUJ\ RI WKH LWK RXWSXW DQG Y 6c9M 9 B\cQfY mf LV WKH FURVVFRUUHODWLRQ EHWZHHQ LWK DQG MWK RXWSXWV ZKHQ WKH LQSXW LV [cQf 7KLV VXJJHVWV WKDW WKH FULn WHULD LQ f DUH HQHUJ\ UDWLRV RI WZR VLJQDOV DIWHU SURMHFWLRQ ZKHUH WKH FRQVWUDLQWV VLPSO\ UHTXLUH WKH GHFRUUHODWLRQ EHWZHHQ HDFK SDLU RI RXWSXW VLJQDOV 7KHUHIRUH WKH SUREOHP LV IRUPXODWHG DV DQ RSWLPDO VLJQDOWRVLJQDO UDWLR ZLWK GHFRUUHODWLRQ FRQVWUDLQWV 7KH 2QOLQH /RFDO 5XOH IRU (LJHQGHFRPSRVLWLRQ 2LDfV 5XOH DQG WKH )LUVW 3URMHFWLRQ $V PHQWLRQHG DERYH WKHUH LV QR FRQVWUDLQW IRU WKH RSWLPL]DWLRQ RI WKH ILUVW SURMHFWLRQ IRU WKH HLJHQGHFRPSRVLWLRQ DQG WKH FULWHULRQ LV WR OHW WKH RXWSXW HQHUJ\ RU SRZHUf RI WKH VLJQDO WR EH DV ODUJH DV SRVVLEOH ZKLOH OHWWLQJ WKH HQHUJ\ RU SRZHUf RI WKH RXWSXW RI WKH ZKLWH QRLVH WR EH DV VPDOO DV SRVVLEOH %\ WKH UHVXOW LQ ZH NQRZ WKDW WKH QRUPDO YHFWRU 6Z LV WKH VWHHSHVW DVFHQW GLUHFWLRQ RI WKH RXWSXW HQHUJ\ ZKHQ WKH LQSXW LV WKH VLJQDO [Q f ZKLOH WKH QRUPDO YHFWRU fÂ§Z fÂ§Z LV WKH VWHHSHVW GHVFHQW GLUHFWLRQ RI WKH RXWSXW HQHUJ\ ZKHQ WKH LQSXW LV WKH ZKLWH QRLVH 7KXV ZH FDQ SRVWXODWH WKDW WKH DGMXVWPHQW RI Z VKRXOG EH D FRPELQDWLRQ RI WZR QRUPDO YHFWRUV 6Z DQG fÂ§Z $Z RF 6Z fÂ§ DOZ@ 6Z fÂ§ DZ f ZKHUH D LV D SRVLWLYH VFDODU ZKLFK EDODQFH WKH UROHV RI WZR QRUPDO YHFWRUV ,I ZH FKRRVH 7 7 7 D Zf Z6ZZZ WKHQ f LV WKH JUDGLHQW PHWKRG 7KH FKRLFH D Z6Z ZLOO OHDG WR WKH VRFDOOHG 2MDfV UXOH >2MD@ 7 $Z RF 6Z fÂ§Z6ZfZ A\mf>AmffÂ§7LmfZ@ %DWFK 0RGH Q $Z FF\mf>[mf\QfZ@ f 6DPSOHE\6DPSOH 0RGH PAGE 121 2MDfV UXOH ZLOO PDNH Z FRQYHUJH WR Zr WKH HLJHQYHFWRU ZLWK WKH ODUJHVW HLJHQYDOXH RI 6 DQG DOVR PDNH __Z__ FRQYHUJH WR LH __ZM_ fÂ§! >2MD@ 7KH FRQYHUJHQFH SURRI FDQ EH IRXQG LQ 2MD >2MD@ ,Q WKH QH[W VHFWLRQ ZH SUHVHQW D JHRPHWULFDO H[SODQDWLRQ WR WKH DERYH UXOH VR WKDW LWV FRQYHUJHQFH FDQ EH HDVLO\ XQGHUVWRRG [fÂ§\Z Z \Z $Z\ Ff __Z__ )LJXUH *HRPHWULFDO ([SODQDWLRQ WR 2MDfV 5XOH *HRPHWULFDO ([SODQDWLRQ WR 2MDfV 5XOH :KHQ ??Z$? WKH EDODQFLQJ VFDODU LQ 2MDfV UXOH LV 7 7 7 D ZAZ Z6nZfKfKnf 6R LQ WKLV FDVH WKH XSGDWLQJ WHUP RI WKH 2MDfV UXOH $Z RF 6Z@ fÂ§ DZ[ A\?^Qf>[^Qf fÂ§ \[^QfZ^? LV WKH VDPH DV WKH JUDGLHQW GLUHFWLRQ Z M M M ZKLFK LV DOZD\V SHUSHQGLFXODU WR Z EHFDXVH Z 69M fÂ§ f 7KLV LV DOVR WUXH HYHQ IRU WKH VDPSOHE\VDPSOH FDVH ZKHUH $Z RF\>[fÂ§ \Z@ DOO WKH LQGL FHV DUH LJQRUHG IRU FRQYHQLHQFHf :KHQ __Z__ REYLRXVO\ Z ^[fÂ§\Zf LH WKH GLUHFWLRQ RI WKH XSGDWLQJ YHFWRU [fÂ§\Z LV SHUSHQGLFXODU WR Z DV VKRZQ LQ )LJXUH Df 6R LQ JHQHUDO WKH XSGDWLQJ YHFWRU [ fÂ§\Z LQ 2MDfV UXOH FDQ EH GHFRPSRVHG LQWR WZR FRPn SRQHQWV RQH LV WKH JUDGLHQW FRPSRQHQW $ZÂ DQG WKH RWKHU $ZZ LV DORQJ WKH GLUHFWLRQ RI WKH YHFWRU Z DV VKRZQ LQ )LJXUH Ef DQG Fff $Z RF $ZÂ $ZZ f PAGE 122 7KH JUDGLHQW FRPSRQHQW $ZLOO IRUFH Z WRZDUGV WKH ULJKW GLUHFWLRQ LH WKH HLJHQn YHFWRU GLUHFWLRQ ZKLOH WKH YHFWRU FRPSRQHQW $ZZ DGMXVWV WKH OHQJWK RI Z $V VKRZQ LQ )LJXUH Ef DQG Ff ZKHQ __Z__ LW WHQGV WR GHFUHDVH __ Z__ ZKHQ __Z__ LW WHQGV WR LQFUHDVH __Z__ 6R LW VHUYHV DV D QHJDWLYH IHHGEDFN FRQWURO IRU __Z__ DQG WKH HTXLOLEULXP SRLQW LV __Z__ 7KHUHIRUH HYHQ ZLWKRXW WKH H[SOLFLW QRUPDOL]DWLRQ RI WKH QRUP RI Z 2MDfV UXOH ZLOO VWLOO IRUFH __Z__ WR EH 8QIRUWXQDWHO\ ZKHQ 2MDfV UXOH LV XVHG IRU WKH PLQRU FRPSRQHQW WKH HLJHQYHFWRU ZLWK VPDOOHVW HLJHQYDOXH ZKHUH WKH FULWHULD LQ f LV WR EH PLQLPL]HGf WKH XSGDWLQJ RI Z EHFRPHV DQWL+HEELDQ W\SH ,Q WKLV FDVH $ZZ ZLOO VHUYH DV D SRVLWLYH IHHGEDFN FRQWURO IRU __Z__ DQG 2MDfV UXOH EHFRPHV XQVWDEOH 2QH VLPSOH PHWKRG WR VWDEOL]H 2MDfV UXOH IRU PLQRU FRPSRQHQWV LV WR SHUIRUP DQ H[SOLFLW QRUPDOL]Dn WLRQ IRU WKH QRUP RI Z VR WKDW 2MDfV UXOH LV H[DFWO\ HTXLYDOHQW WR JUDGLHQW GHVFHQW PHWKRG ,Q VSLWH RI WKH QRUPDOL]DWLRQ Z Z__Z__ WKLV PHWKRG LV FRPSDWLEOH WR WKH RWKHU PHWKRGV LQ FRPSXWDWLRQDO FRPSOH[LW\ EHFDXVH DOO WKH PHWKRGV QHHGV WR FDOFXODWH WKH YDOXH RI Z7Z 6DQJHUfV 5XOH DQG WKH 2WKHU 3URMHFWLRQV )RU WKH RWKHU SURMHFWLRQV WKH GLIIHUHQFH LV WKH FRQVWUDLQW LQ f )RU WKH LWK SURMHFn WLRQ ZH FDQ SURMHFW WKH QRUPDO YHFWRU 6Zc WR WKH VXEVSDFH RUWKRJRQDO WR DOO WKH SUHYLRXV HLJHQYHFWRUV Zr WR PHHW WKH FRQVWUDLQW DQG DSSO\ WKH 2MDfV UXOH LQ WKH VXEVSDFH WR ILQG WKH RSWLPDO VLJQDOWRQRLVH UDWLR LQ WKDW VXEVSDFH 7KLV LV FDOOHG WKH GHIODWLRQ PHWKRG %\ XVLQJ WKH FRQFHSW RI WKH GHIODWLRQ PHWKRG 6DQJHU >6DQ@ SURSRVHG WKH UXOH LQ f ZKLFK ZLOO GHJHQHUDWH WR WKH 2MDfV UXOH ZKHQ L $Z RF ZMZM 6ZL fÂ§ ZL 6ZLfZc M 9 f PAGE 123 LfÂ§ < ZKHUH ,fÂ§ :M:M LV WKH SURMHFWLRQ WUDQVIRUP WR WKH VXEVSDFH SHUSHQGLFXODU WR DOO WKH M SUHYLRXV Z M $FFRUGLQJ WR WKH 2MDfV UXOH ZO ZLOO FRQYHUJH WR WKH ILUVW HLJHQYHFWRU ZLWK WKH ODUJHVW FRUUHVSRQGLQJ HLJHQYDOXH DQG 0ZfÂ§} %DVHG RQ WKLV Z[ DQG WKH UXOH LQ f Z ZLOO FRQYHUJH WR WKH VHFRQG HLJHQYHFWRU ZLWK WKH VHFRQG ODUJHVW HLJHQYDOXHV DQG __Z_ } 6LPLODU VLWXDWLRQ ZLOO KDSSHQ IRU WKH UHVW RI Z 7KHUHIRUH 6DQJHUfV UXOH ZLOO VHTXHQWLDOO\ UHVXOW LQ WKH HLJHQYHFWRUV RI 6 LQ WKH GHVFHQGLQJ RUGHU RI WKHLU FRUUHVSRQGLQJ HLJHQYDOXHV 7KH FRUUHVSRQGLQJ EDWFK PRGH DGDSWDWLRQ DQG VDPSOHE\VDPSOH DGDSWDWLRQ UXOHV IRU 6DQJHUfV PHWKRG DUH LO $ZRF 9\mf^[Qf fÂ§( \;QfZÂ‘ fÂ§AKfA` %DWFK 0RGH fÂ§ L fÂ§ f $ZA \cQf^[Qf fÂ§ ,/ A\MQf:M fÂ§\LQfZc` 6DPSOHE\6DPSOH 0RGH 6DQJHUfV UXOH LV QRW ORFDO EHFDXVH WKH XSGDWLQJ RI Z LQYROYHV DOO WKH SUHYLRXV SURMHFn WLRQV :M DQG WKHLU RXWSXWV \M ,Q D ELRORJLFDO QHXUDO QHWZRUN WKH DGDSWDWLRQ RI WKH V\Qn DSVHV VKRXOG EH ORFDO ,Q DGGLWLRQ WKH ORFDOLW\ ZLOO PDNH WKH 9/6, LPSOHPHQW RI DQ DOJRULWKP PXFK HDVLHU :H QH[W ZLOO LQWURGXFH WKH ORFDO LPSOHPHQWDWLRQ RI 6DQJHUfV UXOH $3(; 0RGHO 7KH /RFDO ,PSOHPHQWDWLRQ RI 6DQJHUfV 5XOH $V VWDWHG DERYH WKH SXUSRVH RI HLJHQGHFRPSRVLWLRQ LV WR ILQG WKH SURMHFWLRQV ZKRVH RXWSXWV DUH PRVW FRUUHODWHG ZLWK WKH LQSXW VLJQDOV DQG GHFRUUHODWHG ZLWK HDFK RWKHU 6WDUWn LQJ IURP WKLV SRLQW DQG FRQVLGHULQJ WKH UHVXOWV LQ WKH VWUXFWXUH LQ )LJXUH LV SURn SRVHG PAGE 124 )LJXUH /LQHDU 3URMHFWLRQV ZLWK /DWHUDO ,QKLELWLRQV ,Q )LJXUH FA DUH ODWHUDO LQKLELWLRQ FRQQHFWLRQV H[SHFWHG WR GHFRUUHODWH WKH RXWSXW VLJQDOV 7KH LQSXWRXWSXW UHODWLRQ IRU WKH LWK SURMHFWLRQ LV ]L \c ZL[ ; FS\M L fÂ§ Â ZO\=FMLZM ?7 f 6R WKH RYHUDOO LWK SURMHFWLRQ LV YÂ Z &MW:M DQG WKH LQSXWRXWSXW UHODWLRQ FDQ EH U \ \c 9M; )RU WKH VLPSOLFLW\ RI H[SRVLWLRQ ZH ZLOO MXVW FRQVLGHU WKH VHFRQG SURMHFWLRQ )RU WKH ILUVW SURMHFWLRQ :M ZH DOUHDG\ KDYH 2MDfV UXOH VXSSRVH LW KDV DOUHDG\ FRQYHUJHG WR WKH VROXWLRQfÂ§WKH HLJHQYHFWRU ZLWK WKH ODUJHVW HLJHQYDOXH RI 6 DQG IL[HGf 7KH VHFRQG SURMHFWLRQ ZLOO UHSUHVHQW DOO WKH RWKHU SURMHFWLRQV WKH UXOH IRU DOO WKH UHVW FDQ EH VLPLODUO\ REWDLQHGf )RU WKH VWUXFWXUH LQ )LJXUH WKH RYHUDOO VHFRQG SURMHFWLRQ LV Y Z &?:? Â‘ 7KH SUREOHP FDQ EH UHVWDWHG DV ILQGLQJ WKH SURMHFWLRQ Y VXFK WKDW WKH IROORZLQJ FULWHULRQ LV PD[LPL]HG PAGE 125 9n\69n\ UU -Yf fÂ§fÂ§ VXEMHFW WR Y6Z[ f Y Y ZKHUH Z LV WKH VROXWLRQ IRU WKH ILUVW SURMHFWLRQ LH WKH HLJHQYHFWRU ZLWK WKH ODUJHVW HLJHQn YDOXH RI 6 DQG FDQ EH DVVXPHG IL[HG GXULQJ WKH DGDSWDWLRQ RI WKH VHFRQG SURMHFWLRQ 7KH RYHUDOO FKDQJH RI Y FDQ UHVXOW IURP WKH YDULDWLRQ RI ERWK IRUZDUG SURMHFWLRQ Z DQG WKH ODWHUDO LQKLELWLRQ FRQQHFWLRQ F LH ZH KDYH $Y $Z $F@fZ f 7R OHW WKH SUREOHP EH IXUWKHU WUDFWDEOH ZH ZLOO FRQVLGHU KRZ WKH RYHUDOO SURMHFWLRQ Y VKRXOG FKDQJH LI ZH IL[ F DQG KRZ LW VKRXOG FKDQJH LI ZH IL[ Z %\ WKH EDVLF SULQFLSOH LQ WKDW LV XVLQJ WKH +HEELDQ UXOH WR LQFUHDVH DQ RXWSXW HQHUJ\ DQG XVLQJ WKH DQWL+HE ELDQ UXOH WR GHFUHDVH DQ RXWSXW HQHUJ\f LI F LV IL[HG WKH RYHUDOO SURMHFWLRQ VKRXOG 7 HYROYH DFFRUGLQJ WR 2MDfV UXOH VR DV WR LQFUHDVH WKH HQHUJ\ Y6Y DQG DW WKH VDPH WLPH GHFUHDVH WKH Y Y $Y L69 fÂ§ YAL69fY f +RZHYHU Y LV D YLUWXDO SURMHFWLRQ DQG UHOLHV RQ ERWK F DQG Z ,Q WKLV FDVH ZKHQ FO LV IL[HG $Y $Z 6R f FDQ EH LPSOHPHQWHG E\ f $ Z 6Y fÂ§ Y6YfY f :KHQ [Y LV IL[HG WKH DGDSWDWLRQ RI F VKRXOG GHFRUUHODWH WKH WZR VLJQDOV \[ DQG \ $FFRUGLQJ WR WKH FRQFOXVLRQ LQ LH XVLQJ WKH DQWL+HEELDQ UXOH WR GHFUHDVH WKH FURVV FRUUHODWLRQ EHWZHHQ WZR RXWSXWV DV LQ )LJXUH f WKH DGDSWDWLRQ RI FQ VKRXOG EH $F AL\[^Qf\^Qf Z6Y Q f PAGE 126 6R E\ WKH SULQFLSOH LQ ZH FDQ SRVWXODWH WKH DGDSWDWLRQ UXOH DV f DQG f WRJHWKHU ? I 7 $Z 6YY6YfY e\mfrmf e\mf 9 Q Z ,!} 9 Q &?:? f 7 $F Z^6Y e\mf\mf 6XUSULVLQJO\ ZH PD\ ILQG RXW WKDW WKLV UXOH LV DFWXDOO\ WKH VDPH DV WKH 6DQJHUfV UXOH LI ZH ZULWH GRZQ WKH DGDSWDWLRQ IRU WKH RYHUDOO SURMHFWLRQ DV f DQG FRPSDUH LW ZLWK f 7 7 $Y $Z :M$F@f 6Y fÂ§ Y6YfY fÂ§ :M:ML6KA ZZIfYYMYfY f +RZHYHU IURP f ZH FDQ VHH WKDW WKH DGDSWDWLRQ RI Z LV QRW ORFDO HLWKHU LH $Z GHSHQGV QRW RQO\ RQ LWV LQSXW RXWSXW DQG Z LWVHOI EXW DOVR RQF DQG Z M ZKLFK LV FRQWDLQHG LQ WKH ODVW WHUP RI $Z LQ f 7KH ODVW WHUP RI $Z LQ f PHDQV WKDW WKH SDUW RI WKH DGDSWDWLRQ RI Z VKRXOG EH DORQJ WKH GLUHFWLRQ RI $QG WKLV FDQ DFWXDOO\ EH LPSOHPHQWHG E\ DGDSWLQJ WKH ODWHUDO LQKLELWLRQ FRQQHFWLRQ F LH WKH ODVW WHUP RI $Z LQ f FDQ EH SXW LQ $F@ LQVWHDG RI $Z )URP f ZH KDYH $Y $Z Z$Ff Af Za 6Af 9 Q U ? Q FZO AOmf7mf 9 Q Z f A\f:mf7mfZf !}f2Lmf FL7mff } 9 Q Z 7R NHHS WKH DGDSWDWLRQ RI $Y XQFKDQJHG ZH FDQ ZULWH QHZ DGDSWDWLRQ UXOHV IRU ERWK Z DQG F DV PAGE 127 $Z Q $F a;!QfOmf &!mff Q f ZKHUH WKH DGDSWDWLRQV RI ERWK Z DQG FO DUH fORFDOf f LV DFWXDOO\ WKH DGDSWDWLRQ UXOH RI WKH $3(; PRGHO >.XQ@ DQG DOO WKH DERYH JLYHV DQ LQWXLWLYH H[SODQDWLRQ WR WKH $3(; PRGHO DQG DOVR VKRZV WKDW WKH $3(; PRGHO LV QRWKLQJ EXW D ORFDO LPSOHPHQWDWLRQ RI 6DQJHUfV UXOH *HQHUDOO\ WKH VDPSOHE\VDPSOH DGDSWDWLRQ IRU WKH $3(; PRGHO LV DV IROORZV $ZLDF\Qf^[Qf\LUWfZL` >$&M FF\cQf^\MQf \ÂQfFM $3(; $GDSWDWLRQ f $Q ,WHUDWLYH 0HWKRG IRU *HQHUDOL]HG (LJHQGHFRPSRVLWLRQ &KDWWHUMHH HWDO >&KD@ IRUPXODWH WKH /'$ DV DQ KHWHURDVVRFLDWLRQ SUREOHP DQG SURn SRVH DQ LWHUDWLYH PHWKRG IRU /'$ 6LQFH WKH /'$ LV D VSHFLDO FDVH RI WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ WKH LWHUDWLYH PHWKRG FDQ EH IXUWKHU JHQHUDOL]HG IRU WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ 8VLQJ WKH VDPH QRWDWLRQ DV LQ WKH LWHUDWLYH PHWKRG IRU WKH JHQHUDOL]HG HLJHQGHn FRPSRVLWLRQ FDQ EH GHVFULEHG DV $YL 6OYLY-V@YLf6YL6nA9MY-VOYL L ?N f 7KLV PHWKRG DVVXPHV WKDW WKH FRYDULDQFH PDWULFHV KDYH DOUHDG\ EHHQ FDOFXODWHG DQG WKHQ WKH JHQHUDOL]HG HLJHQYHFWRUV FDQ EH LWHUDWLYHO\ REWDLQHG E\ f 7KHUH LV DQRWKHU PAGE 128 DOWHUQDWLYH PHWKRG ZKLFK XVHV VRPH RSWLPDO UHODWLRQ LQ WKH SUREOHP IRUPXODWLRQ EXW UHVXOWV LQ D PRUH FRPSOH[ UXOH >&KD@ LfÂ§ $YÂ 6OYLY-6@YLf6YL6 PAGE 129 LPDWLRQV RI D GHVLUHG SDUDPHWHU YHFWRU r ,I WKH IROORZLQJ DVVXPSWLRQV DUH VDWLVILHG IRU DOO IL[HG (^ ` LV WKH H[SHFWDWLRQ RSHUDWRUf WKHQ WKH FRUUHVSRQGLQJ GHWHUPLQLVWLF 2'( IRU f LV A I f DQG Â ZLOO FRQYHUJH WR WKH VROXWLRQ RI WKLV 2'( r ZLWK SUREn DELOLW\ DV N DSSURDFKHV RR >'LD@ $O 7KH VWHSVL]H VHTXHQFH VDWLVILHV fÂ§ DQG A Â RR N R S $ f LV D ERXQGHG DQG PHDVXUDEOH 5 YDOXHG IXQFWLRQ $ )RU DQ\ IL[HG [ WKH IXQFWLRQ I[ f LV FRQWLQXRXV DQG ERXQGHG XQLIRUPO\ LQ [f $ 7KHUH LV D IXQFWLRQ f OLP N FR I ? ,0r f ,3 ,, X OLP (^I[N f` RR $Q 2QOLQH /RFDO 5XOH IRU *HQHUDOL]HG (LJHQGHFRPSRVLWLRQ $V VWDWHG LQ WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ SUREOHP FDQ EH IRUPXODWHG DV WKH SUREOHP RI WKH RSWLPDO VLJQDOWRVLJQDO UDWLR ZLWK GHFRUUHODWLRQ FRQVWUDLQWV +HUH WKH QHWZRUN VWUXFWXUH RI WKH $3(; PRGHO ZLOO EH XVHG IRU WKLV PRUH FRPSOLFDWHG SUREOHP $V VKRZQ LQ )LJXUH Zc H 5P DUH IRUZDUG OLQHDU SURMHFWLQJ YHFWRUV F[ DUH ODWHUDO LQKLEL WLYH FRQQHFWLRQV XVHG WR IRUFH GHFRUUHODWLRQ DPRQJ WKH RXWSXW VLJQDOV EXW WKH LQSXW LV VZLWFKHG EHWZHHQ WKH WZR ]HURPHDQ VLJQDO MFMmf DQG [Qf DW HDFK WLPH LQVWDQW Q 7KH RYHUDOO SURMHFWLRQ LV WKH FRPELQDWLRQ RI WZR W\SHV RI FRQQHFWLRQV HJ 7 Y &@:? Z HWF 7KH LWK RXWSXW IRU WKH LQSXW [cQf ZLOO EH \cLQf YL[>Qf HWF 7KH SURSRVHG RQOLQH ORFDO UXOH IRU WKH QHWZRUN LQ )LJXUH IRU WKH JHQHUDOL]HG HLJHQGHn FRPSRVLWLRQ ZLOO EH GLVFXVVHG LQ WKH IROORZLQJ VHFWLRQV PAGE 130 )LJXUH /LQHDU 3URMHFWLRQV ZLWK /DWHUDO ,QKLELWLRQV DQG 7ZR ,QSXWV 7KH 3URSRVHG /HDUQLQJ 5XOH IRU WKH )LUVW 3URMHFWLRQ ,Q WKLV VHFWLRQ ILUVW ZH ZLOO GLVFXVV WKH EDWFK PRGH UXOH IRU WKH DGDSWDWLRQ IRU WKH ILUVW SURMHFWLRQ WKHQ WKH VWDELOLW\ DQDO\VLV IRU WKH EDWFK PRGH UXOH ILQDOO\ WKH FRUUHVSRQGLQJ DGDSWLYH RQOLQH UXOH IRU WKH ILUVW SURMHFWLRQ $ 7KH %DWFK 0RGH $GDSWDWLRQ 5XOH 6LQFH WKHUH LV QR FRQVWUDLQW IRU WKH RSWLPL]DWLRQ RI WKH ILUVW SURMHFWLRQ Y LWV RXWSXW GRHVQfW UHFHLYH DQ\ ODWHUDO LQKLELWLRQ WKXV Y^ Z` DV VKRZQ LQ )LJXUH 7KH QRUPDO YHFWRU IRU WKH SRZHU ILHOG Z@ 6MZ LV IZMf 6M:M 9 \??Qf[[^Qf DQG WKH QRU 7 PDO YHFWRU IRU WKH SRZHU ILHOG LV +Z[f 9 \QQf[fff 7R 7 7 LQFUHDVH :M 6M Z DQG GHFUHDVH Z O DW WKH VDPH WLPH WKH DGDSWDWLRQ VKRXOG EH PAGE 131 $ZM +OZ[f fÂ§ +Z^fIZ@f ZM :MUL$YY f ZKHUH U_ LV WKH OHDUQLQJ VWHS VL]H WKH +HEELDQ WHUP +[Z[f ZLOO fHQKDQFHf WKH RXWSXW VLJQDO \QQf WKH DQWL+HEELDQ WHUP fÂ§+Z[f ZLOO fDWWHQXDWHf WKH RXWSXW VLJQDO \QQf 7 7 WKH VFDODUZf ZLOO SOD\ WKH EDODQFLQJ UROH ,I I^Z[f ZnZfZn:Mf LV FKRVHQ 7 WKHQ f LV WKH JUDGLHQW PHWKRG ,I Zf Z Z WKHQ f EHFRPHV WKH PHWKRG XVHG LQ 'LDPDQWDUDV DQG .XQJ >'LD@ 6LPLODU WR 2MDfV UXOH WKH EDODQFLQJ VFDODU Zf 7 FDQ EH VLPSOLILHG DVZMf Z[3Z[ 3 6M RU 6 RU 6M 6f f EHFDXVH LQ WKLV FDVH 7 WKH VFDODU FDQ EH VLPSOLILHG DV WKH RXWSXW HQHUJ\ HJ YWT6MZfL A \X Qf ,Q WKH 7 VHTXHO WKH FDVH I^Z[f Z YWT ZLOO EH GLVFXVVHG 0O O.LQ :O YN= M APD[n 0D[LPXP (LJHQYDOXH RI 6 $ZLQ 0LQLPXP (LJHQYDOXH RI 6 )LJXUH 7KH 5HJLRQV 5HODWHG WR WKH 9DULDWLRQ RI WKH 1RUP __Z__ PAGE 132 % 7KH 6WDELOLW\ $QDO\VLV RI WKH %DWFK 0RGH 5XOH 7KH VWDWLRQDU\ SRLQWV RI WKH DGDSWDWLRQ SURFHVV f FDQ EH REWDLQHG E\ VROYLQJ WKH HTXDWLRQ +[^Z[f fÂ§ +^Z[fI^Z[f 6^fÂ§->Z@f6fZO 2EYLRXVO\ DQG DOO WKH JHQHUDOL]HG HLJHQYHFWRUV Yr ZKLFK VDWLVI\ 6Yr IYrf6Yr DUH VWDWLRQDU\ SRLQWV 1RWLFH WKDW LQ JHQHUDO WKH OHQJWK RI Yr VKRXOG EH VSHFLILHG E\ IYrf Nc DUH WKH JHQn HUDOL]HG HLJHQYDOXHV FRUUHVSRQGLQJ WR Yr 6R Yr DUH IXUWKHU GHQRWHG E\ Yr;L ,Q WKH FDVH RI 7 RAR RAR ZMf YYAM:M ZH KDYH YAf ;c DQG YAf 6YAc :H ZLOO VKRZ WKDW 7 ZKHQ Zf Z[3Z[ WKHUH LV RQO\ RQH VWDEOH VWDWLRQDU\ SRLQW WKDW LV WKH VROXWLRQ Z YAM $OO WKH UHVW DUH XQVWDEOH VWDWLRQDU\ SRLQWV 7 /HWfV ORRN DW WKH FDVH ZKHQ Zf WKH UHVW ZLOO EH VLPLODU )LUVW LW FDQ EH VKRZQ WKDW Z LV QRW VWDEOH 7R VKRZ WKLV ZH FDQ FDOFXODWH WKH ILUVW RUGHU DSSUR[LPD WLRQ IRU WKH YDULDWLRQ RI __ZL__ ZKLFK LV $__Z__f Z@$Zf UfZL6n@Z fÂ§ U_Z@6n@+nO fÂ§ :M6AZf 6LQFH ZI6AZM WKH VLJQ RI WKH YDULDWLRQ WRWDOO\ GHSHQGV RQ Z?6Z[ $V VKRZQ LQ )LJXUH ZKHQ Zc LV ORFDWHG ZLWKLQ WKH UHJLRQ LH Z[6:? $__ZM__f LV SRVLWLYH DQG __Z__ ZLOO LQFUHDVH ZKLOH Z LV ORFDWHG RXWVLGH WKH UHJLRQ LH Z[6Z[ $__Z__f LV QHJDWLYH DQG __ZM__ ZLOO GHFUHDVH 6R WKH VWDEOH VWDWLRQDU\ SRLQWV VKRXOG EH ORFDWHG LQ WKH K\SHUHOOLSn VRLG Z?6Z@ 7KHUHIRUH Z FDQ QRW EH D VWDEOH VWDWLRQDU\ SRLQW 7KLV FDQ DOVR EH VKRZQ E\ WKH /\DSXQRY ORFDO DV\PSWRWLF VWDELOLW\ DQDO\VLV >.KD@ 7KH EHKDYLRU RI WKH DOJRULWKP GHVFULEHG E\ f FDQ EH FKDUDFWHUL]HG E\ WKH IROORZLQJ GLIIHUHQWLDO HTXDn WLRQ GZ M W SAf +AZA fÂ§ +AZAIL\YAf 6fM:M fÂ§ ZfKffZ GW f PAGE 133 2EYLRXVO\ WKLV LV D QRQOLQHDU G\QDPLF V\VWHP 7KH LQVWDELOLW\ DW FDQ EH GHWHUn PLQHG E\ WKH SRVLWLRQ RI WKH HLJHQYDOXHV RI WKH OLQHDUL]DWLRQ PDWUL[ $ $ GZ Z 6O fÂ§ 6Z:S6nM fÂ§ :M fL6 Z R rAL f 6LQFH LV SRVLWLYH GHILQLWH DOO LWV HLJHQYDOXHV ZLOO EH SRVLWLYH 6R WKH G\QDPLF SURn FHVV L:M aGL $ZM FDQ QRW EH VWDEOH DW ZW LH f LV QRW VWDEOH DW Z[ 6LPLODUO\ Z Y;L L P FDQ EH VKRZQ XQVWDEOH WRR $FWXDOO\ LQ WKHVH FDVHV WKH FRUUHVSRQGLQJ OLQHDUL]DWLRQ PDWUL[ $ ZLOO EH $ 66!Z>VLZ>VLZf6_ fnL Y?c L P f L9Y/Y/f 6A6 %\ XVLQJ \rXf 6Yr;L Pf \Xf 6[Yr[[ L DQG YAAYMM OZH KDYH YLL0Y/ ALA!R f 7KH LQHTXDOLW\ LQ f KROGV EHFDXVH ;M LV WKH ODUJHVW JHQHUDOL]HG HLJHQYDOXH 6LPLODUO\ E\ XVLQJ Yr;Lf 6[Yr;L ;c DQG Yr;Lf 6Yr;L ZH KDYH Yr;Lf7$YOL ;L f 6R WKH OLQHDUL]DWLRQ PDWUL[ $ DW Z Y;L L P DUH QRW GHILQLWH DQG WKXV WKH\ DUH DOO VDGGOH SRLQWV DQG XQVWDEOH 7KH ORFDO VWDELOLW\ RI Z Y[[ FDQ EH VKRZQ E\ WKH QHJDWLYHQHVV RI WKH OLQHDUL]DWLRQ PDWUL[ $ DW Z@ $ ZZ>VZ>Zf_:L YRL 6[ fÂ§ nN[6Y[[Y[[f 6 fÂ§ ;[6 f PAGE 134 $FWXDOO\ LW LV QRW GLIILFXOW WR YHULI\ f YeMf $Y[M fÂ§ $ fÂ§ $ fÂ§$ Y/0Y/ L P f Y/+ LrM 6LQFH DOO WKH JHQHUDOL]HG HLJHQYHFWRUV YA L P DUH OLQHDUO\ LQGHSHQGHQW ZLWK HDFK RWKHU DQG WKH\ VSDQ WKH ZKROH VSDFH DQ\ QRQ]HUR YHFWRU [ H 5P FDQ EH WKH OLQHDU FRPELQDWLRQ RI DOO WKH JHQHUDOL]HG HLJHQYHFWRUV Yr;L ZLWK DW OHDVW RQH FRHIILFLHQW EHLQJ P R 7 QRQ]HUR LH [ A DLY?c f 7KXV E\ f ZH KDYH WKH TXDGUDWLF IRUP MF $[ DV IRO L L ORZV 7 rf Q 7 UW [ $[ eIOYXf $Y;L f O 6R DOO WKH HLJHQYDOXHV RI WKH OLQHDUL]DWLRQ PDWUL[ $ DUH QHJDWLYH DQG WKXV Z@ Yr[[ LV 7 VWDEOH :KHQ IZ[f Z[3Z[ S 6 RU 6f WKH VWDELOLW\ DQDO\VLV FDQ EH VLPLn ODUO\ REWDLQHG $V VKRZQ SUHYLRXVO\ ERWK WKH 3 DQG WKH 6 LQ WKH FRQVWUDLQ Y-6Yr KDYH WKUHH FKRLFHV )RU WKH VLPSOLFLW\ RI H[SRVLWLRQ RQO\ 3 6 6[ ZLOO EH XVHG LQ WKH UHVW RI WKLV FKDSWHU ,W VKRXOG EH QRWLFHG WKDW ZKHQ Z[ FRQYHUJHV WR Y" WKH VFDODU YDOXH IZ Lf IYr$f $M 6RZMf FDQ EH WKH HVWLPDWH RI WKH ODUJHVW HLJHQYDOXH & 7KH /RFDO 2Q/LQH $GDSWLYH 5XOH 7 :KHQZf :M6M: LV XVHG f ZLOO EH WKH VDPH DV f WKH DGDSWDWLRQ UXOH LQ &KDWWHUMHH HWDO >&KD@ +RZHYHU KHUH WKH FDOFXODWLRQV RI WKH +HEELDQ WHUP +[Z[f WKH DQWL+HEELDQ WHUP fÂ§+Z[f DQG WKH EDODQFLQJ VFDODU IZ[f DUH DOO ORFDO DYRLGLQJ WKH PAGE 135 GLUHFW PDWUL[ PXOWLSOLFDWLRQV LQ f DQG UHVXOWLQJ LQ D GUDVWLF UHGXFWLRQ LQ FRPSXWDWLRQ :KHQ WKH H[SRQHQWLDO ZLQGRZ LV XVHG WR HVWLPDWH HDFK WHUP LQ f ZH KDYH ZmOf :MQf Ufmf$ZMmf $ZQf +[Z[Qf+^Z[QfIZ[Qf +?Z?Qf +[Z[Q?f D?\[[Qf[[Qf+[Z[Q?f@ f +Z[Qf +Z[ m fÂ§ f D>\;Qf[Qf fÂ§ +^Z[ mfÂ§ f@ !mf ZM Q fÂ§ f D>\QmffÂ§\Zm fÂ§ f@ ZKHUH WKH VWHS VL]H U_mf VKRXOG GHFUHDVH ZLWK WKH WLPH LQGH[ Q 7KH QXPEHU RI PXOWLSOLn FDWLRQV UHTXLUHG E\ f LV P P LV WKH GLPHQVLRQ RI WKH LQSXW VLJQDOVf ZKLOH WKH QXPEHU RI PXOWLSOLFDWLRQV UHTXLUHG E\ WKH PHWKRG f RI &KDWWHLMHH HWDO >&KD@ LV P P 7KH FRQYHUJHQFH RI WKH VWRFKDVWLF DOJRULWKP LQ f FDQ EH VKRZQ E\ WKH VWRFKDVWLF DSSUR[LPDWLRQ WKHRU\ LQ WKH VDPH ZD\ DV &KDWWHLMHH HWDO >&KD@ 7KH VLPXODWLRQ UHVXOWV DOVR VKRZ WKH FRQYHUJHQFH ZKHQ WKH LQVWDQWDQHRXV YDOXHV IRU WKH +HEELDQ DQG WKH DQWL +HEELDQWHUP +[Z[f DQG +Z[f DUH XVHG LH :M}f :MQf ULQf$ZQf $ZQf +[Z[Qf+Z[QfI^Z[Qf +[Z[Qf \[[Qf[[Qf f +Z[Qf \[Qf[Qf fmf IWZ[QOf D>\[[Qf-^Z[Q?f@ 1RWLFH WKDW LQ ERWK f DQG f ZKHQ FRQYHUJHQFH LV DFKLHYHG WKH EDODQFLQJ 7 VFDODU IZ[ Qf ZLOO DSSURDFK LWV EDWFK PRGH YHUVLRQ Zf Z M Z $V VKRZQ LQ WKH DERYH WKH EDWFK PRGH VFDODU Zf ZLOO DSSURDFK WKH ODUJHVW JHQHUDOL]HG HLJHQYDOXH $ ZKHQ :M DSSURDFKHV 6R ZH FDQ FRQFOXGH WKDW IZ[ Qf fÂ§! ;[ DQG DOO WKH TXDQWLWLHV LQ ERWK f DQG f KDYH EHHQ IXOO\ XWLOL]HG PAGE 136 7KH 3URSRVHG /HDUQLQJ 5XOHV IRU WKH 2WKHU &RQQHFWLRQV ,Q WKLV VHFWLRQ WKH DGDSWDWLRQ UXOH IRU ERWK ODWHUDO FRQQHFWLRQV DQG WKH IHHGIRUZDUG FRQQHFWLRQV RI WKH RWKHU SURMHFWLRQV DUH GLVFXVVHG )RU VLPSOLFLW\ RQO\ Y FZ Z LV FRQVLGHUHG 7KH RWKHU FDVHV DUH VLPLODU 6XSSRVH Z[ DOUHDG\ KDV UHDFKHG LWV ILQDO SRVLn WLRQ LH :_ YAM DQG YAf 6[YA[ ;M DQG YAMf $JDLQ ZH ZLOO ILUVW GLVFXVV WKH EDWFK PRGH UXOH IRU ERWK WKH IHHGIRUZDUG FRQQHFWLRQV Z DQG WKH ODWHUDO LQKLE LWLYH FRQQHFWLRQ F WKHQ LWV VWDELOLW\ DQDO\VLV DQG ILQDOO\ WKH FRUUHVSRQGLQJ ORFDO RQOLQH DGDSWLYH UXOH $ 7KH %DWFK 0RGH $GDSWDWLRQ 5XOH 6LPLODU WR WKH DGDSWDWLRQ UXOH FDQ EH GHVFULEHG DV WZR SDUWV WKH GHFRUUHODWLRQ DQG WKH RSWLPDO VLJQDOWRVLJQDO UDWLR VHDUFK 7KH GHFRUUHODWLRQ EHWZHHQ WKH RXWSXW VLJQDO \ Q Qf DQG \O Qf FDQ EH DFKLHYHG E\ WKH DQWL+HEELDQ OHDUQLQJ RI WKH LQKLELWLYH FRQQHFn WLRQ F DQG WKH RSWLPDO VLJQDOWRVLJQDO UDWLR VHDUFK FDQ EH DFKLHYHG E\ WKH VLPLODU UXOH DV WKH SUHYLRXV VHFWLRQ IRU WKH IHHGIRUZDUG FRQQHFWLRQ Z 6R ZH KDYH U $F &ZfYf I & F7F$F ? f > $Z +[Yf+YfIYf >Z Z U?Z$Z 7 ZKHUH &:_Yf :MM9 9 \??^Qf\;Qf LV WKH FURVVFRUUHODWLRQ EHWZHHQ WZR Â‘L RXWSXW VLJQDOV \Qmf DQG \;Qf +[\f 6[Y 9 \[Qf[[Qf LV WKH +HEELDQ WHUP ZKLFK ZLOO fHQKDQFHf WKH RXWSXW VLJQDO \;^Qf +Yf Y A \Qf[Qf LV WKH DQWL+HEELDQ WHUP ZKLFK ZLOO fDWWHQXDWHf WKH RXWSXW VLJQDO \Qf 7 Yf Y_ Y 9 \?Qf LV WKH VFDODU SOD\LQJ D EDODQFLQJ UROH EHWZHHQ WKH +HE ELDQ WHUP +[^Yf DQG WKH DQWL+HEELDQ WHUP +^Yf UfF LV WKH VWHS VL]H IRU GHFRUUHODWLRQ SURFHVV U_Z LV WKH VWHS VL]H IRU IHHGIRUZDUG DGDSWDWLRQ PAGE 137 )LUVW OHWfV FRQVLGHU WKH FDVH ZKHUH Z LV IL[HG 7KHQ DV SRLQWHG RXW LQ WKH ODWHUDO LQKLELWLRQ FRQQHFWLRQ F@ LQ f ZLOO GHFRUUHODWH WKH RXWSXW VLJQDOV \XQf DQG \mf ,Q IDFW ZH KDYH WKH YDULDWLRQ IRU WKH FURVVFRUUHODWLRQ $& ZIVM$Yf Z>nULF$FfYYf UfFZ>L6nZf& DQG &mOf &Qf $& fÂ§ULZZf&mf ,I U_F LV VPDOO HQRXJK VXFK WKDW fÂ§ U_Z6nZ,_ WKHQ OLP &Qf :KHQ WKH GHFRUUHODWLRQ LV DFKLHYHG LH & WKHUH ZLOO EH QR DGMXVW Q RR PHQW LQ FO QDPHO\ F ZLOO UHPDLQ WKH VDPH 6HFRQG OHWfV FRQVLGHU WKH FDVH ZLWK F@ IL[HG 7KHQ ZH KDYH $Y $Z +? YfYfYf f %\ WKH FRQFOXVLRQ LQ WKH SUHYLRXV VHFWLRQ ZH NQRZ WKDW $Y LV LQ WKH GLUHFWLRQ WR LQFUHDVH WKH VLJQDOWRVLJQDO UDWLR -\f &RPELQLQJ WKHVH WZR SRLQWV LQWXLWLYHO\ ZH FDQ VD\ WKDW DV ORQJ DV WKH VWHS VL]H U_F IRU WKH GHFRUUHODWLRQ SURFHVV LV ODUJH HQRXJK UHODWLYH WR WKH VWHS VL]H U_Z IRU WKH IHHGIRUZDUG SURFHVV VXFK WKDW WKH GHFRUUHODWLRQ SURFHVV LV IDVWHU WKDQ WKH IHHGIRUZDUG SURFHVV WKHQ WKH RSWLPDO VLJQDOWRVLJQDO UDWLR VHDUFK ZLOO EDVLFDOO\ WDNH SODFH ZLWKLQ WKH VXEVSDFH WKDW LV 7 R 6@ RUWKRJRQDO WR WKH ILUVW HLJHQYHFWRU LH Yr6M YA DQG WKH ZKROH SURFHVV ZLOO FRQn YHUJH WR WKH VROXWLRQ LH Y fÂ§! Ye DQG Yf fÂ§! ; +RZHYHU ZH VKRXOG QRWLFH WKDW Y fÂ§! YrN GRHV QRW QHFHVVDULO\ PHDQ FQ fÂ§! ZKLFK LV WKH FDVH IRU WKH $3(; PRGHO $FWXDOO\ F FDQ WDNH DQ\ YDOXH EXW WKH RYHUDOO SURMHFWLRQ ZLOO FRQYHUJH % 7KH 6WDELOLW\ $QDO\VLV RI WKH %DWFK 0RGH 5XOH 7KH VWDWLRQDU\ SRLQWV RI f FDQ EH REWDLQHG E\ VROYLQJ ERWK $F DQG $Z 2EYLRXVO\ Y DQG Y L Pf DUH WKH VWDWLRQDU\ SRLQWV IRU WKH G\QDPLF SURFHVV RI f %DVHG RQ WKH UHVXOWV LQ WKH SUHYLRXV VHFWLRQ LW LV QRW GLIILn FXOW WR VKRZ WKDW Y DQG Y YeI Pf DUH DOO XQVWDEOH $FWXDOO\ LI WKH LQLWLDO PAGE 138 VWDWH RI Y LV LQ WKH VXEVSDFH 6M RUWKRJRQDO WR LH Y$L Y,L DQG 969r8 Y6?Yr;?fA? r WKHQ $F ZLOO EH DQG WKH DGMXVWPHQW RI Y ZLOO EH $Y $Z L6M9 fÂ§YYfnY ZKLFK LV DOVR 6M RUWKRJRQDO WR YAM LH $YfUYAM 6R Y U_$Y ZLOO DOVR EH 6M RUWKRJRQDO WR Y" 7KLV PHDQV WKDW RQFH Y LV LQ WKH VXEVSDFH ZKLFK VDWLVILHV WKH GHFRUUHODWLRQ FRQVWUDLQW LW ZLOO UHPDLQ LQ WKLV VXEVSDFH E\ WKH 7 UXOH LQ f ,Q WKLV FDVH WKH DGDSWDWLRQ RI f ZLOO EHFRPH $Y n@Y fÂ§Y6YfnY LQ WKH VXEVSDFH 6M RUWKRJRQDO WR Yr\ ^ ZKLFK LV H[DFWO\ WKH VDPH DV WKH FDVH RI WKH ILUVW SURn MHFWLRQ H[FHSW WKDW WKH VHDUFK LV ZLWKLQ WKH VXEVSDFH $ RUWKRJRQDO WR YA $FFRUGLQJ WR WKH UHVXOW LQ ZH NQRZ WKDW WKH VWDWLRQDU\ SRLQWV Y DQG Y YA L Pf DUH DOO XQVWDEOH HYHQ LQ WKH VXEVSDFH 7R VKRZ WKDW Y Ye LV VWDEOH ZH FDQ VWXG\ WKH RYHUDOO SURFHVV $Y &}Yf $Z fÂ§Z$Ff ,WV FRUUHVSRQGLQJ GLIIHUHQWLDO HTXDWLRQ LV M8!Yf YY>YfYAeZZI6Y f ZKHUH Z Ye ZLOO UHPDLQ XQFKDQJHG DIWHU WKH FRQYHUJHQFH RI WKH ILUVW SURMHFWLRQ 7KH OLQHDUL]DWLRQ PDWUL[ $ RI f DW Z DQG Y YA LV :O 9[L 9 YO f $ BGB GY IWYf 6@ fÂ§ fÂ§ nYYn fÂ§ YM9f_ 7O : Y[OY[Of fÂ§ $YXY;f 6N6 OZ $V D FRPSDULVRQ WKH FRUUHVSRQGLQJ OLQHDUL]DWLRQ PDWUL[ % RI f RI WKH PHWKRG LQ &KDWWHUMHH HWDO >&KD@ FDQ EH VLPLODUO\ REWDLQHG % 6LfÂ§9X20f fÂ§ ;n9MLYAf 6 fÂ§ N6 f PAGE 139 } UW X fY r f f9n + L f f f f f )LJXUH 7KH GLVWULEXWLRQ RI WKH UHDO SDUWV RI WKH HLJHQYDOXHV IRU $ LQ WULDOV IRU VLJn QDOV ZLWK GLPHQVLRQ 1RWLFH WKDW $ LV QRW V\PPHWULF 6R WKH HLJHQYDOXHV RI $ PD\ EH FRPSOH[ 7R VKRZ WKH VWDELOLW\ RI Y Ye IRU f ZH QHHG WR VKRZ WKH QHJDWLYHQHVV RI DOO WKH UHDO SDUWV RI WKH HLJHQYDOXHV RI WKH PDWUL[ $ $OWKRXJK WKHUH LV QR ULJRURXV SURRI WKDW WKH UHDO SDUV RI DOO WKH HLJHQYDOXHV RI $ DUH QHJDWLYH LQ WKLV FDVH LW LV GLIILFXOW WR VKRZ WKH QHJDWLYHn QHVV EHFDXVH $ LV QRW V\PPHWULFf WKH 0RQWH &DUOR WULDOV VKRZ WKH QHJDWLYHQHVV DV ORQJ DV WKH VWHS VL]H LV ODUJH HQRXJK UHODWLYH WR WKH VWHS VL]H U?Z )LJXUH LV WKH UHVXOWV RI WULDOV IRU UDQGRPO\ JHQHUDWHG VLJQDOV ZLWK GLPHQVLRQ DQG WKH FRQGLWLRQ WKDW 7_F 7>Z $V FDQ EH VHHQ IURP WKH ILJXUH DOO WKH UHDO SDUWV RI WKH HLJHQYDOXHV RI $ DUH QHJDWLYH 7R FRPSDUH WKH SURSRVHG PHWKRG ZLWK WKH RQH LQ &KDWWHUMHH HWDO >&KD@ WKH HLJHQYDOXHV RI WKH OLQHDUL]DWLRQ PDWUL[ IRU WKH PHWKRG LQ &KDWWHLMHH HWDO >&KD@ LH % LQ ff DUH DOVR FDOFXODWHG 7KH PHDQ YDOXH RI WKH UHDO SDUWV IRU HLJHQYDOXHV RI $ DQG % PAGE 140 DUH FDOFXODWHG IRU HDFK WULDO 7KH PHDQ YDOXHV DUH GLVSOD\HG LQ )LJXUH IURP ZKLFK ZH FDQ VHH WKDW PRVW RI WKH PHDQ YDOXHV IRU $ LV HYHQ OHVV WKDQ WKH FRUUHVSRQGLQJ PHDQ YDOn XHV IRU % ZKLFK VRPHKRZ PHDQV WKDW WKH PRVW UHDO SDUWV RI WKH HLJHQYDOXHV IRU $ DUH HYHQ OHVV WKDQ WKRVH RI % 7KLV LQGLFDWHV WKDW WKH G\QDPLF SURFHVV FKDUDFWHUL]HG E\ GYGW $Y ZLOO FRQYHUJH IDVWHU WKDQ WKH G\QDPLF SURFHVV FKDUDFWHUL]HG E\ GYGW %Y 7KLV PD\ H[SODLQ WKH REVHUYDWLRQV WKDW WKH SURSRVHG PHWKRG XVXDOO\ KDYH D IDVWHU FRQYHUJHQFH VSHHG WKDQ WKH PHWKRG LQ &KDWWHUMHH HWDO >&KD@ LQ RXU VLPXODWLRQV )LJXUH IXUWKHU VKRZV WKH PHDQ GLIIHUHQFH IRU $ DQG % LH PHDQ$f PHDQ%f IURP ZKLFK ZH FDQ VHH DOO WKH YDOXHV DUH QHJDWLYH ZKLFK PHDQV WKDW WKH PHDQV RI WKH UHDO SDUWV RI WKH HLJHQYDOXHV IRU $ DUH OHVV WKDQ WKRVH RI % LQ SURSRVHG PHWKRG RI f % LQ WKH PHWKRG LQ f >&KD@ )LJXUH 7KH &RPSDULVRQ RI WKH PHDQ RI WKH UHDO SDUWV RI WKH HLJHQYDOXHV RI $ DQG % LQ WKH VDPH WULDOV DV LQ )LJXUH PAGE 141 )LJXUH 7KH GLIIHUHQFH RI WKH PHDQ UHDO SDUWV RI WKH HLJHQYDOXHV RI $ DQG % & 7KH /RFDO 2Q/LQH $GDSWLYH 5XOH 7R JHW DQ DGDSWLYH RQOLQH DOJRULWKP ZH FDQ DJDLQ XVH WKH H[SRQHQWLDO ZLQGRZ WR HVWLn PDWH WKH WHUPV LQ f 7KXV ZH KDYH $Fmf &ZYQf $ ZQf +^YQf+YQfIYQf +?Y Qf +OYQOf D>\OQf[OQf+O\QOf@ +YQf +^\Q?f D?\Qf[^Qf+YQ?f@ nnnf $r Qf Y Q f FW>\mf Y Q f@ &ZY Qf &ZO YQOf D_!Qf!Qf&Z@YQOf@ ZKHUH D LV D VFDODU EHWZHHQ DQG 7KH FRQYHUJHQFH RI f FDQ DOVR EH UHODWHG WR WKH VROXWLRQ WR LWV FRUUHVSRQGLQJ GHWHUPLQLVWLF RUGLQDU\ GLIIHUHQWLDO HTXDWLRQ FKDUDFWHUL]HG E\ f E\ WKH VWRFKDVWLF DSSUR[LPDWLRQ WKHRU\ >'LD &KD@ PAGE 142 7KH QXPEHU RI WKH PXOWLSOLFDWLRQV UHTXLUHG E\ WKH SURSRVHG PHWKRG IRU WKH ILUVW WZR SURMHFWLRQV DW HDFK WLPH LQVWDQW Q LV P YHUVXV P P UHTXLUHG E\ WKH PHWKRG LQ f RI &KDWWHLMHH HWDO >&KD@ 6LPXODWLRQ UHVXOWV DOVR VKRZ WKH FRQYHUJHQFH ZKHQ LQVWDQWDQHRXV YDOXHV DUH XVHG IRU +[Y Qf +Y Qf DQG &Z Y Qf LH $FZf &^Z[YQf $ZZf +?^Y Qf fÂ§ +Y QfI^\ Qf +[YQf \[Qf[[Qf +Y Qf \Qf[Qf f IY Qf Y Q f D>\QfY Q f@ &Z? Y Qf \QQf\;Qf 6LPXODWLRQV 7ZR GLPHQVLRQDO ]HURPHDQ FRORUHG *DXVVLDQ VLJQDOV DUH JHQHUDWHG ZLWK VDPn SOHV HDFK 7DEOH FRPSDUHV WKH UHVXOWV RI WKH QXPHULFDO PHWKRG ZLWK WKRVH RI WKH SURn SRVHG DGDSWLYH PHWKRGV DIWHU RQOLQH LWHUDWLRQV ,Q ([SHULPHQW DOO WKH WHUPV LQ f DQG f DUH HVWLPDWHG RQOLQH E\ DQ H[SRQHQWLDO ZLQGRZ ZLWK D EXW LQ ([SHUn LPHQW DOO +[ + DQG & XVH LQVWDQWDQHRXV YDOXHV ZKLOH IZ[f DQG I^Yf UHPDLQ WKH VDPH $V DQ H[DPSOH )LJXUH Df VKRZV WKH DGDSWDWLRQ SURFHVV RI ([SHULPHQW )LJXUH Ef FRPSDUHV WKH FRQYHUJHQFH VSHHG EHWZHHQ WKH SURSRVHG PHWKRG DQG WKH PHWKRG LQ &KDWWHLMHH HWDO >&KD@ IRU WKH DGDSWDWLRQ RI Y LQ EDWFK PRGH ZKHQ Z 7KHUH DUH WULDOV HDFK ZLWK WKH VDPH LQLWLDO FRQGLWLRQf 7KH YHUWLFDO D[LV LV WKH PLQLPXP QXPEHU RI LWHUDWLRQV IRU FRQYHUJHQFH ZLWK WKH EHVW VWHS VL]H REWDLQHG E\ H[KDXVWLYH VHDUFKf &RQn YHUJHQFH LV FODLPHG ZKHQ WKH GLIIHUHQFH EHWZHHQ Yf DQG -Yf LV OHVV WKDQ IRU FRQVHFXWLYH LWHUDWLRQV )LJXUH Ff DQG Gf UHVSHFWLYHO\ VKRZ D W\SLFDO HYROXWLRQ RI -Yf PAGE 143 DQG & LQ RQH RI WKH WULDOV ZKHUH WKH HLJHQYDOXHV RI WKH OLQHDUL]DWLRQ PDWULFHV DUH fÂ§ M fÂ§ fÂ§ fÂ§ IRU $ RI WKH SURSRVHG PHWKRG DQG fÂ§ fÂ§ fÂ§ IRU % RI WKH PHWKRG LQ &KDWWHUMHH HWDO >&KD@ )LJXUH VKRZV WKH SURFHVV RI WKH EDWFK PRGH UXOH LQ f &RQFOXVLRQ DQG 'LVFXVVLRQ ,Q WKLV FKDSWHU WKH UHODWLRQVKLS EHWZHHQ WKH +HEELDQ UXOH DQG WKH HQHUJ\ RI WKH RXWSXW RI D OLQHDU WUDQVIRUP DQG WKH UHODWLRQVKLS EHWZHHQ WKH DQWL+HEELDQ UXOH DQG WKH FURVV FRUn UHODWLRQ RI WZR RXWSXWV FRQQHFWHG E\ D ODWHUDO LQKLELWLYH FRQQHFWLRQ DUH GLVFXVVHG :H FDQ VHH WKDW DQ HQHUJ\ TXDQWLW\ LV EDVHG RQ WKH UHODWLYH SRVLWLRQ RI HDFK VDPSOH WR WKH PHDQ RI DOO VDPSOHV 7KXV HDFK VDPSOH FDQ EH WUHDWHG LQGHSHQGHQWO\ DQG DQ RQOLQH DGDSWDWLRQ UXOH LV UHODWLYHO\ HDV\ WR GHULYH ZKLOH WKH LQIRUPDWLRQ SRWHQWLDO DQG WKH FURVV LQIRUPDWLRQ SRWHQWLDO DUH EDVHG RQ WKH UHODWLYH SRVLWLRQ RI HDFK SDLU RI GDWD VDPSOHV DQG DQ RQOLQH DGDSWDWLRQ UXOH IRU WKH LQIRUPDWLRQ SRWHQWLDO RU WKH FURVV LQIRUPDWLRQ SRWHQWLDO LV UHODWLYHO\ GLIILFXOW WR REWDLQ 7KH LQIRUPDWLRQWKHRUHWLF IRUPXODWLRQ DQG WKH IRUPXODWLRQ EDVHG RQ HQHUJ\ TXDQWLWLHV IRU WKH HLJHQGHFRPSRVLWLRQ DQG WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ DUH LQWURGXFHG 7KH HQHUJ\ EDVHG IRUPXODWLRQ FDQ EH UHJDUGHG DV D VSHFLDO FDVH RI WKH LQIRUPDWLRQWKHRUHWLF IRUPXODWLRQ ZKHQ GDWD DUH *DXVVLDQ GLVWULEXWHG %DVHG RQ WKH HQHUJ\ IRUPXODWLRQ IRU WKH HLJHQGHFRPSRVLWLRQ DQG WKH UHODWLRQVKLS EHWZHHQ WKH HQHUJ\ FULWHULD DQG WKH +HEELDQ DQG WKH DQWL+HEELDQ UXOHV ZH FDQ XQGHUn VWDQG 2MDfV UXOH 6DQJHUfV UXOH DQG WKH $3(; PRGHO LQ DQ LQWXLWLYH DQG HIIHFWLYH ZD\ 6WDUWLQJ IURP VXFK DQ XQGHUVWDQGLQJ ZH SURSRVH D VLPLODU VWUXFWXUH DV WKH $3(; PRGHO DQG DQ RQOLQH ORFDO DGDSWLYH DOJRULWKP IRU WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ 7KH VWD PAGE 144 ELOLW\ DQDO\VLV RI WKH SURSRVHG DOJRULWKP LV JLYHQ DQG WKH VLPXODWLRQ VKRZV WKH YDOLGLW\ DQG WKH HIILFLHQF\ RI WKH SURSRVHG DOJRULWKP %DVHG RQ WKH LQIRUPDWLRQWKHRUHWLF IRUPXODWLRQ ZH FDQ JHQHUDOL]H WKH FRQFHSW RI WKH HLJHQGHFRPSRVLWLRQ DQG WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ E\ XVLQJ WKH HQWURS\ GLIIHUn HQFH LQ )RU QRQ*DXVVLDQ GDWD DQG QRQOLQHDU PDSSLQJ WKH LQIRUPDWLRQ SRWHQWLDO FDQ EH XVHG WR LPSOHPHQW WKH HQWURS\ GLIIHUHQFH WR VHDUFK IRU DQ RSWLPDO PDSSLQJ VXFK WKDW WKH RXWSXW RI WKH PDSSLQJ ZLOO FRQYH\ WKH PRVW LQIRUPDWLRQ DERXW WKH ILUVW VLJQDO [[Qf ZKLOH LW ZLOO FRQWDLQ WKH OHDVW LQIRUPDWLRQ DERXW WKH VHFRQG VLJQDO [Qf DW WKH VDPH WLPH 7KLV FDQ EH UHJDUGHG DV D VSHFLDO FDVH RI WKH fLQIRUPDWLRQ ILOWHULQJf 7DEOH &203$5,621 2) 5(68/76 -YAf DQG -Yrf DUH WKH JHQHUDOL]HG HLJHQYDOXHV Yr[ DQG Yr; DUH FRUUHVSRQGLQJ QRUPDOL]HG HLJHQYHFWRUV 1XPHULFDO 0HWKRG ([SHULPHQW ([SHULPHQW YLLOf f f f f f PAGE 145 $GDSWDWLRQ 3URFHVV e R R &RPSDULVRQ RI &RQYHUJHQFH RQ 7ULDOV WKH PHWKRG LQ &KDWWHUMHH HWDO >&KDA@ 6 L f nO W L ,M OO LO W nL? L_ n6 S WKH SURSRVHG PHWKRG Ef WULDOV &RPSDULVRQ RI WKH (YROXWLRQ RI -Yf &RPSDULVRQ RI &URVV&RUUHODWLRQ & RI WKH PHWKRG LQ -Yf RI WKH PHWKRG LQ Inf V &KDWWHUMHH HWDO >&KD@ &KDWWHUMHH HWDO >&KD@ L V fÂ§ Y L X ? -YfRI WKH SURSRVHG PHWKRG & RI WKH SURSRVHG PHWKRG Ff LWHUDWLRQV Gf LWHUDWLRQV )LJXUH Df (YROXWLRQ RI -Ycf DQG -Yf LQ ([SHULPHQW Ef &RPSDULVRQ RI &RQn YHUJHQFH 6SHHG LQ WHUPV RI WKH PLQLPXP QXPEHU RI LWHUDWLRQV Ff 7\SLFDO DGDSWDWLRQ FXUYH RI -Yf RI WZR PHWKRGV ZKHQ LQLWLDO FRQGLWLRQ LV WKH VDPH DQG WKH EHVW VWHS VL]H LV XVHG Gf 7\SLFDO DGDSWDWLRQ FXUYH RI & LQ WKH VDPH WULDO DV Ff ,Q Ef Ff DQG Gf WKH VROLG OLQHV UHSUHVHQW WKH SURSRVHG PHWKRG ZKLOH WKH GDVKHG OLQHV UHSUHVHQW WKH PHWKRG LQ &KDW WHLMHH HWDO >&KD@ PAGE 146 )LJXUH 7KH (YROXWLRQ 3URFHVV RI WKH %DWFK 0RGH 5XOH PAGE 147 &+$37(5 $33/,&$7,216 $VSHFW $QJOH (VWLPDWLRQ IRU 6$5 ,PDJHU\ 3UREOHP 'HVFULSWLRQ 7KH UHODWLYH GLUHFWLRQ RI D YHKLFOH ZLWK UHVSHFW WR WKH UDGDU VHQVRU LQ 6$5 V\QWKHWLF DSHUWXUH UDGDUf LPDJHU\ LV QRUPDOO\ FDOOHG WKH DVSHFW DQJOH RI WKH REVHUYDWLRQ ZKLFK LV DQ LPSRUWDQW SLHFH RI LQIRUPDWLRQ IRU YHKLFOH UHFRJQLWLRQ )LJXUH VKRZV W\SLFDO 6$5 LPDJHV RI D WDQN RU PLOLWDU\ SHUVRQQHO FDUULHU ZLWK GLIIHUHQW DVSHFW DQJOHV n -W f IU r )LJXUH 6$5 ,PDJHV RI D 7DQN ZLWK 'LIIHUHQW $VSHFW $QJOHV PAGE 148 :H DUH JLYHQ VRPH WUDLQLQJ GDWD ERWK 6$5 LPDJHV DQG WKH FRUUHVSRQGLQJ WUXH DVSHFW DQJOHVf 7KH SUREOHP LV WR HVWLPDWH WKH DVSHFW DQJOH RI WKH YHKLFOH LQ D WHVWLQJ 6$5 LPDJH EDVHG RQ WKH LQIRUPDWLRQ JLYHQ LQ WKH WUDLQLQJ GDWD 7KLV LV D YHU\ W\SLFDO SUREOHP RI fOHDUQLQJ IURP H[DPSOHVf $V FDQ EH VHHQ IURP )LJXUH WKH SRRU UHVROXWLRQ RI 6$5 FRPELQHG ZLWK VSHFNOH DQG WKH YDULDELOLW\ RI VFDWWHULQJ FHQWHUV PDNHV WKH GHWHUPLQDWLRQ RI WKH DVSHFW DQJOH RI D YHKLFOH IURP LWV 6$5 LPDJH D QRQWULYLDO SUREOHP $OO WKH GDWD LQ WKH H[SHULPHQWV DUH IURP WKH 067$5 SXEOLF UHOHDVH GDWDEDVH >9HG@ 3UREOHP )RUPXODWLRQ /HWfV XVH ; WR GHQRWH D 6$5 LPDJH ,Q WKH 067$5 GDWDEDVH >9HG@ D WDUJHW FKLS LV XVXDOO\ E\ 6R ; FDQ XVXDOO\ EH UHJDUGHG DV D YHFWRU ZLWK GLPHQVLRQ [ 2U ZH FDQ MXVW XVH WKH FHQWHU UHJLRQ RI [ VLQFH D WDUn JHW LV ORFDWHG LQ WKH FHQWHU RI HDFK LPDJH LQ WKH 067$5 GDWDEDVH /HWfV XVH $ WR GHQRWH WKH DVSHFW DQJOH RI D WDUJHW 6$5 LPDJH 7KHQ WKH JLYHQ WUDLQLQJ GDWD VHW FDQ EH GHQRWHG E\ ^[ Df_L 1f WKH XSSHU FDVH ; DQG $ UHSUHVHQW UDQGRP YDULDEOHV DQG WKH ORZHU FDVH [ DQG D UHSUHVHQW WKHLU VDPSOHVf ,Q JHQHUDO IRU D JLYHQ LPDJH [ WKH DVSHFW DQJOH HVWLPDWLRQ SUREOHP FDQ EH IRUPXODWHG DV D PD[LPXP D SRVWHULRUL SUREDELOLW\ 0$3f SUREOHP D DUJPD[ I$O[[ Df D DUJ PD[ D DÂƒ[!Df I[L[f DUJ PD[ I$;[ Df D f ZKHUH D LV WKH HVWLPDWLRQ RI WKH WUXH DVSHFW DQJOH I$?[[ Df LV WKH D SRVWHULRUL SUREDn ELOLW\ GHQVLW\ IXQFWLRQ SGIf RI WKH DVSHFW DQJOH $ JLYHQ ; IA[f LV WKH SGI RI WKH LPDJH ; I$;^[ Df LV WKH MRLQW SGI RI LPDJH ; DQG DVSHFW DQJOH $ 6R WKH NH\ LVVXH KHUH LV WR PAGE 149 HVWLPDWH WKH MRLQW SGI I$;[ Df +RZHYHU WKH YHU\ KLJK GLPHQVLRQDOLW\ RI WKH LPDJH YDULn DEOH ; PDNH LW YHU\ GLIILFXOW WR REWDLQ D UHOLDEOH HVWLPDWLRQ 'LPHQVLRQDOLW\ UHGXFWLRQ RU IHDWXUH H[WUDFWLRQf EHFRPHV QHFHVVDU\ $Q fLQIRUPDWLRQ ILOWHUf \ T[ Zf ZKHUH Z LV WKH SDUDPHWHU VHWf LV QHHGHG VXFK WKDW ZKHQ DQ LPDJH [ LV WKH LQSXW LWV RXWSXW \ FDQ FRQn YH\ WKH PRVW LQIRUPDWLRQ DERXW WKH DVSHFW DQJOH DQG GLVFDUG DOO WKH RWKHU LUUHOHYDQW LQIRUn PDWLRQV 6XFK DQ RXWSXW LV WKH IHDWXUH IRU DVSHFW DQJOH %DVHG RQ WKLV IHDWXUH YDULDEOH < WKH DVSHFW DQJOH HVWLPDWLRQ SUREOHP FDQ EH UHIRUPXODWHG E\ WKH VDPH 0$3 VWUDWHJ\ D DUJPD[ I$<\Df \ J[Zf f D ZKHUH I$<\ Df LV WKH MRLQW SGI RI WKH IHDWXUH < DQG WKH DVSHFW DQJOH $ 7KH FUXFLDO SRLQW IRU WKLV DVSHFW DQJOH HVWLPDWLRQ VFKHPH LV KRZ JRRG WKH IHDWXUH < WXUQV RXW WR EH $FWXDOO\ WKH SUREOHP RI UHOLDEOH SGI HVWLPDWLRQ LQ D KLJK GLPHQVLRQDO VSDFH LV QRZ FRQYHUWHG WR WKH SUREOHP RI EXLOGLQJ D UHOLDEOH DVSHFW DQJOH fLQIRUPDWLRQ ILOn WHUf RQO\ RQ WKH JLYHQ WUDLQLQJ GDWD VHW 7R DFKLHYH WKLV JRDO WKH PXWXDO LQIRUPDWLRQ LV XVHG DQG WKH SUREOHP RI ILQGLQJ DQ RSWLPDO fLQIRUPDWLRQ ILOWHUf FDQ EH IRUPXODWHG DV ZRSWLPDO DUJPD[ ,< T;Zf$f f : WKDW LV WR ILQG WKH RSWLPDO SDUDPHWHU VHW ZRSWLPDO VXFK WKDW WKH PXWXDO LQIRUPDWLRQ EHWZHHQ WKH IHDWXUH < DQG WKH DQJOH $ LV PD[LPL]HG 7R LPSOHPHQW WKLV LGHD WKH TXDGUDWLF PXWXDO LQIRUPDWLRQ EDVHG RQ WKH (XFOLGHDQ GLVWDQFH ,(' DQG LWV FRUUHVSRQGLQJ FURVV LQIRUPDWLRQ SRWHQWLDO 9HG EHWZHHQ WKH IHDWXUH < DQG WKH DQJOH $ ZLOO EH XVHG 7KHUH ZLOO EH QR DVVXPSWLRQ PDGH RQ HLWKHU WKH GDWD RU WKH fLQIRUPDWLRQ ILOWHUf 7KH RQO\ WKLQJ XVHG KHUH ZLOO EH WKH WUDLQLQJ GDWD VHW LWVHOI ,Q WKH H[SHULPHQWV LW LV IRXQG WKDW D OLQHDU PDSSLQJ ZLWK PAGE 150 f 7 WZR RXWSXWV LV JRRG HQRXJK IRU WKH DVSHFW DQJOH LQIRUPDWLRQ ILOWHU f ff 7KH V\VWHP GLDJUDP LV VKRZQ EHOORZ )LJXUH 6\VWHP 'LDJUDP IRU $VSHFW $QJOH ,QIRUPDWLRQ )LOWHU 2QH PD\ QRWLFH WKDW WKH MRLQW SGI I$<\ Df LV WKH QDWXUDO fE\SURGXFWf RI WKLV VFKHPH 5HFDOO WKDW WKH FURVV LQIRUPDWLRQ SRWHQWLDO LV EDVHG RQ WKH 3DU]HQ ZLQGRZ HVWLPDWLRQ RI WKH MRLQW SGI I$ <\ Df 6R WKHUH LV QR QHHG WR IXUWKHU HVWLPDWH WKH MRLQW SGI I$ <\ Df E\ DQ\ RWKHU PHWKRG 6LQFH WKH DQJOH YDULDEOH $ LV D SHULRGLF RQH HJ VKRXOG EH WKH VDPH DV DOO WKH DQJOHV DUH SXW LQ WKH XQLW FLUFOH LH WKH IROORZLQJ WUDQVIRUPDWLRQ LV XVHG I$ M FRV\If >$ VLQ $f f 6R WKH DFWXDO DQJOH YDULDEOH XVHG LV $ $ M $f D WZR GLPHQVLRQDO YDULDEOH PAGE 151 ,Q WKH H[SHULPHQW LW LV DOVR IRXQG WKDW WKH GLVFULPLQDWLRQ EHWZHHQ WZR DQJOHV ZLWK GHJUHHV GLIIHUHQFH LV YHU\ GLIILFXOW $FWXDOO\ LW FDQ EH VHHQ IURP )LJXUH WKDW LW LV GLIILn FXOW WR WHOO ZKHUH LV WKH IURQW DQG ZKHUH LV WKH EDFN RI D YHKLFOH DOWKRXJK WKH RYHUDOO GLUHFn WLRQ RI WKH YHKLFOH LV FOHDU WR RXU H\HV 0RVW RI WKH H[SHULPHQWV DUH MXVW WR HVWLPDWH WKH DQJOH ZLWKLQ GHJUHHV HJ GHJUHH ZLOO EH WUHDWHG DV GHJUHH $FWXn DOO\ WKH IROORZLQJ WUDQVIRUPDWLRQ LV XVHG LQ WKLV FDVH Â$c FRV $f >$ VLQLf ,Q WKLV FDVH WKH DFWXDO DQJOH YDULDEOH LV $ $O$f &RUUHVSRQGLQJO\ WKH HVWLPDWHG DQJOHV ZLOO EH GLYLGHG E\ 1 A r 6LQFH WKH MRLQW SGI I$<\! Df 7U *\fÂ§\Â RYf*D fÂ§ Dc -Df ZKHUH R` LV WKH YDUL L L DQHH IRU WKH *DXVVLDQ .HUQHO IRU WKH IHDWXUH < DD LV WKH YDULDQFH IRU WKH *DXVVLDQ .HUQHO IRU WKH DFWXDO DQJOH $ DQG DOO WKH DQJOH GDWD DL DUH LQ WKH XQLW FLUFOH WKH VHDUFK IRU WKH RSWLPDO DQJOH D DUJPD[ I$<\ Df \ T[ Zf FDQ EH LPSOHPHQWHG E\ VFDQQLQJ D WKH XQLW FLUFOH LQ $\$f SODQH 7KHQ WKH UHDO HVWLPDWHG DQJOH FDQ EH D IRU WKH FDVH ZLWKRXW GHJUHH GLIIHUHQFH ([SHULPHQWV RI $VSHFW $QJOH (VWLPDWLRQ 7KHUH DUH WKUHH FODVVHV RI YHKLFOHV ZLWK VRPH GLIIHUHQW FRQILJXUDWLRQV 7RWDOO\ WKHUH DUH GLIIHUHQW YHKLFOH W\SHV 7KH\ DUH %03B& %03B %03B %75B& 7B 7B6 7R XVH WKH ('&,3 WR LPSOHPHQW WKH PXWXDO LQIRUPDWLRQ WKH NHUQHO VL]H M\ DQG RD KDYH WR EH GHWHUPLQHG 7KH H[SHULPHQWV VKRZ WKDW WKH WUDLQLQJ SURFHVV DQG WKH SHUIRU PAGE 152 PDQHH DUH QRW VHQVLWLYH WR WKHP 7KH W\SLFDO YDOXHV DUH D DQG D 7KHUH ZLOO EH QR ELJ SHUIRUPDQFH GLIIHUHQFH LI R\ RU D RU RD LV XVHG fÂ§ 7KH VWHS VL]H LV XVXDOO\ DURXQG [ ,W FDQ EH DGMXVWHG DFFRUGLQJ WR WKH WUDLQLQJ SURFHVV 2XWSXW GDWD DQJOH IHDWXUHf GLVWULEXWLRQ HVWLPDWHG DQJOH DQG WUXH YDOXH 'LDPRQGfÂ§WUDLQLQJ GDWD 7ULDQJOHfÂ§WHVWLQJ GDWD VROLG OLQHf )LJXUH 7UDLQLQJ %03B& GHJUHHf 7HVWLQJ %03B& GHJUHHf (UURU 0HDQ GHJUHHf (UURU 'HYLDWLRQ GHJUHHf )LJXUH VKRZV D W\SLFDO UHVXOW 7KH WUDLQLQJ GDWD DUH FKRVHQ IURP %03B& ZLWKLQ WKH DQJOH UDQJH IURP WR GHJUHHV WRWDOO\ LPDJHV DQG WKHLU FRUUHVSRQGLQJ DQJOHV ZLWK DQ DSSUR[LPDWH GHJUHHV GLIIHUHQFH EHWZHHQ HDFK QHLJKERULQJ DQJOH SDLU 7KH WHVWLQJ GDWD DUH IURP WKH VDPH YHKLFOH LQ WKH VDPH GHJUHH UDQJH EXW QRW LQFOXGHG LQ WKH WUDLQLQJ GDWD VHW 7KH OHIW JUDSK VKRZV WKH RXWSXW GDWD GLVWULEXWLRQ IRU ERWK WUDLQLQJ DQG WHVWLQJ GDWD ,W FDQ EH VHHQ WKDW WKH WUDLQLQJ GDWD IRUP D FLUFOH WKH EHVW ZD\ WR UHSUHVHQW DQJOHV 7KH WHVWLQJ LPDJHV DUH ILUVW IHG LQWR WKH LQIRUPDWLRQ ILOWHU WR REWDLQ WKH IHDWXUHV 7KH WULDQJOHV LQ WKH OHIW JUDSK RI )LJXUH LQGLFDWH WKHVH IHDWXUHV 7KH DVSHFW DQJOHV DUH WKHQ HVWLPDWHG DFFRUGLQJ WR WKH PHWKRG GHVFULEHG DERYH 7KH ULJKW JUDSK VKRZV PAGE 153 WKH FRPSDULVRQ EHWZHHQ WKH HVWLPDWHG DQJOHV WKH GRWV LQGLFDWHG E\ [f DQG WKH WUXH YDOXH VROLG OLQHf WKH WHVWLQJ LPDJH DUH VRUWHG DFFRUGLQJ WR WKHLU WUXH DVSHFW DQJOHVf 2XWSXW GDWD DQJOH IHDWXUHf GLVWULEXWLRQ HVWLPDWHG DQJOH DQG WUXH YDOXH 'LDPRQGWUDLQLQJ GDWD 7ULDQJOHWHVWLQJ GDWD VROLG OLQHf )LJXUH 7UDLQLQJ %03B& GHJUHHf 7HVWLQJ %03B& GHJUHHf (UURU 0HDQ GHJUHHf (UURU 'HYLDWLRQ GHJUHHf 2XWSXW GDWD DQJOH IHDWXUHf GLVWULEXWLRQ HVWLPDWHG DQJOH DQG WUXH YDOXH 'LDPRQGWUDLQLQJ GDWD 7ULDQJOHWHVWLQJ GDWD VROLG OLQHf GLIIHUHQFH LV LJQRUHGf )LJXUH 7UDLQLQJ %03B& GHJUHHf 7HVWLQJ 7B6 GHJUHHf (UURU 0HDQ GHJUHHf (UURU 'HYLDWLRQ GHJUHHf PAGE 154 )LJXUH VKRZV WKH UHVXOW RI WKH WUDLQLQJ RQ WKH VDPH %03B& YHKLFOH EXW WKH DQJOH UDQJH LV IURP WR GHJUHH 7HVWLQJ LV GRQH RQ WKH VDPH %03B& ZLWKLQ WKH VDPH DQJOH UDQJH WR f EXW DOO WKH WHVWLQJ GDWD DUH QRW LQFOXGHG LQ WKH WUDLQLQJ GDWD VHW $V FDQ EH VHHQ WKH UHVXOWV EHFRPH ZRUVH GXH WR WKH GLIILFXOW\ RI WHOOLQJ WKH GLIIHUHQFH EHWZHHQ WZR LPDJHV ZLWK GHJUHH DQJOH GLIIHUHQFH 7KH ILJXUH DOVR VKRZV WKDW WKH PDMRU HUURU RFFXUV ZKHQ GHJUHH GLIIHUHQFH FDQ QRW EH FRUUHFWO\ UHFRJQL]HG 7KH ELJ HUURUV LQ WKH ILJXUH DUH DERXW GHJUHHf )LJXUH VKRZV WKH UHVXOW RI WUDLQLQJ RQ WKH SHUVRQQHO FDUULHU %03B& ZLWKLQ WKH UDQJH RI GHJUHH EXW WHVWLQJ RQ WKH WDQN 7B6 ZLWKLQ WKH VDPH UDQJH GHJUHHf 7KH WDQN LV TXLWH GLIIHUHQW IURP WKH SHUVRQQHO FDUULHU EHFDXVH WKH WDQN KDV D FDQQRQ EXW WKH FDUULHU KDVQfW 7KH JRRG UHVXOW LQGLFDWH WKH UREXVWQHVV DQG WKH JRRG JHQHUDOL]DWLRQ DELOLW\ RI WKH PHWKRG 7KH IROORZLQJ WZR H[SHULPHQWV ZLOO IXUWKHU JLYH XV DQ RYHUDOO LGHD RQ WKH SHUIRUPDQFH RI WKH PHWKRG DQG WKH\ IXUWKHU FRQILUP WKH UREXVWQHVV DQG WKH JRRG JHQHUDOLn ]DWLRQ DELOLW\ RI WKH PHWKRG ,QVSLUHG E\ WKH UHVXOW RI WKH PHWKRG ZH DSSO\ WKH WUDGLWLRQDO 06( FULWHULRQ E\ SXWWLQJ WKH GHVLUHG DQJOHV LQ WKH XQLW FLUFOH LQ WKH VDPH ZD\ DV WKH DERYH 7KH UHVXOWV DUH VKRZQ EHOORZ IURP ZKLFK ZH FDQ VHH WKDW ERWK PHWKRGV KDYH D FRPSDWLEOH SHUIRUPDQFH EXW ('&,3 PHWKRG FRQYHUJHV IDVWHU WKDQ WKH 06( PHWKRG ,Q WKH H[SHULPHQW WKH WUDLQLQJ LV EDVHG RQ LPDJHV IURP %03B& ZLWKLQ WKH UDQJH RI GHJUHHV 7KH UHVXOWV DUH VKRZQ LQ 7DEOH 7KH WHVWLQJ VHW fEPSBFBWOf PHDQV WKH YHKLFOH EPSBF ZLWKLQ WKH UDQJH RI GHJUHH EXW QRW LQFOXGHG LQ WKH WUDLQLQJ GDWD VHW WKH VHW fEPSBF Wf PHDQV WKH YHKLFOH EPSBF ZLWKLQ WKH UDQJH RI GHJUHH EXW WKH GHJUHH GLIIHUHQFH LV LJQRUHG LQ WKH HVWLPDWLRQ WKH VHW fWBBWUf PHDQV WKH YHKLFOH W ZKLFK ZLOO EH XVHG IRU WUDLQLQJ LQ WKH H[SHULPHQW PAGE 155 WKH VHW fWBBWHf PHDQV WKH YHKLFOH W EXW QRW LQFOXGHG LQ WKH VHW fWBBWUf 7DEOH 7KH 5HVXOW RI ([SHULPHQW 7UDLQLQJ RQ EPSBFBWU LPDJHVf f 9HKLFOH 5HVXOWV ('&,3f HUURU PHDQ HUURU GHYLDWLRQf 5HVXOWV 06(f HUURU PHDQ HUURU GHYLDWLRQf EPSBF WU f H Hf EPSBFBWO f f EPSBFBW f f W WU f f W WH f f EPSB f f EPSB f f EWUBF f f WBV f f 7DEOH 7KH 5HVXOW RI ([SHULPHQW 7UDLQLQJ RQ EPSBFBWU DQG W WU f 9HKLFOH 5HVXOWV ('&,3f HUURU PHDQ HUURU GHYLDWLRQf 5HVXOWV 06(f HUURU PHDQ HUURU GHYLDWLRQf EPSBFBWU f f EPSBF WH f f W BWU f f WBBWH f f EPSB f f EPSB f f EWUBF f f WBV f f PAGE 156 ,Q ([SHULPHQW WUDLQLQJ LV EDVHG WKH GDWD VHW fEPSBF OBWUf DQG WKH GDWD VHW fWBBWUf 7KH H[SHULPHQWDO UHVXOWV DUH VKRZQ LQ 7DEOH IURP ZKLFK ZH FDQ VHH WKH LPSURYHPHQW RI WKH SHUIRUPDQFH ZKHQ PRUH YHKLFOHV DQG PRUH GDWD DUH LQFOXGHG LQ WKH WUDLQLQJ SURFHVV 0RUH H[SHULPHQWDO UHVXOWV FDQ EH IRXQG LQ WKH SDSHU >;X'@ DQG WKH UHSRUWV RI WKH '$53$ SURMHFW RQ ,PDJH 8QGHUVWDQGLQJ WKH UHSRUWV FDQ EH IRXQG LQ WKH ZHE VLWH fKWWS ZZZFQHOXIOHGXaDWUf )URP WKH H[SHULPHQW UHVXOWV ZH FDQ VHH WKDW WKH HUURU PHDQ LV DURXQG GHJUHH 7KLV LV UHDVRQDEOH EHFDXVH WKH DQJOHV RI WKH WUDLQLQJ GDWD DUH DSSUR[Ln PDWHO\ GHJUHHV DSDUW EHWZHHQ WKH QHLJKERULQJ DQJOHV 2XWSXW GDWD DQJOH IHDWXUHf GLVWULEXWLRQ 'LDPRQGWUDLQLQJ GDWD 7ULDQJOHWHVWLQJ GDWD HVWLPDWHG DQJOH DQG WUXH YDOXH VROLG OLQHf )LJXUH 2FFOXVLRQ 7HVW ZLWK %DFNJURXQG 1RLVH 7KH LPDJHV FRUUHVSRQGLQJ WR Df Ef Ff Gf Hf DQG If DUH VKRZQ LQ )LJXUH PAGE 157 )LJXUH 7KH RFFOXGHG LPDJHV FRUUHVSRQGLQJ WR WKH SRLQWV LQ )LJXUH PAGE 158 2FFOXVLRQ 7HVW RQ $VSHFW $QJOH (VWLPDWLRQ 7R IXUWKHU WHVW WKH UREXVWQHVV DQG WKH JHQHUDOL]DWLRQ DELOLW\ RI WKH PHWKRG RFFOXVLRQ WHVWV DUH FRQGXFWHG ZKHUH WKH WHVWLQJ LQSXW 6$5 LPDJHV DUH FRQWDPLQDWHG E\ EDFNJURXQG QRLVH RU WKH YHKLFOH LPDJH LV RFFOXGHG E\ WKH 6$5 LPDJH RI WUHHV )LJXUH VKRZV WKH UHVXOW RI f2FFOXVLRQ 7HVWf ZKHUH D VTXDUHG ZLQGRZ ZLWK EDFNn JURXQG QRLVH HQODUJHV JUDGXDOO\ XQWLO DOO WKH LPDJH LV RFFOXGHG DQG UHSODFHG E\ WKH EDFNn JURXQG QRLVH DV VKRZQ LQ )LJXUH DQG )LJXUH )LJXUH VKRZV WKH RFFOXGHG LPDJHV FRUUHVSRQGLQJ WR WKH SRLQWV LQ )LJXUH :H FDQ VHH WKDW HYHQ ZKHQ WKH PRVW SDUW RI WKH WDUJHW LV RFFOXGHG WKH HVWLPDWLRQ LV VWLOO JRRG ZKLFK VLPSO\ YHULILHV WKH UREXVWQHVV DQG WKH JHQHUDOL]DWLRQ DELOLW\ RI WKH PHWKRG :KHQ WKH RFFOXGLQJ VTXDUH HQODUJHV WKH RXWn SXW SRLQW IHDWXUH SRLQWf JRHV DZD\ IURP WKH FLUFOH EXW WKH GLUHFWLRQ LV HVVHQWLDOO\ SHUSHQn GLFXODU WR WKH FLUFOH ZKLFK PHDQV WKH QHDUHVW SRLQW LQ WKH FLUFOH LV HVVHQWLDOO\ XQFKDQJHG DQG WKH HVWLPDWLRQ RI WKH DQJOH EDVLFDOO\ UHPDLQV WKH VDPH )LJXUH 6$5 ,PDJH RI 7UHHV 7KH VTXDUHG UHJLRQ ZDV FXW IRU WKH RFFOXVLRQ SXUSRVH PAGE 159 )LJXUH LV D 6$5 LPDJH RI WUHHV 2QH UHJLRQ ZDV FXW WR RFFOXGH WKH WDUJHW LPDJHV WR VHH KRZ UREXVW WKH PHWKRG LV DQG KRZ JRRG WKH JHQHUDOL]DWLRQ FDQ EH PDGH E\ WKH PHWKRG $V VKRZQ LQ )LJXUH DQG )LJXUH WKH FXW UHJLRQ RI WUHHV LV VOLG RYHU WKH WDUJHW LPDJH IURP WKH ORZHU ULJKW FRPHU WR WKH XSSHU OHIW FRPHU 7KH RFFOXVLRQ LV PDGH E\ DYHUn DJLQJ WKH RYHUODSSHG WDUJHW SL[HOV DQG WUHH SL[HOV )LJXUH VKRZV WZR SDUWLFXODU RFFOXn VLRQV LQ WKH ULJKW RQH RI ZKLFK WKH PRVW SDUW RI WKH WDUJHW LV RFFOXGHG EXW WKH HVWLPDWLRQ LV VWLOO JRRG )LJXUH VKRZV WKH RYHUDOO UHVXOWV ZKHQ VOLGLQJ WKH RFFOXVLRQ VTXDUH UHJLRQ 2QH PD\ QRWLFH WKDW WKH UHVXOW JHWV EHWWHU ZKHQ WKH ZKROH LPDJH LV RYHUODSSHG E\ WKH WUHH LPDJH 7KH H[SODQDWLRQ LV WKDW WKH RFFOXVLRQ LV WKH DYHUDJH RI ERWK WKH WDUJHW SL[HOV DQG WKH WUHH SL[HOV LQ WKLV FDVH DQG WKH FHQWHU UHJLRQ RI WKH WUHH LPDJH KDV VPDOO SL[HO YDOXHV ZKLOH WKH FHQWHU UHJLRQ RI WKH WDUJHW LPDJH KDV ODUJH SL[HO YDOXHV WKHUHIRUH ZKHQ WKH ZKROH WDUn JHW LPDJH LV RYHUODSSHG E\ WKH WUHH LPDJH WKH RFFOXVLRQ RI WKH WDUJHW WKH FHQWHU UHJLRQ RI WKH WDUJHW LPDJHf EHFRPHV HYHQ OLJKWHU 2XWSXW GDWD DQJOH IHDWXUHf GLVWULEXWLRQ 'LDPRQGfÂ§WUDLQLQJ GDWD 7ULDQJOHfÂ§WHVWLQJ GDWD HVWLPDWHG DQJOH DQG WUXH YDOXH VROLG OLQHf )LJXUH 2FFOXVLRQ 7HVW ZLWK 6$5 ,PDJH RI 7UHHV 7KH LPDJHV FRUUHVSRQGLQJ WR WKH SRLQWV Df DQG Ef DUH VKRZQ LQ )LJXUH 7KH LPDJHV FRUUHVSRQGLQJ WR WKH SRLQWV Ff DQG Gf DUH VKRZQ LQ )LJXUH PAGE 160 (VWLPDWHG $QJOH RR (VWLPDWHG $QJOH L L n L fÂ§LfÂ§LfÂ§LfÂ§@fÂ§LfÂ§LfÂ§U )LJXUH 2FFOXVLRQ ZLWK 6$5 ,PDJH RI 7UHHV 2XWSXW GDWD GLVWULEXWLRQ 'LDPRQG WUDLQLQJ GDWD 7ULDQJOH WHVWLQJ GDWDf 8SSHU ,PDJHV DUH RFFOXGHG LPDJHV /RZHU ,PDJHV VKRZ WKH RFFOXGHG UHJLRQV 7KH WUXH DQJOH LV fÂ§ (VWLPDWHG $QJOH (VWLPDWHG $QJOH )LJXUH 2FFOXVLRQ ZLWK 6$5 ,PDJH RI 7UHHV 2XWSXW GDWD GLVWULEXWLRQ 'LDPRQG WUDLQLQJ GDWD 7ULDQJOH WHVWLQJ GDWDf 8SSHU ,PDJHV DUH RFFOXGHG LPDJHV /RZHU ,PDJHV VKRZ WKH RFFOXGHG UHJLRQV 7KH WUXH DQJOH LV PAGE 161 $XWRPDWLF 7DUJHW 5HFRJQLWLRQ L$75W ,Q WKLV VHFWLRQ ZH ZLOO VHH KRZ LPSRUWDQW WKH PXWXDO LQIRUPDWLRQ ZLOO EH IRU WKH SHUn IRUPDQFH RI SDWWHUQ UHFRJQLWLRQ DQG KRZ WKH FURVV LQIRUPDWLRQ SRWHQWLDO FDQ EH DSSOLHG WR DXWRPDWLF WDUJHW UHFRJQLWLRQ RI 6$5 ,PDJHU\ )LUVW OHWfV ORRN DW WKH ORZHU ERXQG RI UHFRJQLWLRQ HUURU VSHFLILHG E\ )DQRfV LQHTXDOLW\ >)LV@ 3Fr Ff +V^F?\f? ORJFff f ZKHUH F LV D YDULDEOH IRU WKH LGHQWLW\ RI FODVVHV \ LV D IHDWXUH YDULDEOH EDVHG RQ ZKLFK D FODVVLILFDWLRQ ZLOO EH FRQGXFWHG Ff GHQRWHV WKH QXPEHU RI FODVVHV +VF?\f LV 6KDQn QRQfV FRQGLWLRQDO HQWURS\ RI F JLYHQ \ )DQRfV LQHTXDOLW\ PHDQV WKH FODVVLILFDWLRQ HUURU LV ORZHU ERXQGHG E\ WKH TXDQWLW\ ZKLFK LV GHWHUPLQHG E\ WKH FRQGLWLRQDO HQWURS\ RI WKH FODVV LGHQWLW\ JLYHQ WKH UHFRJQLWLRQ IHDWXUH \ %\ D VLPSOH PDQLSXODWLRQ ZH JHW +VFffÂ§,F\ffÂ§ 3F rFf ORJFff f ZKLFK PHDQV WKDW WR PLQLPL]H WKH ORZHU ERXQG RI WKH HUURU SUREDELOLW\ WKH PXWXDO LQIRUn PDWLRQ EHWZHHQ WKH FODVV LGHQWLW\ F DQG WKH IHDWXUH \ VKRXOG EH PD[LPL]HG 3UREOHP 'HVFULSWLRQ DQG )RUPXODWLRQ /HWfV XVH ; WR GHQRWH WKH YDULDEOH IRU WDUJHW LPDJHV DQG & WR GHQRWH WKH YDULDEOH IRU WKH FODVV LGHQWLW\ :H DUH JLYHQ D VHW RI WUDLQLQJ LPDJHV DQG WKHLU FRUUHVSRQGLQJ FODVV LGHQn WLWLHV ^[c FIf_ 1` $ FODVVLILHU QHHG WR EH HVWDEOLVKHG EDVHG RQO\ RQ WKLV WUDLQLQJ GDWD VHW VXFK WKDW ZKHQ JLYHQ D WDUJHW LPDJH [ LW FDQ FODVVLI\ WKH LPDJH $JDLQ WKH SUREn OHP FDQ EH IRUPXODWHG DV D 0$3 SUREOHP PAGE 162 F DUJPD[ 3FAF?[f DUJPD[ IF[[Ff f & & ZKHUH 3F?[LF _[f LV WKH D SRVWHULRUL SUREDELOLW\ RI WKH FODVV LGHQWLW\ & JLYHQ WKH LPDJH ; IFÂƒ[! Ff nV WKH MRLQW SGI RI LPDJH ; DQG WKH FODVV LGHQWLW\ & 6R VLPLODUO\ WKH NH\ LVVXH KHUH LV WR HVWLPDWH WKH MRLQW SGI IF[L[! Ff f +RZHYHU WKH YHU\ KLJK GLPHQVLRQDOLW\ RI WKH LPDJH YDULDEOH ; PDNH LW YHU\ GLIILFXOW WR REWDLQ D UHOLDEOH HVWLPDWLRQ 'LPHQVLRQDOLW\ UHGXFWLRQ RU IHDWXUH H[WUDFWLRQf DJDLQ LV QHFHVVDU\ $Q fLQIRUPDWLRQ ILOWHUf \ T[ Zf ZKHUH Z LV SDUDPHWHU VHWf LV QHHGHG VXFK WKDW ZKHQ DQ LPDJH [ LV LWV LQSXW LWV RXWSXW \ FDQ FRQYH\ WKH PRVW LQIRUPDWLRQ DERXW WKH FODVV LGHQWLW\ DQG GLVFDUG DOO WKH RWKHU LUUHOHn YDQW LQIRUPDWLRQV 6XFK DQ RXWSXW LV WKH IHDWXUH IRU FODVVLILFDWLRQ %DVHG RQ WKH FODVVLILFDn WLRQ IHDWXUH \ WKH FODVVLILFDWLRQ SUREOHP FDQ EH UHIRUPXODWHG E\ WKH VDPH 0$3 VWUDWHJ\ F DUJ PD[I&<\Ff \ T[Zf f & ZKHUH IF\\ Ff LV WKH MRLQW SGI RI WKH FODVVLILFDWLRQ IHDWXUH DQG WKH FODVV LGHQWLW\ & 6LPLODU WR WKH DVSHFW DQJOH HVWLPDWLRQ SUREOHP WKH FUXFLDO SRLQW IRU WKLV FODVVLILFDWLRQ VFKHPH LV KRZ JRRG WKH FODVVLILFDWLRQ IHDWXUH < LV $FWXDOO\ WKH SUREOHP RI UHOLDEOH SGI HVWLPDWLRQ LQ D KLJK GLPHQVLRQDO VSDFH LV QRZ FRQYHUWHG WR WKH SUREOHP RI EXLOGLQJ D UHOLn DEOH fLQIRUPDWLRQ ILOWHUf IRU FODVVLILFDWLRQ EDVHG RQO\ RQ WKH JLYHQ WUDLQLQJ GDWD VHW 7R DFKLHYH WKLV JRDO WKH LQIRUPDWLRQ PHDVXUH RI WKH PXWXDO LQIRUPDWLRQ LV XVHG DV DOVR VXJn JHVWHG E\ )DQRfV LQHTXDOLW\ DQG WKH SUREOHP RI ILQGLQJ DQ RSWLPDO fLQIRUPDWLRQ ILOWHUf FDQ EH IRUPXODWHG DV ZRSWLPDO DUJ PD[ T;Zf&f f : WKDW LV WR ILQG WKH RSWLPDO SDUDPHWHU VHW ZRSWLPDO VXFK WKDW WKH PXWXDO LQIRUPDWLRQ EHWZHHQ WKH FODVVLILFDWLRQ IHDWXUH DQG WKH FODVV LGHQWLW\ & LV PD[LPL]HG 7R LPSOHPHQW WKLV LGHD PAGE 163 WKH TXDGUDWLF PXWXDO LQIRUPDWLRQ EDVHG RQ (XFOLGHDQ GLVWDQFH ,(' DQG LWV FRUUHVSRQGLQJ FURVV LQIRUPDWLRQ SRWHQWLDO 9(' ZLOO EH XVHG DJDLQ 7KHUH ZLOO EH QR DVVXPSWLRQ PDGH RQ HLWKHU WKH GDWD RU WKH fLQIRUPDWLRQ ILOWHUf 7KH RQO\ WKLQJ XVHG KHUH ZLOO EH WKH WUDLQLQJ GDWD VHW LWVHOI ,Q WKH H[SHULPHQWV LW LV IRXQG WKDW D OLQHDU PDSSLQJ ZLWK RXWSXWV IRU WKH FODVVHV LV JRRG HQRXJK IRU WKH FODVVLILFDWLRQ RI VXFK KLJK GLPHQVLRQDO LPDJHV E\ f 7KH V\VWHP GLDJUDP LV VKRZQ LQ )LJXUH &ODVV ,GHQWLW\ & 2 $ Â’ ,PDJH ; ? &URVV ,QIRUPDWLRQ 3RWHQWLDO )LHOG ,QIRUPDGR@ )RUFH %DFN3URSDJDWLRQ )LJXUH 6\VWHP 'LDJUDP IRU &ODVVLILFDWLRQ ,QIRUPDWLRQ )LOWHU 7KH MRLQW SGI I& PAGE 164 V FFcf F & RWKHUZLVH f 6R WKHUH LV QR QHHG WR HVWLPDWH WKH MRLQW SGI I&<\ Ff DJDLQ E\ DQ\ RWKHU PHWKRG 7KH ('40, LQIRUPDWLRQ IRUFH LQ WKLV SDUWLFXODU FDVH FDQ EH LQWHUSUHWHG DV UHSXOVLRQ DPRQJ WKH fLQIRUPDWLRQ SDUWLFOHVf ,37Vf ZLWK GLIIHUHQW FODVV LGHQWLW\ DQG DWWUDFWLRQ ZLWK HDFK RWKHU DPRQJ WKH ,37V ZLWKLQ WKH VDPH FODVV %DVHG RQ WKH MRLQW SGI I&<^\ Ff WKH %D\HV FODVVLILHU FDQ EH EXLOW XS F DUJ PD[I&<\Ff \ T[Zf f 6LQFH WKH FODVV LGHQWLW\ YDULDEOH & LV GLVFUHWH WKH VHDUFK IRU PD[LPXP LQ f FDQ EH VLPSO\ LPSOHPHQWHG E\ FRPSDULQJ HDFK YDOXH RI I&<\ Ff ([SHULPHQW DQG 5HVXOW 7KH H[SHULPHQW LV FRQGXFWHG RQ 067$5 GDWDEDVH >9HG@ 7KHUH DUH WKUHH FODVVHV YHKLFOHVf %03 %75 DQG 7 )RU HDFK RQH WKHUH DUH VRPH GLIIHUHQW FRQILJXUDWLRQV VXEFODVVHVf DV VKRZQ EHOORZ 7KHUH DUH DOVR W\SHV RI FRQIXVHU %03 %03B& %03B %03B %75 %75B& 7 7B 7B6 7B &RQIXVHU 6 7KH WUDLQLQJ GDWD VHW LV FRPSRVHG RI W\SHV RI YHKLFOH %03B& %75B& DQG 7B ZLWK GHSUHVVLRQ DQJOH GHJUHH $OO WKH WHVWLQJ GDWD KDYH GHJUHH GHSUHVVLRQ DQJOH 7KH FODVVLILHU LV EXLOW ZLWKLQ WKH UDQJH RI GHJUHH DVSHFW DQJOH 7KH ILQDO JRDO LV WR FRPELQH WKH UHVXOW RI DVSHFW DQJOH HVWLPDWLRQ ZLWK WKH WDUJHW UHFRJQLWLRQ VXFK WKDW ZLWK PAGE 165 WKH DVSHFW DQJOH LQIRUPDWLRQ WKH GLIILFXOW RYHUDOO UHFRJQLWLRQ WDVN ZLWK DOO DVSHFW DQJOHVf FDQ EH GLYLGHG DQG FRQTXHUHG 6LQFH D 6$5 LPDJH RI D WDUJHW LV EDVHG RQ WKH UHIOHFWLRQ RI WKH WDUJHW GLIIHUHQW DVSHFW DQJOHV PD\ UHVXOW LQ TXLWH GLIIHUHQW FKDUDFWHULVWLFV IRU 6$5 LPDJHU\ 6R RUJDQL]LQJ FODVVLILHUV DFFRUGLQJ WR DVSHFW DQJOH LQIRUPDWLRQ LV D JRRG 6WUDWn HJ\ )LJXUH VKRZV WKH LPDJHV IRU WUDLQLQJ 7KH FODVVLILFDWLRQ IHDWXUH H[WUDFWRU KDV WKUHH RXWSXWV )RU WKH LOOXVWUDWLRQ SXUSRVH RXWSXWV DUH XVHG LQ )LJXUH )LJXUH DQG )LJXUH WR VKRZ WKH RXWSXW GDWD GLVWULEXWLRQ )LJXUH VKRZV WKH LQLWLDO VWDWH ZLWK FODVVHV PL[HG XS )LJXUH VKRZV WKH UHVXOW DIWHU VHYHUDO LWHUDWLRQV ZKHUH WKH FODVVHV DUH VWDUWLQJ WR VHSDUDWH )LJXUH VKRZV WKH RXWSXW GDWD GLVWULEXWLRQ DW WKH ILQDO VWDJH RI WKH WUDLQLQJ ZKHUH FODVVHV DUH FOHDUO\ VHSDUDWHG DQG HDFK FODVV WHQGV WR VKULQN WR RQH SRLQW )LJXUH 7KH 6$5 ,PDJHV RI 7KUHH 9HKLFOHV IRU 7UDLQLQJ &ODVVLILHU GHJUHHf PAGE 166 )LJXUH ,QLWLDO 2XWSXW 'DWD 'LVWULEXWLRQ IRU &ODVVLILFDWLRQ /HIW JUDSK OLQHV DUH LOOXVWUDWLRQ RI fLQIRUPDWLRQ IRUFHVf 5LJKW JUDSK GHWDLOHG GLVWULEXWLRQ )LJXUH ,QWHUPHGLDWH 2XWSXW 'DWD 'LVWULEXWLRQ IRU &ODVVLILFDWLRQ /HIW JUDSK OLQHV DUH LOOXVWUDWLRQ RI fLQIRUPDWLRQ IRUFHVf 5LJKW JUDSK GHWDLOHG GLVWULEXWLRQ PAGE 167 B LfÂ§LfÂ§LfÂ§LfÂ§LfÂ§LfÂ§LfÂ§U L n A 26 )LJXUH 2XWSXW 'DWD 'LVWULEXWLRQ DW )LQDO 6WDJH IRU &ODVVLILFDWLRQ /HIW JUDSK OLQHV DUH LOOXVWUDWLRQ RI fLQIRUPDWLRQ IRUFHVf 5LJKW JUDSK GHWDLOHG GLVWULEXWLRQ 7DEOH &RQIXVLRQ 0DWUL[ IRU &ODVVLILFDWLRQ E\ ('&,3 %03 %75 7 %03B& %03B %03B %75B& 7B 7B 7B6 7DEOH VKRZV WKH FODVVLILFDWLRQ UHVXOW :LWK OLPLWHG QXPEHU RI WUDLQLQJ GDWD WKH FODVVLILHU VWLOO VKRZV D YHU\ JRRG JHQHUDOL]DWLRQ DELOLW\ %\ VHWWLQJ D WKUHVKROG WR DOORZ b UHMHFWLRQ D GHWHFWLRQ WHVW LV IXUWKHU FRQGXFWHG RQ DOO WKHVH GDWD DQG WKH GDWD IRU WZR RWKHU FRQIXVHUV $ JRRG UHVXOW LV VKRZQ LQ 7DEOH 7KH UHVXOWV LQ 7DEOH DQG 7DEOH PAGE 168 fÂ§ DUH REWDLQHG E\ XVLQJ NHUQHO VL]H D\ DQG WKH VWHS VL]H [ $VD FRPSDULn VRQ 7DEOH DQG 7DEOH JLYH WKH FRUUHVSRQGLQJ UHVXOWV RI WKH VXSSRUW YHFWRU PDFKLQH PRUH GHWDLOHG UHVXOWV DUH SUHVHQWHG LQ LPDJH XQGHUVWDQGLQJ ZRUNVKRS >3UL@f IURP ZKLFK ZH FDQ VHH WKDW WKH FODVVLILFDWLRQ UHVXOW RI ('&,3 LV HYHQ EHWWHU WKDQ WKDW RI VXSn SRUW YHFWRU PDFKLQH 7DEOH &RQIXVLRQ 0DWUL[ IRU 'HWHFWLRQ ZLWK GHWHFWLRQ SUREDELOLW\ f ('&,3f %03 %75 7 5HMHFW %03B& %03B %03B %75B& 7B 7B 7B6 6 7DEOH &RQIXVLRQ 0DWUL[ IRU &ODVVLILFDWLRQ E\ 6XSSRUW 9HFWRU 0DFKLQH 690f %03 %75 7 %03B& %03B %03B %75B& 7B 7B 7B6 PAGE 169 7DEOH &RQIXVLRQ 0DWUL[ IRU 'HWHFWLRQ ZLWK GHWHFWLRQ SUREDELOLW\ f 690f %03 %75 7 5HMHFW %03B& %03B %03B %75B& 7B 7B 7-6 6 7UDLQLQJ 0/3 /DYHUEY/DYHU ZLWK &,3 'XULQJ WKH ILUVW QHXUDO QHWZRUN HUD WKDW HQGHG LQ WKH V WKHUH ZDV RQO\ 5RVHQEn ODWWfV DOJRULWKP >5RV 5RV@ WR WUDLQ RQH OD\HU SHUFHSWURQ DQG WKHUH ZDV QR NQRZQ DOJRULWKP WR WUDLQ 0/3V +RZHYHU WKH PXFK KLJKHU FRPSXWDWLRQDO SRZHU RI WKH 0/3 ZKHQ FRPSDUHG ZLWK WKH SHUFHSWURQ ZDV UHFRJQL]HG LQ WKDW SHULRG RI WLPH >0LQ@ ,Q WKH ODWH V WKH EDFNSURSDJDWLRQ DOJRULWKP ZDV LQWURGXFHG WR WUDLQ 0/3V FRQWULEXWLQJ WR WKH UHYLYDO RI QHXUDO FRPSXWDWLRQ (YHU VLQFH WKLV WLPH WKH EDFNSURSDJDWLRQ DOJRULWKP KDV EHHQ H[FOXVLYHO\ XWLOL]HG WR WUDLQ 0/3V WR D SRLQW WKDW VRPH UHVHDUFKHUV HYHQ FRQIXVH WKH QHWZRUN WRSRORJ\ ZLWK WKH WUDLQLQJ DOJRULWKP E\ FDOOLQJ 0/3V DV EDFNSURSDJDWLRQ QHWZRUNV ,W KDV EHHQ ZLGHO\ DFFHSWHG WKDW WUDLQLQJ WKH KLGGHQ OD\HUV UHTXLUHV EDFNSURSD JDWLRQ RI HUURUV IURP WKH RXWSXW OD\HUV $V SRLQWHG RXW LQ &KDSWHU /LQVNHUfV ,QIR0D[ FDQ EH IXUWKHU H[WHQGHG WR D PRUH JHQn HUDO FDVH 7KH 0/3 QHWZRUN FDQ EH UHJDUGHG DV D FRPPXQLFDWLRQ FKDQQHO RU fLQIRUPDWLRQ PAGE 170 ILOWHUf IRU HDFK OD\HU 7KH JRDO RI WKH WUDLQLQJ RI VXFK QHWZRUN LV WR WUDQVPLW DV PXFK LQIRUn PDWLRQ DERXW WKH GHVLUHG VLJQDO DV SRVVLEOH DW WKH RXWSXW RI HDFK OD\HU $V VKRZQ LQ f WKLV FDQ EH LPSOHPHQWHG E\ PD[LPL]LQJ WKH PXWXDO LQIRUPDWLRQ EHWZHHQ WKH RXWSXW RI HDFK OD\HU DQG WKH GHVLUHG VLJQDO 1RWLFH WKDW ZH DUH QRW XVLQJ WKH EDFNSURSDJDWLRQ RI HUURUV DFURVV OD\HUV 7KH QHWZRUN LV LQFUHPHQWDOO\ WUDLQHG LQ D VWULFWO\ IHHGIRUZDUG ZD\ IURP WKH LQSXW OD\HU WR WKH RXWSXW OD\HU 7KLV PD\ VHHP LPSRVVLEOH VLQFH ZH DUH QRW XVLQJ WKH LQIRUPDWLRQ RI WKH WRS OD\HU WR WUDLQ WKH LQSXW OD\HU 7KH WUDLQLQJ LQ WKLV ZD\ LV VLPSO\ JXDUDQWHHLQJ WKDW WKH PD[LPXP SRVVLEOH LQIRUPDWLRQ DERXW WKH GHVLUHG VLJQDO LV WUDQVn IHUUHG IURP WKH LQSXW OD\HU WR HDFK OD\HU 7KH FURVV LQIRUPDWLRQ SRWHQWLDO FDQ PDNH WKH H[SOLFLW LPPHGLDWH UHVSRQVH WR HDFK QHWZRUN OD\HU ZLWKRXW WKH QHHG WR EDFNSURSDJDWH IURP WKH RXWSXW OD\HU 7R WHVW WKH PHWKRG WKH fIUHTXHQF\ GRXEOHUf SUREOHP LV VHOHFWHG ZKLFK LV UHSUHVHQWDn WLYH RI D QRQOLQHDU WHPSRUDO SURFHVVLQJ 7KH LQSXW VLJQDO LV D VLQHZDYH DQG WKH GHVLUHG RXWn SXW VLJQDO LV VWLOO D VLQHZDYH EXW ZLWK WKH IUHTXHQF\ GRXEOHG DV VKRZQ LQ )LJXUH f $ IRFXVHG 7'11 ZLWK RQH KLGGHQ OD\HU LV XVHG 7KHUH DUH RQH LQSXW QRGH ZLWK GHOD\ WDSV WZR QRGHV LQ KLGGHQ OD\HU ZLWK WDQK QRQOLQHDU IXQFWLRQ DQG RQH OLQHDU RXWSXW QRGH DV VKRZQ LQ )LJXUH f 7KH ('40, RU ('&,3 LV XVHG IRU WUDLQLQJ 7KH KLGGHQ OD\HU LV WUDLQHG ILUVW IROORZHG E\ WKH RXWSXW OD\HU 7KH WUDLQLQJ FXUYHV DUH VKRZQ LQ )LJXUH 7KH RXWSXW RI WKH KLGGHQ QRGHV DQG RXWSXW QRGH DIWHU WUDLQLQJ DUH VKRZQ LQ )LJXUH ZKLFK WHOOV XV WKDW WKH IUHTXHQF\ RI WKH ILQDO RXWSXW LV GRXEOHG 7KH NHUQHO VL]H IRU WKH WUDLQLQJ RI ERWK WKH KLGGHQ OD\HU DQG WKH RXWSXW OD\HU DUH FWf IRU WKH RXWSXW RI HDFK OD\HU DQG DG IRU WKH GHVLUHG VLJQDO PAGE 171 7KLV SUREOHP FDQ DOVR EH VROYHG ZLWK 06( FULWHULRQ DQG %3 DOJRULWKP 7KH HUURU PD\ EH VPDOOHU 6R WKH SRLQW KHUH LV QRW WR XVH &,3 DV D VXEVWLWXWH WR %3 IRU 0/3 WUDLQLQJ ,W LV DQ LOOXVWUDWLRQ WKDW WKH %3 DOJRULWKP LV QRW WKH RQO\ SRVVLEOH ZD\ WR WUDLQ QHWZRUNV ZLWK KLGGHQ OD\HUV )URP WKH H[SHULPHQWDO UHVXOWV ZH FDQ VHH WKDW HYHQ ZLWKRXW WKH LQYROYHPHQW RI WKH RXWSXW OD\HU &,3 FDQ VWLOO JXLGH WKH KLGGHQ OD\HU WR OHDP ZKDW LV QHHGHG 7KH SORW RI WZR KLGGHQ QRGH RXWSXWV DOUHDG\ UHYHDOV WKH GRXEOHG IUHTXHQF\ ZKLFK PHDQV WKH KLGGHQ QRGHV EHVW UHSUHVHQW WKH GHVLUHG RXWSXW IURP WKH WUDQVIRUPDWLRQ RI WKH LQSXW 7KH RXWSXW OD\HU VLPSO\ VHOHFWV ZKDW LV QHHGHG 7KHVH UHVXOWV RQ WKH RWKHU KDQG IXUWKHU FRQILUP WKH YDOLGn LW\ RI WKH &,3 PHWKRG SURSRVHG )URP WKH WUDLQLQJ FXUYHV ZH FDQ VHH WKH VKDUS LQFUHDVHV LQ &,3 ZKLFK VXJJHVW WKDW WKH VWHS VL]H VKRXOG EH YDULHG DQG DGDSWHG GXULQJ WKH WUDLQLQJ SURFHVV +RZ WR FKRRVH WKH NHUn QHO VL]H RI *DXVVLDQ IXQFWLRQ LQ &,3 PHWKRG LV VWLOO DQ RSHQ SUREOHP )RU WKHVH UHVXOWV LW LV GHWHUPLQHG H[SHULPHQWDOO\ )LJXUH 7'11 DV D )UHTXHQF\ 'RXEOHU PAGE 172 )LJXUH 7UDLQLQJ &XUYH &,3 YV ,WHUDWLRQV )LUVW +LGGHQ 1RGH 6HFRQG +LGGHQ 1RGH 3ORW WKH RXWSXW RI WZR KLGGHQ QRGHV WRJHWKHU 7KH RXWSXW RI WKH QHWZRUN )LJXUH 7KH RXWSXW RI WKH QRGHV DIWHU WUDLQLQJ PAGE 173 %OLQG 6RXUFH 6HSDUDWLRQ DQG ,QGHSHQGHQW &RPSRQHQW $QDO\VLV 3UREOHP 'HVFULSWLRQ DQG )RUPXODWLRQ %OLQG VRXUFH VHSDUDWLRQ LV D VSHFLILF FDVH RI ,&$ 7KH REVHUYHG GDWD ; $6 LV D OLQn HDU PL[WXUH $ H 5P[P LV QRQVLQJXODUf RI LQGHSHQGHQW VRXUFH VLJQDOV 6 6Pf7 6c LQGHSHQGHQW ZLWK HDFK RWKHUf 7KHUH LV QR IXUWKHU LQIRUPDWLRQ DERXW WKH VRXUFHV DQG WKH PL[LQJ PDWUL[ 7KLV LV ZK\ LW LV FDOOHG fEOLQGf 7KH SUREOHP LV WR ILQG D SURMHFWLRQ : H 5P[P < :; VR WKDW < 6 XS WR D SHUPXWDWLRQ DQG VFDOLQJ &RPRQ >&RP@ DQG &DR DQG /LX >&DR@ DPRQJ RWKHUV KDYH DOUHDG\ VKRZQ WKDW WKLV UHVXOW ZLOO EH REWDLQHG IRU D OLQHDU PL[WXUH ZKHQ WKH RXWSXWV DUH LQGHSHQGHQW RI HDFK RWKHU %DVHG RQ WKH ,3 RU &,3 FULWHULD WKH SUREOHP FDQ EH UHVWDWHG DV ILQGLQJ D SURMHFWLRQ : J 5P[P < :; VR WKDW WKH ,3 LV PLQLPL]HG PD[LPXP TXDGUDWLF HQWURS\f RU &,3 LV PLQLPL]HG PLQLPXP 40,f 7KH V\VWHP GLDJUDP LV VKRZQ LQ )LJXUH 7KH GLIIHUHQW FDVHV ZLOO EH GLVFXVVHG LQ WKH IROORZLQJ VHFWLRQV %DFN3URSDJDWLRQ )LJXUH 7KH 6\VWHP 'LDJUDP IRU %66 ZLWK ,3 RU &,3 PAGE 174 %OLQG 6RXUFH 6HSDUDWLRQ ZLWK &640,Â&6&,3, $V LQWURGXFHG LQ &KDSWHU &640, FDQ EH XVHG DV DQ LQGHSHQGHQFH PHDVXUH ,WV FRUn UHVSRQGLQJ FURVV LQIRUPDWLRQ SRWHQWLDO &6&,3 ZLOO EH XVHG KHUH IRU WKH EOLQG VRXUFH VHSn DUDWLRQ )RU HDVH RI LOOXVWUDWLRQ RQO\ VRXUFHVHQVRU SUREOHP LV WHVWHG 7KHUH DUH WZR H[SHULPHQWV SUHVHQWHG KHUH 6RXUFH 0L[HG 6LJQDO 5HFRYHUHG )LJXUH 'DWD 'LVWULEXWLRQ IRU ([SHULPHQW 7UDLQLQJ &XUYH G% YV LWHUDWLRQ )LJXUH 7UDLQLQJ &XUYH IRU ([SHULPHQW 615 G%f YV LWHUDWLRQV PAGE 175 ([SHULPHQW WHVWV WKH SHUIRUPDQFH RI WKH PHWKRG RQ D YHU\ VSDUVH GDWD VHW 7ZR GLIn IHUHQW FRORUHG *DXVVLDQ QRLVH VHJPHQWV DUH XVHG DV VRXUFHV ZLWK GDWD SRLQWV IRU HDFK VHJPHQW 7KH GDWD GLVWULEXWLRQ IRU VRXUFH VLJQDOV PL[HG VLJQDOV DQG UHFRYHUHG VLJQDOV DUH SORWWHG LQ )LJXUH )LJXUH LV WKH WUDLQLQJ FXUYH ZKLFK VKRZV KRZ WKH 615 RI GH PL[LQJPL[LQJ SURGXFW PDWUL[ :$f FKDQJHV ZLWK LWHUDWLRQ 615 DSSURDFKHV WR G%f %RWK ILJXUHV VKRZ WKDW WKH PHWKRG ZRUNV ZHOO )LJXUH 7ZR 6SHHFK 6LJQDOV IURP 7,0,7 'DWDEDVH DV 7ZR 6RXUFH 6LJQDOV 7UDLQLQJ &XUYH )LJXUH 7UDLQLQJ &XUYH IRU 6SHHFK 6LJQDOV 615 G%f YV ,WHUDWLRQV PAGE 176 ([SHULPHQW XVHV WZR VSHHFK VLJQDOV IURP WKH 7,0,7 GDWDEDVH DV VRXUFH VLJQDOV VKRZQ LQ )LJXUH f 7KH PL[LQJ PDWUL[ LV > @ ZKHUH WZR PL[LQJ GLUHFWLRQ > @ DQG > @ DUH VLPLODU :KLWHQLQJ LV ILUVW GRQH RQ PL[HG VLJQDOV $Q RQOLQH LPSOHPHQWDWLRQ LV WULHG LQ WKLV H[SHULPHQW LQ ZKLFK D VKRUWWLPH ZLQGRZ VOLGHV RYHU WKH VSHHFK GDWD ,Q HDFK ZLQGRZ SRVLWLRQ VSHHFK GDWD ZLWKLQ WKH ZLQGRZ DUH XVHG WR FDOFXODWH WKH &6&,3 UHODWHG IRUFHV DQG EDFNSURSDJDWHG IRUFHV WR DGMXVW WKH GHPL[LQJ PDWUL[ $V WKH ZLQGRZ VOLGHV DOO VSHHFK GDWD ZLOO PDNH FRQWULEXWLRQ WR WKH GHPL[LQJ DQG WKH FRQWULn EXWLRQV DUH DFFXPXODWHG 7KH WUDLQLQJ FXUYH 615 YV VOLGLQJ LQGH[ 615 DSSURDFKHV WR G%f LV VKRZQ LQ )LJXUH ZKLFK WHOOV XV WKDW WKH PHWKRG FRQYHUJHV IDVW DQG ZRUNV YHU\ ZHOO :H FDQ HYHQ VD\ WKDW LW FDQ WUDFN WKH VORZ FKDQJH RI PL[LQJ $OWKRXJK ZKLWHQn LQJ LV GRQH EHIRUH WKH &,3 PHWKRG ZH EHOLHYH WKDW ZKLWHQLQJ SURFHVV FDQ DOVR EH LQFRUSRn UDWHG LQWR WKLV PHWKRG ('40, ('&,3f FDQ DOVR EH XVHG DQG VLPLODU UHVXOWV KDYH EHHQ REWDLQHG )RU WKH EOLQG VRXUFH VHSDUDWLRQ WKH UHVXOW LV QRW VHQVLWLYH WR WKH NHUQHO VL]H IRU WKH FURVV LQIRUPDWLRQ SRWHQWLDO $ YHU\ ODUJH UDQJH RI WKH NHUQHO VL]H ZLOO ZRUN HJ IURP WR HWF %OLQG 6RXUFH 6HSDUDWLRQ EY 0D[LPL]LQJ 4XDGUDWLF (QWURS\ %HOO DQG 6HMQRZVNL >%HO@ KDYH VKRZQ WKDW D OLQHDU QHWZRUN ZLWK QRQOLQHDU IXQFWLRQ DW HDFK RXWSXW QRGH FDQ VHSDUDWH OLQHDU PL[WXUH RI LQGHSHQGHQW VLJQDOV E\ PD[LPL]LQJ WKH RXWSXW HQWURS\ +HUH TXDGUDWLF HQWURS\ DQG FRUUHVSRQGLQJ LQIRUPDWLRQ SRWHQWLDO ZLOO EH XVHG WR LPSOHPHQW WKH PD[LPXP HQWURS\ LGHD IRU %66 $JDLQ IRU WKH HDVH RI H[SRVLWLRQ RQO\ VRXUFHVHQVRU SUREOHP LV WHVWHG 7KH VRXUFH VLJQDOV DUH WKH VDPH VSHHFK VLJQDOV PAGE 177 IURP WKH 7,0,7 GDWDEDVH DV DERYH 7KH PL[LQJ PDWUL[ LV > @ QHDU VLQJXODU ,W EHFRPHV > @ DIWHU ZKLWHQLQJ ZKLFK LV QHDU RUWKRJRQDO 7KH VLJQDO VFDWWHULQJ SORWV DUH VKRZQ LQ )LJXUH IRU ERWK VRXUFH DQG PL[HG VLJQDOV 7ZR QDUURZ OLQHVKDSH GLVWULEXWLRQ DUHDV FDQ EH YLVXDOO\ VSRWWHG LQ )LJXUH ZKLFK FRUUHVSRQG WR PL[LQJ GLUHFWLRQV 8VXDOO\ LI VXFK OLQHV DUH FOHDU WKH %66 ZLOO EH UHODWLYHO\ HDVLHU 7R WHVW WKH ,3 PHWKRG D fEDGf VHJPHQW ZLWK RQO\ VDPSOHV DUH FKRVHQ ZKHUH QR REYLRXV OLQHVKDSHG QDUURZ GLVWULEXWLRQ DUHD FDQ EH VHHQ DV VKRZQ LQ )LJXUH f )LJXUH VKRZV WKH PL[HG VLJQDOV RI WKLV fEDGf VHJPHQW $OO WKH H[SHULPHQWV DUH GRQH RQO\ RQ WKLV fEDGf VHJPHQW 7KH SDUDPHWHUV XVHG DUH *DXVVLDQ NHUQHO VL]H D LQLWLDO VWHS VL]H V WKH GHFD\LQJ IDFn WRU RI VWHS VL]H D WKH VWHS VL]H ZLOO GHFD\ DFFRUGLQJ WR VQf VQ fÂ§ fD ZKHUH Q LV WKH WLPH LQGH[ 'DWD SRLQWV LQ WKH VDPH fEDG VHJPHQWf DUH XVHG IRU WUDLQLQJ $OO UHVXOWV DUH WKH LWHUDWLRQV IURP WR fWDQKf IXQFWLRQV DUH XVHG LQ WKH RXWSXW VSDFH fÂ§ fÂ§ fÂ§ 6RXUFH 6LJQDOV 0L[HG 6LJQDOV DIWHU ZKLWHQLQJf )LJXUH 6LJQDOV 6FDWWHULQJ 3ORWV PAGE 178 )LJXUH $ fEDGf 6HJPHQW RI 6RXUFH 6LJQDOV OLQHV LQGLFDWH PL[LQJ GLUHFWLRQV )LJXUH 7KH 0L[HG 6LJQDOV IRU WKH fEDGf 6HJPHQW DIWHU ZKLWHQLQJf 2XWSXW 6LJQDOVf 6FDWWHULQJ 3ORW 7UDLQLQJ &XUYH 'HPL[LQJ 615 G%f YV LWHUDWLRQV '''fÂ§GHVLUHG GHPL[LQJ GLUHFWLRQ DSSURDFKLQJ G%f $''fÂ§DFWXDO GHPL[LQJ GLUHFWLRQ )LJXUH 7KH ([SHULPHQW 5HVXOW D V D PAGE 179 ,2'2 m 2XWSXW 6LQJÂ£LVf 6FDWWHULQJ 3ORW 7UDLQLQJ &XUYH 'HPL[LQJ 615 G%f YV LWHUDWLRQV '''fÂ§GHVLUHG GHPL[LQJ GLUHFWLRQ DSSURDFKLQJ G%f $''fÂ§DFWXDO GHPL[LQJ GLUHFWLRQ )LJXUH 7KH ([SHULPHQW 5HVXOW FW V D 2XWSXW 6LQJÂ£LVf 6FDWWHULQJ 3ORW '''fÂ§GHVLUHG GHPL[LQJ GLUHFWLRQ $''fÂ§DFWXDO GHPL[LQJ GLUHFWLRQ 7UDLQLQJ &XUYH 'HPL[LQJ 615 G%f YV LWHUDWLRQV DSSURDFKLQJ G%f )LJXUH 7KH ([SHULPHQW 5HVXOW D V D PAGE 180 7UDLQLQJ &XUYH 'HPL[LQJ 615 G%f YV LWHUDWLRQV DSSURDFKLQJ G%f 2XWSXW 6LQJÂ£LVf 6FDWWHULQJ 3ORW '''fÂ§GHVLUHG GHPL[LQJ GLUHFWLRQ $''fÂ§DFWXDO GHPL[LQJ GLUHFWLRQ )LJXUH 7KH ([SHULPHQW 5HVXOW D V D %OLQG 6RXUFH 6HSDUDWLRQ ZLWK ('20, W('&,3nf DQG 0LQL0D[ 0HWKRG )RU VLPSOLFLW\ RI H[SRVLWLRQ DQG ZLWKRXW FKDQJLQJ WKH HVVHQFH RI WKH SUREOHP ZHfOO GLVFXVV RQO\ WKH FDVH ZLWK VRXUFHV DQG VHQVRUV )LJXUH LV D PL[LQJ PRGHO ZKHUH RQO\ MFMIf}r DUH REVHUYHG 6RXUFH VLJQDOV VcWfVWf DUH VWDWLVWLFDOO\ LQGHSHQGHQW DQG XQNQRZQ 0L[LQJ GLUHFWLRQV 0 DQG 0 DUH GLIIHUHQW DQG XQNQRZQ HLWKHU 7KH SUREn OHP LV WR ILQG D GHPL[LQJ V\VWHP RI )LJXUH WR UHFRYHU WKH VRXUFH VLJQDOV XS WR D SHUn PXWDWLRQ DQG VFDOLQJ (TXLYDOHQWO\ WKH SUREOHP LV WR ILQG VWDWLVWLFDOO\ RUWKRJRQDO LQGHSHQGHQWf GLUHFWLRQV :c DQG : UDWKHU WKDQ JHRPHWULFDOO\ RUWKRJRQDO XQFRUUHODWHGf GLUHFWLRQV DV 3&$ >&RP &DR &DUD@ 1HYHUWKHOHVV JHRPHWULFDO RUWKRJRQDOLW\ H[LVWV EHWZHHQ GHPL[LQJ DQG PL[LQJ GLUHFWLRQV HJ HLWKHU :[ 0 RU :[ s 0 :X HWDO >:X+@ KDYH VKRZQ WKDW HYHQ ZKHQ VRXUFHV DUH PRUH WKDQ VHQVRUV LH WKHUH DUH QR VWDWLVWLFDOO\ RUWKRJRQDO GHPL[LQJ GLUHFWLRQV PL[LQJ GLUHFWLRQV FDQ VWLOO EH LGHQWLILHG DV ORQJ DV WKHUH DUH VRPH VLJQDO VHJPHQWV ZLWK VRPH VRXUFHV EHLQJ ]HUR RU QHDU ]HUR /RRN PAGE 181 LQJ IRU WKH PL[LQJ GLUHFWLRQV LV WKHUHIRUH PRUH HVVHQWLDO WKDQ VHDUFKLQJ GHPL[LQJ GLUHFn WLRQV DQG WKH QRQVWDWLRQDULW\ QDWXUH RI WKH VRXUFHV SOD\V DQ LPSRUWDQW UROH r1 nP?? P?n U6? 2nn r r1Q P r A 2n Â:?? Z$7[LWfn? ?r1? :?[WfM 0[V[Wf 0VWf Z?[^Wf:7[Wf f f )URP )LJXUH LI V LV ]HUR RU QHDU ]HUR WKH GLVWULEXWLRQ RI REVHUYHG VLJQDOV LQ [M [f SODQH ZLOO EH DORQJ WKH GLUHFWLRQ RI 0 IRUPLQJ D fQDUURZ EDQGf GDWD GLVWULEXn WLRQ ZKLFK LV JRRG IRU ILQGLQJ WKH PL[LQJ GLUHFWLRQ 0[ ,I V[ DQG V DUH FRPSDUDEOH LQ HQHUJ\ WKH PL[LQJ GLUHFWLRQV ZLOO EH VPHDUHG ZKLFK LV FRQVLGHUHG fEDGf )LJXUH DQG )LJXUH JLYH WZR RSSRVLWH H[DPSOHV 6LQFH WKHUH DUH fJRRGf DQG fEDGf GDWD VHJPHQWV ZH VHHN D WHFKQLTXH WR FKRRVH fJRRGf RQHV ZKLOH GLVFDUGLQJ fEDGf VHJPHQWV ,W VKRXOG EH SRLQWHG RXW WKDW WKLV LVVXH LV UDUHO\ DGGUHVVHG LQ WKH %66 OLWHUDWXUH 0RVW PHWKRGV WUHDW GDWD HTXDOO\ DQG VLPSO\ DSSO\ D FULWHULRQ WR DFKLHYH WKH LQGHSHQGHQFH RI WKH GHPL[LQJ V\Vn WHP RXWSXWV 0LQLPL]LQJ ('&,3 FDQ EH XVHG IRU WKLV SXUSRVH ,Q DGGLWLRQ ('&,3 FDQ EH XVHG WR GLVWLQJXLVK fJRRGf VHJPHQWV IURP fEDGf RQHV :X HWDO >:X+@ XWLOL]H WKH QRQVWDWLRQDULW\ RI VSHHFK VLJQDOV DQG WKH HLJHQVSUHDG RI GLIIHUHQW VSHHFK VHJPHQWV WR FKRRVH fJRRGf VHJPHQWV +RZHYHU KRZ WR GHFRPSRVH VLJQDOV LQ IUHTXHQF\ GRPDLQ WR ILQG fJRRGf IUHTXHQF\ EDQGV UHPDLQV REVFXUH ,W LV ZHOO NQRZQ WKDW DQ LQVWDQWDQHRXV PL[WXUH ZLOO KDYH WKH VDPH PL[WXUH LQ DOO WKH IUHTXHQF\ EDQGV ZKLOH D FRQYROXWLYH PL[WXUH ZLOO LQ JHQHUDO KDYH GLIIHUHQW PL[WXUHV LQ GLIIHUHQW IUHn TXHQF\ EDQGV WKHUHIRUH %66 IRU FRQYROXWLYH PL[WXUH LV D PXFK PRUH GLIILFXOW SUREOHP WKDQ %66 IRU LQVWDQWDQHRXV PL[WXUHf )RU DQ LQVWDQWDQHRXV PL[WXUH GLIIHUHQW IUHTXHQF\ PAGE 182 EDQGV PD\ UHYHDO D VDPH PL[LQJ GLUHFWLRQ 6R ,W LV QHFHVVDU\ WR ILQG fJRRGf IUHTXHQF\ EDQGV E\ ZKLFK PL[LQJ GLUHFWLRQV DUH HDVLHU WR ILQG )RU FRQYROXWLYH PL[WXUH WR WUHDW GLIn IHUHQW IUHTXHQF\ EDQGV GLIIHUHQWO\ PD\ DOVR EH LPSRUWDQW EXW ZHfOO RQO\ GLVFXVV WKH SUREn OHP UHODWHG WR LQVWDQWDQHRXV PL[WXUH KHUH /HW KWQf GHQRWH WKH LPSXOVH UHVSRQVH RI D ),5 ILOWHU ZLWK SDUDPHWHUV Q $SSO\LQJ WKLV ILOWHU WR WKH REVHUYHG VLJQDOV QHZ REVHUYHG VLJQDOV DUH REWDLQHG n[L97L ?r 9fM KW Qfr Ir?f1 ?;^Lff 0@Lr@f 0KrVf f 2EYLRXVO\ WKH PL[LQJ GLUHFWLRQV UHPDLQ XQFKDQJHG 7KH SUREOHP KHUH LV KRZ WR FKRRVH Q VR WKDW RQO\ RQH VRXUFH VLJQDO GRPLQDWLQJ G\QDPLF UDQJH VR WKDW WKH FRUUHVSRQGn LQJ PL[LQJ GLUHFWLRQ LV FOHDUO\ UHYHDOHG )LUVW OHWfV FRQVLGHU WKH FDVH ZKHQ PL[LQJ PDWUL[ 0 N5 ZKHUH $ LV D SRVLWLYH VFDn ODU 5 LV D URWDWLRQ WUDQVIRUP RUWKRQRUPDO PDWUL[f DQG PL[LQJ GLUHFWLRQV DUH QHDU r RU r 2EYLRXVO\ ZKHQ WKHUH LV RQO\ RQH VRXUFH [n DQG [ DUH OLQHDU GHSHQGHQW 6R WKH QHFHVVDU\ FRQGLWLRQ WR MXGJH D fJRRGf VHJPHQW LV WKH KLJK GHSHQGHQFH EHWZHHQ ;Mn DQG [n Â‘ %XW D PRUH LPSRUWDQW SUREOHP LV ZKHWKHU WKH KLJK GHSHQGHQFH EHWZHHQ [n DQG [n FDQ JXDUDQWHH WKDW WKHUH LV RQO\ RQH GRPLQDWLQJ ILOWHUHG VRXUFH VLJQDO 7KH DQVZHU LV \HV 2Q RQH KDQG VLQFH WKH VRXUFH VLJQDOV DUH LQGHSHQGHQW DV ORQJ DV WKH ILOWHU OHQJWK LV VKRUW HQRXJK IUHTXHQF\ EDQG ODUJH HQRXJKf WKH ILOWHUHG VRXUFH VLJQDOV ZLOO VFDWWHUHG LQ D ZLGH UHJLRQ RU D QDUURZ RQH DORQJ QDWXUDO EDVHV RWKHUZLVH WKH VRXUFH VLJQDOV DUH QRW LQGHSHQn GHQWf 2Q WKH RWKHU KDQG WKH PL[LQJ LV D URWDWLRQ ZLWK DERXW RU GHJUHHV RU HTXLYDn OHQW GHJUHHV DQG D QDUURZ EDQG GLVWULEXWLRQ DORQJ WKHVH GLUHFWLRQV PHDQV WKH KLJK GHSHQGHQFH EHWZHHQ WZR YDULDEOHV 6R LI D QDUURZ GLVWULEXWLRQ LQ [Mn [nf SODQ DSSHDUV PAGE 183 LW PXVW EH WKH UHVXOW ZLWK RQO\ RQH GRPLQDWLQJ VRXUFH VLJQDO 7R PD[LPL]H WKH GHSHQGHQFH EHWZHHQr@n DQG [n EDVHG RQ GDWD VHW ^[nQLfL 1@ ZKHUH Q DUH WKH SDUDPHWHUV RI WKH ILOWHU 1 LV WKH QXPEHU RI WKH ILOWHUHG VDPSOHV ('&,3 FDQ EH XVHG ARSWLPDO DUJPD[ 9('^[nQLfL O1`ff f ,8,, L ZKHUH __WL__ PHDQV WKH ),5 ILOWHU LV FRQVWUDLQHG ZLWK XQLW QRUP 2QH QDUURZ GLVWULEXWLRQ FDQ EH RQO\ DVVRFLDWHG ZLWK RQH PL[LQJ GLUHFWLRQ 2QFH D GHVLUHG ILOWHU ZLWK SDUDPHWHUV Q DQG RXWSXWV ^r n` LV REWDLQHG WKH UHPDLQLQJ SUREOHP LV KRZ WR REWDLQ WKH VHFRQG WKH WKLUG HWF VR WKDW WKH QDUURZ GLVWULEXWLRQ DVVRFLDWHG ZLWK DQRWKHU PL[LQJ GLUHFWLRQ ZLOO DSSHDU 2QH LGHD LV WR OHW WKH RXWSXWV RI WKH ILOWHU EH KLJKO\ GHSHQGHQW ZLWK HDFK RWKHU DQG DW WKH VDPH WLPH EH LQGHSHQGHQW ZLWK DOO WKH RXWSXWV RI SUHn YLRXV ILOWHUV HJ QSWLPDL t"JPD[>?[9F[f fÂ§ ? fÂ§ ?Lf9F[n[?nf@ ZKHUH S LV D Or_ ZHLJKW DQG FDQ FKDQJH IURP WR RU WR $IWHU VHYHUDO fJRRGf GDWD VHW ^rn` ^[UL` DUH REWDLQHG WKH GHPL[LQJ FDQ EH IRXQG E\ PLQLPL]LQJ WKH ('&,3 RI WKH RXWSXWV RI GHPL[LQJ RQ DOO FKRVHQ GDWD VHW \L : [Ln L Q .SXPDO DUJ[LQ >9F\Of 9F\Qf@ nf Z 7KLV LV ZK\ WKH PHWKRG LV FDOOHG WKH f0LQL0D[f PHWKRG ,I PL[LQJ 0 LV QRW D URWDWLRQ ZKLWHQLQJ FDQ EH GRQH VR WKDW WKH PL[LQJ PDWUL[ ZLOO EH FORVH WR N5 PHQWLRQHG DERYH ,I WKH PL[LQJ GLUHFWLRQV DIWHU ZKLWHQLQJf DUH IDU IURP RU GHJUHH GLUHFWLRQ D URWDWLRQ WUDQVIRUP FDQ EH IXUWKHU LQWURGXFHG EHIRUH ILOWHUV 7KH SDUDPHWHUV RI URWDWLRQ ZLOO EH WUDLQHG E\ WKH VDPH FULWHULRQ DQG ZLOO FRQYHUJH WR WKH VWDWH ZKHUH WKH RYHUDOO PL[LQJ GLUHFWLRQ LV QHDU RU GHJUHH GLUHFWLRQ 6R WKH SURFHGXUH PAGE 184 ZLOO EH f ZKLWHQLQJ f WUDLQLQJ WKH SDUDPHWHUV RI D URWDWLRQ WUDQVIRUP f WUDLQLQJ WKH SDUDPHWHUV RI ILOWHUV 6LQFH PL[LQJ GLUHFWLRQV FDQ EH LGHQWLILHG HDVLO\ E\ QDUURZ VFDWWHULQJ GLVWULEXWLRQf WKLV PHWKRG LV DOVR H[SHFWHG WR HQKDQFH WKH GHPL[LQJ SHUIRUPDQFH ZKHQ WKH REVHUYDWLRQ LV FRUUXSWHG E\ QRLVH LH [ 0[V[ 0V 1RLVH 7KH VDPH fEDGf VHJPHQW DQG PL[LQJ PDWUL[ DV WKH SUHYLRXV VHFWLRQ ZLOO EH XVHG KHUH VKRZQ LQ )LJXUH f :KLWHQLQJ LV ILUVW GRQH DQG WKH PL[HG VLJQDOV DIWHU ZKLWHQLQJ LV VKRZV LQ )LJXUH :KLWH *DXVVLDQ QRLVH 615 G%f LV DGGHG LQWR WKH PL[HG VLJQDOV DQG PDNH HYHQ D ZRUVH VHJPHQW VKRZQ LQ )LJXUH f )URP )LJXUH ZH FDQ VHH WKDW WKH PL[LQJ GLUHFWLRQV DUH GLIILFXOW WR ILQG 7KH FDVH LQ )LJXUH LV HYHQ ZRUVH GXH WR WKH QRLVH )LJXUH 7KH fEDGf 6HJPHQW LQ )LJXUH 1RLVH 615 G%f %\ GLUHFWO\ PLQLPL]LQJ ('&,3 RI WKH RXWSXWV RI D GHPL[LQJ V\VWHP WKH UHVXOWV LQ )LJXUH LV REWDLQHG IURP ZKLFK ZH FDQ VHH WKH DYHUDJH GHPL[LQJ SHUIRUPDQFH FRQn YHUJHV WR G% IRU WKH FDVH ZLWKRXW QRLVH DQG G% IRU WKH FDVH ZLWK QRLVH %DVHG RQO\ RQ WKH OLPLWHG QXPEHU RI GDWD SRLQWV LQ WKH fEDG VHJPHQWf ILUVW GDWD SRLQWV DUH PAGE 185 XVHGf 0LQL('&,3 PHWKRG FDQ VWLOO JHW D JRRG SHUIRUPDQFH &RPSDULQJ WKH UHVXOWV ZLWK WKH UHVXOWV E\ ,3 PHWKRG LQ WKH SUHYLRXV VHFWLRQf 7KLV IXUWKHU YHULILHV WKH YDOLGLW\ RI (' &,3 %\ DSSO\LQJ 0D[('&,3 PHWKRG WR WUDLQ ),5 ILOWHUV ZH JHW UHVXOWV VKRZQ LQ )LJXUH DQG )LJXUH ZKHUH IUHTXHQF\ EDQGV ZLWK RQO\ RQH GRPLQDWLQJ VRXUFH VLJQDO DUH IRXQG DQG WKH VFDWWHULQJ GLVWULEXWLRQV RI WKH RXWSXWV RI WKRVH ILOWHUV PDWFK ZLWK PL[LQJ GLUHFWLRQV 0LQL0D[('&,3 LV IXUWKHU DSSOLHG WR WKHVH UHVXOWV WR ILQG GHPL[LQJ V\VWHP REWDLQLQJ LPSURYHG G% DYHUDJH GHPL[LQJ SHUIRUPDQFH IRU WKH FDVH ZLWKRXW QRLVH DQG G% IRU WKH FDVH ZLWK QRLVH )LJXUH DQG )LJXUH f ,Q WKLV VHFWLRQ LW LV SRLQWHG RXW WKDW ILQGLQJ PL[LQJ GLUHFWLRQV LV PRUH HVVHQWLDO WKDQ REWDLQLQJ GHPL[LQJ GLUHFWLRQV 0D[LPL]LQJ ('&,3 FDQ KHOS WR REWDLQ IUHTXHQF\ EDQGV LQ ZKLFK PL[LQJ GLUHFWLRQV DUH HDVLHU WR ILQG 0LQL0D[('&,3 PHWKRG FDQ LPSURYH WKH GHPL[LQJ SHUIRUPDQFH RYHU 0LQL('&,3 PHWKRG $OWKRXJK WKH H[SHULPHQWV SUHVHQWHG KHUH DUH VSHFLILF RQHV WKH\ IXUWKHU FRQILUPV WKH HIIHFWLYHQHVV RI ('&,3 PHWKRG 7KH ZRUN RQ 0LQL0D[('&,3 LV SUHOLPLQDU\ EXW LW VXJJHVWV WKH RWKHU H[WUHPH PD[LPL]LQJ PXWXDO LQIRUPDWLRQf IRU %66 FRPSDUHG ZLWK DOO WKH FXUUHQW PHWKRGV PLQLPL]LQJ PXWXDO LQIRUPDWLRQf $V DQFLHQW SKLORVRSK\ VXJJHVWV WZR RSSRVLWH H[WUHPHV FDQ RIWHQ H[FKDQJH ,W LV ZRUWKZKLOH WR H[SORUH WKLV GLUHFWLRQ IRU %66 DQG HYHQ EOLQG GHFRQYROXWLRQ 7DEOH 'HPL[LQJ 3HUIRUPDQFH &RPSDULVRQ 7KH FDVH ZLWKRXW QRLVH 7KH FDVH ZLWK QRLVH L 0LQL&,3 G% G% 0LQL0D[&,3 G% G% PAGE 186 WKH FDVH ZLWKRXW QRLVH GHPL[LQJ 615 DSSURDFKLQJ G% WKH FDVH ZLWK QRLVH GHPL[LQJ 615 DSSURDFKLQJ G% )LJXUH 3HUIRUPDQFH E\ 0LQLPL]LQJ ('&,3 Df Ef Ff Gf Df GLVWULEXWLRQ RI WKH RXWSXWV RI ),5 Ff GLVWULEXWLRQ RI WKH RXWSXWV RI )LOWHU Ef VRXUFH VLJQDOV ILOWHUHG E\ ),5 UDWLR RI WZR VLJQDOV IURP G% WR G% Gf VRXUFH VLJQDOV ILOWHUHG E\ ),5 UDWLR RI WZR VLJQDOV IURP G% WR G% )LJXUH 7KH UHVXOWV RI ILOWHUV ),5 DQG ),5 REWDLQHG E\ 0D[ ('&,3 WKH FDVH ZLWKRXW QRLVHf PAGE 187 Df Ef Ff Gf Df GLVWULEXWLRQ RI WKH RXWSXWV RI ),5 Ff GLVWULEXWLRQ RI WKH RXWSXWV RI )LOWHU Ef VRXUFH VLJQDOV ILOWHUHG E\ ),5 UDWLR RI WZR VLJQDOV IURP G% WR G% Gf VRXUFH VLJQDOV ILOWHUHG E\ ),5 UDWLR RI WZR VLJQDOV IURP G% WR G% )LJXUH 7KH UHVXOWV RI ILOWHUV ),5 DQG ),5 REWDLQHG E\ 0D[ ('&,3 ZLWK QRLVHf WKH FDVH ZLWKRXW QRLVH GHPL[LQJ 615 DSSURDFKLQJ G% WKH FDVH ZLWK QRLVH GHPL[LQJ 615 DSSURDFKLQJ G% )LJXUH 7KH 3HUIRUPDQFH E\ 0LQL0D[ ('&,3 PAGE 188 &+$37(5 &21&/86,216 $1' )8785( :25. ,Q WKLV FKDSWHU ZH ZRXOG OLNH WR VXPPDUL]H WKH LVVXHV DGGUHVVHG LQ WKLV GLVVHUWDWLRQ DQG WKH FRQWULEXWLRQV ZH PDGH WRZDUGV WKHLU VROXWLRQV 7KH LQLWLDO JRDO LV WR HVWDEOLVK D JHQHUDO QRQSDUDPHWULF PHWKRG IRU LQIRUPDWLRQ HQWURS\ DQG PXWXDO LQIRUPDWLRQ HVWLPDWLRQ EDVHG RQO\ RQ GDWD VDPSOHV ZLWKRXW DQ\ RWKHU DVVXPSWLRQ )URP D SK\VLFDO SRLQW RI YLHZ WKH ZRUOG LV D fPDVVHQHUJ\f V\VWHP ,W WXUQV RXW WKDW HQWURS\ DQG PXWXDO LQIRUPDWLRQ FDQ DOVR EH YLHZHG IURP WKLV SRLQW RI YLHZ %DVHG RQ WKH RWKHU JHQHUDO PHDVXUH IRU HQWURS\ VXFK DV 5HQ\LfV HQWURS\ ZH LQWHUSUHW HQWURS\ DV D UHVFDOHG QRUP RI D SGI IXQFWLRQ DQG SURn SRVHG WKH LGHD RI WKH TXDGUDWLF PXWXDO LQIRUPDWLRQ %DVHG RQ WKHVH JHQHUDO PHDVXUHV WKH FRQFHSWV RI fLQIRUPDWLRQ SRWHQWLDOf DQG fFURVV LQIRUPDWLRQ SRWHQWLDOf DUH SURSRVHG 7KH RUGLQDU\ HQHUJ\ GHILQLWLRQ IRU D VLJQDO DQG WKH SURSRVHG ,3 DQG &,3 DUH SXW WRJHWKHU WR JLYH D XQLI\LQJ SRLQW RI YLHZ DERXW WKHVH IXQGDPHQWDO PHDVXUHV ZKLFK DUH FUXFLDO IRU VLJn QDO SURFHVVLQJ DQG DGDSWLYH OHDUQLQJ LQ JHQHUDO :LWK VXFK IXQGDPHQWDO WRRO D JHQHUDO LQIRUPDWLRQWKHRUHWLF OHDUQLQJ IUDPHZRUN LV JLYHQ ZKLFK FRQWDLQV DOO WKH FXUUHQW H[LVWLQJ LQIRUPDWLRQWKHRUHWLF OHDUQLQJ DV D VSHFLDO FDVH 0RUH LPSRUWDQWO\ ZH QRW RQO\ JLYH D JHQn HUDO OHDUQLQJ SULQFLSOH EXW DOVR JLYH DQ HIIHFWLYH LPSOHPHQWDWLRQ RI WKLV JHQHUDO SULQFLSOH :H EUHDN WKH EDUULHU RI PRGHO OLQHDULW\ DQG *DXVVLDQ DVVXPSWLRQ RQ GDWD ZKLFK DUH WKH PDMRU OLPLWDWLRQ RI WKH PRVW H[LVWLQJ PHWKRGV ,Q &KDSWHU D FDVH VWXG\ RQ OHDUQLQJ ZLWK RQOLQH ORFDO UXOH LV SUHVHQWHG :H HVWDEOLVK WKH OLQN EHWZHHQ WKH SRZHU ILHOG ZKLFK LV D PAGE 189 VSHFLDO FDVH RI WKH LQIRUPDWLRQ SRWHQWLDO ILHOG WR WKH IDPRXV ELRORJLFDO OHDUQLQJ UXOHV WKH +HEELDQ DQG WKH DQWL+HEELDQ UXOHV %DVHG RQ WKHVH EDVLF XQGHUVWDQGLQJ ZH GHYHORSHG DQ RQOLQH ORFDO OHDUQLQJ DOJRULWKP IRU WKH JHQHUDOL]HG HLJHQGHFRPSRVLWLRQ IRU VLJQDOV 6LPXn ODWLRQV DQG H[SHULPHQWV RI WKHVH PHWKRGV DUH FRQGXFWHG RQ VHYHUDO SUREOHPV VXFK DV DVSHFW DQJOH HVWLPDWLRQ IRU 6$5 LPDJHU\ WDUJHW UHFRJQLWLRQ OD\HUE\OD\HU WUDLQLQJ RI PXOWLOD\HU QHXUDO QHWZRUNV EOLQG VRXUFH VHSDUDWLRQ 7KH UHVXOWV IXUWKHU FRQILUP WKH SURn SRVHG PHWKRGRORJ\ 7KH PDMRU SUREOHP OHIW LV WKH IXUWKHU WKHRUHWLF MXVWLILFDWLRQ RI WKH TXDGUDWLF PXWXDO LQIRUPDWLRQ 7KH EDVLV IRU WKH 40, DV DQ LQGHSHQGHQFH PHDVXUH LV VWURQJ :H IXUWKHU SURn YLGH VRPH LQWXLWLYH DUJXPHQWV WKDW LW LV DOVR DSSURSULDWH DV D GHSHQGHQFH PHDVXUH DQG ZH DSSO\ WKH FULWHULD VXFFHVVIXOO\ WR VROYH VHYHUDO SUREOHPV +RZHYHU WKHUH LV VWLOO QR ULJRUn RXV WKHRUHWLFDO SURRI WKDW WKH 40, LV DSSURSULDWH IRU PXWXDO LQIRUPDWLRQ PD[LPL]DWLRQ 7KH SUREOHP RI WKH RQOLQH OHDUQLQJ ZLWK ,3 RU &,3 LV PHQWLRQHG LQ &KDSWHU 6LQFH ,3 RU &,3 H[DPLQHV VXFK GHWDLOHG LQIRUPDWLRQ DV WKH UHODWLYH SRVLWLRQ RI HDFK SDLU RI GDWD VDPSOHV LW LV YHU\ GLIILFXOW WR GHVLJQ DQ RQOLQH DOJRULWKP IRU ,3 DQG &,3 7KH RQOLQH UXOH IRU DQ HQHUJ\ PHDVXUH LV UHODWLYHO\ HDV\ WR REWDLQ EHFDXVH LW RQO\ H[DPLQHV WKH UHODWLYH SRVLWLRQ RI HDFK GDWD VDPSOH WR WKHLU PHDQ SRLQW 7KXV HDFK GDWD SRLQW LV UHODWLYHO\ LQGHn SHQGHQW ZLWK HDFK RWKHUV ZKLOH ,3 RU &,3 QHHG WR WDNH FDUH WKH UHODWLRQ RI HDFK GDWD VDPSOH WR DOO WKH RWKHUV 2QH VROXWLRQ WR WKLV SUREOHP PD\ FRPH IURP WKH XVH RI WKH PL[WXUH PRGHO ZKHUH WKH PHDQV IRU VXEFODVVHV RI DOO GDWD DUH XVHG 7KHQ WKH UHODWLYH SRVLWLRQ EHWZHHQ HDFK GDWD VDPSOH DQG HDFK VXEFODVV PHDQ QHHG WR EH FRQVLGHUHG (DFK PHDQ PD\ MXVW OLNH D fKHDY\f GDWD SRLQW ZLWK PRUH fPDVVf WKDQ DQ RUGLQDU\ GDWD VDPSOH 7KHVH fKHDY\f GDWD SRLQWV PD\ VHUYH DV D NLQG RI PHPRU\ LQ D OHDUQLQJ SURFHVV 7KH ,3 RU &,3 WKHQ PD\ PAGE 190 EHFRPH WKH ,3 RU &,3 RI HDFK VDPSOH LQ WKH ,3 RU &,3 ILHOG RI WKHVH fKHDY\ PHDQ SDUWLn FOHVf %DVHG RQ WKLV VFKHPH DQ RQOLQH DOJRULWKP PD\ EH GHYHORSHG 7KH *DXVVLDQ PL[n WXUH PRGHO DQG WKH (0 DOJRULWKP PHQWLRQHG LQ &KDSWHU PD\ EH WKH SRZHUIXO WRROV WR REWDLQ VXFK fKHDY\ LQIRUPDWLRQ SDUWLFOHVf 7KH FRPSXWDWLRQDO FRPSOH[LW\ RI ,3 RU &,3 PHWKRG LV LQ WKH RUGHU RI 1f ZKHUH 1 LV WKH QXPEHU RI GDWD VDPSOHV :LWK WKH fKHDY\ LQIRUPDWLRQ SDUWLFOHVf VXSSRVH WKHUH DUH 0 VXFK fSDUWLFOHVf DQG 0m1 DQG PD\ EH IL[HGf WKH FRPSXWDWLRQDO FRPSOH[LW\ PD\ EH UHGXFHG WR WKH RUGHU RI 01f 6R LW PD\ EH YHU\ VLJQLILFDQW WR IXUWKHU VWXG\ WKLV SRVVLn ELOLW\ ,Q WHUPV RI DOJRULWKPLF LPSOHPHQWDWLRQ KRZ WR FKRRVH WKH NHUQHO VL]H IRU ,3 DQG &,3 LV QRW GLVFXVVHG LQ WKH SUHYLRXV &KDSWHUV :H HPSLULFDOO\ FKRRVH WKH NHUQHO VL]H GXULQJ RXU H[SHULPHQWV ,W KDV EHHQ REVHUYHG WKDW WKH &,3 LV QRW VHQVLWLYH WR NHUQHO VL]H EXW NHUn QHO VL]H PD\ EH FUXFLDO IRU WKH ,3 )XUWKHU VWXG\ RQ WKLV LVVXH RU HYHQ D PHWKRG WR VHOHFW WKH RSWLPDO NHUQHO VL]H LV LPSRUWDQW IRU WKH ,3 DQG WKH &,3 PHWKRGV 7KH ,3 DQG WKH &,3 PHWKRGV DUH JHQHUDO 7KH\ PD\ ILQG PDQ\ DSSOLFDWLRQV LQ SUDFWLFDO SUREOHPV 7R ILQG PRUH DSSOLFDWLRQV ZLOO DOVR EH DQ LPSRUWDQW ZRUN LQ WKH IXWXUH PAGE 191 $33(1',; $ 7+( ,17(*5$7,21 2) 7+( 352'8&7 2) *$866,$1 .(51(/6 7 /HW *\ ;f fÂ§fÂ§fÂ§ H[S, fÂ§\ ; \ EH WKH *DXVVLDQ IXQFWLRQ LQ N GLPHQ WLf _;_ A N N N VLRQDO VSDFH ZKHUH ; LV WKH FRYDULDQFH PDWUL[ \ H 5 /HW Dc H 5 DQG FLM H 5 EH WZR GDWD SRLQWV LQ WKH VSDFH ; DQG ; EH WZR FRYDULDQFH PDWULFHV IRU WZR *DXVVLDQ NHUQHOV LQ WKH VSDFH WKHQ ZH KDYH *\DL,Of*\DMOfG\ *DDMf ff $Of fÂ§ 6LPLODUO\ WKH LQWHJUDWLRQ RI WKH SURGXFW RI WKUHH *DXVVLDQ NHUQHOV FDQ DOVR EH REWDLQHG 7KH IROORZLQJ LV WKH SURRI RI $Of 3URRI /HW G FLWGM WKHQ $Of EHFRPHV RR *\GLn=Of*\GMn=fG\ *^\GLA*\LfG\ fÂ§RR fÂ§RR *G ; ;ff /HW F ;M ;nf [\ WKHQ ZH KDYH 2 Gf7,A ^\Gf \7n/O\ \Ff7=aO ;AfFf ; ;fanM $FWXDOO\ E\ WKH PDWUL[ LQYHUVLRQ OHPPD >*RO@ $f $f PAGE 192 $ &%&f n $ O$ O&% &7$ &f n&7$ $f DQG OHW $ = DQG & LGHQWLW\ PDWUL[f ZH KDYH 6 VW: VLnI9 VfB $f 6LQFH e DQG e DUH DOO V\PPHWULF ZH KDYH \fÂ§Gf e \aGf \7 A \ ]M\ e\ G7W G ]A\ \U= ,f\FUe ="f\ =cnfF F7WI = fF = G $f \ FfA f\ Ff ffn G G \ Ff7LU@ = f Ff >= = = fB @ G \ FfA f\ Ff M ,fB UI 6LQFH 8___ _AL"_ LI H[LVWVf DQG ?$? 8 r_ LI $ H[LVWVf ZH KDYH SL A HnH O__6_ ,6 _@ 6 B__e = =n ( ]c L Of 6n =L ]n $f %DVHG RQ $f DQG $f ZH KDYH *&\-=M-*&\ 6f *\F6c =fnf*A66ff $f $FWXDOO\ E\ DSSO\LQJ $f DQG $f ZH KDYH PAGE 193 rf _e_ *\GOf*\ =f Â‘7fÂ§fÂ§A H[SL_> \Gf7=OMGf \bn\@ UAO A A 6nZ\XH[S>F<67 "f\Ff Gf( M\G@ WLf _6_ _=_ Y ] *\aF =n =f AA6M(ff *\F 6]n(Mf nf*A((ff $f QL[ ( A V K 9 LO O6 \ 6LQFH f LV WKH *DXVVLDQ SGI IXQFWLRQ DQG LWV LQWHJUDWLRQ HTXDOV WR ZH KDYH *\Gn/Lf*\ ;fG\ *\F 6cf 6fnnf*G 6 =ffG\ fÂ§ *G ( =ff *\F^WA OAf;fG\ fÂ§ *G 6 =ff $ f 6R $f LV SURYHG DQG HTXLYDOHQWO\ $Of LV SURYHG PAGE 194 $33(1',; % 6+$1121 (17523< 2) 08/7,',0(16,21$/ *$866,$1 9$5,$%/( N )RU D *DXVVLDQ UDQGRP YDULDEOH ; H 5 ZLWK SGI IXQFWLRQ 7 1 I-[f fÂ§fÂ§fÂ§H[S, fÂ§[ fÂ§ Sf = [ fÂ§Sf ZKHUH S LV WKH PHDQ DQG = LV WKH WLf _=_ A FRYDULDQFH PDWUL[ 6KDQQRQfV LQIRUPDWLRQ HQWURS\ LV +6;f _ORJ_=__ORJ7U %Of 3URRI +6^;f (LORJIA[f@ (> _ORJWf _ORJ_=_ O;7<7;; sORJ_=_ A ORJ L s(>WU;7O7O;f@ LORJ_=_ _ORJ7 O(>WU;;7U;f@ N U L %f ÂORJ_=_ _ORJW AU>e; (^;;7f =ff@ AORJ_=_ _ORJL _WU>@ AORJ_=_ _ORJL ZKHUH WU> @ LV WKH WUDFH RSHUDWRU PAGE 195 $33(1',; & 5(1<, (17523< 2) 08/7,',0(16,21$/ *$866,$1 9$5,$%/( f N )RU D *DXVVLDQ UDQGRP YDULDEOH ; H 5 ZLWK SGI IXQFWLRQ W $ I;[f fÂ§fÂ§fÂ§H[S fÂ§ MF fÂ§ Sf ; [fÂ§ Sf ZKHUH S LV WKH PHDQ DQG ; LV WKH UFf _;_ 9 f FRYDULDQFH PDWUL[ 5HQ\LfV LQIRUPDWLRQ HQWURS\ LV _ORJ_6_ _ORJMW Af &f 3URRI XVLQJ $Of ZH KDYH I[[fDG[ *[S;fD*[S;fDG[ fÂ§ fÂ§ fÂ§ MLf:nf!4A__afJR ÂIf &f QfrrJff_L__ff W WFf ; D IODf UODf WWf D 8;_ &f ORJ_V__ORJ_A D PAGE 196 $33(1',; +& (17523< 2) 08/7,',0(16,21$/ *$866,$1 9$5,$%/( N )RU D *DXVVLDQ UDQGRP YDULDEOH PAGE 197 5()(5(1&(6 >$FH@ $ $FHUR $FRXVWLFDO DQG (QYLURQPHQWDO 5REXVWQHVV LQ $XWRPDWLF 6SHHFK 5HFn RJQLWLRQ .OXZHU $FDGHPLF 3XEOLVKHUV %RVWRQ >$F]@ $F]HO = 'DURF]\ 2Q 0HDVXUHV RI ,QIRUPDWLRQ DQG 7KHLU &KDUDFWHUL]DWLRQV $FDGHPLF 3UHVV 1HZ PAGE 198 >&DR@ ;5 &DR 5: /LX f*HQHUDO $SSURDFK WR %OLQG 6RXUFH 6HSDUDWLRQf ,((( 7UDQVDFWLRQV RQ 6LJQDO 3URFHVVLQJ 9RO SS 0DUFK >&KD@ &KDQGOHU ,QWURGXFWLRQ WR 0RGHP 6WDWLVWLFDO 0HFKDQLFV 2[IRUG 8QLYHUVLW\ 3UHVV 1HZ PAGE 199 >)LV@ : )LVKHU f1RQOLQHDU ([WHQVLRQV WR WKH 0LQLPXP $YHUDJH &RUUHODWLRQ (QHUJ\ )LOWHUf 3K' GLVVHUWDWLRQ 'HSDUWPHQW RI (OHFWULFDO DQG &RPSXWHU (QJLQHHULQJ 8QLYHUVLW\ RI )ORULGD *DLQHVYLOOH >*DO@ $ 5 *DOODQW DQG + :KLWH f7KHUH ([LVWV D 1HXUDO 1HWZRUN 7KDW 'RHV 1RW 0DNH $YRLGDEOH 0LVWDNHVf ,((( ,QWHUQDWLRQDO &RQIHUHQFH RQ 1HXUDO 1HWZRUN 9ROO SS 6DQ 'LHJR >*Â@ 5 ( *LOO : 0XUUD\ DQG 0 + :ULJKW 3UDFWLFDO 2SWLPL]DWLRQ $FDGHPLF 3UHVV 1HZ PAGE 200 1HZ PAGE 201 >/LQ@ 5 /LQVNHU f$Q $SSOLFDWLRQ RI WKH 3ULQFLSOH RI 0D[LPXP ,QIRUPDWLRQ 3UHVHUYDn WLRQ WR /LQHDU 6\VWHPVf ,Q $GYDQFHV LQ 1HXUDO ,QIRUPDWLRQ 3URFHVVLQJ 6\VWHPV HGLWHG E\ '6 7RXUHW]N\f SS 0RUJDQ .DXIPDQQ 6DQ 0DWHR &$ >0DR@ 0DR DQG $ -DLQ f$UWLILFLDO 1HXUDO 1HWZRUNV IRU )HDWXUH ([WUDFWLRQ DQG 0XOWLYDULDWH 'DWD 3URMHFWLRQf ,((( 7UDQVDFWLRQV RQ 1HXUDO 1HWZRUN 9RO 1R SS 0DUFK >0FO@ 0F/DFKODQ DQG ( %DVIRUG 0L[WXUH 0RGHOV ,QIHUHQFH DQG $SSOLFDWLRQV WR &OXVWHULQJ 0DUFHO 'HNNHU ,QF 1HZ PAGE 202 SHQGHQW &RPSRQHQW $QDO\VLVf ,((( 7UDQVDFWLRQV RQ 6LJQDO 3URFHVVLQJ 9RO 1R SS 1RYHPEHU >3OX@ 0 3OXPEOH\ DQG ) )DOOVLGH f$Q ,QIRUPDWLRQ7KHRUHWLF $SSURDFK WR 8QVXn SHUYLVHG &RQQHFWLRQLVW 0RGHOVf LQ 3URFHHGLQJV RI WKH &RQQHFWLRQLVW 0RGn HOV 6XPPHU 6FKRRO HGLWHG E\ 7RXUHW]N\ +LQWRQ DQG 7 6HMQRZVNLf SS 0RUJDQ .DXLPDQQ 6DQ 0DWHR &$ >3RJ@ 7 3RJJLR DQG ) *LURVL f1HWZRUNV IRU $SSUR[LPDWLRQ DQG /HDUQLQJf 3URFHHGn LQJV RI WKH ,((( 9RO SS >3UL@ & 3ULQFLSH % GH9ULHV DQG 3 *XHGHV GH 2OLYHLUD f7KH *DPPD )LOWHUV $ 1HZ &ODVV RI $GDSWLYH +5 )LOWHUV ZLWK 5HVWULFWHG )HHGEDFNf ,((( 7UDQVDFWLRQV RQ 6LJQDO 3URFHVVLQJ 9RO 1R SS >3ULD@ & 3ULQFLSH ;X DQG & :DQJ f*HQHUDOL]HG 2MDfV 5XOH IRU /LQHDU 'LVFULPLn QDQW $QDO\VLV ZLWK )LVKHU &ULWHULRQf WKH SURFHHGLQJV RI ,((( ,QWHUQDWLRQDO &RQIHUHQFH RQ $FRXVWLF 6SHHFK DQG 6LJQDO 3URFHVVLQJ SS 0XQLFK *HUPDQ\ >3ULE@ & 3ULQFLSH DQG ;X f&ODVVLILFDWLRQ ZLWK /LQHDU QHWZRUNV 8VLQJ DQ 2QOLQH &RQVWUDLQHG /'$ $OJRULWKPf 3URFHHGLQJV RI WKH ,((( :RUNVKRS RQ 1HXn UDO 1HWZRUNV IRU 6LJQDO 3URFHVVLQJ 9,, SS $PHOLD ,VODQG )/ >3UL@ & 3ULQFLSH 4 =KDR DQG ;X f$ 1RYHO $75 &ODVVLILHU ([SORLWLQJ 3RVH ,QIRUPDWLRQf 3URFHHGLQJV RI ,PDJH 8QGHUVWDQGLQJ :RUNVKRS 9RO SS 0RQWHUH\ &DOLIRUQLD >5DE@ / 5DELQHU DQG % + -XDQJ )XQGDPHQWDOV RI 6SHHFK 5HFRJQLWLRQ 3UHQWLFH +DOO (QJOHZRRG &OLIIV 1>5HQ@ $ 5HQ\L f6RPH )XQGDPHQWDO 4XHVWLRQV RI ,QIRUPDWLRQ 7KHRU\f LQ 6HOHFWHG 3DSHUV RI $OIUHG 5HQ\L 9RO SS $NDGHPLDL .LDGR %XGDSHVW >5HQ@ $ 5HQ\L f2Q 0HDVXUHV RI (QWURS\ DQG ,QIRUPDWLRQf LQ 6HOHFWHG 3DSHUV RI $OIUHG 5HQ\L 9RO SS $NDGHPLDL .LDGR %XGDSHVW >5RV@ ) 5RVHQEODWW f7KH 3HUFHSWURQ $ 3UREDELOLVWLF 0RGHO IRU ,QIRUPDWLRQ 6WRUDJH DQG 2UJDQL]DWLRQ LQ WKH %UDLQf 3V\FKRORJLFDO 5HYLHZ 9RO SS >5RV@ 5 5RVHQEODWW 3ULQFLSOHV RI 1HXURG\QDPLFV 3HUFHSWURQ DQG 7KHRU\ RI %UDLQ 0HFKDQLVPV 6SDUWDQ %RRNV :DVKLQJWRQ '& >5XD@ ( 5XPHOKDUW DQG / 0F&OHOODQG HGV 3DUDOOHO 'LVWULEXWHG 3URFHVVLQJ ([SORUDWLRQV LQ WKH 0LFURVWUXFWXUH RI &RJQLWLRQ 0,7 3UHVV &DPEULGJH 0$ PAGE 203 >5XE@ ( 5XPHOKDUW ( +LQWRQ DQG 5 :LOOLDPV f/HDUQLQJ 5HSUHVHQWDWLRQV RI %DFN3URSDJDWLRQ (UURUVf 1DWXUH /RQGRQf 9RO SS >5XF@ ( 5XPHOKDUW ( +LQWRQ DQG 5 :LOOLDPV f/HDUQLQJ ,QWHUQDO 5HSUHVHQWDn WLRQV E\ (UURU 3URSDJDWLRQf LQ 3DUDOOHO 'LVWULEXWHG 3URFHVVLQJ 9ROO &KDSWHU 0,7 3UHVV &DPEULGJH 0$ >6KD@ & ( 6KDQQRQ f$ 0DWKHPDWLFDO 7KHRU\ RI &RPPXQLFDWLRQf %HOO 6\VWHP 7HFKn QLFDO -RXUQDO 9RO SS SS >6KD@ & ( 6KDQQRQ DQG : :HDYHU 7KH 0DWKHPDWLFDO 7KHRU\ RI &RPPXQLFDWLRQ 8QLYHUVLW\ RI ,OOLQRLV 3UHVV 8UEDQD >6LO@ % : 6LOYHUPDQ 'HQVLW\ (VWLPDWLRQ )RU 6WDWLVWLFV DQG 'DWD $QDO\VLV &KDSPDQ DQG +DOO 1HZ PAGE 204 >:LG@ % :LGURZ $ 6WDWLVWLFDO 7KHRU\ RI $GDSWDWLRQ 3HUJDPRQ 3UHVV 2[IRUG >:LG@ % :LGURZ $GDSWLYH 6LJQDO 3URFHVVLQJ 3UHQWLFH+DOO (QJOHZRRG &OLIIV 1HZ -HUVH\ >:LO@ 6 6 :LONV 0DWKHPDWLFDO 6WDWLVWLFV -RKQ :LOH\ t 6RQV ,QF 1HZ PAGE 205 > PAGE 206 %,2*5$3+,&$/ 6.(7&+ 'RQJ[LQ ;X ZDV ERP RQ -DQXDU\ LQ -LDQJVX &KLQD +H HDUQHG KLV EDFKHORUfV GHJUHH LQ HOHFWULFDO HQJLQHHULQJ IURP ;LfDQ -LDRWRQJ 8QLYHUVLW\ &KLQD LQ ,Q KH UHFHLYHG KLV 0DVWHU RI 6FLHQFH GHJUHH LQ FRPSXWHU VFLHQFH IURP WKH ,QVWLWXWH RI $XWRn PDWLRQ &KLQHVH $FDGHP\ RI 6FLHQFHV %HLMLQJ &KLQD $IWHU WKDW KH KDG EHHQ GRLQJ UHVHDUFK RQ VSHHFK VLJQDO SURFHVVLQJ VSHHFK UHFRJQLWLRQ SDWWHUQ UHFRJQLWLRQ DUWLILFLDO LQWHOOLJHQFH DQG QHXUDO QHWZRUN LQ WKH 1DWLRQDO /DERUDWRU\ RI 3DWWHUQ 5HFRJQLWLRQ LQ &KLQD IRU \HDUV 6LQFH KH KDV EHHQ D 3K' VWXGHQW LQ WKH 'HSDUWPHQW RI (OHFWULFDO DQG &RPSXWHU (QJLQHHULQJ 8QLYHUVLW\ RI )ORULGD +H KDV ZRUNHG LQ WKH &RPSXWDWLRQDO 1HXUR(QJLQHHULQJ /DERUDWRU\ RQ YDULRXV WRSLFV LQ VLJQDO SURFHVVLQJ +LV PDLQ UHVHDUFK LQWHUHVWV DUH DGDSWLYH V\VWHPV VSHHFK FRGLQJ HQKDQFHPHQW DQG UHFRJQLWLRQ LPDJH SURn FHVVLQJ GLJLWDO FRPPXQLFDWLRQ DQG VWDWLVWLFDO VLJQDO SURFHVVLQJ PAGE 207 FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ PY RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VTSSH DQG TXDOLW\ DVD GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ -RVIH \ 3LnL QHUS A&KDLUPDQ 3WSUIHVVIWU RI (OHFWULFDO DQG &RPSXWHU (QJLQHHULQJ FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DVD GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ I 3URIHVVRU RI (OHFWULFDO DQG &RPSXWHU (QJLQHHULQJ FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DVD GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ $VVLVWDQW 3URIHVVRU RI (OHFWULFDO DQG &RPSXWHU (QJLQHHULQJ FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P? RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ $VVLVWDQW 3URIHVVRU RI (OHFWULFDO DQG &RPSXWHU (QJLQHHULQJ PAGE 208 7KLV GLVVHUWDWLRQ ZDV VXEPLWWHG WR WKH *UDGXDWH )DFXOW\ RI WKH &ROOHJH RI (QJLQHHULQJ DQG WR WKH *UDGXDWH 6FKRRO DQG ZDV DFFHSWHG DV SDUWLDO IXOILOOPHQW RI WKH UHTXLUHPHQWV IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 0D\ U / :LQIUHG 0 3KLOOLSV 'HDQ &ROOHJH RI (QJLQHHULQ xml version 1.0 encoding UTF-8 REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID ET78DQSNK_WXPY33 INGEST_TIME 2015-02-05T20:24:48Z PACKAGE AA00024501_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES ENERGY, ENTROPY AND INFORMATION POTENTIAL FOR NEURAL COMPUTATION By DONGXIN XU A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1999 To My Parents ACKNOWLEDGEMENTS Â«me ftssx-tf This Chinese poem exactly expresses my feeling and experience in four yearsâ€™ Ph.D study. During this period, there have been difficulties encountered both in the course of my research and in my daily life. Just as the poem says, there are always hopes in spite of difficulties. Retrospecting the past, I would like to express my gratitude to individuals who brought me hope and light which guided me go through the darkness. First, I would like to thank my advisor, Dr. JosÃ© Principe, for providing me with the wonderful opportunity to be a Ph.D student in CNEL. Its excellent environment helped me a lot when I just came here. I was impressed by Dr. Principeâ€™s active thought and appreciÂ¬ ated very much his style of supervision which give a lot of space to students to explore on their own. I am grateful for his introducing me to the area of the information-theoretic learning and the guidance throughout the development of this dissertation. I would also like to thank my committee members Dr. John Harris, Dr. Donald Childers, Dr. Jacob Hammer, Dr. Mark Yang and Dr. Tan Wong for their guidance and disÂ¬ cussion they provided. Their comments are critical and constructive. Special thank goes to John Fisher for introducing his work to me, which actually inspired this work. Special thank also goes to Chuan Wang for introducing me to CNEL and the friendship he provided. The discussions with Hsiao-Chun Wu were fruitful. The special thank is also due to him. I would also like to thank the other CNEL fellows. The list includes, but not limited to, Likang Yen, Craig Fancourt, Frank Candocia, Qun Zhao for their help and friendship. I would like to thank my brother, sister and my friend Yuan Yao for their constant love, support and encouragement. Finally, I would like to thank my wife, Shu, for her love, support, patience and sacriÂ¬ fice, which made this dissertation possible. Â« IV TABLE OF CONTENTS Page ACKNOWLEDGEMENTS iii ABSTRACT viii CHAPTERS 1 INTRODUCTION 1 1.1 Information and Energy: A Brief Review 1 1.2 Motivation 6 1.3 Outline 15 2 ENERGY, ENTROPY AND INFORMATION POTENTIAL 17 2.1 Energy, Entropy and Information of Signals 17 2.1.1 Energy of Signals 17 2.1.2 Information Entropy 20 2.1.3 Geometrical Interpretation of Entropy 24 2.1.4 Mutual Information 27 2.1.5 Quadratic Mutual Information 31 2.1.6 Geometrical Interpretation of Mutual Information 38 2.1.7 Energy and Entropy for Gaussian Signal 39 2.1.8 Cross-Correlation and Mutual Information for Gaussian Signal .... 42 2.2 Empirical Energy, Entropy and MI: Problem and Literature Review 44 2.2.1 Empirical Energy 44 2.2.2 Empirical Entropy and Mutual Information: The Problem 44 2.2.3 Nonparametric Density Estimation 46 2.2.4 Empirical Entropy and Mutual Information: The Literature Review 51 2.3 Quadratic Entropy and Information Potential 57 2.3.1 The Development of Information Potential 57 2.3.2 Information Force (IF) 59 2.3.3 The Calculation of Information Potential and Force 60 2.4 Quadratic Mutual Information and Cross Information Potential 62 2.4.1 QMI and Cross Information Potential (CIP) 62 2.4.2 Cross Information Forces (CIF) 65 2.4.3 An Explanation to QMI 66 Page 3 LEARNING FROM EXAMPLES 68 3.1 Learning System 68 3.1.1 Static Models 69 3.1.2 Dynamic Models 74 3.2 Learning Mechanisms 78 3.2.1 Learning Criteria 79 3.2.2 Optimization Techniques 83 3.3 General Point of View 90 3.3.1 InfoMax Principle 90 3.3.2 Other Similar Information-Theoretic Schemes 91 3.3.3 A General Scheme 95 3.3.4 Learning as Information Transmission Layer-by-Layer 96 3.3.5 Information Filtering: Filtering beyond Spectrum 97 3.4 Learning by Information Force 97 3.5 Discussion of Generalization by Learning 99 4 LEARNING WITH ON-LINE LOCAL RULE: A CASE STUDY ON GENERALIZED EIGENDECOMPOSITION 101 4.1 Energy, Correlation and Decorrelation for Linear Model 101 4.1.1 Signal Power, Quadratic Form, Correlation, Hebbian and Anti-Hebbian Learning 102 4.1.2 Lateral Inhibition Connections, Anti-Hebbian Learning and Decorrelation 103 4.2 Eigendecomposition and Generalized Eigendecomposition 105 4.2.1 The Information-Theoretic Formulation for Eigendecomposition and Generalized Eigendecomposition 106 4.2.2 The Formulation of Eigendecomposition and Generalized Eigendecomposition Based on the Energy Measures 109 4.3 The On-line Local Rule for Eigendecomposition 111 4.3.1 Ojaâ€™s Rule and the First Projection 111 4.3.2 Geometrical Explanation to Ojaâ€™s Rule 112 4.3.3 Sangerâ€™s Rule and the Other Projections 113 4.3.4 APEX Model: The Local Implementation of Sangerâ€™s Rule 114 4.4 An Iterative Method for Generalized Eigendecomposition 118 4.5 An On-line Local Rule for Generalized Eigendecomposition 120 4.5.1 The Proposed Learning Rule for the First Projection 121 4.5.2 The Proposed Learning Rules for the Other Connections 127 4.6 Simulations 133 4.7 Conclusion and Discussion 134 5 APPLICATIONS 138 5.1 Aspect Angle Estimation for SAR Imagery 138 5.1.1 Problem Description 138 5.1.2 Problem Formulation 139 5.1.3 Experiments of Aspect Angle Estimation 142 vi Page 5.1.4Occlusion Test on Aspect Angle Estimation 149 5.2 Automatic Target Recognition (ATR) 152 5.2.1 Problem Description and Formulation 152 5.2.2 Experiment and Result 155 5.3 Training MLP Layer-by-Layer with CIP 160 5.4 Blind Source Separation and Independent Component Analysis 164 5.4.1 Problem Description and Formulation 164 5.4.2 Blind Source Separation with CS-QMI (CS-CIP) 165 5.4.3 Blind Source Separation by Maximizing Quadratic Entropy 167 5.4.4 Blind Source Separation with ED-QMI (ED-CIP) and MiniMax Method 171 6 CONCLUSIONS AND FUTURE WORK 179 APENDICES A THE INTEGRATION OF THE PRODUCT OF GAUSSIAN KERNELS 182 B SHANNON ENTROPY OF MULTI-DIMENSIONAL GAUSSIAN VARIABLE 185 C RENYI ENTROPY OF MULTI-DIMENSIONAL GAUSSIAN VARIABLE 186 D H-C ENTROPY OF MULTI-DIMENSIONAL GAUSSIAN VARIABLE 187 REFERENCES 188 BIOGRAPHICAL SKETCH 197 vii Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ENERGY, ENTROPY AND INFORMATION POTENTIAL FOR NEURAL COMPUTATION By Dongxin Xu May 1999 Chairman: Dr. JosÃ© C. Principe Major Department: Electrical and Computer Engineering The major goal of this research is to develop general nonparametric methods for the estimation of entropy and mutual information, giving a unifying point of view for their use in signal processing and neural computation. In many real world problems, the informaÂ¬ tion is carried solely by data samples without any other a priori knowledge. The central issue of â€œlearning from examplesâ€ is to estimate energy, entropy or mutual information of a variable only from its samples and adapt the system parameters by optimizing a criterion based on the estimation. By using alternative entropy measures such as Renyiâ€™s quadratic entropy, coupled with the Parzen window estimation of the probability density function for data samples, we developed an â€œinformation potentialâ€ method for entropy estimation. In this method, data samples are treated as physical particles and the entropy turns out to be related to the potential energy of these â€œinformation particles.â€ The entropy maximization or minimiza- vni tion is then equivalent to the minimization or the maximization of the â€œinformation potenÂ¬ tial.â€ Based on the Cauchy-Schwartz inequality and the Euclidean distance metric, we further proposed the quadratic mutual information as an alternative to Shannonâ€™s mutual information. There is also a â€œcross information potentialâ€ implementation for the quaÂ¬ dratic mutual information that measures the correlation between the â€œmarginal informaÂ¬ tion potentialsâ€ at several levels. â€œLearning from examplesâ€ at the output of a mapper by the â€œinformation potentialâ€ or the â€œcross information potentialâ€ is implemented by propaÂ¬ gating the â€œinformation forceâ€ or the â€œcross information forceâ€ back to the system paramÂ¬ eters. Since the criteria are decoupled from the structure of learning machines, they are general learning schemes. The â€œinformation potentialâ€ and the â€œcross information potenÂ¬ tialâ€ provide a microscopic expression for the macroscopic measure of the entropy and mutual information at the data sample level. The algorithms examine the relative position of each data pair and thus have a computational complexity of O^N2). An on-line local algorithm for learning is also discussed, where the energy field is related to the famous biological Hebbian and anti-Hebbian learning rules. Based on this understanding, an on-line local algorithm for the generalized eigendecomposition is proÂ¬ posed. The information potential methods have been successfully applied to various problems such as aspect angle estimation in synthetic aperture radar (SAR) imagery, target recogniÂ¬ tion in SAR imagery, layer-by-layer training of multilayer neural networks and blind source separation. The good performance of the methods on various problems confirms the validity and efficiency of the information potential methods. IX CHAPTER 1 INTRODUCTION 1.1 Information and Energy: A Brief Review Information plays an important role both in the life of a person and of a society, espeÂ¬ cially in todayâ€™s information age. The basic purpose of all kinds of scientific research is to obtain information in a particular area. One of the most important tasks of space programs is to get information about cosmic space and celestial bodies, such as evidence whether there is life on Mars. A central problem of the Internet is how to transmit, process and store information in computer networks. â€œLike it or not, we are information dependent. It is a commodity as vital as the air we breathe, as any of our metabolic energy requirements. For better or worse, weâ€™re all inescapably embedded in a universe of flows, not only of matter and energy but also of whatever it is we call informationâ€ [You87: page 1]. The notion of information is so fundamental and universal that only the notion of energy can be compared with it. The parallel and analogy of these two fundamental notions are well known. Most of the greatest inventions and discoveries in scientific and human history can be related to either the conversion, transfer, and storage of energy or the transmission and storage of information. For instance, the use of fire and water, the invention of simple machines such as the lever and the wheel, and the invention of the steam-engine, the discoveries of electricity and atomic energy are all connected to energy while the appearance of speech in the prehistoric times and the invention of writing at the 1 2 dawn of human history, followed by the invention of paper, printing, telegraph, photograÂ¬ phy, telephone, radio, television and finally the computer and the computer network are examples of information. Many inventions and discoveries can be used for both purposes. Fire, as an example, can be used for cooking, heating and transmitting signals. Electricity, as another example, can be used for transmitting both energy and information [Ren60]. There are a variety of energies and information. If we disregard the actual form of energy (mechanical, thermal, chemical, electrical and atomic, etc.) and the real content of information, what will be left is the pure quantity [Ren60], The principle of energy conserÂ¬ vation was formulated and developed in the middle of the last century, while the essence of information was studied later in the 1940s. With the quantity of energy, we can come up to the conclusion that a small amount of U235 contains a large amount of atomic energy and our world came into the atomic age. With the pure quantity of information, we can tell that the optical cable can transmit much more information than the ordinary elecÂ¬ trical telephone line, and in general, the capacity of a communication channel can be specÂ¬ ified in terms of the rate of information quantity. Although the quantitative measure of information was originated from the study of communication, it is such a fundamental concept and method that it has been widely applied to many areas such as statistics, physÂ¬ ics, chemistry, biology, lifescience, psychology, psychobiology, cognitive science, neuroÂ¬ science, cybernetics, computer sciences, economics, operation research, linguistics, philosophy [You87, Kub75, Kap92, Jum86]. The study of quantitative measure of information in communication systems started in 1920s. In 1924 Nyquist showed that the speed W of transmission of intelligence over a telegraph circuit with a fixed line speed is proportional to the logarithm of a number m of 3 current values used to encode the message: W - klogm, where A: is a constant [Nyq24, Chr81]. In 1928, Hartley generalized this to all forms of communication, letting m repreÂ¬ sent the number of symbols available at each selection of a symbol to be transmitted. HartÂ¬ ley explicitly addressed the issue of the quantitative measure for information and pointed out that it should be independent of psychological factors (or objective) [Har28, Chr81]. Later in 1948, Shannon published his celebrated paper â€œA Mathematical Theory of ComÂ¬ munication,â€ which explored the statistical structure of a message and extended Nyquist and Hartleyâ€™s logarithmic measure for information to a probabilistic logarithm: N N I = pk\ogpk for the probability structure pk> 0 (k = 1 ^pk= 1. k=1 k= 1 When pk- Wm in the equiprobable case, Shannonâ€™s measure degenerates to Hartleyâ€™s measure [Sha48, Sha62], Shannonâ€™s measure can also be regarded as a measure for uncerÂ¬ tainty. It laid the foundation for information theory. There is a striking formal similarity between Shannonâ€™s measure and the entropy in statistical mechanics. This was one of the reasons that led von Neumann to suggest to Shannon to call his uncertainty measure the entropy [Tri71]. â€œEntropieâ€ was a German word coined in 1865 by Clausius to represent the capacity for change of matter [Chr81]. The second law of thermodynamics, formulated by Clausius, is also known as the entropy law. Its best-known statement has been in the form, â€œHeat cannot by itself pass from a colder to a hotter system.â€ Or more formally, the entropy of a closed system will never decrease, but can only increase until it reaches its maximum [You87]. The entropy maxiÂ¬ mum principle of a closed system has a corollary that is an energy minimum principle [Cha87]; i.e., the energy of the closed system will reach its minimum when the entropy of the system reaches its maximum. 4 Clausiusâ€™ entropy was initially an abstract and macroscopic idea. It was Boltzmann who first gave the entropy a microscopic and probabilistic interpretation. Boltzmannâ€™s work showed that entropy could be understood as a statistical law measuring the probable states of the particles in a closed system. In statistical mechanics, each particle in a system occupies a point in a â€œphase space,â€ and so the entropy of a system came to constitute a measure for the probability of the microscopic state (distribution of particles) of any such system. According to this interpretation, a closed system will approach a state of thermoÂ¬ dynamic equilibrium because equilibrium is overwhelmingly the most probable state of the system. The probabilistic interpretation of entropy resulted in an interpretation of entropy that is one of the cornerstones of the modem relationship between measures of entropy and the amount of information in a message. That is, both the information entropy and the statistical mechanical entropy are the measure of uncertainty or disorder of a sysÂ¬ tem [You87]. One interesting problem about entropy which puzzled physicists for almost 80 years is Maxwellâ€™s Demon, a hypothetical identity which could theoretically sort the molecules of a gas into either of two compartments, say, the faster molecules going into A, the slower to B, resulting in the lowering of the temperature in B while raising it in A without expendiÂ¬ ture of work. But according to the second law of thermodynamics, i.e. the entropy law, the temperature of a closed system will eventually be even and thus the entropy be maxiÂ¬ mized. In 1929, Szilard pointed out that the sorting of the molecules depends on the inforÂ¬ mation about the speed of molecules which is obtained by the measurement or observation on molecules, and any such measurement or observation will invariably involve dissipaÂ¬ tion of energy and increase entropy. While Szilard did not produce a working model, he 5 showed mathematically that entropy and information were fundamentally interconnected, and his formula was analogous to the measures of information developed by Nyquist and Hartley and eventually by Shannon [You87]. Contrary to closed systems, the open systems with energy flux in and out tend to self- organize and develop and maintain a structural identity, resisting the entropy drift of closed systems and their irreversible thermodynamic fate [You87, Hak88]. In this area, Prigogine and his colleaguesâ€™ work on nonlinear, nonequilibrium processes made a pecuÂ¬ liar contribution, which provides a powerful explanation of how order in the form of stable structures can be built up and maintained in a universe whose ingredients seem otherwise subject to a law of increasing entropy [You87]. Boltzmann and othersâ€™ work gave the relationship between entropy maximization and state probabilities; that is, the most probable microscopic state of an ensemble is a state of uniformity described by maximizing its entropy subject to constraints specifying its observed macroscopic condition [Chr81]. The maximization of Shannonâ€™s entropy, as a comparison, can be used as the basis for equiprobability assumptions (an equiprobability should be used upon the total ignorance of the probability distribution). Information-theoÂ¬ retic entropy maximization subject to known constraints was explored by Jaynes in 1957 as a basis for statistical mechanics, which in turn makes it a basis for thermostatics and thermodynamics [Chr81]. Jaynes also pointed out: â€œin making inferences on the bases of partial information we must use that probability distribution which has maximum entropy subject to whatever is known. This is the only unbiased assignment we can make; to use any other would amount to arbitrary assumption of information which by hypothesis we do not haveâ€ [Jay57:1, page 623], More general than Jaynesâ€™ maximum entropy principle 6 is Kullbackâ€™s minimum cross-entropy principle, which introduces the concept of crossÂ¬ entropy or â€œdirected divergenceâ€ of a probability distribution P from another probability distribution Q. The maximum entropy principle can be viewed as a special case of the minimum cross-entropy principle when Q is a uniform distribution [Kap92], In addition, Shannonâ€™s mutual information is nothing but the directed divergence between the joint probability distribution and the factorized marginal distributions. 1.2 Motivation The above gives a brief review of various aspects on energy, entropy and information, from which we can see how fundamental and general the concepts of energy and entropy are, and how these two fundamental concepts are related to each other. In this dissertation, the major interests and the issues addressed are about the energy and entropy of signals, especially the empirical energy and entropy measures of signals, which are crucial in sigÂ¬ nal processing practice. First, letâ€™s take a look at the empirical energy measures for sigÂ¬ nals. There are many kinds of signals in the world. No matter what kind, a signal can be abstracted as X(n) e Rm, where n is the time index (only discrete time signals are considÂ¬ ered in this dissertation), R represents an m-dimensional real space (only real signals are considered in this dissertation, complex signals can be thought of as a two dimensional real signal). The empirical energy and power of a finite signal x(n) e R, n = 1,..., N, is N 2 1 N 2 E(x) = Â£ x(n) , P(x) = - Â£ x(n) n= 1 n= 1 (1.1) 7 The difference between two signals x,(n) and x2(n), n = 1, ...,N can be measured by the empirical energy or power of the difference signal: d(n) = x, (n)â€”x2(n) N . N Ed(xx,x2) = X d(n) , Pd(xx,x2) = - X d(n)2 (1.2) n = 1 n = 1 The difference between x, and x2 can also be measured by the cross-correlation (inner-product) N C(X],x2) = ^x,(n)x2(n) (1.3) n = 1 or its normalized version / \ C(x,,x2) = N X *100*20) / N I *iO)2 N X *20)2 n = 1 ^ 2 i n = 1 V y n = 1 v y The geometrical illustration of these quantities is shown in Figure 1-1. Figure 1-1. Geometrical Illustration of Energy Quantities Since E(x) = C(x, x), cross-correlation can be regarded as an energy related quanÂ¬ tity. We know that for a random signal x(n) with the pdf (probability density function) fx(x), the Shannon information entropy is 8 H(x) = -\fx(x)\ogfx(x)dx (1.5) Based on the information entropy concept, the difference or similarity between two random signals jc, and x2 with joint pdf fX[Xl{x\,x2) and marginal pdfs fx (x,), fXi(x2) can be measured by the mutual information between two signals: (1.6) Since H(x) = I(x, x), mutual information is an entropy type quantity. Comparatively, energy is a simple, straightforward idea and easy to implement, while information entropy uses all the statistics of the signal and is much more profound and difÂ¬ ficult to measure or implement. A very fundamental and important question arises natuÂ¬ rally: If a discrete data set {x(n) e Rm\n= 1, ..., N} is given, what is the information entropy related to this data set, or how can we estimate the entropy for this data set. This empirical entropy problem was addressed before in the literatures [Chr80, Chr81, Bat94, Vio95, Fis97], etc. Parametric methods can be used for pdf estimation and then entropy estimation, which is straightforward but less general. Nonparametric methods for pdf estiÂ¬ mation can be used as the basis for the general entropy estimation (no assumption about data distribution is required). One example is the historgram method [Bat94] which is easy to implement in one dimensional space but difficult to apply to high dimensional space, and also difficult to analyze mathematically. Another popular nonparametric pdf estimaÂ¬ tion method is the Parzen window method, the so-called kernel or potential function method [Par62, Dud73, Chr81]. Once the Parzen window method is used, the perplexing problem left is the calculation of the integral in the entropy or mutual information formula. Numerical methods are extremely complex in this case and thus only suitable for one 9 dimensional variable [Pha96]. Approximation can also be made by using sample mean [Vio95] which requires a large amount of data and may not be a good approximation for a small data set. The indirect method of Fisher [Fis97] can not be used for entropy estimaÂ¬ tion but only for entropy maximization purposes. For the blind source separation (BSS) or independent component analysis (ICA) problem [Com94, Cao96, Car98b, Bel95, Dec96, Car97, Yan97], one popular contrast function is the empirical mutual information between the outputs of a demixing system, which can be implemented by the difference between the sum of the marginal entropies and the joint entropy, where joint entropy is usually related to the input entropy and the determinant of the linear demixing matrix, and the marginal entropies are estimated based on the moment expansions for pdf such as the Edgeworth expansion and the Gram-Charlier expansion [Yan97, Dec96]. The moment expansions have to be truncated in practice and are only appropriate for a one-dimension (1-D) signal because, in multi-dimensional space, the expansions will become extremely complicated. So, from the above brief review, we can see that there lacks an effective and general entropy estimation method. One major point of this dissertation is to formulate and develop such an effective and general method for the empirical entropy problem and give a unifying point of view about signal energy and entropy, especially the empirical signal energy and entropy. Surprisingly, if we regard each data sample mentioned above as a physical particle, then the whole discrete data set is just like a set of particles in a statistical mechanical sysÂ¬ tem. It might be interesting to think what is the information entropy of this data set and how this can be related to physics. 10 According to the modem science, the universe is a mass-energy system. In such mass- energy spirit, we would ask whether the information entropy, especially the empirical information entropy, would somehow have mass-energy properties. In this dissertation, the empirical information entropy is related to â€œpotential energyâ€ of â€œdata particlesâ€ (data samples). Thus, a data sample is called â€œinformation particleâ€ (IPT). In fact, data samples are basic units conveying information; they indeed are â€œparticlesâ€ which transmit informaÂ¬ tion. Accordingly, the empirical entropy can be related to the potential energy called â€œinformation potentialâ€ (IP) of â€œinformation particlesâ€ (IPTs). With the information potential, we can further study how it can be used in a learning system or an adaptive system of signal processing, and how a learning system can self- organize with the information flux in and out (often in the form of the flux of data samÂ¬ ples), just like an open physical system which will appear some orders with the energy flux in and out. The information theory originated from communication study and has been widely used for the design and practice in this area and many other areas. However, its applicaÂ¬ tion to learning systems or adaptive systems such as perceptual systems, either artificial or natural, is just in its infancy. Some early researchers tried to use information theory for the explanation of a perceptual process, e.g. Attneave who pointed out in 1954 that â€œa major function of the perceptual machinery is to strip away some of the redundancy of stimulaÂ¬ tion, to describe or encode information in a form more economical than that in which it impinges on the receptorsâ€ [Hay94: page 444]. However, only in the late 1980s did Lin- sker propose the principle of maximum information preservation (InfoMax) [Lin88, Lin89] as the basic principle for the self-organization of neural networks, which requires 11 the maximization of the mutual information between the output and the input of the netÂ¬ work so that the information about the input is best preserved in the output. Linsker further applied the principle to linear networks with Gaussian assumption on input data distribuÂ¬ tion and noise distribution, and derived the way to maximize the mutual information in this particular case [Lin88, Lin89]. In 1988, Plumbley and Fallside proposed the similar minimum information loss principle [Plu88], In the same period, there are other researchÂ¬ ers who use the information-theoretic principles but still with the limitation of linear model or Gaussian assumption, for instance, Becker and Hintonâ€™s spatially coherent feaÂ¬ tures [Bec89, Bec92], Ukrainec and Haykinâ€™s spatially incoherent features [Ukr92], etc. In recent years, the information-theoretic approaches for BSS and ICA have drawn a lot of attention. Although they certainly broke the limitation of the model linearity and the GausÂ¬ sian assumption, the methods are still not general enough. There are two typical informaÂ¬ tion-theoretic methods in this area: maximum entropy (ME) and minimum mutual information (MMI) [Bel95, Yan97, Yan98, Pha96]. Both methods use the entropy relation of a full rank linear mapping: H(Y) = H(X) + \og\det(JV)\, where Y = WX and IP is a full rank square matrix. Thus the estimation of information quantities is coupled with the network structure. Moreover, ME requires that the nonlinearity in the outputs matches with the cdf (cumulative density function) of the source signals [Bel95], and MMI uses the above mentioned expansion methods or numerical method to estimate the marginal entroÂ¬ pies [Yan97, Yan98, Pha96]. On the whole, there lacks a general method and a unifying point of view about the estimation of information quantities. Human beings and animals in general are examples of systems that can learn from the interactions with their environments. Such interactions are usually in the form of â€œexam- 12 piesâ€ (or called â€œdata samplesâ€). For instance, children learn to speak by listening, learn to recognize objects by being presented with exemplars, learn to walk by trying, etc. In genÂ¬ eral, children learn by the stimulation from their environment. Adaptive systems for signal processing [Wid85, Hay94, Hay96] are also learning systems that evolve with the interacÂ¬ tion with input, output and desired (or teacher) signals. To study the general principle of a learning system, we first need to set an abstract model for the system and its environment. As illustrated in Figure 1-2, an abstract learning in k m k system is a mapping R â€”> R : Y = q(X, W), where lei? is the input signal, Y e R is the output signal, IF is a set of parameters of the mapping. The environment is modeled k â€¢ â€¢ â€¢ by the doublet (X, D), where D e R is a desired signal (teacher signal). The learning mechanism is a set of rules or procedures that will adjust the parameters W so that the mapping achieves a desired goal. Figure 1-2. Illustration of a Learning System There are a variety of learning systems, linear or nonlinear, feedforward or recurrent, full rank or dimension reduced, perceptron and multilayer perceptron (MLP) with global basis or radial-basis function with local basis, etc. Different system structures may have different property and usage [Hay98]. 13 The environment doublet (Xâ€™ D) also has a variety of forms. A learning process can have a desired signal or not (very often the input signal is the implicit desired signal). Some statistical property of X or Y or D can be given or assumed. Most often, only a disÂ¬ crete data set Q = { (XÂ¡, DÂ¡) |z= 1, ..., N} is provided. Such a scheme is called â€œlearning from examplesâ€ and is a general case [Hay94, Hay98]. This dissertation is more interested in â€œlearning form examplesâ€ than any scheme with some assumptions about the data. Of course, if a priori knowledge about data is known, a learning method should incorporate this knowledge. There are also a lot of learning mechanisms. Some of them make assumptions about data, and others do not. Some are coupled with the structure and topology of the learning system, while the others are independent of the system. A general learning mechanism should not depend on data and should be de-coupled from the learning system. There is no doubt that the area is rich in diversity but lacks unification. There are no more known abstract and fundamental concepts such as energy and information. To look for the essence of learning, one should start from these two basic ideas. Obviously, learnÂ¬ ing is about obtaining knowledge and information. Based on the above learning system model, we can say that learning is nothing but to transfer onto the machine parameters the information contained in the environment or in a given data set to be more specific. This dissertation will try to give a unifying point of view for learning systems and to implement it by using the proposed information potential. The basic purpose of learning is to generalize. The ability of animals to learn someÂ¬ thing general from their past experiences is the key to their survival in the future. RegardÂ¬ ing the generalization ability of a learning machine, one very fundamental question is 14 what is the best we can do to generalize for a given learning system and a given set of environmental data? One thing is very clear that the information contained in the given data set is a quantity that can not be changed by any learning method, and no learning method can go beyond that. Thus, it is the best that one learning system can possibly obtain. Generalization, from this point of view, is not to create something new but to utiÂ¬ lize fully the information contained in the observed data, neither less nor more. By â€œless,â€ we mean that the information obtained by a learning system is less than the information contained in the given data. By â€œmore,â€ we mean that implicitly or explicitly, a learning method assumes something that is not given. This is also the spirit of Jaynes [Jay57] menÂ¬ tioned above and similar point of view can be found in Christensen [Chr80, Chr81]. The environmental data for a learning system are usually not collected all at one time but are accumulated during a learning process. Whenever one datum appears or after a small set of data is obtained, learning should take place and the parameters of the learning system should be updated. This is the problem of the on-line learning method, which is also the issue that this dissertation is going to deal with. Another problem that this dissertation is interested in is the â€œlocalâ€ learning algoÂ¬ rithms. In a biological nervous system, what can be changed is the strength of synaptic connections. The change of a synaptic connection can only depend on its local informaÂ¬ tion, i.e. its input and output. For an engineering system, it will be much easier to impleÂ¬ ment by either hardware or software if the learning rule is â€œlocal;â€ i.e., the update of a connection in a learning network system only relies on its input and output. The Hebbâ€™s rule is a famous neuropsychological postulation of how a synaptic connection will evolve 15 with its input and output [Heb49, Hay98]. It will be shown in this dissertation how Heb- bian type algorithms can be related to the energy and entropy of signals. 1.3 Outline In Chapter 2, the basic ideas of energy, information entropy and their relationship will be reviewed. Since the information entropy directly relies on the pdf of the variable, the Parzen window nonparametric method will be reviewed for the development of the idea of information potential and cross information potential. Finally, the derivation will be given, the idea of the information force in a information potential field will be introduced for its use in learning systems, and the calculation procedure for information potential and cross information potential and all the forces in corresponding information potential fields will be described. In Chapter 3, a variety of learning systems and learning mechanisms will be reviewed. A unifying point of view about learning by information theory will be given. The informaÂ¬ tion potential implementation for the unifying idea will be described. And generalization of learning will be discussed. In Chapter 4, the on-line local algorithms for a linear system with energy criteria will be reviewed. The relationship between Hebbian, anti-Hebbian rules and the energy criteria will be discussed. An on-line local algorithm for generalized eigen-decomposition will be proposed, with the discussion of convergence properties such as the convergence speed and stability. Chapter 5 will give several application examples. First, the information potential method will be applied to aspect angle estimation for SAR images. Second, the same method will be applied to the SAR automatic target recognition. Third, the example of the 16 training of layered neural network by the information potential method will be described. Fourth, the method will be applied to independent component analysis and blind source separation. Chapter 6 will conclude the dissertation and provide a survey on the future work in this area. CHAPTER 2 ENERGY, ENTROPY AND INFORMATION POTENTIAL 2.1 Energy. Entropy and Information of Signals 2.1.1 Energy of Signals From the statistical point of view, the energy of a 1-D stationary signal is related to its 2 variance. For a 1-D stationary signal x(n) with variance a and mean m, its energy (preÂ¬ cisely short time energy or power) is Ex = E[x2] = a2 + rn (2.1) where E[ ] is the expectation operator. If m = 0, then the energy is equal to the variance 2 Ex = a . So, basically, energy is a quantity related to second order statistics. For two 1-D signals xx(n) and x2(n) with mean mx and m2 respectively, the co-variÂ¬ ance r = ^[(xj â€” mx)(x2 â€” m2)] = E[xxx2\ â€” mxm2, and we have the cross-correlation between two signals: c\2 = Cx,x2 = E[x\X2\ = r + mxm2 (2.2) If at least one signal is zero-mean, cx2 = r. T For a 2-D signal X = (xx,x2) , all the second statistics are given in a covariance matrix E, and we have E[XXT] = X + 2 2 mx mxm 2 Z = a, r 2 2 mxm2 m2 r a2_ (2.3) 17 18 Usually, the first order statistics has nothing to do with the information; we will just T consider zero-mean case; thus we have E[XX ] = I. . 2 2 For a 2-D signal, there are three energy quantities in the covariance matrix: a,, ct2 and r. One may ask what is the overall energy quantity for a 2-D signal. From linear algeÂ¬ bra [Nob88], there are 3 choices: the first is the determinant of I which is a volume meaÂ¬ sure in the 2-D signal space and is equal to the product of all the eigenvalues of Â£; second is the trace of Â£ which is equal to the sum of all the eigenvalues of Â£; the third is the product of all the diagonal elements. Thus, we have â€™ Jx = log|Z| . J2 = tr(Z) = o] + c>l (2.4) J3 = logÃaÃcTj) where tr( ) is the trace operator, the use of log function in 7, and 73 is to reduce the dynamic range of the original quantities and this is also related to the information of the signal which will be clear later in this chapter. The component signals Xj and x2 will be called marginal signals in this dissertation. If the two marginal signals Xj and x2 are uncorrelated, then 7, = 73. In general, we have 73>7] (2.5) where the equality holds if and only if the two marginal signals are uncorrelated. This is the so-called Hadamardâ€™s inequality [Nob88, Dec96]. In general, for a positive semi-defi- nite matrix I, we have the same inequality where 7j is the determinant of the matrix (or its logarithm, note that logarithm is a monotonic increasing function); J3 is the multiplicaÂ¬ tion of the diagonal components (or its logarithm) 19 When the two marginal signals are uncorrelated and their variances are equal, then JÂ¡ and J2 are equivalent in the sense that For a n-D signal X JÂ¡ = 21ogJ2 â€” 21og2 = 21ogcj2 (2.6) T (je,, ...,xn) with zero-mean, we have covariance matrix I = E[XXr] = 2 where a( (i=l,...,Â») are the variance of the marginal signals xÂ¡, rÂ¡j (i = 1, ..., n, j = 1, ..., n, i # j) are the cross-correlations between the marginal sigÂ¬ nals xÂ¡ and Xj. The three possible overall energy measure are r J\ = log i^i a, ... r In nl (2.7) n J2 = "(2) = Y, J i= l (2.8) A = lÂ°g n-Ã i v =1 ) Hadamardâ€™s inequality is J3 > J,, the equality holds if and only if I is diagonal; i.e., the marginal signals are uncorrelated with each other. J2 is equal to the sum of all the eigenvalues of I and is invariant under any orthonorÂ¬ mal transformation (rotation transform). When the marginal signals are uncorrelated with each other and their variances are equal, J2 and Jx are equivalent in the sense that they are related by a monotonic increasing function: JÂ¡ = n\ogJ2 â€” n\ogn Â«loga 2 (2.9) 20 2.1.2 Information Entropy Compared with energy, the information entropy of a signal involves all the statistics of a signal, and thus is more profound and difficult to implement. As mentioned in Chapter 1, the study of abstract quantitative measures for information started in 1920s when Nyquist and Hartley proposed a logarithmic measure [Nyq24, Har28]. Later in 1948, Shannon pointed out that the measure is valid only if all events are equiprobable [Sha48]. Further he coined the term â€œinformation entropyâ€ which is the mathematical expectation of Nyquist and Hartleyâ€™s measures. In 1960, Renyi generalized Shannonâ€™s idea by using an exponential function rather than a linear function to calculate the mean [Ren60, Ren61]. Later on, other forms of information entropy appeared (e.g. Havrda and Charvatâ€™s measure, Kapurâ€™s measure) [Kap94], Although Shannonâ€™s entropy is the only one which possesses all the postulated properties (which will be given later) for an information measure, the other forms such as Renyiâ€™s and Havrda-Charvatâ€™s are equivÂ¬ alent with regards to entropy maximization [Kap94], In a real problem, which form to use depends upon other requirements such as ease of implementation. For an event with probability p, according to Hartleyâ€™s idea, the information given when this event happens is I(p) = log- = â€”log/? [Har28]. Shannon further developed P Hartleyâ€™s idea, resulting in Shannonâ€™s information entropy for a variable with the probaÂ¬ bility distribution {pÂ¡c\k= 1 H, = ZpM Â¿P*= 1 Pt*Â° (2-10) k = 1 k = 1 In the general theory of means, a mean of the real numbers x{, ...,xn with weights pj, ...,pn has the form 21 -1 9 f \ n Z PMxk) k= 1 v y where (p(x) is Kolmogorov-Nagumo function, which strictly monotonic function defined on the real numbers, sure should be [RenÃ³O, Ren61 ] (2.11) is an arbitrary continuous and So, in general, the entropy mea- f \ -l 9 n z (2.12) v y As an information measure, cp( ) can not be arbitrary since information is â€œadditive.â€ To meet the additivity condition, cp( ) can be either cp(x) = x or cp(x) = 2 .If the former is used, (2.12) will become Shannonâ€™s entropy (2.10). If the latter is used, Renyiâ€™s entropy with order a is obtained [RenÃ³O, Ren61]: f \ HRa 1_a log a Pk k= 1 a > 0, a * 1 (2.13) v y In 1967, Havrda and Charvat proposed another entropy measure which is similar to Renyiâ€™s measure but has different scaling [Hav67, Kap94] (it will be called Havrda-Char- vatâ€™s entropy or H-C entropy for short): C \ Hfia 1 - a a ,Pk- k= 1 a > 0, a * 1 (2.14) v y There are also some other entropy measures, for instance, = â€”log(max (pk)) k [Kap94]. Different entropy measures may have different properties. There are more than a dozen properties for Shannonâ€™s entropy. We will discuss five basic properties since all the 22 other properties can be derived from these properties [Sha48, Sha62, Kap92, Kap94, Acz75]. (1) The entropy measure H(px, ...,pn) is a continuous function of all the probabilities pk, which means that a small change in probability distribution will only result in a small change in the entropy. (2) H(px, â– â– .,pn) is permutationally symmetric; i.e., the position change of any two or more pk in H(px, ...,pn) will not change the entropy value. Actually, the permutation of any pk in the distribution will not change the uncertainty or disorder of the distribution and thus should not affect the entropy. (3) H(\/n, ..., \/ri) is a monotonic increasing function of n. For an equiprobable distribution, when the number of choices n increases, the uncertainty or disorder increases, and so does the entropy measure. (4) Recursivity: If an entropy measure satisfies (2.15) or (2.16), then it has the recur- sivity property. It means that the entropy of n outcomes can be expressed in terms of the entropy of n â€” 1 outcomes plus the weighted entropy of the combined 2 outcomes. Hn(Plâ€™P2â€™ â€¢">Pn) = Hr,_\(pX +P&P3, â– â– ;Pn) + (Pl +P2)H2(j Pi Pi \ (2.15) *1 +P2 P\ +P2J Hn(px,p2, ...,pn) = Hn_x{px +p2,P3, ...,pn) + (px +p2fH2[ Pi P'1 \ | (2.16) Pi +P2 P1 +P2) where a is the parameter in Renyiâ€™s entropy or H-C entropy. (5)Additivity: If p = (/?,, ...pn) and q = (qx, ..., qm) are two independent probaÂ¬ bility distribution, and the joint probability distribution is denoted by p â€¢ q, then the propÂ¬ erty H{p â€¢ q) = H(p) + H(q) is called additivity. 23 The following table gives the comparison of the three types of entropy about the above five properties: Table 2-1. The Comparison of Properties of Three Entropies Properties (1) (2) (3) (4) (5) Shannonâ€™s yes yes yes yes yes Renyiâ€™s yes yes yes no yes H-Câ€™s yes yes yes yes no From the table, we can see that the three types of entropy differ in recursivity and addiÂ¬ tivity. However, Kapur pointed out: â€œThe maximum entropy probability distributions given by Havrda-Charvat and Renyiâ€™s measures are identical. This shows that neither additivity nor recursivity is essential for a measure to be used in maximum entropy princiÂ¬ pleâ€ [Kap94: page 42]. So, the three entropies are equivalent for entropy maximization and any of them can be used. As we can see from the above, Shannonâ€™s entropy has no parameter, but both Renyiâ€™s entropy and Havrda-Charvatâ€™s entropy have a parameter a. So, both Renyiâ€™s entropy and Havrda-Charvatâ€™s measures constitute a family of entropy measures. There is a relation between Shannonâ€™s entropy and Renyiâ€™s entropy [Ren60, Kap94]: HRa > > Hrp, if 1 > a > 0 and p > 1 Hm HRa = Hs a -> 1 (2.17) i.e., the Renyiâ€™s entropy is a monotonic decreasing function of the parameter a and it approaches Shannonâ€™s entropy when a approaches 1. Thus, Shannonâ€™s entropy can be regarded as one member of the Renyiâ€™s entropy family. Similar results hold for Havrda-Charvatâ€™s entropy measure [Kap94]: 24 Hha > Hs > Hhp, if 1 > a > 0 and P > 1 Urn Hha = Hs a -> 1 (2.18) Thus, Shannonâ€™s entropy can also be regarded as one member of Havrda-Charvatâ€™s entropy family. So, both Renyi and Havrda-Charvat generalize Shannonâ€™s idea of informaÂ¬ tion entropy. n 2 When a = 2, Hhl = 1 â€” pk is called quadratic entropy [Jum90]. In this disserta- n k= l tion, Hr2 = â€”log ^ pk is also called quadratic entropy for convenience and because of k= l the dependence of the entropy quantity on the quadratic form of probability distribution. The quadratic form will give us more convenience as we will see later. For the continuous random variable Y with pdf fY{y), similarly to the Boltzmann- +00 Shannon differential entropy HS(Y) = â€” j fY(y)\ogfY(y)dy, we can obtain the difieren- â€”00 tial version for these two types of entropy: 1 â€”a +00 J fyiyfdy Hr2{ Y) = â€”log +00 J fyiyfdy ^+C0 ' (2.19) Hha (Y) 1 â€”a +oo jfy(yf and (2.18) will hold for their corresponding differential entropies. 2.1.3 Geometrical Interpretation of Entropy From the above, we see that both Renyiâ€™s entropy and Havrda-Charvatâ€™s entropy con- n tain the term ^ pÂ°k for a discrete variable, and both of them approach Shannonâ€™s entropy k= l when a approaches 1. This suggests that all these entropies are related to some kind dis- 25 tance between the point of the probability distribution p = (px, ...,pn) and the origin in the space of Rn. As illustrated in Figure 2-1, the probability distribution point n p = (px,...,pn) is restricted to a segment of the hyperplane defined by pk = 1 and k= l pk > 0 (in the left graph below, the region is the line connecting two points (1,0) and (0,1); in the right graph below, the region is the triangular area confined by the three connecting lines between each pair of three points (1,0,0), (0,1,0) and (0,0,1)). The entropy of the n probability distribution p = (/?,,...,pn) is a function Va = ^pk, which is the a- k= l norm of the point p raised power to a [Nov88, Gol93] and will be called â€œentropy a- norm.â€ Renyiâ€™s entropy rescale the â€œentropy a-normâ€ Va by a logarithm: Hra = -jâ€”^log Va; while Havrda-Charvatâ€™s entropy linearly rescales the â€œentropy a- normâ€ Va: Hha = !)â– Figure 2-1. Geometrical Interpretation of Entropy So, both Renyiâ€™s entropy with order a (HRa) and Havrda-Charvatâ€™s entropy with order a (Hha) are related to the a -norm of the probability distribution p. For the above- mentioned infinity entropy Hx, there is a relation lim HRa= and a â€”> oo - â€”log(wax (pk)) [Kap94]. Therefore, HrJ0 is related to the infinity-norm of the k 26 probability distribution p. For Shannonâ€™s entropy, we have lim HRa = Hs and a â€”> l lim Hha- Hs. It might be interesting to consider Shannonâ€™s entropy as the result of 1- a â€”> 1 norm of the probability distribution p. Actually, the 1-norm of any probability distribution " 1 is always 1 ( ^ = 1). If we plug V] = 1 and a = 1 in HRa = logFa and j k= 1 1 ~a Hha = ]â€”â€œ(Fa- 1), we will get 0/0. Its limit, however, is Shannonâ€™s entropy. So, in the limit sense, Shannonâ€™s entropy can be regarded as the function value of the 1-norm of the probability distribution. Thus, we can generally say that the entropy with order a (either Renyiâ€™s or H-Câ€™s) is a monotonic function of the a-norm of the probability distriÂ¬ bution p, and the entropy (all entropies, at least all the above-mentioned entropies) is essentially a monotonic function of the distance from the probability distribution point p to the origin. From linear algebra, all norms are equivalent in comparing distances [Gol93, Nob88]; thus, they are equivalent for distance maximization or distance minimization, in both unconstrained and constrained cases. Therefore, all entropies (at least the above menÂ¬ tioned entropies) are equivalent for the purpose of entropy maximization or entropy miniÂ¬ mization. When a > 1, both Renyiâ€™s entropy HRa and Havrda-Charvatâ€™s entropy Hha are monotonic decreasing functions of the â€œentropy a -normâ€ Va. So, in this case, the entropy maximization is equivalent to the minimization of the â€œentropy a-normâ€ Va, and the entropy minimization is equivalent to the maximization of the â€œentropy a -normâ€ Va. When a < 1, both Renyiâ€™s entropy HRa and Havrda-Charvatâ€™s entropy Hha are monotonic increasing functions of the â€œentropy a -normâ€ Va. So, in this case, the entropy maximization is equivalent to the maximization of the â€œentropy a-normâ€ Va, and the entropy minimization is equivalent to the minimization of the â€œentropy a -normâ€ Va. 27 Of particular interest in this dissertation are the quadratic entropies HR2 and Hhl, which are both monotonic decreasing functions of the â€œentropy 2-normâ€ V2 of the probaÂ¬ bility distribution p and are related to the Euclidean distance from the point p to the oriÂ¬ gin. The entropy maximization is equivalent to the minimization of V2; and the entropy minimization is equivalent to the maximization of V2. Moreover, since both HR2 and Hh2 are lower bounds of Shannonâ€™s entropy, they might be more efficient than Shannonâ€™s entropy for entropy maximization. For a continuous variable Y, the probability density function fY(y) is a point in a funcÂ¬ tional space. All the pdf fyiy) will constitutes a similar region in a â€œhyperplaneâ€ defined +00 by | fY(y)dy = 1 and fY(y)> 0. The similar geometrical interpretation can also be â€”00 given to the differential entropies. In particular, we have the â€œentropy a -normâ€ as +oo +oo vâ€ž = | frtofdy v2 = J fyiyfdy (2.20) â€”00 â€”00 2.1.4 Mutual Information Mutual information (MI) measures the relationship between two variables and thus is more desirable in many cases. Following Shannon [Sha48, Sha62], the mutual information between two random variables X] and X2 is defined as 75(Ai,A2) = j\fXix2(xl> x2)lÂ°gfXlX^ 2^dx,i&o fx\(x0fx2(xl) (2.21) T where fxxxpcvxi) is the joint pdf of joint variable (Xj,x2) , fX] (*i) and /^2(x2) are the marginal pdf for A, and X2 respectively. Obviously, mutual information is symmetric; i.e., IS(X{,X2) = IS(X2, X{). It is not difficult to show the relation between mutual inforÂ¬ mation and Shannonâ€™s entropy in (2.22) [Dec96, Hay98]: 28 IS(XVX2) = Hs(Xx)-Hs(Xx\X2) = Hs{X2)-Hs(X2\Xx) (2.22) = Hs(Xx) + Hs(X2)-Hs(Xx,X2) where Hs(Xx) and HS(X2) are the marginal entropies; HS(Xx,X2) is the joint entropy; Hs{Xx \X2) = Hs(Xx,X2)â€”Hs(X2) is the conditional entropy of Xx given X2 which is the measure of uncertainty of Xx when X2 is given, or the uncertainty left in (Xx,X2) when the uncertain of X2 is removed; similarly, Hs{X2\Xx) is the conditional entropy of X2 given Xx (all entropies involved are Shannonâ€™s entropy). From (2.22), it can be seen that the mutual information is the measure of the uncertainty removed from Xx when X2 is given, or in another word, the mutual information is the measure of the information that X2 convey about Xx (or vice versa since the mutual information is symmetric). It proÂ¬ vides a measure of the statistical relationship between Xx and X2, which contains all the statistics of the related distributions and thus is a more general measure than a simple cross-correlation between Xx and X2 which only involve the second order statistics of the variables. It can be shown that the mutual information is non-negative, or equivalently the ShanÂ¬ nonâ€™s entropy reduces on conditioning, or the total marginal entropies is the upper bound of the joint entropy; i.e., Is(Xx,X2)> 0 Hs{Xx) > Hs{Xx \X2), Hs{X2) > H(X2 \Xx ) (2.23) Hs{Xx,X2) divergence or called cross-entropy) [Kul68, Dec96, Hay98] between the joint pdf 29 fxtX2(xiâ€™x2) an<^ the factorized marginal pdf fxl(xi)fx2(x2) â€¢ The Kullback-Leibler diverÂ¬ gence between two pdfs f(x) and g(x) is defined as (2.24) Jensenâ€™s inequality [Dec96, Ace92] says for a random variable X and a convex funcÂ¬ tion h(x), the expectation of this convex function of X is no less than the convex function of the expectation of X; i.e., E[h(X)]>h(E[X]) or Â¡h(x)fx(x)dx > h(\xfx(x)dx) (2.25) where E[ ] is the operator of mathematical expectation, f^x) is the pdf of X. From Jensenâ€™s inequality [Dec96, Kul68], or by using the derivation in Acero [Ace92], it can be shown that the Kullback-Leibler divergence is non-negative and is zero if and only if two distributions are the same; i.e., (2.26) where the equality holds if and only if /(x) = g{x). So, the Kullback-Leibler divergence can be regarded as a â€œdistanceâ€ measure between pdfs f(x) and g(x). However, it is not symmetric; i.e., Dk(f,g) *Dk(g,f) in general, and thus is called â€œdirected divergence.â€ Obviously, the mutual information mentioned above is the Kullback-Leibler â€œdistanceâ€ from the joint pdf fxx(x\'xi) to the factorized marginal pdf fx{(x\)fx2(xi) DiSfx\X2(x\> x2)â€™fxSx\â€¢ Based on Renyiâ€™s entropy, we can define Renyiâ€™s divergence measure with order a for two pdf f(x) and g(x) [Ren60, Ren6, Kap94]: 30 DÂ«.Â«*> â€™ (SrTjl04^T* (2.27) The relation between Renyiâ€™s divergence and Kullback-Leibler divergence is [Kap92, Kap94] lim DR(f,g) = Dk(f,g) a -> 1 (2.28) Based on Havrda-Charvatâ€™s entropy, there is also Havrda-Charvatâ€™s divergence meaÂ¬ sure with order a for two pdfs f(x) and g(x) [Hav67, Kap92, Kap94]: 1 Dha^S) (a_1) Ã Ml a- 1 dxâ€” 1 L g(x) (2.29) There is also a similar relation between this divergence measure and Kullback-Leibler divergence [Kap92, Kap94]: lim Dha(fâ€™S) = Dk(fâ€™g) (2.30) a -Â» 1 Unfortunately, as Renyi pointed out D Ra{fx^x^x{, x1)Jx{xx)fx^x2)) is not approÂ¬ priate as a measure of mutual information of the variables XÂ¡ and X2 [Ren60], FurtherÂ¬ more, all these divergence measures (Kullback-Leibler, Renyi and Havrda-Charvat) are complicated due to the calculation of the integrals involved in their formula. Therefore, they are difficult to implement in the â€œlearning from examplesâ€ and general adaptive sigÂ¬ nal processing applications where the maximization or minimization of the measures is desired. In practice, simplicity becomes a paramount consideration. Therefore, there is a need for alternative measures which may have the same maximum or minimum pdf soluÂ¬ tions as Kullback-Leibler divergence but at the same time is easy to implement, just like the case of the quadratic entropy which meet these two requirements. 31 For discrete variables XÂ¡ and X2 with probability distribution pj i'=l,Â«I and wj respectively, and the joint probability distribution {PlJx i = 1,m}, the Shannonâ€™s mutual information is defined as (2.31) 2,1.5 Quadratic Mutual Information As pointed out by Kapur [Kap92], there is no reason to restrict ourselves to Shannonâ€™s measure for entropy and to confine ourselves to Kullback-Leiblerâ€™s measure for crossÂ¬ entropy (density discrepancy or density distance). Entropy or cross-entropy is too deep and too complex a concept to be measured by a single measure under all conditions. The alternative measures for entropy discussed in 2.1.2 break such restriction on entropy, espeÂ¬ cially, there are entropies with simple quadratic form of pdfs. In this section, the possibilÂ¬ ity of â€œmutual informationâ€ measures with only simple quadratic form of pdfs will be discussed (the reason to use quadratic form of pdfs will be clear later in this chapter). These measures will be called quadratic mutual information although they may lack some properties of Shannonâ€™s mutual information. Independence is a fundamental statistical relationship between two random variables (the extension of the idea of independence to multiple variables is not difficult, for the simplicity of exposition, only the case of two variables will be discussed at this stage). It is defined when the joint pdf is equal to the factorized marginal pdfs. For instance, two variÂ¬ ables Xl and X2 are independent with each other when fx\X2 (x i > x2) â€œ fxx (x l )fx2 (x2) (2.32) 32 where fXix2(x\â€™ xi) the joint pdf and /^(x,) and fx(x-2) are marginal pdfs. As menÂ¬ tioned in the previous section, the mutual information can be regarded as a distance between the joint pdf and the factorized marginal pdf in the pdf functional space. When the distance is zero, the two variables are independent. When the distance is maximized, two variables will be far away from the independent state and roughly speaking the depenÂ¬ dence between them will be maximized. The Euclidean distance is a simple and straightforward distance measure for two pdfs. The squared distance between the joint pdf and the factorized marginal pdf will be called Euclidean distance quadratic mutual information (ED-QMI). It is defined as DEDif,g) = \(f(x)-g(x)fdx â– (2.33) Ied(X\,X2) = Ded( fXlx2(xtâ€™^)â€™ fx^.x\)fx^.xi) ) Obviously, the ED-QMI between Xx and X2: IED(Xx, X2) is non-negative and is zero if and only if fXXl(xj,x2) = fx^(x])fx^(x2); i.e., X{ and X2 are independent with each other. So, it is appropriate to measure the independence between Xl and X2. Although there is no strict theoretical justification yet that the ED-QMI is an appropriate measure for the dependence between two variables, the experimental results described later in this dissertation and the comparison between ED-QMI and Shannonâ€™s Mutual Information in some special cases described later in this chapter will all support that ED-QMI is appropriÂ¬ ate to measure the degree of dependence between two variables, especially the maximizaÂ¬ tion of this quantity will give reasonable results. For multiple variables, the extension of ED-QMI is straightforward: 33 IED(Xv...,Xk) = D ED fx(x\, ...,xk) , YlfxfxJ (2.34) v y where fx(x\, ...,xk) is the joint pdf, fx(xi) (i=l,..., k) are marginal pdfs. Another possible pdf distance measure is based on Cauchy-Schwartz inequality [Har34]: (J/(x)2ÃÂ¿c)(Jg(x)2ÃÂ¿c) > (Jy(x)g(x)<Â±c) where equality holds if and only if Ax) = C g(x) for a constant scalar C,. If f(x) and g(x) are pdfs; i.e., [f(x)dx = 1 and |g(x)ÃÂ¿x = 1, then f(x) = C, g(x) implies C, = 1. So, for two pdfs f(x) and g(x), we have equality holding if and only if f(x) = g(x) . Thus, we may define Cauchy- Schwartz distance for two pdfs as (\f(x)2dx)([g(x)2dx) Dcs(f,g) = log â€” (2.35) (fAx)g(x)dx) Obviously, Dcs(f, g) > 0, with equality if and only if f(x) = g(x) almost everywhere and the integrals involved are all quadratic form of pdfs. Based on Dcs(f, g), we have Cauchy-Schwartz quadratic mutual information (CS-QMI) between two variables X] and X2 as W\,X2) = Dcs{ fx\X2(xiâ€™x2)â€™ fxx(x\}fx2(xi) ) (2.36) where the notations are the same as above. Directly from the above, we have ICs(X\,X2) > 0 with the equality if and only if A, and X2 are independent with each other. So, Ics is an appropriate measure for independence. However, the experimental results shows that it might be not appropriate as a dependence measure. For multiple variÂ¬ ables, the extension is also straightforward: 34 - Dcs fx(x\,-,xk) , nÃº*) i â€” 1 (2.37) v y For the discrete variables Xx and X2 with probability distribution i = 1,Â«j and \p'x2 |y = 1,m- respectively, and the joint probability distribution {P'x\i - 1,n ;j = 1,m}, the ED-QMI and CS-QMI are ii in /Â£D(zâ€ž^2) = y i=lj=\ ICS(X{,X2) = log- n m .. 2 I Z (PlJx) i=\j=l V \ n m 2 Z Z (Vx,) i =\j=\ J\ \2 Z Z pxpl*A â€˜ = 1/ = 1 (2.38) X, Py P2 Figure 2-2. A Simple Example 35 Figure 2-3. The Surfaces and Contours of Is, IED and Ics vs.Px and Px To get an idea about how similar and how different the measures Is, IED and Ics will be, letâ€™s look at a simple case with two discrete random variables X] and X2. As shown in 1 2 Figure 2-2, Xx will be either 1 or 2 and its probability distribution is i.e., P(Xx= 1) = P^x and P(Xx= 2) = F^x . Similarly X2 can also be either 1 or 2 with 36 the probability distribution Px = (p[y, PXJ (P{X2= 1) = Px and P(X2= 2) = Px). The joint probability distribution is Py - (PX,PX,Pf.Pf); i.e., P((jrâ€žjr2)= (i,iÂ» = px, p((xâ€žx2)= (i,2)) = px, P((xt,x2)= (2,i)> = p$ and P((Xt,X2)= (2,2)) = pf. Obviously, p'xÂ¡ = p|â€˜ + p|2, P*,=pJ'+pJ, P* = P11 + P21 and P2 = P*2 + P22 ^X2 rX +rX ana rX2 * X + â€œx â€¢ First, letâ€™s look at the case with the distribution of A, fixed P^ = (0.6, 0.4). Then the free parameters left are PXX from 0 to 0.6 and P2X from 0 to 0.4. When p'r' and P2X change in the ranges, the values of Is, IED and Ics can be calculated. Figure 2-3 shows 11 21 how these values change with Px and Px , where the left graphs are surfaces for Is, IED 11 21 and Ics versus Px and Px ; the right graphs are the contours of the corresponding left surfaces, (contour means that each line has the same value). These graphs show that although the surfaces or contours of the three measures are different, they reach the mini- 11 21 mum value 0 in the same line Px = 1.5P^ where the joint probabilities equal the correÂ¬ sponding factorized marginal probabilities. And the maximum values, although different, . 11 21 are also reached at the same points (Px >px) = (0.6 0) and (0 0.4) where the joint probaÂ¬ bilities are 0.6 0 0 0.4 respectively. These are just cases where Xx and X2 have a 1 -to-1 relation; i.e., X] can determine X2 without any uncertainty, and vice versa. If the marginal probability of X2 is further fixed, e.g. P= (0.3, 0.7), then the free parameter can be Px from 0 to 0.3. In this case, both marginal probabilities of Xl and X2 are fixed and the factorized marginal probability distribution is thus fixed and only the p 12 p22 rx rx pi 1 p21 rx rx 0 0.4 0.6 0 and pi2 p22 rX rx p'x Â¡^x 37 joint probability distribution will change. This case can also be regarded as the previous l . â€ž21 case with a further constraint specified by Px + Px = 0.3. Figure 2-4 shows how the ,11 . three measures change with Px in this case, from which we can see that the minima are ,11 reached at the same point Px = 0 .18, and the maxima are also reached at the same point Px = 0; i.e., p12 p22 rx rx 0.6 0.1 p" p21 fx rX_ 0 0.3_ Figure 2-4. /,, IED and Ics vs. Px From this simple example, we can see that although the three measures are different, they have the same minimum points and also have the same maximum points in this parÂ¬ ticular case. It is known that both Shannonâ€™s mutual information Is and ED-QMI IED are convex functions of pdfs [Kap92]. From the above graphs, we can confirm this fact and also come up to the conclusion that CS-QMI Ics is not a convex function of pdfs. On the 38 whole, we can say that the similarity between Shannonâ€™s mutual information Is and ED- QMIIED is confirmed by their convexity with the guaranteed same minimum points. Figure 2-5. Illustration of Geometrical Interpretation to Mutual Information 2.1.6 Geometrical Interpretation of Mutual Information From the previous section, we can see that both ED-QMI and CS-QMI have the folÂ¬ lowing three terms in their formulas: VJ = \\fxxxÂ¿x \,x2)2dxxdx2 â€˜ VM = \\tfxM\)fxS-X2)f (2-39) . Vc = I\fxxX^x 1 > Xl)fx,(x 1 )fxSXl)dx 1 dx2 where Vj is obviously the â€œentropy 2-normâ€ (the squared 2-norm) of the joint pdf, is the â€œentropy 2-normâ€ of the factorized marginal pdf and Vc is the cross-correlation or inner product between the joint pdf and the factorized marginal pdf. With these three terms, QMI can be expressed as 39 IED = Vj-2Vc+Vm Ics= logVj-2logVc + logVM (2.40) Figure 2-5 shows the illustration of the geometrical interpretation to all these quantiÂ¬ ties. Is, as previously mentioned, is the K-L divergence between the joint pdf and the facÂ¬ torized marginal pdf, IED is the squared Euclidean distance between these two pdfs and Ics is related to the angle between these two pdfs. Note that can be factorized as two marginal information potentials VÂ¡ and V2: VM = \\(fxSx^fx2(x2))2(lx\dx2 = v\v2 v\ = \fxM \fdx\ V2 ~ \fx2ix2)1(ix2 (2.41) 2.1.7 Energy and Entropy for Gaussian Signal T k It is well known that for a Gaussian random variable X- (x,, ...,xk) e R with pdf (2Â«rizi is covariance matrix, the Shannonâ€™s information entropy is HS(X) = iloglSl + |log27t + | (2.42) (see Appendix B for the derivation) Similarly, we can get the Renyiâ€™s information entropy for X: (2.43) (The derivation is given in Appendix C) For Havrda-Charvatâ€™s entropy, we have 40 Hha(X) = (2n)Â¿ a |Â£| -1 !('-â€œ) -Lit1â€”) (2.44) v (The derivation is given in Appendix D). Obviously, lim Ha(X) = HS(X) and lim Hha(X) = HS(X) in this case a-Â» l a-> 1 which are consistent with and (2.18) respectively. Since k and a in (2.42), (2.43) and (2.44) have nothing to do with the data, the data dependent quantity is log|I| or |Â£|. From the information-theoretic point of view, a meaÂ¬ sure of information using energy quantities (the elements in covariance matrix E) is J, = log|E| in (2.4) and (2.8), or just |E|. 2 If the diagonal elements of Â£ are ct;- (i = 1,..., k); i.e., the variance of the marginal 2 signal xi is g; , then the Shannonâ€™s and Renyiâ€™s marginal entropies are (2.45) ( \ k 2 So, J3 = log ctÂ¿ in (2.8) is related to the sum of the marginal Shannonâ€™s or i=l Renyiâ€™s entropies. For Shannonâ€™s entropy, we generally have (2.23) and its generalization (2.46)[Dec96, Hay98], k i= 1 (2.46) 41 Applying (2.42) and (2.45) to (2.46), we get Hadamardâ€™s inequality (2.5). So, Had- amardâ€™s inequality can be regarded as a special case of (2.46) when the variable X is Gaussian distributed. The most popular energy quantity used in practice is J2 in (2.8): N k J2 = Ãr(S) = i Â£ Â£ (*,(Â«)-(i,)2 (2.47) H = If = 1 T where p = (p,,..., pÂ¿) and p(- is the mean of the marginal signal xÂ¡. The geometrical meaning of J2 is the average of the squared Euclidean distance from the data points to the â€œmean point.â€ If the signal is an error signal, this is so called MSE (mean squared error) criterion, and it is wildly applied in learning or adaptive system, etc.. This criterion is not directly related to the information measure of the signal. Only when the signal is white Gaussian with zero-mean, J2 and JÂ¡ becomes equivalent as (2.9) shows. So, from the information-theoretic point of view, when a MSE criterion is used, it implicitly assumes that the error signal is white Gaussian with zero-mean. As mentioned in 2.1.1, J] is basically the determinant of Â£, which is the product of all the eigenvalues of Â£ and can be regarded as a geometrical average of all the eigenvalues, while J2 is the trace of Â£, which is the sum of all the eigenvalues and can be regarded as an arithmetic average of all the eigenvalues. Note that |Â£| = 0 can not guarantee the zero energy of all the marginal signals but the maximization of |Â£| can make the joint entropy of X maximum; while the maximization of tr[Â£] can not guarantee the maximum of the joint entropy of X but the minimization of tr[Â£] can make all the marginal signals zero. This is possibility the reason why the minimization of MSE is so popular in practice. 42 2.1.8 Cross-Correlation and Mutual Information for Gaussian Signal J . Suppose X = (jcj, x2) is a zero-mean (without lose of generality because both crossÂ¬ correlation and mutual information have nothing to do with the mean) Gaussian random 2 variable with covariance matrix Â£ = Ax i,x2) = a, r r a- The joint pdf will be - ^xTi'Ax (2tt)|2:| 1/2 (2.48) the two marginal pdfs are /iOi) *1 2aj Jin no. /2(^2) = 2 *2 2a\ a/27TCTo (2.49) The Shannonâ€™s mutual information is Is(x ,,x2) = Hs(xl) + Hs(x2)-Hs(xvx2) = hogâ€”1â€” 2 1-P P = JrZ/(a]a22) (2.50) where p is the correlation coefficient between x, and x2. By using (A. 1) in Appendix A and letting P = a, ct2 then we have VJ = \\Axvx2)2dxxdx2 = _ 4;ipVl-p vm = \\fx(xx)2f2{x2)2dxxdx2 = (2.51) VC = \\Ax\,x2)fx(xx)f2(x2)dxxdx2 = 1== 4nfij4 â€” p The ED-QMI and CS-QMI then will be 43 ^Ed(x 1Â»x2) ~ 4tiP vVi-p2 ^ +1 IcÃ¡x\>xi) = loÂ§ 4a/i â€” (2.52) Figure 2-6. Mutual Informations vs. correlation coefficient for Gaussian distribution Similar to Is, Ics is the function of only one parameter p, and both are the monotonic increasing function of p with the same minimum value 0, the same minimum point p = 0 and the same maximum point p = 1 in spite of the difference of the maximum values. IED is the function of two parameters p and (3. However, P only serves as a scaÂ¬ lar of the function and can not change the shape of the function. Once P is fixed, IED will be the monotonic increasing function of p with the same minimum value 0, the same minÂ¬ imum point p = 0 and the same maximum point p = 1 as Is and Ics, in spite of the difÂ¬ ference of the maximum values. Figure 2-6 shows these curves, which tells us the two 44 proposed ED-QMI and CS-QMI are consistent with Shannonâ€™s MI in the Gaussian case regarding the minimum and maximum points. 2.2 Empirical Energy. Entropy and MI: Problem and Literature Review In the previous section 2.1, the concept of various energy, entropy and mutual inforÂ¬ mation quantities have been introduced. In practice, we are facing the problem of estimatÂ¬ ing these quantities from given sample data. In this section, empirical energy, entropy and MI problems will be discussed, and the related literature review will be given. 2,2.1 Empirical Energy The problem of empirical energy is relatively simple and straightforward. For a given T . T dataset {a(i)= ..., an(i)) i= 1 of a n-D signal X = (xj, ...,xw) , it is not difficult to estimate the means, the variances of the marginal signals and the covariÂ¬ ance between the marginal signals. We have sample mean and sample variance matrix as follows [Dud73, Dud98]: 1 mi = â€™ 1 = lj j = i (2.53) 1 N J = 1 These are the results of maximum likelihood estimation [Dud73, Dud98]. 2.2.2 Empirical Entropy and Mutual Information: The Problem As shown in the previous section 2.1, the entropy and mutual information all rely on the probability density function (pdf) of the variables, thus they use all the statistics of the 45 variables, but are more complicated and difficult to implement than the energy. To estiÂ¬ mate the entropy or mutual information, the first thing we need to do is to estimate the pdf of the variables, then the entropy and mutual information can be calculated according to the formula described in the previous section 2.1. For continuous variables, there are inevÂ¬ itable integrals in all the entropy and mutual information definitions described in 2.1, which is the major difficulty after pdf estimation. Thus, the pdf estimation and the meaÂ¬ sures for entropy and mutual information should be appropriately chosen so that the correÂ¬ sponding integrals can be simplified. In the rest of this chapter, we will see the importance of the choice in practice. Different empirical entropies or mutual informations are actually the results of different choices. If a priori knowledge about the data distribution is known or a model is assumed, then parametric methods can be used to estimate the pdf model parameters, and then the entroÂ¬ pies and mutual informations can be estimated based on the model and the estimated parameters. However, in many real world problems the only available information about the domain is contained in the data collected and there is no a priori knowledge about the data. It is therefore practically significant to estimate the entropy of a variable or the mutual information between variables based merely on the given data samples, without further assumption or any a priori model assumed. Thus, we are actually seeking nonpara- metric ways for the estimation of entropies and mutual informations. Formally, the problems can be described as follows: â€¢ The Nonparametric Entropy Estimation: given a data set {a(i)\i- 1, ...,N} for a signal X (X can be a scalar or n-D signal), how to estimate the entropy of X without any other informations or assumptions. 46 The Nonparametric Mutual Information Estimation: given a data set T T {a(/)= (aj(z'), a2(0) *= 1Â» for a signal X = (x,,x2) (x, and x2 can be scalar or n-D signals, and their dimensions can be different), how to estimate the mutual information between x, and x2 without any assumption. This scheme can be easily extended to the mutual information of multiple signals. For nonparametric methods, there are still two major difficulties: the non-parametric pdf estimation and the calculation of the integrals involved in the entropy and mutual information measures. In the following, the literature review on these two aspects will be given. 2.2.3 Nonparametric Density Estimation The literature of nonparametric density estimation is fairly extensive. A complete disÂ¬ cussion on this topic in such small section is virtually impossible. Here, only a brief review on the relevant methods such as histogram, Parzen window method, orthogonal series estimates, mixture model, etc. will be given. â€¢ Histogram [Sil86, Weg72]: Histogram is the oldest and most widely used density estimator. For a 1-D variable x, given an origin x0 and a bin width h, the bins for the histogram can be defined as the intervals [ x0 + mh, x0 + (m + l)h ). The histogram is then defined by f(x) = â€”(number of samples in the same bin as x) (2.54) The histogram can be generalized by allowing the bin widths to vary. Formally, supÂ¬ pose the real line has been dissected into bins, then the histogram can be 47 , _ 1 (number of samples in the same bin as x) N (width of the bin containing x) For a multi-dimensional variable, histogram presets several difficulties. First, contour diagrams to represent data can not be easily drawn. Second, the problem of choosing the origin and the bins (or cells) are exacerbated. Third, if rectangular type of bins are used for n-D variable and the number of bin for each marginal variable is m, then the number of bins is in the order of 0(mn). Forth, since the histogram discretizes each marginal variable, it is difficult to make further mathematical analysis. Orthogonal Series Estimates [Hay98, Com94, Yan97, Weg72, Sil86, Wil62, Kol94]: This category includes Fourier Expansion, Edgeworth Expansion and Gram-Charlier Expansion etc.. We will just discuss Edgeworth and Gram-Charlier Expansions for 1- D variable. Without the loss of generality, we assume that the random variable x is zero-mean. The pdf of x can be expressed in terms of Gaussian function G(x) = ~j=e J2n -x/2 as fix) = G(x) \ 1 + Â£ c^x) k= 3 (2.56) v. / where ck are coefficients which depend on the cumulants of x. e.g. c, = 0, c2 = 0, c3 = k3/6, c4 = &4/24, c5 = k5/120, c6 = (k6 + \0k3)/720, c1 = (k7 + 35k4k3)/5040, c8 = (^8 + 56k5k3 + 35&4)/40320, etc., (kÂ¡ are ith order cumulants); Hk(x) are the Flermite polynomials which can be defined in terms of the kth derivative of the Gaussian function G(x) as G (x) = (â€”1) G(x)Hk(x), or 2 explicitly, H0(x) = 1, 77, (x) = x, H2(x) = x â€” 1, etc., and there is a recursive 48 relation Hk+l(x) = xHk(x) â€” kHk_x(x). Furthermore, biorthogonal property exists between the Hermite polynomials and the derivatives of the Gaussian function: f Hk(x)G{m\x)dx = (-l)mm\5km , (k,m) = 0, 1, ... (2.57) where 8km is the Kronecker delta which is equal to 1 if k = m and 0 otherwise. (2.56) is the so called Gram-Charlier expansion. It is important to note that the natural order of the terms is not the best for the Gram-Charlier series. Rather, the grouping k = (0), (3), (4, 6), (5, 7, 9),... is more appropriate. In practice, the expansion has to be truncated. For BSS or ICA application, the truncation of the series at k = (4, 6) is considered to be adequate. Thus, we have f(x) * G(x) k, kl (k,+ \0k\) 1 + jpjM + ^Ht (x) + - H6(x) (2.58) where cumulants &3 = m3, kA = m4 â€” 'im22, k6 = m6â€” 10m3â€” \5m2m4 + 30 (moments mi = E[xl\ ). The Edgeworth expansion, on the other hand, can be defined as /(*) = G(x) 1 + + + + kyHÂ¿x) \ 35&o&4 +â€”jpW" 280it, k. iHg(x) + f[H6(x) + ... 9! (2.59) There is no essential difference between the Edgeworth expansion and the Gram- Charlier expansion. The key feature of the Edgeworth expansion is that its coefficients decrease uniformly, while the terms in the Gram-Charlier expansion do not tend uniÂ¬ formly to zero ffom the viewpoint of numerical errors. This is why the terms in Gram- Charlier expansion should be grouped as mentioned above. 49 Both Edgeworth and Gram-Charlier expansions will be truncated in the real applicaÂ¬ tion, which make them a kind of approximation to pdfs. Furthermore, they usually can only be used for 1-D variable. For multi-dimensional variable, they become very comÂ¬ plicated. â€¢ Parzen Window Method [Par62, Dud73, Dud98, Chr81, Vap95, Dev85]: The Parzen Window Method is also called a kernel estimation method, or potential function method. Several nonparametric methods for density estimation appeared in the 60â€™s. Among these methods the Parzen window method is the most popular. According to the method, one first has to determine the so-called kernel function. For simplicity and the later use in this dissertation, we consider a simple symmetric GausÂ¬ sian kernel function: G(x, a ) = Â¿exP (2ti) g ( t \ x x V 2a2y (2.60) where a will control the kernel size and x can be a n-D variables. For a data set described in 2.2.2, the density function will be fix) = G(x-a(i),o2) (2.61) i = l which means that each data point will be occupied by a kernel function and the whole density is the average of all kernel functions. The asymptotic theory for Parzen type nonparametric density estimation was developed in the 70s [Dev85]. It concludes that (i) Parzenâ€™s estimator is consistent (in the various metrics) for estimating a density from a very wide classes of densities; (ii) The asymptotic rate of convergence for 50 Parzenâ€™s estimator is optimal for â€œsmoothâ€ densities. We will see later in this Chapter how this density estimation method can be combined with quadratic entropy and quaÂ¬ dratic mutual information to develop the ideas of the information potential and the cross information potential. However, selecting the Parzen window method is not just only for simplicity but also for its good asymptotic properties. In addition, this kernel function is actually consistent with the mass-energy spirit mentioned in Chapter 1. In fact, one data point should not only represent itself but also represent its neighborÂ¬ hood. The kernel function is nothing but more like a mass-density function in this sense. And from this point of view, it naturally introduce the idea of field and potential energy. We will see this in a clearer way later in this chapter. â€¢ Mixture Model [McL88, McL96, Dem77, Rab93, Hua90]: The mixture model is a kind of â€œsemi-parametricâ€ method (or we may call it semi- nonparametric). The mixture model, especially the Gaussian mixture model has been extensively applied in various engineering areas such as the hidden markov model in speech recognition and many other areas. Although Gaussian mixture model assumes that the data samples come from several Gaussian sources, it can approximate quite diverse densities. Generally, the density for a n-D variable x is assumed as K f(x) = (2.62) k= 1 where K is the number of mixture sources, ck are mixture coefficients which are non- K negative and their summation equals 1 V ck = 1, p;- and 'Zi are means and covari- k= l anee matrices for each Gaussian source where Gaussian function is notated by 51 l -^(x-[Ã)ti:\x-\x) G{xâ€” jlÃ, Z) = â€”â€”â€”e with the mean p and covariance (2Ãt)' \ZÂ¡ matrix I as the parameters. All the parameters ck, \ik and lk can be estimated from data samples by the EM algorithm in the maximum likelihood sense. One may notice the similarity between the Gaussian mixture model and the Gaussian kernel estimation method. Actually, the Gaussian kernel estimation method is the extreme case of the Gaussian mixture model where all the means are data points themselves and all the mixture coefficients and all the covariance matrices are equal. In other words, each data point in the Gaussian kernel estimation method is treated as a Gaussian source with equal mixture coefficient and equal covariance. There are also other nonparametric method such as the k-nearest neighbor method [Dud73, Dud98, Sil86], the naive estimator [SÃ186], etc.. These estimated density funcÂ¬ tions are not the â€œnatural density functions;â€ i.e., the integrations of these functions are not equal to 1. And their unsmoothness in data points also make them difficult to be applied to the entropy or mutual information estimation. 2.2,4 Empirical Entropy and Mutual Information: The Literature Review With the probability density function, we can then calculate the entropy or the mutual information, where the difficulty lies in the integrals involved. Both Shannonâ€™s entropy and Shannonâ€™s mutual information are the dominating measures used in the literature, where the logarithm usually brings big difficulties in their estimations. Some researchers tried to avoid the use of Shannonâ€™s measures in order to get some tractability. The sumÂ¬ mary on various existing methods will be given and organized in the following manner, which will start with the simple method of histogram. 52 â€¢ Histogram Based Method If the pdf of a variable is estimated by the histogram method, the variable has to be disÂ¬ cretized by histogram bins. Thus the integration in Shannonâ€™s entropy or mutual inforÂ¬ mation becomes a summation and there is no difficulty at all for its calculation. However, this is true only for a low dimension variable. As pointed out in the previous section, for a high dimension variable, the computational complexity becomes too large for the method to be implementable. Furthermore, in spite of the simplicity it made in the calculation, the discretization makes it impossible to make further matheÂ¬ matical analysis and to apply this method to the problem of optimization of entropy or mutual information where differential continuous functions are needed for analysis. Nevertheless, such simple method is still very useful in the cases such as the feature selection [Bat94] where only the static comparison of the entropy or mutual informaÂ¬ tion is needed. The Case of Full Rank Linear Transform From probability theory, we know that for a full rank linear transform Y = WX where X = (xj, ...,xn) and Y = (y,, ...,yn)T are all vectors in an n-dimensional real space, W is n-by-n full rank matrix, there is a relation between the density function of fAx) X and the density function of Y: fY(y) = ^ [Pap91] where fY and fx are denÂ¬ sity of Y and X respectively, and det( ) is the determinant operator. Accordingly, we have the relation between the entropy of Y and the entropy of X: H(Y) = Â£[-log/â€™y(y)] = E[-\ogfAx) + \og\det(W)\] = H(X) + \og\det(W)\. So, the output entropy H(Y) can be expressed in terms of the input entropy H(X). Although H(X) may not be known, it may be fixed and the relation can be used for 53 the purpose of the manipulation of the output entropy H( Y). This is the basis for a series of methods in BSS and ICA areas. For instance, the mutual information among n n the output marginal variables I(yx,...,yn) = ^ H(yÂ¡) -H(Y) = V H{yt) - i = 1 1 = 1 log |det( W)| - H(X) so that the minimization of the mutual information can be impleÂ¬ mented by the manipulation on the marginal entropies and the determinant of the linÂ¬ ear transform. In spite of the simplicity, this method, however, is obviously coupled with the structure of the transform (full rank is required, etc.), and thus is less general. InfoMax Method Letâ€™s look at a transformation Z = (z,, ...,zn) , zÂ¡ = f(yÂ¡), (yx, ...,yn)T = Y = WX, where /( ) is a monotonic increasing (or decreasing for the cases other than BSS and ICA) function, and the linear transform is the same as the previous. Again, from probability theory [Pap91], we have fz(z) = fyiy) \J(z)\ where fz and fY are density of Z and Y respectively, and J(z) is the Jacobian of the nonlinear transforms expressed as the function of z. Thus, there is the relation: H{Z) = H{Y) + Â£'[log|J(z)|] = H{X) + \og\det(W)\ + Â£[log|./(z)|], where Â£'[log|./(z)|] is approximated by the samÂ¬ ple mean method [Bel95], The maximization of the output entropy can then be manipÂ¬ ulated by the two terms \og\det(W)\ and Â£[log|J(z)|]. In addition to the sample mean approximation, this method requires the match between the nonlinear function and the cdf of the sources signals when applied to BSS and ICA problems. Nonlinear Function By the Mixture Model The above method can be generalized by using the mixture method to model the pdf of sources [XuL97] and then the corresponding cdf; i.e., the nonlinear functions. 54 Although this method avoid the arbitrary assumption on the cdf of the sources, it still suffers from the problem such as the coupling with the structure of a learning machine. â€¢ Numerical Method The integration involved in the calculation of the entropy or mutual information is usually complicated. A numerical method can be used to calculate the integration. However, this method can only be used for low dimensional variables. [Pha96] used the Parzen window method to estimate the marginal density and applied this method for the calculation of the marginal entropies needed in the calculation of the mutual information of the outputs of a linear transform described above. As pointed out by [Vio95], the integration in Shannonâ€™s entropy or mutual information will become extremely complicated when Parzen window is used for the density estimation. ApplyÂ¬ ing the numerical method makes the calculation possible but restricts itself to simple cases, and the method is also coupled with the structure of the learning machine. â€¢ Edgeworth and Gram-Charlier Expansion Based Method As described above, both expansions can be expressed in the form f(x) = G{jc)( 1 + A(x)), where A(x) is a polynomial. By using the Taylor expansion, we have log(l + A(x)) = A(x) â€” ^^ = B(x) for relative small A(x). Then H(x) =â€”jf(x)logf(x)dx = â€”jG(x)(l + A(x))(logG(x) + B(x))dx. Notice that G(x) is the Gaussian function and A(x) and B(x) are all polynomials, this integraÂ¬ tion will have an analytical result. Thus a relation between the entropy and the coeffiÂ¬ cients of the polynomials A(x) and B(x) (i.e. the sample cumulants of the variable) can be established. Unfortunately, this method can only be used for 1 -D variable, and 55 thus it is usually used in the calculation of the mutual information described above for BSS and ICA problems [Yan97, Yan98, Hay98]. â€¢ Parzen Window and Sample Mean Similar to [Pha96], [Vio95] also uses the Parzen Window Method for the pdf estimaÂ¬ tion. To avoid the complicated integration, [Vio95] used the sample mean to approxiÂ¬ mate the integration rather than numerical method in Pham [Pha96]. This is clear when we express the entropy as H{x) = Â£[-log/(x)]. This method can used not only for 1- D variables but also for n-D variables. Although this method is flexible, its sample mean approximation restrict its precision. â€¢ An Indirect Method Based on Parzen Window Estimation Fisher [Fis97] uses an indirect way for entropy optimization. If Y is the output of an mapping and is bounded in a rectangular type region D = {y\(ai terion as k (2.63) 0 ;otherwise where u(y) is the uniform pdf in the region D, fY(y ) is the estimated pdf of the output y by Parzen Window method described in the previous section. The gradient method can be used for the minimization of J. As an example, the partial derivative of J with respect to wtj- are 56 dJ k N dw â– X 2 dJ d V p = 1 n = 1 typWdWa P â– yD (Â«) (2.64) where y(w) are samples of the output. The partial derivative of the mean squared difÂ¬ ference with respect to output samples, can be broken down as Ku(z) = u(z) â€¢ Gg(z) = |u(y)Gg(zâ€”y)dy Kg(z) = G(z, a) â€¢ Gg(z) = Â¡G(y,o2)Gg(z-y)dy Gg(z) = Â§-G(z, a2) where Gg(y) is the gradient of the Gaussian Kernel, KJz) is the convolution between the uniform pdf u(z) and the gradient of the Gaussian Kernel Gg(z), KG(z) is the . 2 convolution between the Gaussian Kernel G(z, ct ) and its gradient Gg(z). As shown in Fisher [Fis97], the convolution KG(z) turns out to be = (,(3Â¿/4) + I 1/4 Â°7'/2" (2-Â«6) V no y If domain D is symmetric; i.e., bi = â€”ai = a/2, i = 1,..., k, then the convolution Ku(z) is *Â«(*) = 1 ( fz. + -^ ni erj ' 2 Jlo 1 Z * 1 K \ y * a / fz. + -^ TÃš ery ' 2 Jlo i*k v y -ery _d:\\ ' 2 Jlo v yy -ery Jlo v yy Gilzi +Â¿> Â°2 â€ž , a 2 -G,[z,--,a â€ž i a 2 z^2â€™a (2.67) 57 T 2 C* where z = (z,,...,zk) , G(z, a ) is the same as (2.60), er/(x) = I -prexp is the error function. This method is indirect and still depends on the topology of the network. But it also shows the flexibility by using Parzen Window method. It has been used in practice with good results for the MACE [Fis97]. Summarizing the above, we see that there is no direct efficient nonparametric method to estimate the entropy or mutual information for a given discrete data set, which is decouÂ¬ pled from the structure of the learning machine and can be applied to n-D variables. In the next sections, we will show how the quadratic entropy and the quadratic mutual informaÂ¬ tion rather than Shannonâ€™s entropy and mutual information can be combined with the Gaussian kernel estimation of pdfs to develop the ideas of â€œinformation potentialâ€ and â€œcross information potential,â€ resulting in a effective and general method for the calculaÂ¬ tion of the empirical entropy and mutual information. 2.3 Quadratic Entropy and Information Potential 2.3.1 The Development of Information Potential As mentioned in the previous section, the integration of Shannonâ€™s entropy with the Gaussian kernel estimation for pdf will become â€œinordinately difficultâ€ [Vio95]. HowÂ¬ ever, if we choose the quadratic entropy and notice the fact that the integration of the prodÂ¬ uct of two Gaussian function can still be evaluated by another Gaussian function as (A.l) shows, then we can come up to a simple method. For a data set described in 2.2.2, we can use Gaussian kernel method in (2.61) to estimate pdf of X and then to calculate the â€œentropy 2-normâ€ as 58 +00 V= {fx(x)2dx â€”00 +oo 1 N = Ã Â¿-Z G(x-a(i)><*2) â€”00 l = V v n G(x~a(J), Â°2) v j = i dx ^ N N +cc = Z Z Ã ^-Â«(0, <52)G(x-a(j),G2)dx AT/ = ly = l â€”oo 1 AT = â€” Z Z G(a(i)-a(j),2o2) lv i= \ j= l (2.68) So, Renyiâ€™s quadratic entropy and Havrda-Charvatâ€™s quadratic entropy lead to a much simpler entropy estimator for a set of discrete data points {a(i) | i= 1,..N} : HR2{X\{a}) = â€”logF ^2(^|{Â«}) =1-^ (2.69) 1 N N v y K= -iZ ZG^z')-^')â€™2a2) Vi = ly = 1 The combination of the quadratic entropies with the Parzen window method leads to entropy estimator that computes the interactions among pairs of samples. Notice that there is no approximation in these evaluations except pdf estimation. We wrote (2.69) in this way because there is a very interesting physical interpretation for this estimator of entropy. Let us assume that we place physical particles in the locaÂ¬ tions prescribed by a{i) and a(j). Actually, the Parzen window method is just in the spirit of mass-energy. The integration of the product of two Gaussian kernels representing some kind of mass density can be regarded as the interaction between particles a(i) and a(J), 2 which results in the potential energy G(a(i) â€” a(j), 2a ). Notice that it is always positive 59 and is inversely proportional to the distance square between the particles. We can consider that a potential field exists for each particle in the space of with a field strength defined by the Gaussian kernel; i.e., an exponential decay with the distance square. In the real world, physical particles interact with the potential energy inverse to the distance between them, but here the potential energy abides by a different law which in fact is determined by the kernel in pdf estimation. V in (2.69) is the overall potential energy including each pair of data particles. As pointed out previously, these potential energies are related to â€œinformaÂ¬ tionâ€ and thus are called â€œinformation potentialsâ€ (IP). Accordingly, data samples will be called â€œinformation particlesâ€ (IPT). Now, the entropy is expressed in terms of the potenÂ¬ tial energy and the entropy maximization now becomes equivalent to the minimization of the information potential. This is again a surprising similarity to the statistical mechanics where the entropy maximization principle has a corollary of the energy minimization prinÂ¬ ciple. It is a pleasant surprise to verify that the nonparametric estimation of entropy here ends up with a principle that resembles the one of the physical particle world which was one of the origin of the concept of entropy. We can also see from (2.68) and (2.69) that the Parzen window method implemented with the Gaussian kernel and coupled with Renyiâ€™s entropy or Havrda-Charvatâ€™s entropy of higher order (a>2) will compute each interaction among a-tuples of samples, providing even more information about the detailed structure and distribution of the data set. 2.3.2 Information Force QFI Just like in mechanics, the derivative of the potential energy is a force, in this case an information driven force that moves the data samples in the space of the interactions to change the distribution of the data and thus the entropy of the data. Therefore, 60 -^â€”G{a{i)-a(J),2a1)= G(a(i)-a(j),2o2)(a(j)-a(i))/(2o2) (2.70) oa(i) can be regarded as the force that a particle in the position of sample a(j) impinges upon a{i) and will be called an information force. If all the data samples are free to move in a certain region of the space, then the information forces between each pair of samples will drive all the samples to a state with minimum information potential. If we add all the conÂ¬ tributions of the information forces from the ensemble of samples on a(i) we have the overall effect of the information potential on sample a(i); i.e., 7)V -1 N = -T1 Y G(a(0-o(/),2a )(a(i)-a(/)) (2.71) 8a0) cles.â€ The entropy will change towards the direction (for each information particle) of the information force. Accordingly, Entropy maximization or minimization could be impleÂ¬ mented in a simple and effective way. 2.3.3 The Calculation of Information Potential and Force The above has given the concept of the information potential and the information force. Here, the procedure for the calculation of the information potential and the informaÂ¬ tion force will be given according to the formula above. The procedure itself and the plot here may even help to further understand the idea of the information potential and the information force. To calculate the information potential and the information force, two matrices can be defined as (2.72) and their structures are illustrated in Figure 2-7. D= {d(ij)}, d(ij) = a(i) â€” a(J) V = {v(z>')}, v(ij) = G{d{ij),2a2) 61 (2.72) a{\) a{2) â€¢â€¢â€¢ a(j) â€¢â€¢â€¢ a(N) Figure 2-7. The structure of Matrix D and V Notice that each element of D is a vector in Rn space while each element of v is a scalar. It is easy to show from the above that 1 N N â„¢i = \j=l -1 N (2.73) /(0 = â€”Yj N V j= 1 / = 1 where V is the overall information potential, f(i) is the force that a(i) receives. We can also define the information potential for each particle a(i) as v(0 = Obviously, V = v(0 From this procedure, we can clearly see that the information potential relies on the difÂ¬ ference between each pair of data points, and therefore makes full use of the information of their relative position; i.e., the data distribution. 62 2,4 Quadratic Mutual Information and Cross Information Potential 2.4.1 OMI and Cross Information Potential (OP) T For the given data set {a(i)~ (ax(i), a2(i)) i= of a variable T X = (X[,jc2) described in 2.2.2, the joint and marginal pdfs can be estimated by the Gaussian kernel method as 1 N fxtx2(xiâ€™*2) = ^Z G(xl-a^i),a2)G(x2-a2(J),o2) i= 1 fxSx i)= ^i-fli(05cr2) (2.74) i= 1 7/ 4(^2) = Â¿Z G(x2-a2(i),G2) i â€” 1 Following the same procedure as the development of the information potential, we can obtain the three terms in ED-QMI and CS-QMI based only on the given data set: Vj = ~2 Z Z G(fl(0-Â«(/')Â» 2a2) N i= \j= 1 1 N N = â€” Z Z G(ax(i)-ax(J),2<5 )G(a2(i)-a2(j),2G ) ; = 1 y = 1 ^ = ^2 iV N vk = â€” Z Z G{ak{i)-ak{j),2a ), k =â– \,2 Ni = \j= 1 = 7>Z 7>Z G(fli(0-fli(/),2^2) ^Z G(a2(i)-a2(J),2a2) f N^ N i = 1 V ' = n y = i (2.75) 7=1 If we define similar matrices to (2.72), then we have 63 D= {d(ij)}, d(ij) = a(i)-a(j) Dk = WO'))* dk(iJ) = ak(0~ak(J), k = 1,2 v = {v(z7)}, v(ij) = G(d(ij), 2c2) vk = {vkW)}â€™ vk^j) = G(dk(ij), 2a2), k = 1,2 (2.76) v(0 = v^')â€™ = k = l> 2 where v(z/') is the information potential in the joint space, thus is called the joint potential; vk(ij) is the information potential in the marginal space, thus is called the marginal potenÂ¬ tial; v(i) is the joint information potential energy for IPT a(i); vk(i) is the marginal information potential energy for the marginal IPT ak(i) in the marginal space indexed by k. Based on these quantities, the above three terms can be expressed as (2.77) Vc = *Ivi(0v2(0 i= 1 So, ED-QMI and CS-QMI can be expressed as 64 hÃ¼i*\â€™Xl) = VED = â€”2L LV^1^V2^J)~m21V^V2^+V\V2 f (2.78) ^cs(x\â€™ X2^ - VCS ~ loÂ§ \ / ^Zvl(0v2(0 i= 1 From the above, we can see that both QMIs can be expressed as the cross-correlations between the marginal information potentials at different levels: vl(ij)v2(ij), Vj(/)v2(i) and Vx V2. Thus, the above measure Ved is called the Euclidean distance cross informaÂ¬ tion potential (ED-CIP), and the measure Vcs is the called Cauchy-Schwartz cross inforÂ¬ mation potential (CS-CIP). The quadratic mutual information and the corresponding cross information potential T can be easily extended to the case with multiple variables, e.g. X = (x1? ...xK) . In this case, we have similar matrices D and v and all similar IPs and marginal IPs. Then we have the ED-QMI and CS-QMI and their corresponding ED-CIP and CS-CIP as follows. Ied(xv â€¢â€¢â€¢Â» xk) ~ Ped ~ 1 N N K 2 N K K 1 N N K K K Ics(xiÂ» - ^cs ~ loS V J f V 65 2,4.2 Cross Information Forces fCIF) The cross information potential is more complex than the information potential. Three different terms (or potentials) contribute to the cross information potential. So, the force that one data point a(i) receives comes from these three sources. A force in the joint space can decomposed into marginal components. The marginal force in each marginal space should be considered separately to simplify the analysis. The case of ED-CIP and CS-CIP are different. They should also be considered separately. Only the cross informaÂ¬ tion potential between two variables will be dealt with here. The case for multiple variÂ¬ ables can be readily obtained in a similar way. dV, ED First, letâ€™s look at the CIF of ED-CIP â€”â€” (k- 1, 2). By the similar derivation pro- oak\l) cedure to that of the Information Force in IP field, we can obtain the following ck = ickW)} > ck(iJ) = vk(iJ)-vk(i)-vk(j)+Vk, k = 1,2 dV, -1 N j= i i = 1, k = 1,2 l*k (2.80) where all dk{ij), vk(ij), vk(i) Vk are defined as the previous ones, Ck are cross matrices which serve as force modifiers. For the CIF of CS-CIP, similarly, we have /*( o = dV, cs 1 dVj 2dVc . {dVk â– + dak(i) Vjda^i) Vcdak(i) Vkdak(i) Â£ v, (ij)v2(ij)dk(ij) Y vk(ij)dk(ij) (v/(i) + vÂ¡(j))vk(ij)dk(ij) j= 1 + j= 1 j = N N N N N z Xvi(Wy2(w Z i = \j = 1 i=\j=\ i = 1 (2.81) 66 marginal IPT Figure 2-8. Illustration of â€œreal IPTâ€ and â€œvirtual IPTâ€ 2.4.3 An Explanation to OMI Another way to look at the CIP comes from the expression of the factorized marginal pdfs. From the above, we have 1 N N fXi(x0fx2(x2) = 1 Z Z G(xi~aiW> G2)G(x2-a2(j), a2) (2.82) = ly = l This suggests that in the joint space, there are TV2 â€œvirtual IPTsâ€ T {(Ãºfj(i'), a2(J)) , /,_/= 1, ...,N} whose pdf estimated by the Parzen Window method will be exactly the factorized marginal pdfs of the â€œreal IPTs.â€ The relation between all types of IPTs is illustrated in Figure 2-8. From the above description, we can see that the ED-CIP is the square of the Euclidean distance between real IP field (formed by real IPTs) and the virtual IP field (formed by virÂ¬ tual IPTs), and the CS-CIP is related to the angle between the real IP field and the virtual IP field as Figure 2-5 shows. When real IPTs are organized such that each virtual IPT has at least one real IPT in the same position, the CIP is zero and two marginal variables Xj 67 and x2 are statistically independent; when real IPTs are distributed along a diagonal line, the difference between the distribution of real IPTs and virtual IPTs is maximized. Two extreme cases are illustrated in Figure 2-9 and Figure 2-10. It should be noticed that both X] and x2 are not necessarily scalars. Actually, they can be multidimensional variables, and their dimensions can be even different. CIPs are general measures for the statistical relation between two variables (based merely on given data). marginal IPT Figure 2-9. Illustration of Independent IPTs marginal IPT Figure 2-10. Illustration of Highly Correlated Variables CHAPTER 3 LEARNING FROM EXAMPLES A learning machine is usually a network. Neural networks are of particular interest in this dissertation. Actually, almost all adaptive systems can be regarded as network models, no matter if they are linear or nonlinear, feedforward or recurrent. In this sense, the learnÂ¬ ing machines studied here are neural networks. So, learning, in this circumstance, is a proÂ¬ cess by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded [Men70], The environÂ¬ mental stimulation, as pointed out in Chapter 1, is usually in the form of â€œexamples,â€ and thus learning is about how to obtain information from â€œexamples.â€ â€œLearning from examÂ¬ plesâ€ is the topic of this chapter, which will include the review and discussion on learning systems, learning mechanisms, the information-theoretic viewpoint about learning, â€œlearnÂ¬ ing from examplesâ€ by the information potential, and finally a discussion on generalizaÂ¬ tion. 3.1 Learning System According to the abstract model described in Chapter 1, a learning system is a mapÂ¬ ping network. The flexibility of the mapping highly depends on the structure of the sysÂ¬ tem. The structure of several typical network systems will be reviewed in this section. Network models can basically be divided into two categories: static models and dynamic models. The static model can also be called a memory-less model. In a network, 68 69 memory about the signal past is obtained by using delayed connections (the connections through delay units) (In continuous time case delay connections become feedback connecÂ¬ tions. In this dissertation, only discrete time signals and systems are studied). Generally speaking, if there are delay units in a network, then the network will have memory. For instance, the transversal filter [Hay96, Wid85, Hon84], the general HR filter [Hay96, Wid85, Hon84], the time delay neural network (TDNN) [Lan88, Wai89], the gamma neuÂ¬ ral network [deV92, Pri93], the general recurrent neural networks [Hay98, Hay94], etc. are all dynamic network systems with memory or delay connections. If a network has delay connections, it has to be described by difference equations (in the continuous time case, differential equations), while a static network can be expressed by algebraic equaÂ¬ tions (linear or nonlinear). There is also another taxonomy for the structure of learning or adaptive systems. For instance, linear models and nonlinear models belongs to another category. The following will start with the static linear model. 3.1.1 Static Models E. Linear Model Possibly, the simplest mapping network structure is the linear model. Mathematically, it is a linear transformation. As shown in Figure 3-1, the input and output relation of the network is defined by (3.1). y = wTx, y = (y,, ...,yk)T eRk mx k r,m , WÂ¡ G R xeRm, w = (wj, ..., Wfc) g R (3.1) 70 where x is the input signal and y is the output signal, w is the linear transformation matrix where each column wi (i = 1, ...&) is a vector. Each output or group of outputs is a subspace of the input signal space. Eigenanalysis (principal component analysis) [Oja82, Dia96, Kun94, Dud73, Dud98] and generalized eigenanalysis [XuD98, Cha97, Dud73, Dud98] are seeking signal subÂ¬ space with maximum signal-to-noise ratio (SNR) or signal-to-signal ratio. For pattern classificaÂ¬ tion, subspace methods such as Fisher Discriminant Analysis are also very useful tools [Oja82, Dud73, Dud98]. Linear models can also be used for inverse problems such as BSS and ICA [Com94, Cao96, Car98b, Bel95, Dec96, Car97, Yan97]. The linear model is simple, and it is very effective for a wide range of problems. The understanding of the learning behavior of a linear model may also help the understanding of nonlinear systems. Figure 3-1. Linear Model F. Multilayer Perceptron (MLP) The multilayer perceptron is the extension of the perceptron model [Ros58, Ros62, Min69]. The perceptron is similar to the linear model in Figure 3-1 but with nonlinear f 1, x>0 functions in each output node, e.g. a hard limit function f(x) = < . The per- -1 , x< 0 71 ceptron initiated the mathematical analysis of learning and it is the first machine which learns directly from examples [Vap95]. Although the perceptron demonstrated an amazing learning ability, its performance is still limited by its single layer structure [Min69]. The MLP extends the perceptron by putting more layer in the network structure as shown in Figure 3-2. For the ease of mathematical analysis, the nonlinear function in each node is usually a continuous differentiable function, e.g. the sigmoid function f(x) = 1/(1 + e '). (3.2) gives a typical input-output relation of the network in Figure 3-2: ' Zi = /OÃ* + bÂ¡) i = < yj= 2+aj) 2 = (zi> â€¢ â€¢ â€¢â€™z/) j = l TYl where bi and cij are the biases for the node zf and respectively, v- e R and wÂ¡ e R are the linear projections for node . and zi respectively. The layer of nodes z is called hidden layer which is neither input nor output. MLPs may have more than one hidden layÂ¬ ers. The nonlinear function /( ) may be different for different nodes. Each node in an MLP is a simple processing element which is abstracted functionally from a real neuron cell, called the McCullock-Pitts model [Hay98, Ru86a]. Collective behavior emerges when these simple elements are connected with each other to form a network whose overÂ¬ all function can be very complex [Ru86a]. One of the most appealing properties of the MLP is its universal approximation ability. It has been shown that as long as there are enough hidden nodes, an MLP can approximate any functional mapping [Hec87, Gal88, Hay94, Hay98]. Since a learning system is nothÂ¬ ing but a mapping from an abstract point of view, the universal approximation property of 1,...,/ 1, ...,k (3.2) 72 the MLP is a very desirable feature for a learning system. This is one reason why the MLP is so popular. The MLP is a kind of â€œglobalâ€ model whose basic building block is a hyperÂ¬ plane which is the projection represented by the sum of the products at each node. The nonlinear function at each node distorts its hyperplane to a ridge function which also serves as a selector. So, the overall functional surface of a MLP is the combination of these ridge function. The number of hidden nodes provides the number of ridge functions. Therefore, as long as the number of nodes is large enough, the overall functional surface can approximate any mapping. This is an intuitive understanding of the universal approxiÂ¬ mation property of the MLP. Figure 3-2. Multilayer Perceptron G Radial-Basis Function (RBF) As shown in Figure 3-3, the RBF network has two layers, the hidden layer is the nonÂ¬ linear layer, whose input-output relation is a radial-basis function, e.g. the Gaussian func- 73 â€œÂ¿Ik-HI tion: z(- = e â€˜ , where p( is the mean (center) of the Gaussian function and determines the location of the Gaussian function in the input space, ct( is the variance of the Gaussian function and determines the shape or sharpness of the Gaussian function. The output layer is a linear layer. So the overall input-output relation of the network can be expressed as i n ii -rilk-HI Z,. = e 2ct'- yj= wjz z = (zJ, (3.3) 2 where Wj are linear projections, a( and are the same as above. Figure 3-3. Radial-Basis Function Network (RBF Network) The RBF network is also a universal approximator if the number of hidden nodes is large enough [Pog90, Par91, Hay98]. However, unlike the MLP, the basic building block is not a â€œglobalâ€ function but a â€œlocalâ€ one such as the Gaussian function. The overall 74 mapping surface is approximated by the linear combination of such â€œlocalâ€ surfaces. IntuÂ¬ itively, we can also imagine that any shape of the mapping surface can be approximated by the linear combination of small piece of local surfaces if there is enough such basic building blocks. The RBF network is also an optimal regularization function [Pog90, Hay98]. It has been applied as extensively as the MLP in various areas. 3.1.2 Dynamic Models H. Transversal Filter The transversal filter, also referred to as a tapped-delay line filter or FIR filter, consists of two parts (as depicted in Figure 3-4): (1) the tapped-delay line, (2) the linear projection. The input-output relation can be expressed as q 1 rp yi y y(n)= YJwix(n-i) = w x, w = (w0, ..., wq) , x = (*(Â«), ..., x(n-q)) (3.4) i = 0 where wÂ¡ are the parameters of the filter. Because of its versatility and ease of implemenÂ¬ tation, the transversal filter has become an essential signal processing structure in a wide variety of applications [Hay96, Hon84]. Figure 3-4. Transversal Filter 75 Figure 3-5. Gamma Filter I. Gamma Model As shown in Figure 3-5, the gamma filter is similar to transversal filter except that the tapped delay line is replaced by the gamma memory line [deV92, Pri93], The gamma memory is a delay tap with feedback. The transfer function of one tap gamma memory is G(z) = itÂ£ 1 â€” (1 â€” p)z_1 Z~0-P) (3.5) The corresponding impulse response is the gamma function with one parameter p - 1: g{n) = p(l-p)" \ n > 1 (3.6) For the pth tap of the gamma memory line, the transfer function and its impulse response (the gamma function) are G^)=(z-Z(fâ€”j n>~p (3-7> Compared with the tapped delay line, the gamma memory line is a recursive structure and has infinite length of impulse response. Therefore, the â€œmemory depthâ€ can be adjusted by the parameter p instead of fixed by the number of taps in the tapped delay line. Compared with the general HR filter, the analysis of the stability of the gamma memÂ¬ ory is simple. When 0 < p < 2, the gamma memory line is stable (everywhere in the line). 76 And also when p = 1, the gamma memory line becomes the tapped delay line. So, the gamma memory line is the generalization of the tapped delay line. The gamma filter is a good compromise between the FIR filter and the HR filter. It has been widely applied to a variety of signal processing and pattern recognition problems. J. The All Pole HR Filter Figure 3-6. The All Pole HR Filter As shown in Figure 3-6, the all pole HR filter is composed of only the delayed feedÂ¬ back and there is no feedforward connections in the network structure. The transfer funcÂ¬ tion of the filter is H(z) = (3.8) 1 - Â¿ w,zâ€œ' i = 1 n Obviously, this is the inverse system of the FIR filter H{z) = 1 â€” wÂ¡z 1 which has / = l been used in deconvolution problems [Hay94a]. There are also its counterpart for two inputs and two outputs system, which has been used in the blind source and blind source separation problems [Ngu95, Wan96]. In general, this type of filters may be very useful in inverse, or system identification problem. 77 K. TDNN and Gamma Neural Network In an MLP, each connection is instantaneous and there is no temporal structure in it. If the instantaneous connections are replaced by a filter, then each node will have the ability to process time signals. The time delay neural network (TDNN) is formed by replacing the connections in the MLP with transversal filters [Lan88, Wai89]. The gamma neural netÂ¬ work is the result of replacing the connections in the MLP with gamma filters [deV92, Pri93]. These types of neural networks extend the ability of the MLP. x Figure 3-7. Multilayer Perceptron with Delayed Connections L. General Recurrent Neural Network A general nonlinear dynamic system is the multilayer perceptron with some delayed connections. As Figure 3-7 shows, for instance, the output of node zÂ¡ relies on the previÂ¬ ous output of node yk: zÂ¡(n) = f{wTlx{n) + bl + dyk{n-\)) (3.9) 78 There may be some other nodes which have the similar delayed connections. This type of neural network is powerful but complicated. It is difficult to analyze adaptation although its flexibility and potential are high. 3.2 Learning Mechanisms The central part of a learning mechanism is the criterion. The range of application of a learning system may be very broad. For instance, a learning system or adaptive signal proÂ¬ cessing system can be used for data compression, encoding or decoding signals, noise or echo cancellation, source separation, signal enhancement, pattern classification, system identification and control, etc.. However, the criterion to achieve such diverse purposes can be basically divided into only two types: one is based on the energy measures; the other is based on information measures. As pointed out in Chapter 2, the energy measures can be regarded as special cases of information measures. In the following, various energy measures and information measures will be discussed. Once the criterion of a system is determined, the task left is to adjust the parameters of the system so as to optimize the criterion. There are a variety of optimization techniques. The gradient method is perhaps the simplest but it is a general method [Gil81, Hes80, Wid85] which is based on the first order approximation of the performance surface. Its onÂ¬ line versionâ€”the stochastic gradient method [Wid63] is widely used in adaptive and learnÂ¬ ing systems. Newtonâ€™s method [GÃ181, Hes80, Wid85] is a more sophisticated method which is based on the second order approximation of the performance surface. Its varied versionâ€”the conjugate gradient method [Hes80] will avoid the calculation of the inverse of the Hessian matrix and thus is computationally more efficient [Hes80]. There are also other techniques which are efficient for specific applications. For instance, the Expecta- 79 tion and Maximization algorithm for the maximum likelihood estimation or a class of nonÂ¬ negative function maximization [Dem77, Mcl96, XuD95, XuD96]. The natural gradient method by means of information geometry is used in the case where the parameter space is constrained [Ama98], In the following, various techniques will also be briefly reviewed. 3.2.1 Learning Criteria â€¢ MSE Criterion The mean squared error (MSE) criterion is one of the most widely used criteria. For the learning system described in Chapter 1, if the given environmental data is {(x(Â«), d{n))\n= 1, ...,N} where x(n) is the input signal and d{n) is the desired signal, then the output signal is y(n) = q(x(n), W) and the error signal is e(n) = d(n) â€”y(n). The MSE criterion can be defined as . N , N J = 5 X <"> = l X ( It is basically the squared Euclidean distance between desired signal d(n) and the outÂ¬ put signal y(n) from the geometrical point of view, and the energy of the error signal from the point of view of the energy and entropy measures. Minimization of the MSE criterion will result in a closest output signal to the desired signal in the Euclidean disÂ¬ tance sense. As mentioned in Chapter 2, if we assume the error signal is white GaussÂ¬ ian with zero-mean, then the minimization of the MSE is equivalent to the minimization of the entropy of the error signal. 80 For a multiple output system; i.e., the output signal and the desired signal are multiÂ¬ dimensional, the error signal is then multi-dimensional and the definition of the MSE criterion is the same as described in Chapter 2. â€¢ Signal-to-Noise Ratio (SNR) The signal-to-noise ratio is also a frequently used criterion in the signal processing area. The purpose of many signal processing systems is to enhance the SNR. A well known example is the principal component analysis (PCA), where a linear projection is desired such that the SNR in the output is maximized (when the noise is assumed to T tn be white Gaussian). For the linear model described above y = w x, y e R, x e R TYl T and w g R , if the input x is zero-mean and its covariance matrix is Rx = E[xx ], 2 t T T then the output power (short time energy) is E\y ] = w E[xx ]w = w Rxw. If the input is xnoiseâ€”a zero-mean white Gaussian noise with covariance matrix being iden- T tity matrix /, then the output power of the noise is w w. The SNR in the output of the linear projection will be J = T w Rxw T W W (3.11) From the information-theoretic point of view, the entropy of the output will be noise) = ^log(v/w) + |log27l + ! Et(wTx) = |log(wrR^w) + ^ log 2 7i + | (3.12) where the input signal x is assumed zero-mean Gaussian signal. Then the entropy difÂ¬ ference is 81 J = H(wTx)-H(wTxnoise) = -log (3.13) T w w which is equivalent to the SNR criterion. The solution to this problem is the eigenvecÂ¬ tor that corresponds to the largest eigenvalue of Rx. The PCA problem can also be formulated as the minimum reconstruction MSE probÂ¬ lem [Kun94]: (3.14) (3.14)can also be regarded as an auto-association problem in a two-layer network with the constraints that the two layer weights should be dual with each other (i.e. one is the transpose of the other). The minimization solution to (3.14) is equivalent to the maxiÂ¬ mization solution to (3.12) or (3.13). â€¢ Signal-to-Signal Ratio For the same linear network, if the input signal is switched between two zero-mean signals jcj and x2, then the signal-to-signal ratio in the output of the linear projection will be (3.15) where Z?v is the covariance matrix of x,, and Rr is the covariance matrix of x,. The Maximization of this criterion is to enhance the signal xÂ¡ in the output and to attenuate the signal x2 at the same time. From the information-theoretic point of view, if both signals are Gaussian signals, then the entropy difference in the output will be 82 T T i w Rr w J = H(w\)-H(W:Jx2) = zlog ^ (3.16) 1 w Ryw xi which is equivalent to a signal-to-signal ratio. The maximization solution to (3.15) or (3.16) is the generalized eigenvector with the largest generalized eigenvalue: ^xâ„¢optimal â€” ^max^xâ„¢ optimal (3-17) [Cha97] also shows that when this criterion is applied to classification problems, it can be formulated as a heteroassociation problem with a MSE criterion and a constraint. â€¢ The Maximum Likelihood The maximum likelihood estimation has been widely used in the parametric model estimation [Dud98, Dud73]. It has also been extensively applied to â€œlearning from examples.â€ For instance, the hidden markov model has been successfully applied in the speech recognition problem [Rab93, Hua90]. Training of most hidden markov models is based on maximum likelihood estimation. In general, suppose there is a staÂ¬ tistical model p(z, w) where z is a random variable and w are a set of parameters, and the true probability distribution is q(z) but unknown. The problem is to find w so that p(z, w) is the closest to q(z). We can simply apply the information cross-entropy criÂ¬ terion, i.e. the Kullback-Leibler criterion to the problem: J(w) = ^q(z)\og^Z~^-dz = -E[\ogp{z,w)]+Hs{z) (3.18) where Hs(z) is the Shannon entropy of z which does not depend on the parameters w, and L(w) = E[\ogp(z, w)] is exactly the log likelihood function of p(z, w). So, the minimization of (3.18) is equivalent to the maximization of the log likelihood function L(w). In other words, the maximum likelihood estimation is exactly the same as the 83 minimum Kullback-Leibler cross-entropy between the true probability distribution and the model probability distribution [Ama98]. â€¢ The Information-Theoretic Measures for BBS and ICA As introduced in Chapter 2, the maximization of the output entropy and the minimizaÂ¬ tion of the mutual information between the outputs can be used in BBS and ICA probÂ¬ lems. We will deal with this case in more details later. 3.2,2 Optimization Techniques â€¢ The Back-Propagation Algorithm m dJ In general, for a function R1" â€”Â» R: J = f(w), the gradient is the steepest ascent n At; dxv QJ direction for J, and â€” ^râ€” is the steepest descent direction for J, and the whole first ow order approximation of the function at w - wn is y=y(wâ€ž) + Awr|t- (3.19) So, for the maximization of the function, the updating of w can be accomplished along the steepest ascent direction; i.e., wn +, = wn + pâ€” where p is the step size. For the minimization of the function the updating rule can be along the steepest descent direction; i.e., wn + x = wn â€” p^ [Wid85]. If the gradient can be w = wn expressed as the summation over data samples such as the case of the MSE as the cri- N 1 2 terion J = ^ J(n), J{n) = ~(d(n)â€” y(n)) , then each datum can be used to n = i 1 Ã“ update the parameter w whenever it appears; i.e., w_ + , = wâ€ž Â± \xâ€”â€”J(n). This is dw called the stochastic gradient method [Wid63]. 84 N For a MLP network described above, the MSE criterion is still J = ^ J(n). Letâ€™s n= l T T look at a simple case with only one output node y = f(v z + a), v = (v,, vÂ¡) , T T z = (zx,...,zÂ¡) ,zÂ¡ = f(wÂ¡x + bÂ¡), i = 1,...,/. Then by the chain rule, we have dJ dv N d N = I f/(Â«) = I Â« = l dv ^ ^ y(Â«) H = 1 5V (3.20) We can see from this equation that the key point here is how to calculate the sensitivity Q Q of the network output -z-y(n). The term â€”â€”â€”J(n) in the MSE case is the error signal dv Sy(n) â€”-â€”J(n) = e(n) = y(n) â€” d(n). The sensitivity can then be regarded as a mecha- dy(n) nism which will propagate the error e(n) back to the parameters v or wÂ¿. To be more specific, we have (3.21) if we consider the relation ^ = y{ 1 â€”y) for a sigmoid funcÂ¬ tion y = f(x) = 1 /(1 + e x) and apply the chain to the problem %{n) = 1 â€¢ (y(Â«)(l-y(Â«))} Â¿(*0 = %{n)z dv = ^(Â«)v dz = Try-&(*)* (zWO-zC*))} dz(n) = Â£i(*)x(n) (3.21) where â€¢ is the operator for component-wise multiplication. The process of (3.21) is a linear process which back-propagate 1 through the â€œdual networkâ€ system back to each parameter and thus is called â€œback-propagation.â€ If we need to back-propagate an error e(n), then the 1 in Â£(Â«) of (3.21) will be replaced by e(n), and (3.21) will be called the â€œerror back-propagation.â€ Actually, the â€œerror back-propagationâ€ is nothing but the gradient method implementation with the calculation of the gradient by the 85 chain rule applied to the network structure. The effectiveness of the â€œback-propagaÂ¬ tionâ€ is its locality in calculation by utilizing the topology of the network. It is signifiÂ¬ cant for engineering implementations. For a detailed description, one can refer to Rumelhart etal. [Ru86b, Ru86c]. Figure 3-8. The Time Extension of the Recurrent Neural Network in Figure 3-7. For a dynamic system with delay connections, the whole network can be extended along time with the delay connections linking the nodes between time slices. The recurrent neural network in Figure 3-7 is shown in Figure 3-8, in which, the structure in each time slice will only contain the instantaneous connections, and the delay conÂ¬ nections will connect the corresponding nodes between time slices. Once a dynamic network is extended in time, the whole structure can be regarded as a large static net- 86 work and the back-propagation algorithm can be applied as usual. This is so called the â€œback-propagation through timeâ€ (BPTT) [Wer90, Wil90, Hay98]. There is another algorithm for the training of dynamic networks, which is called â€œreal time recurrent learningâ€ (RTRL) [WÃ189, Hay98]. Both the BPTT and the RTRL are the gradient based method and both of them use the chain rule to calculate the gradient. The differÂ¬ ence is that the BPTT starts the chain rule from the end of a time block to the beginÂ¬ ning of it, while the RTRL starts the chain rule from the beginning of a time block to the end of it, resulting in differences of the memory complexity and computational complexity [Hay98]. Newtonâ€™s Method The gradient method is based on the first order approximation of the performance surÂ¬ face and is simple. But its convergence speed may be slow. Newtonâ€™s method is based on the second order approximation of the performance surface and the closed form optimization solution to a quadratic function. First, letâ€™s look at the optimization soluÂ¬ tion to a quadratic function F(x) = }jXTAx â€” hTx + c where A e Rm * m is symmetric matrix, it is either positive definite or negative definite, h e Rm and x e Rm are vecÂ¬ tors, c is a scalar constant. There is an maximum solution x0 if A is negative definite, or there is an minimum solution x0 if A is positive definite, where in both case, x0 should satisfy the linear equation JLf(x) = 0; i.e., Ax = h, or x0 = A '/z. For a general cost function J(w), its second order approximation at w = wn will be J(w) = J(wn) + (w-wâ€ž) + |(w-wâ€ž)rf/(wâ€ž)(w-w/z) (3.22) 87 where H(wn) is the Hessian matrix of J(w) at w = wn. So, the optimization point â€”l Q for (3.22) is w â€” wn = â€”H(wn) -Zâ€”J(w ). Thus we have Newtonâ€™s method as fol- ow lows [Hes80, Hay98, Wid85]: _l A (3.23) As pointed in Haykin [Hay98], there are several problems for Newtonâ€™s method to be applied to the MLP training. For instance, Newtonâ€™s method involves the calculation of the inverse of the Hessian matrix. It is computationally complex and there is no guarantee that the Hessian matrix is nonsingular and always positive or negative defiÂ¬ nite. For a nonquadratic performance surface, there is no guarantee for the converÂ¬ gence of Newtonâ€™s method. To overcome these problems, there appear the Quasi- Newton method [Hay98] and the conjugate gradient method [Hes80, Hay98], etc. â€¢ Quasi-Newton Method This method uses an estimate of the inverse Hessian matrix without the calculation of the real inverse. This estimate is guaranteed to be positive definite for a minimization problem or negative definite for a maximization problem. However, the computational complexity is still in the order of 0(PV2) where W is the number of parameters [Hay98], â€¢ The Conjugate Gradient Method The conjugate gradient method is based on the fact that the optimal point of a quaÂ¬ dratic function can be obtained by a sequential searches along the so called conjugate directions rather than the direct calculation of the inverse of the Hessian matrix. There is a guarantee that the optimal solution can be obtained within W steps for a quadratic 88 function (W is the number of parameters). One method to obtain the conjugate direcÂ¬ tions is based on the gradient directions; i.e., the modification of the gradient direcÂ¬ tions may result in the one set of conjugate directions, thus the name â€œconjugate gradient methodâ€ [Hes80, Hay98]. The conjugate gradient method can avoid the calÂ¬ culation of the inverse and even the evaluation of the Hessian matrix itself, and thus is computational efficient. The conjugate gradient method is perhaps the only second- order optimization method which can be applied to large-scale problems [Hay98]. â€¢ The Natural Gradient Method When a parameter space has a certain underlying structure, the ordinary gradient of a function does not represent its steepest direction, but the natural gradient does. The basic point of the natural gradient method is as follows [Ama98]: For a cost function J(w), if the small incremental vector dw is fixed with its length; 2 2 i.e., \dw\ = 8 where 8 is a small constant, then the steepest descent direction of 0 3 J(w) is â€”~J(w) and the steepest ascent direction is -i-J(w). However, if the length dw dw T 2 of dw is constrained in such a way that the quadratic form (dw) G(dw) = s where G is so called Riemannian metric tensor which is always positive definite, then the â€”i 3 steepest descent direction will be â€” G -^-J(w), and the steepest ascent direction will dw be G X-^-J(w). dw â€¢ The Expectation and Maximization (EM) Algorithm The EM algorithm can be generalized and summarized as the following inequality called the generalized EM inequality [XuD95], which can be described as follows: 89 For a non-negative function f(D, 0) = ^ _/].(Â£), 0), fÂ¡(D, 0) > 0, V(D, 0), i = 1 k â€¢ â€¢ D = {dt e R } is the data set, 0 is the parameter set, we have /(A eâ€ž+!)>/(Â£>, eâ€ž), If 0â€ž + j = argmax Â£/,(A 9) Ã = 1 (3.24) This inequality suggests an iterative method for the maximization of the function f(D, 0) with respect to the parameters 0, that is the generalized EM algorithm (all functions fÂ¡(D, 0) and f(D, 0) are not required to be a pdf function, as long as they are non-negative functions). First, use the known parameters Qn to calculate fÂ¡(D, Qn) i and thus 0n)log/)(,D, 0), this is so called expectation step I i = l ( 0n)logfÂ¿(D, 0) can be regarded as a generalized expectation); Second, find i = l / the maximum point 0^ + , for the expectation function ^T/)(A 0/z)logfÂ¡(D, 0), this /= l is so called maximization step. The process can go on iteratively. With this inequality, it is not difficult to prove the Baum-Eagon inequality which is the basis for the training of the well known hidden markov model. The Baum-Eagon ineÂ¬ quality can be stated as P(y)>P(x) where P(x) = /â€™({x^}) is a polynomial with nonnegative coefficients homogeneous of degree d in its variables x^; x = {xÂ¡j} is a 9/ point in the domain PD: xfj > 0 ^ xÂ¡j =1 i = 1 j = and 9/ d j= i xtj-â€”P(x) * 0 for all /; y = {yÂ¡j} is another point in the PD satisfying j = i " ya = xi lJdx -P(x) / j = 1 . If we regard x as a parameter set, then this inequality also suggests an iterative way to maximize the polynomial P(x). That is the above y is a better estimation of parameters (better means makes the polynomial larger) and the process can go on iteratively. The polynomial can also be non-homoge- neous but with nonnegative coefficients. This is a general result which has been 90 applied to train such general model as the multi-channel hidden markov model [XuD96], where the calculation of the gradient P(x) is still needed and which is dxij accomplished by the back-propagation through time. So, the forward and backward algorithm in the training of the hidden markov model can be regarded as the forward process and back-propagation through time for the hidden markov network [XuD96], The details about the EM algorithm can be found in Dempster and McLachlan [Dep77, Mcl96]. 3.3 General Point of View It can be seen from the above that there are variety of learning criteria. Some of them are based on energy quantities, some of them are based on information-theoretic meaÂ¬ sures. In this chapter, a unifying point of view will be given 3.3.1 InfoMax Principle In the late 1980s, Linsker gave a rather general point of view about learning or statistiÂ¬ cal signal processing [Lin88, Lin89], He pointed out that the transformation of a random vector X observed at the input layer of a neural network to a random vector Y produced at the output layer of the network should be so chosen that the activities of the neurons in the output layer jointly maximize information about the activities in the input layer. To achieve this, the mutual information /(Y, X) between the input vector X and the output vector Y should be used as the cost function or criteria for the learning process of the neuÂ¬ ral network. This is called the InfoMax principle. The InfoMax principle provides a mathÂ¬ ematical framework for self-organization of the learning network that is independent of the rule used for its implementation. This principle can also be viewed as the neural net- 91 work counterpart of the concept of channel capacity, which defines the Shannon limit on the rate of information transmission through a communication channel. The InfoMax prinÂ¬ ciple is depicted in the following figure: Figure 3-9. InfoMax Scheme When the neural network or mapping system is deterministic, the mutual information is determined by the output entropy as it can be shown by I(Y,X) = H(Y) â€” H(Y\X) where H( Y) is the output entropy, and H{ Y\X) = 0 is the conditional output entropy when the input is given (since the input-output relation is deterministic, the conditional entropy is zero). So, in this case, the maximization of mutual information is equivalent to the maximization of the output entropy. 3.3.2 Other Similar Information-Theoretic Schemes Haykin summarized other information-theoretic learning schemes in [Hay98], which all use the mutual information as the learning criteria but the schemes are formulated in different ways. There are three other different scenarios which are described in the followÂ¬ ing. Although the formulations are different, the spirit is the same as the InfoMax princiÂ¬ ple [Hay98], 92 â€¢ Maximization of the Mutual Information Between Scalar Outputs As depicted in Figure 3-10, the objective of this learning scheme is to maximize the mutual information between two scalar outputs such that the output ya will convey most information about yb and vice versa. The example of this scheme is the spatially coherent feature extractor [Bec89, Bec92, Hay98], where as depicted in Figure 3-11, the transformation of a pair of vectors Xa and Xb (representing adjacent, nonoverlapÂ¬ ping regions of an image by a neural system) should be so chosen that the scalar output ya of the system due to the input Xa maximizes information about the second scalar output yb due to Xb. Figure 3-10. Maximization of the Mutual Information between Scalar Outputs Figure 3-11. Processing of two Neighboring Regions of an Image 93 Figure 3-12. Minimization of the Mutual Information between Scalar Outputs â€¢ Minimization the Mutual Information between Scalar Outputs Similar to the previous scheme, this scheme is trying to make the two scalar outputs to be the most irrelevant. The example of this scheme is the spatially incoherent feature extractor [Ukr92, Hay98]. As depicted in Figure 3-13, the transformation of a pair of input vectors Xa and Xb, representing data derived from corresponding regions in a pair of separate images, by a neural system should be so chosen that the scalar output ya due to the input Xa minimize information about the second scalar output yb due to the input Xb, and vice versa. Figure 3-13. Spacially Inchoherent Feature Extraction 94 Figure 3-14. Minimization Mutual Information among Outputs â€¢ Statistical Independence between Outputs This scheme requires that all the outputs of the system are independent with each other. The examples for this scheme are all systems for Blind Source Separation and Independent Component Analysis described in the previous chapters, where usually the systems are full rank linear networks. Desired Signal D Figure 3-15. A General Learning Framework 95 3.3.3 A General Scheme As can be seen from the above, all the existing learning schemes are by no means genÂ¬ eral. The InfoMax principle deals with only the mutual information between the input and the output, although it motivated the analysis of a learning process from information-theoÂ¬ retic angle. The other schemes summarized by Haykin are also some specific cases even with the limitation of model linearity and Gaussian assumption. These learning schemes have not considered the case with external teacher signals, i.e. the supervised learning case. In order to unify all the schemes, a general learning framework is proposed here. As depicted in Figure 3-15, this general learning scheme is nothing but the abstract and general learning model described in Chapter 1 with the specification of the learning mechanism as the optimization of the information measure based on the response of the learning system Y and the desired or teacher signal D. If the desired signal D is the input signal X and the information measure is the mutual information, then this scheme degenÂ¬ erate the InfoMax principle. If the desired signal D is one or some of the output signals, then this scheme degenerates the schemes summarized by Haykin and the case of BBS and ICA. Ever for a supervised learning case, where there is an external teacher signal D, the mutual information between the response of the learning system Y and the desired sigÂ¬ nal D can be maximized under this scheme. That means, in general, the purpose of learnÂ¬ ing is to transmit as much information about the desired signal D as possible in the output or response of the learning system Y. The extensively used MSE criterion, this scheme is still contained in this scheme, where the difference signal or error signal Yâ€”D is assumed white Gaussian with zero mean, and the minimization of the entropy of the error signal is equivalent to the minimization of the MSE criterion according to Chapter 2. 96 In this learning scheme, the supervised learning can be defined as the case with an external desired signal. In this case, the order of the learning system appears such that its response best represents the desired signal. If the desired signal is either the input of the system or the output of the system, this scheme becomes unsupervised learning, where the system will self-organize such that either the output signal best represent the input signal, or the outputs are independent with each other or highly related with each other. The folÂ¬ lowing will give two specific cases of this general point of view. 3.3.4 Learning as Information Transmission Laver-bv-Laver For a layered network, each layer can itself be regarded as a learning system. The whole system is the concatenation of each layer. From the above general point of view, if the desired signal is either an external one or the input signal, then each layer should serve the same purpose for the learning as to transmit as much information about the desired sigÂ¬ nal as possible. In this way, the whole learning process is broken down to several small scale learning processes and each small learning process can proceed sequentially. This is an alternative learning scheme for a layered network where the back-propagation learning 97 algorithm has dominated for more than 10 years. The layer-by-layer learning scheme may simplify the whole learning process and shed more light into the essence of the learning process in this case. The scheme is shown in the following figure. Examples of the appliÂ¬ cation of such learning scheme will be given in Chapter 5. 3.3.5 Information Filtering: Filtering beyond Spectrum Traditional filtering is based on the spectrum, i.e. an energy quantity. The basic interÂ¬ est of traditional filtering is to find some signal components or signal subspace according to the spectrum. From the information-theoretic point of view, the signal components or signal subspace, linear or nonlinear, should be chosen not in the domain of the spectrum but in the domain of â€œthe signal information structure.â€ A signal may contain various kinds of information. The list of various information will be so called â€œinformation specÂ¬ trum.â€ It is more desired to choose signal components or subspace according to such â€œinformation spectrumâ€ than to choose signal components according to the energy specÂ¬ trum which is the traditional way of filtering. The idea of the information filtering proÂ¬ posed here will generalize the traditional way of filtering and bring more powerful tools in the signal processing area. Examples of information filtering application to pose estimaÂ¬ tion of SAR (synthetic aperture radar) image will be given in Chapter 5. 3.4 Learning by Information Force The general point of view is important, but the practical implementation is more chalÂ¬ lenging. In this section, we will see how the general learning scheme can be implemented or further specified by using the powerful tool of the information potential and the cross information potential. The general learning scheme can be depicted as 98 Figure 3-17. The General Learning Scheme by Information Potential In the general learning scheme depicted in Figure 3-17, if the information measure used is the entropy, then the information potential can be used; if the information measure is the mutual information, then the cross information potential can be used. So, the inforÂ¬ mation potential in Figure 3-17 is a general term which stands for both the narrow sense information potential and the cross information potential. We may call such a general term as the general information potential. Given a set of environmental data {(*(Â»), d(n))\n= 1, ...,N}, there will be the response data set {y(n)\n= 1, ...,N} y(n) = q(x(n), w), then the general information potential V( {y{n)}) can be calculated according to the formula in Chapter 2. To optimize V({y(n)}), the gradient method can be used. Then the gradient of V({y(n)}) with respect to the parameters of the learning system and the learning of the system will be 4-V({y(n)})= ZflT-r ow â€ž=[dy(w) ow w = wÂ±r\-^-V({y{n)}) ow (3.25) 99 As described in Chapter 2, â€” F( {>>(Â«)}) is the information force that the informa- dy{n) tion particle y(n) receives in the information potential field. As pointed out in the above â€”y(n) is the sensitivity of the learning network output and it serves as the mechanism of ow error back-propagation in the error back-propagation algorithm. Here, (3.25) can be interÂ¬ preted as â€œinformation force back-propagation.â€ So, from a physical point of view such as a mass-energy point of view, the learning starts from the information potential field, where each information particle receives the information force from the field, which then transmits through the network to the parameters so as to drive them to a state which will make the information potential be optimized. The information force back-propagation is illustrated in Figure 3-18 where the network functions as a â€œleverâ€ which connects the parameters and data samples (information particles) and transmit the force that the field impinges on the information particles to the parameters. â€œInformation Forceâ€ Network Data Sample Figure 3-18. Illustration of Information Force Back-Propagation Parameters 3.5 Discussion of Generalization by Learning The basic purpose of learning is to generalize. As pointed out in Chapter 1, generalizaÂ¬ tion is nothing but to make full use of the information given, neither less nor more. Similar 100 point of view can be found in Christensen [Chr80: page vii] where he pointed out: â€œThe generalizations should represent all of the information which is available. The generalizaÂ¬ tions should represent no more information than is available.â€™Tdeas of this kind are found in ancient wisdom. The ancient Chinese philosopher Confucius pointed out: â€œSay â€˜knowâ€™ when you know; say â€œdonâ€™t knowâ€ when you donâ€™t know, that is the real knowledge.â€ Although Confuciusâ€™ word is about the right attitude that a scholar should take, when we are thinking about machine learning today, this is still the right â€œattitudeâ€ that a machine should take in order to obtain information from its environment. The information potential provides a powerful tool to achieve the balance of making full use of given information while avoiding explicit or implicit assumptions that are not given. To be more specific, the information potential does not rely on any external assumption and its formulation tells us that it examines each pair of data, extracting more detailed information from the data set than the traditional MSE criterion where only the relative position between each data sample and their mean is considered and the relative position of each pair of data samples is obviously ignored and thus they can be treated independently. In this aspect, the information potential is similar to the supporting vector machine [Vap95, Cor95], where a maximum margin is pursued for a linear classifier, and for this purpose, the detailed data distribution information is also needed. The supporting vector machine has shown to have a very good generalization ability. The experimental results in Chapter 5 will also show that the information potential have a very good generÂ¬ alization too, and even better result than supporting vector machine. CHAPTER 4 LEARNING WITH ON-LINE LOCAL RULE: A CASE STUDY ON GENERALIZED EIGENDECOMPOSITION In this chapter, the issue of learning with on-line local rules will be discussed. As pointed out in Chapter 1, learning or adaptive evolution of a system can happen whenever there are data flowing into the system, and thus should be on-line. For a biological neural network, the strength of a synaptic connection will evolve only with its input and output activities. For a learning machine, although the features of â€œon-lineâ€ and â€œlocalityâ€ may not be necessary in some cases, a system with such features will certainly be much more appealing. The Hebbian rule is the well-known postulated rule for the adaptation of a neu- robiological system [Heh49]. Here, it will be shown how the Hebbian rule and the anti- Hebbian rule can be mathematically related to the energy and cross correlation of a signal, and how these simple rules can be combined together to achieve on-line local adaptation for a problem as intricate as generalized eigendecomposition. We will again see the role of the mass-energy concept. 4.1 Energy. Correlation and Decorrelation for Linear Model In Chapter 3, a linear model is introduced, where the input-output relation is formuÂ¬ lated in (3.1) and the system is illustrated in Figure 3-1. In the following, it will be shown how the energy measure of a linear model can be related to Hebbian and anti-Hebbian learning rule. 101 102 4.1.1 Signal Power. Quadratic Form. Correlation. Hebbian and Anti-Hebbian Learning â€¢ T In Figure 3-1, the output signal in the ith node is yÂ¿ = wÂ¡ x. So, given a data set {x(n)\n= 1, ...,N}, the power of the output signal yÂ¡ is the quadratic form: p = X = wjswiâ€™ s = e{xxt) = jjYa x("M")r (4J) n = 1 n = 1 where the covariance matrix of the input signal S is estimated from samples and n is the time index. One of the consequences of the quadratic form of (4.1) is that it can be interpreted as a field in the space of the weights. The change in the power â€œfieldâ€ with the projection wi is shown in Figure 4-1 where â€œP = constant" are hyper-ellipsoids. The normal vector of the surface â€œP = constant â€œ is SwÂ¿ which is proportional to VW.P (the gradient of P) This means that the normal vector Swi is the direction of the steepest ascent of the power P. Figure 4-1. The power â€œfieldâ€ P of the input signal The Hebbian and the anti-Hebbian learning, although initially motivated by biological considerations [Heb49], happen to be consistent with the normal vector direction. These rules can be summarized as follows: Sample by Sample Mode Aw,(Â«)Â«y(.(n)x(n) Hebbian N AwÂ¡(n) cc ^ yÂ¡(n)x(n) Â°c Batch Mode (4.2) /7 = 1 AWj(n) cc â€”yi(n)x(n) N Sample by Sample Mode Anti-Hebbian (4.3) AW;(n) oc â€” ^ yÂ¿(n)x(n) where the adjustment of the projection wÂ¡ should be proportional to the input and output signal correlations for Hebbian learning (or the negative of the correlation for the anti- Hebbian learning). So, the direction of Hebbian batch learning is actually the direction of the fastest ascent in the power field of the output signal, while the anti-Hebbian batch learning moves the system weights in the direction of the fastest descent of the power field. The sample-by-sample Hebbian and anti-Hebbian learning rules are just the stochasÂ¬ tic versions for their corresponding batch mode learning rules. Hence, these simple rules are able to seek both the directions of the steepest ascent and descent in the input power field using only local information. 4.1.2 Lateral Inhibition Connections. Anti-Hebbian Learning and Decorrelation Lateral inhibition connections adapted with the anti-Hebbian learning are known to decorrelate signals. As shown in Figure 4-2, c is the lateral inhibition connection from y\ to yj, yÂ¿ = yÂ¡, yj = cy\ + yj â– The cross-correlation between yÂ¡ and yj is as (4.4) (note the upper C denotes the cross-correlation, and the lower c denotes the lateral inhibition connections). 104 C(yÂ¡,yj) = YjWyft1) = cY/i^2 + Tjy+i^y+j^ (4.4) Figure 4-2. Lateral Inhibition Connection 2 Assume the energy of the signal yÂ¡, W(Â«) , is always greater than 0. Then, there n always exists a value c - Y,yUn)yj(n) 7 &â€¢(Â«) A f / \ n \ n (4.5) which will make C(yÂ¡, y ) = 0; i.e., decorrelate signal yt and y.. The anti-Hebbian learning requires the adjustment of c to be proportional to the negaÂ¬ tive of the cross-correlation between the output signals, as (4.6) shows 'Ac = â€”r\(yÂ¿(n)yj(n)) Sample by Sample mode Ac = â€”y\C(yÂ¡,yj) = Batch Mode n where r\ is the learning step size. Accordingly, we have (4.7) for the batch mode. AC = (Ac)^(/z)2 = -TlEC ( E = ^(n)2 > 0 ) (4.6) (4.7) 105 It is obvious that 0 is the only fixed stable atractor for the dynamic process dC/dt = â€”EC. So, the anti-Hebbian learning will converge to decorrelate the signals as long as the learning step size p is small enough. Summarizing the above, we can say that for a linear projection, the Hebbian learning tends to maximize the output energy while the anti-Hebbian learning tends to minimize the output energy, and for a lateral inhibition connection, the anti-Hebbian learning tends to minimize the cross-correlation between the two output signals. 4.2 Eigendecomposition and Generalized Eigendecomposition Eigendecomposition and generalized eigendecomposition arise naturally in many sigÂ¬ nal processing problems. For instance, principal component analysis (PCA) is basically an eigenvalue problem with wide application in data compression, feature extraction and other areas [Kun94, Dia96]; as another example, Fisher linear discriminant analysis (LDA) is a generalized eigendecomposition problem [Dud73, XuD98]; signal detection and enhancement [Dia96] and even blind source separation [Sou95] can also be related to or formulated as an eigendecomposition or generalized eigendecomposition. Although the solutions based on numerical methods have been well studied [Gol93], adaptive, on-line solutions are more desirable in many cases [Dia96]. Adaptive on-line structures and methÂ¬ ods such as Ojaâ€™s rule [Oja82] and the APEX rule [Kun94] emerged in the past decade to solve the eigendecomposition problem. However, the study of adaptive on-line methods for generalized eigendecomposition is far from satisfactory. Mao and Jain [Mao95] use two steps PCA for LDA which is clumsy and not efficient;. Principe and Xu [Pr97a, Pr97b] only discuss the two-class constrained LDA case; Diamantaras and Kung [Dia96] describe the problem as oriented PCA and present the rule only for the largest generalized 106 eigenvalue and its corresponding eigenvector. More recently, Chatteijee etal. [Cha97] forÂ¬ mulate LDA from the point of view of heteroassociation and provided an iterative solution with the proof of convergence for its on-line version. But the method does not use local computations and is still computationally complex. Hence a systematic, on-line local algoÂ¬ rithm for the generalized eigendecomposition in not presently available. In this chapter, an on-line local rule to adapt both the forward and lateral connections of a single layer netÂ¬ work is proposed which produces generalized eigenvalues and the corresponding eigenÂ¬ vectors in descending orders. The problem of the eigendecomposition and the generalized eigendecomposition will be formulated here in a different way which will lead to the proÂ¬ posed solutions. An information-theoretic problem formulation for the eigendecomposiÂ¬ tion and the generalized eigendecomposition will be given in the following first, and then the formulation based on the energy measures for eigendecomposition and the generalized eigendecomposition. 4.2.1 The Information-Theoretic Formulation for Eigendecomposition and Generalized Eigendecomposition As pointed out in Chapter 3, the first component of the PC A can be formulated as maximizing an entropy difference, and the first component of the generalized eigendeÂ¬ composition can also be formulated as maximizing an entropy difference. Here, more genÂ¬ eralized formulations will be given. Suppose there are one zero-mean Gaussian signal x(n) e Rm, n = 1 with â€ž N T 7^ covariance matrix S = E(xx ) = ^ x(n)x(n) (the trivial constant scalar l/N is n = 1 ignored here for convenience) and one zero-mean white Gaussian noise with covariance matrix as the identity matrix I. After the linear transform shown in Figure 3-1, the signal 107 . . . . . T and the noise will still be Gaussian signal and noise with covariance matrices as w Sw T and w w respectively. The entropies for the outputs when the input are the signal and the noise will be the following according to (2.42) in Chapter 2: H(wTx) = |log|v/Sw| + |log27i + | I,,. . (4-8) H(wTnoise) = -log|w7w| + -log27t + - If we are going to find a linear transform such that the information about the signal at the . T . output end, i.e. H(w x), is maximized while the information about the noise at the output . t . . . . end, i.e. H(w noise), is minimized at the same time, the entropy difference can be used as the maximization criterion: J = H(wTx) â€” H(wTnoise) = ^logy^-py (4.9) ^ I w w\ equivalently, J = (4.10) |w w\ This problem is not a easy one but has been studied before. Fortunately, the solution turns out to be the eigenvectors of S with the largest eigenvalues [Wil62, Dud73]: SwÂ¡ = A,iwi i = 1 k can be from 1 to m (4.11) So, the eigendecomposition can be regarded as finding a linear transform in the case of Gaussian signal and Gaussian noise such that the entropy difference in the output is maxiÂ¬ mized; i.e., the output information entropy of the signal is maximized while the output information entropy of the noise is minimized at the same time. One may note that the Renyiâ€™s entropy will lead to the same result. 108 Similarly, for the generalized eigendecomposition, suppose there are two zero-mean y Gaussian signals x,(Â«), x2(n), n = 1, with covariance matrices as S) = ] N N y y y = ^ Xj(Â«)jCj(n) , and S2 = Â£,[x2x2] = ^ x2(n)x2(n) respectively (the trivial con- n = l Â« = l stant scalar 1 /N is ignored for convenience). The outputs after the linear transform will y still be Gaussian signals with zero-mean and the covariance matrices as w w S2w respectively. So the output information entropy for these two signals will be H(wTxx) = ~ log| w^iS) w + â€œlog 2 ti + ~ H(wTx2) = hoglw^w +^log2 71 + ^ (4.12) If we are looking for a linear transform such that at the output, the information about the first signal is maximized while the information about the second signal is minimized, then we can use the entropy difference as the maximization criterion. In this case, the entropy difference will be (both Shannonâ€™s entropy and Renyiâ€™s entropy) J = Â¿log T w S,w T w (4.13) equivalently, J = T w S'jW y w S2w (4.14) Again, this is not a easy problem. Fortunately the solution turns out to be the generalÂ¬ ized eigenvectors with the largest generalized eigenvalues [WÃ162, Dud73] as SjW,- = XiS2wi, i = 1,..., k , k can be from 1 to m (4.15) So, in the case of Gaussian signals, the generalized eigendecomposition is the same as finding a linear transform such that the information about the first signal at the output end 109 is maximized while the information about the second signal at the output end is miniÂ¬ mized. 4,2.2 The Formulation of Eigendecomposition and Generalized Eigendecomposition Based on the Energy Measures Based on the energy criterion, the eigendecomposition can also be formulated as findÂ¬ ing linear projections wÂ¡ e Rm, i = 1, ..., k (k from 1 to m) (Figure 3-1) which maxiÂ¬ mize the criteria in (4.16), T W â– Sw â€¢ t J{wÂ¡) = â€”â€”â– subject to wâ€¢ Wj = 0, j = 1,1 (4.16) WÂ¡ w(- where wÂ° e Rm are the projections which maximize J(yvj) â– Obviously, when i = 1, there is no constraint for the maximization of (4.16). Using Lagrange Multipliers we can verify that the solutions = J(wÂ°)) of the optimization are eigenvectors and eigenvalues which satisfy SwÂ° = X(wÂ° where XÂ¡ are eigenvalues of S in descending order. From section 4.1, we know that the numerator in (4.16) is the power of the output signal of the projection wÂ¡ when the input is applied. The denominator can actually be regarded as the power of a white noise source applied to the same linear pro- T T jection in the absence of x(n) since w- w- = IwÂ¡ where / is the identity matrix, i.e. the covariance matrix of the noise. So, the eigendecomposition is actually the optimization of a signal-to-noise ratio (maximizing the signal power with respect to an alternate white noise source applied to the same linear projection), which is an interesting observation for signal processing applications. The constraints in (4.16) simply require the orthogonality of each pair of projections. Since Wj are eigenvectors of S, equivalent constraints can be written as 110 wJwÂ°jXj = wjswj = ^yÂ¿(n)yj(n) = O (4.17) n which means exactly the decorrelation between each pair of output signals. This derivaÂ¬ tion can be summarized by saying that an eigendecomposition finds a set of projections so that the outputs are most correlated with the input while the outputs themselves are decor- related with each other. Similarly, the criterion in (4.14) is equivalent to the following criteria [Wil62, Dud73, XuD98], Let X[(n) e Rm, n = 1,2,...,/ = 1,2 be two zero-mean ergodic stationary random . T signals. The auto-correlation matrix E{xl{n)xl{n) } can be estimated by SÂ¡ = V xÂ¡(n)xÂ¡(n) . The problem is to find vÂ¡ g R , i = 1,..., k (k from 1 to m) which maximize vTS v J{yt) = 1 ' subject to vjsvÂ° = 0, j = 1,1 (4.18) v\S2vi where vÂ° is the j-th optimal projection vector which maximizes 7(v;), S in the constraints can be either or S2 or S, + S2. Sj, S2 are assumed positive definite. Obviously, when i = l, there is no constraint for the maximization of (4.18). After is obtained, vÂ° (/ = 2, ..., k) will be obtained sequentially in a descending order of Using Lagrange Multipliers we can verify that the optimization solutions (A./ - J( v i)> 0) are generalized eigenvalues and eigenvectors satisfying S, vâ€ = 'kiS2vÂ° which can be used to justify the equivalence of three alternative choices of S. In fact, vjsjvj = vJs2vÂ°Xj and v(r(5, + S2)vÂ°j = vjs2vÂ°(1 + Xj), thus any of the three choices will result in the others and are equivalent. This is why the problem is called generalized eigendecomposition. Ill T Let yÂ¡i(n) = vÂ¡xl(n) denotes the i-th output when the input is xÂ¡(n), then T 2 T vÂ¿ SÂ¡vÂ¡ = ^ yÂ¡i(n) is the energy of the i-th output and v- SÂ¡Vj = V yÂ¡j(n)yjÂ¡(n) is the cross-correlation between i-th and j-th outputs when the input is xÂ¡(n). This suggests that the criÂ¬ teria in (4.18) are energy ratios of two signals after projection, where the constraints simply require the decorrelation between each pair of output signals. Therefore the problem is formulated as an optimal signal-to-signal ratio with decorrelation constraints. 4.3 The On-line Local Rule for Eigendecomposition 4.3.1 Oiaâ€™s Rule and the First Projection As mentioned above, there is no constraint for the optimization of the first projection for the eigendecomposition and the criterion is to let the output energy (or power) of the signal to be as large as possible while letting the energy (or power) of the output of the white noise to be as small as possible. By the result in 4.1, we know that the normal vector Sw, is the steepest ascent direction of the output energy when the input is the signal x(n ), while the normal vector â€”7w, = â€”w, is the steepest descent direction of the output energy when the input is the white noise. Thus, we can postulate that the adjustment of w, should be a combination of two normal vectors Sw, and â€”w,: Aw, oc Sw, â€” alw] = Sw, â€” aw, (4.19) where a is a positive scalar which balance the roles of two normal vectors. If we choose T T . . T a = /(w,) = w,Sw,/w,w,, then (4.19) is the gradient method. The choice a = w,Sw, will lead to the so-called Ojaâ€™s rule [Oja82]: T Aw, oc Sw, â€”(w,Sw,)w, = ^y,(Â«)[^(Â«)â€”Ti(Â«)w,] Batch Mode n Aw, ccy,(Â«)|>(w)-j/i(Â«)w,] (4.20) Sample-by-Sample Mode 112 Ojaâ€™s rule will make w, converge to wÂ°, the eigenvector with the largest eigenvalue of S, and also make ||w,|| converge to 1; i.e., ||wj| â€”> 1 [Oja82]. The convergence proof can be found in Oja [Oja82]. In the next section, we present a geometrical explanation to the above rule so that its convergence can be easily understood. xâ€”yw w yw Awy (c) llwll < 1 Figure 4-3. Geometrical Explanation to Ojaâ€™s Rule 4.3.2 Geometrical Explanation to Ojaâ€™s Rule When \\wA\ = 1, the balancing scalar in Ojaâ€™s rule is T T T a = w^w, = (wjÃS'w1)/(hâ€™1h'1). So, in this case, the updating term of the Ojaâ€™s rule Aw, oc Sw] â€” awx = ^y\{n)[x{n) â€” yx(n)w{\ is the same as the gradient direction w j, j, j, which is always perpendicular to w (because w, (SVj â€” = 0). This is also true even for the sample-by-sample case where Aw Â°cy[xâ€” yw] (all the indi- ces are ignored for convenience). When ||w|| = 1, obviously w {xâ€”yw) = 0; i.e., the direction of the updating vector xâ€”yw is perpendicular to w as shown in Figure 4-3 (a). So in general, the updating vector x â€”yw in Ojaâ€™s rule can be decomposed into two comÂ¬ ponents, one is the gradient component AwÂ¿ and the other Aww is along the direction of the vector w (as shown in Figure 4-3 (b) and (c)): Aw oc AwÂ¿ + Aww (4.21) 113 The gradient component Awill force w towards the right direction, i.e. the eigenÂ¬ vector direction, while the vector component Aww adjusts the length of w. As shown in Figure 4-3 (b) and (c), when ||w|| > 1, it tends to decrease || w||, when ||w|| < 1, it tends to increase ||w||. So, it serves as a negative feedback control for ||w|| and the equilibrium point is ||w|| = 1. Therefore, even without the explicit normalization of the norm of w, Ojaâ€™s rule will still force ||w|| to be 1. Unfortunately, when Ojaâ€™s rule is used for the minor component (the eigenvector with smallest eigenvalue, where the criteria in (4.16) is to be minimized), the updating of w becomes anti-Hebbian type. In this case, Aww will serve as a positive feedback control for ||w|| and Ojaâ€™s rule becomes unstable. One simple method to stablize Ojaâ€™s rule for minor components is to perform an explicit normalizaÂ¬ tion for the norm of w so that Ojaâ€™s rule is exactly equivalent to gradient descent method. In spite of the normalization w = w/||w||, this method is compatible to the other methods in computational complexity because all the methods needs to calculate the value of wTw. 4.3.3 Sangerâ€™s Rule and the Other Projections For the other projections, the difference is the constraint in (4.16). For the i-th projecÂ¬ tion, we can project the normal vector SwÂ¡ to the subspace orthogonal to all the previous eigenvectors wÂ° to meet the constraint and apply the Ojaâ€™s rule in the subspace to find the optimal signal-to-noise ratio in that subspace. This is called the deflation method. By using the concept of the deflation method, Sanger [San89] proposed the rule in (4.22) which will degenerate to the Ojaâ€™s rule when i = 1: A wÂ¡ oc wjwj SwÂ¡ â€” (wi Swi)wÂ¡ j= 1 V / (4.22) 114 iâ€” 1 Y where Iâ€” WjWj is the projection transform to the subspace perpendicular to all the j= 1 previous Wj, j = 11. According to the Ojaâ€™s rule, wl will converge to the first eigenvector with the largest corresponding eigenvalue and MwJ â€”Â» 1. Based on this wx and the rule in (4.22), w2 will converge to the second eigenvector with the second largest eigenvalues and ||w2|| -Â» 1. Similar situation will happen for the rest of wÂ¿. Therefore, Sangerâ€™s rule will sequentially result in the eigenvectors of S in the descending order of their corresponding eigenvalues. The corresponding batch mode adaptation and sample-by-sample adaptation rules for Sangerâ€™s method are (-1 Aw(-oc Vyl(Â«){x(n) â€”E yAn)wâ€¢ â€”^f(n)wf} Batch Mode i â€” 1 (4.23) A\Vj ccyi(n){x(n) â€” E ^ yj{n)\Vj â€” _yi(/z)wl-} Sample-by-Sample Mode Sangerâ€™s rule is not local because the updating of w( involves all the previous projecÂ¬ tions Wj and their outputs yj. In a biological neural network, the adaptation of the synÂ¬ apses should be local. In addition, the locality will make the VLSI implement of an algorithm much easier. We next will introduce the local implementation of Sangerâ€™s rule. 4.3.4 APEX Model: The Local Implementation of Sangerâ€™s Rule As stated above, the purpose of eigendecomposition is to find the projections whose outputs are most correlated with the input signals and decorrelated with each other. StartÂ¬ ing from this point and considering the results in 4.1, the structure in Figure 4-4 is proÂ¬ posed. 115 Figure 4-4. Linear Projections with Lateral Inhibitions In Figure 4-4, c^ are lateral inhibition connections expected to decorrelate the output signals. The input-output relation for the i-th projection is z-i yÂ¡ = wix+ Xw = 7=1 i -1 /-i wl+yZcjiwj 7=1 \T (4.24) So, the overall i-th projection is vÂ¿ = w(- + CjtWj and the input-output relation can be r J=l yÂ¡ = VjX. For the simplicity of exposition, we will just consider the second projection (For the first projection Wj, we already have Ojaâ€™s rule, suppose it has already converged to the solutionâ€”the eigenvector with the largest eigenvalue of S and fixed). The second projection will represent all the other projections (the rule for all the rest can be similarly obtained). For the structure in Figure 4-4, the overall second projection is v2 = w2 + C\2W\ â– The problem can be restated as finding the projection v2 such that the following criterion is maximized. 116 V'ySV'y rr J(y2) = â€”-â€”, subject to v2Swx = 0 (4.25) v2 v2 where w, is the solution for the first projection, i.e. the eigenvector with the largest eigenÂ¬ value of S and can be assumed fixed during the adaptation of the second projection. The overall change of v2 can result from the variation of both forward projection w2 and the lateral inhibition connection c12; i.e., we have Av2 = Aw2 + (Ac]2)w1 (4.26) To let the problem be further tractable, we will consider how the overall projection v2 should change if we fix c12, and how it should change if we fix w2. By the basic principle in 4.1 (that is, using the Hebbian rule to increase an output energy and using the anti-Heb- bian rule to decrease an output energy), if c12 is fixed, the overall projection should T evolve according to Ojaâ€™s rule so as to increase the energy v25V2 and at the same time decrease the v2 v2: Av2 = iSV2 â€” (v^iSV2)v2 (4.27) However, v2 is a virtual projection and relies on both c12 and w2. In this case when cl2 is fixed, Av2 = Aw2 . So, (4.27) can be implemented by (4.28): A w2 = Sv2 â€” (v2Sv2)v2 (4.28) When w2 is fixed, the adaptation of c12 should decorrelate the two signals yx and y2. According to the conclusion in 4.1 (i.e. using the anti-Hebbian rule to decrease the cross correlation between two outputs as in Figure 4-4), the adaptation of c]2 should be J Ac12 = -^Ti(Â«)T2(Â«) = ~w,Sv2 n (4.29) 117 So, by the principle in 4.1, we can postulate the adaptation rule as (4.28) and (4.29) together: ( \ f T 2 Aw2 = Sv2-(v2Sv2)v2 = Â£y2(Â«)*(Â«)- Â£y2(Â«) V n w2- I>2<Â»" V n C\2W\ (4.30) T Ac12 = â€œw15v2 = â€”(w>y2(") Surprisingly, we may find out that this rule is actually the same as the Sangerâ€™s rule if we write down the adaptation for the overall projection as (4.31) and compare it with (4.22). T T Av2 = Aw2 + Wj(Ac]2) = Sv2 â€” (v2 (4.31) However, from (4.30) we can see that the adaptation of w2 is not local either; i.e., Aw2 depends not only on its input, output and w2 itself, but also onc,2 and w, which is contained in the last term of Aw2 in (4.30). The last term of Aw2 in (4.30) means that the part of the adaptation of w2 should be along the direction of . And this can actually be implemented by adapting the lateral inhibition connection c12; i.e., the last term of Aw2 in (4.30) can be put in Ac]2 instead of Aw2. From (4.30), we have Av2 = Aw2 + w,(Ac12) = X^2(")*(w)- ^2(") w2~ V n r \ n ( c12wl~ Y/\(n)y2(n) \ n W, (4.32) Yjy2{n){x{n)-y{n)w2)- + c12y2(Â«)) Â» vn J w, To keep the adaptation of Av2 unchanged, we can write new adaptation rules for both w2 and c 12 as 118 Aw2 = n Ac12 = -^2(")0;l(") + C123;2(Â«)) n (4.33) where the adaptations of both w2 and cl2 are â€œlocal.â€ (4.33) is actually the adaptation rule of the APEX model [Kun94], and all the above gives an intuitive explanation to the APEX model and also shows that the APEX model is nothing but a local implementation of Sangerâ€™s rule. Generally, the sample-by-sample adaptation for the APEX model is as follows: J Awi ocy.(n){x(n)-y/(Â«H} [ACj; cc-yi(n){yj(n) +yÂ¿(n)cj APEX Adaptation (4.34) 4.4 An Iterative Method for Generalized Eigendecomposition Chatterjee etal. [Cha97] formulate the LDA as an heteroassociation problem and proÂ¬ pose an iterative method for LDA. Since the LDA is a special case of the generalized eigendecomposition, the iterative method can be further generalized for the generalized eigendecomposition. Using the same notation as in 4.2, the iterative method for the generalized eigendeÂ¬ composition can be described as Avi = Sxvi-(vJs]vi)S2vi-S2Y,vJvJslvi , i = \,...k (4.35) j= l This method assumes that the covariance matrices have already been calculated and then the generalized eigenvectors can be iteratively obtained by (4.35). There is another 119 alternative method which uses some optimal relation in the problem formulation but results in a more complex rule [Cha97]: iâ€” 1 AvÂ¿ = S{vi-{vTiSxvi)S2vi- S2 Â£ vjvjS\vÂ¡ j= 1 iâ€” 1 + Slvi~(vTiS2vi)Slvi-S] Â£ vjvjs2vi (4.36) j= i For a two zero-mean signal xx(Â«) and x2(n), their covariance matrices can be estiÂ¬ mated on-line by using Sx(n) = (4.37) where y(n) is a scalar gain sequence [Cha97]. Based on (4.37), an adaptive on-line algoÂ¬ rithm for the generalized eigendecomposition can be the same as (4.35) or (4.36) except that all the terms there are estimated on-line; i.e., Sx = Sx(n) S2 = S2(n) vÂ¡ = Vj(n) Vj = Vj(n) (4.38) The convergence of this adaptive on-line algorithm can be shown by the stochastic approximation theory [Cha97, Dia96], of which the major point is that a stochastic algoÂ¬ rithm will converge to the solution of its corresponding deterministic ordinary differential equation (ODE) with probability 1 under certain conditions [Dia96, Cha97]. Formally, we have a stochastic recursive algorithm: e*+I = 9/t+ fikf(xkâ€™ 0*) *= 0,1,2,... (4.39) where {xkeRm} is a sequence of random vectors, {fik} is a sequence of step-size p parameters, / is a continuous and bounded function and Qk e R is a sequence of approx- 120 imations of a desired parameter vector 0Â°. If the following assumptions are satisfied for all fixed 0. (E{ } is the expectation operator), then the corresponding deterministic ODE for (4.39) is ^ = f (0), and 0Â¿ will converge to the solution of this ODE 0Â° with probÂ¬ ability 1 as k approaches oo [Dia96]. A-l. The step-size sequence satisfies â€”3â–º 0 and ^ (3Â¿ = oo. k= o p A-2. /( , ) is a bounded and measurable R -valued function. A-3. For any fixed x, the function f(x, ) is continuous and bounded (uniformly in x). A-4. There is a function / (0) = lim k co 00 f \ 00 IM*,. 8) / IP.- II u = lim E{f(xk, 0)} oo 4.5 An On-line Local Rule for Generalized Eigendecomposition As stated in 4.2.2, the generalized eigendecomposition problem can be formulated as the problem of the optimal signal-to-signal ratio with decorrelation constraints. Here, the network structure of the APEX model will be used for this more complicated problem. As shown in Figure 4-5, wÂ¡ e Rm are forward linear projecting vectors, cx - are lateral inhibi- tive connections used to force decorrelation among the output signals, but the input is switched between the two zero-mean signal jcj(Â«) and x2(n). at each time instant n. The overall projection is the combination of two types of connections, e.g., , T v2 = C]2W\ + >^2â€™ etc- The i-th output for the input xÂ¡(n) will be yÂ¡i(n) = vix[(n), etc. The proposed on-line local rule for the network in Figure 4-5 for the generalized eigendeÂ¬ composition will be discussed in the following sections. 121 Figure 4-5. Linear Projections with Lateral Inhibitions and Two Inputs 4.5.1 The Proposed Learning Rule for the First Projection In this section, first, we will discuss the batch mode rule for the adaptation for the first projection, then the stability analysis for the batch mode rule, finally the corresponding adaptive on-line rule for the first projection. A. The Batch Mode Adaptation Rule Since there is no constraint for the optimization of the first projection v,, its output doesnâ€™t receive any lateral inhibition, thus v{ = w, as shown in Figure 4-5. The normal vector for the power field WjS']w, is f/,(W]) = SjW] = V yn(n)xx{n), and the nor- T mal vector for the power field w1S2w1 is H2(wx) = = V ,y12(n)x2(H). To T T increase W] 5, w, and decrease wl S2W\ at the same time, the adaptation should be 122 Awj = Hl(w]) â€” H2(w{)f(w]) , wj = Wj+riAvv, (4.40) where r| is the learning step size, the Hebbian term //,(wÂ¡) will â€œenhanceâ€ the output signal yn(n), the anti-Hebbian term â€”H2(wÂ¡) will â€œattenuateâ€ the output signal yn(n), T T the scalar/(w,) will play the balancing role. If f{wx) = (w15'1w1)/(w15'2W1) is chosen, T then (4.40) is the gradient method. If /(w,) = w, w,, then (4.40) becomes the method used in Diamantaras and Kung [Dia96]. Similar to Ojaâ€™s rule, the balancing scalar /(w,) T can be simplified as/(wj) = w, Pw] ( P = *Sj, or S2 or (Sj + S' 2) ) because in this case T 2 the scalar can be simplified as the output energy, e.g. ^ yu (n) . In the T sequel, the case /(w,) = w, S, wl will be discussed. Ml2 = l/Kin : Ml2 = j ^max'- Maximum Eigenvalue of S2 1 A,win: Minimum Eigenvalue of S2 Figure 4-6. The Regions Related to the Variation of the Norm ||w|| 123 B. The Stability Analysis of the Batch Mode Rule The stationary points of the adaptation process (4.40) can be obtained by solving the equation Hx{M>]) â€” H2{wx)f{wx) = (S{â€”J[w])S2)wl - 0. Obviously, = 0 and all the generalized eigenvectors vÂ° which satisfy S;vÂ° = f(vÂ°)S2vÂ° are stationary points. Notice that in general the length of vÂ° should be specified by f(vÂ°) = (kÂ¡ are the genÂ¬ eralized eigenvalues corresponding to . So, vÂ° are further denoted by vÂ°Xi. In the case of T o^o o^o /(wj) = WjSjWj , we have (v^) and (v^-) S2v^Â¡ = 1. We will show that T when /(w,) = wxPwx, there is only one stable stationary point, that is the solution w, = vÂ£j. All the rest are unstable stationary points. T Letâ€™s look at the case when /(w,) = , the rest will be similar. First, it can be shown that w, = 0 is not stable. To show this, we can calculate the first order approxima- tion for the variation of ||wi||25 which is A(||w,||2) = 2w](Aw1) = 2r)(w1iS']w1 â€” = 2qw]5,]w1(l â€” WjS^w,). Since wfSjWj > 0, the sign of the variation totally depends on 1 - w\S2wx. As shown in Figure 4-6, when wÂ¡ is located within the region D2; i.e., wxS2W\ < 1, A(||wj||2) is positive and ||w,|| will increase, while w, is located outside the region D2; i.e., wxS2wx > 1, A(||w,||2) is negative and ||wj|| will decrease. So, the stable stationary points should be located in the hyper-ellipÂ¬ soid w\S2w] = 1. Therefore, w, = 0 can not be a stable stationary point. This can also be shown by the Lyapunov local asymptotic stability analysis [Kha92]. The behavior of the algorithm described by (4.40) can be characterized by the following differential equaÂ¬ tion: dw j t = (p^) = FfjÃwj)â€”H2(wx)f(w\) = SjWj â€” (w)5,,hâ€™1)52w1 dt (4.41) 124 Obviously, this is a nonlinear dynamic system. The instability at =0 can be deterÂ¬ mined by the position of the eigenvalues of the linearization matrix A : A = dw w, = 0 = Sl â€” 2S2w1WpS'j â€” Wj )iS,21 w, = o *^i (4.42) Since 5, is positive definite, all its eigenvalues will be positive. So, the dynamic proÂ¬ cess dw^ ~dt Awj can not be stable at wt = 0; i.e., (4.40) is not stable at wx = 0. Similarly, w, = vXi, i = 2,m can be shown unstable too. Actually, in these cases, the corresponding linearization matrix A will be A = S,1-2S>1w[si-(wf.S1w1)S2| â€˜'i = v\Â¡ i = 2, ..., m (4.43) = 5,-2 V2vL(vL) 52-V2 By using (vÂ°u)TS2vÂ°Xi = 0 (i = 2,..., m), (vJj/SjvJj = A., and (vÂ°u) S2vÂ°u = l,we have (viiMvL = Xx-Xt>0 (4.44) The inequality in (4.44) holds because X,j is the largest generalized eigenvalue. Similarly, by using (vÂ°Xi) SxvÂ°Xi = and (vÂ°Xi) S2vÂ°Xi = 1 , we have (yli)rAv Â°Xi = -2Xi<0 (4.45) So, the linearization matrix A at w, = vÂ°Xi, i = 2, ..., m, are not definite, and thus they are all saddle points and unstable. The local stability of w, = vxx can be shown by the negativeness of the linearization matrix A at nq = : A = 51-252w1w[s,1-(w[51w1)52|Wi = voi - Sx â€” 2XxS2vxx(vxx) S2 â€” XxS2 (4.46) 125 Actually, it is not difficult to verify (4.47). (vÂ£j) Avâ€” 2A, â€” A,, = â€”2A-, < 0 (vLMvL = <0, i = 2, m (4.47) (vD^.= 0, i*j Since all the generalized eigenvectors vÂ°u, i = 1, m are linearly independent with each other and they span the whole space, any non-zero vector x e Rm can be the linear combination of all the generalized eigenvectors vÂ£- with at least one coefficient being m o T non-zero; i.e., x = ^ aiv).j â€¢ Thus by (4.47), we have the quadratic form jc Ax as fol- Â¡ = 1 lows: T o s\ T ~ x Ax = 2fl/(vw) AvXi<0 (4-48) /= l So, all the eigenvalues of the linearization matrix A are negative and thus w] = vÂ°xx is T stable. When f(wx) = wxPwx, p = S2 or (5, + S2), the stability analysis can be simiÂ¬ larly obtained. As shown previously, both the P and the S in the constrain vJSvÂ° = 0 have three choices. For the simplicity of exposition, only P = S = Sx will be used in the rest of this chapter. It should be noticed that when wx converges to v?,, the scalar value f(w i) = f(vÂ°A) = A,j. So,/(wj) can be the estimate of the largest eigenvalue. C. The Local On-Line Adaptive Rule T When/(w,) = WjSjW, is used, (4.40) will be the same as (4.35), the adaptation rule in Chatterjee etal. [Cha97]. However, here the calculations of the Hebbian term Hx(wx), the anti-Hebbian term â€”H2(wx) and the balancing scalar f(wx) are all local, avoiding the 126 direct matrix multiplications in (4.35) and resulting in a drastic reduction in computation. When the exponential window is used to estimate each term in (4.40), we have w,(Â«+l) = Wj(n) + r)(Â«)Awj(Â«) Aw,(n) = Hx(wx,n)-H2(wx,n)J{wx,n) H\(w\,n) = tf1(w1,n-l) + a|>1I(ii)x1(/i)-.flr1(w1,H-l)] (4.49) H2(w\, n) = H2(wx, n â€” 1) + a[yX2(n)x2(n) â€” H2(wx, /? â€” 1)] /(w,,n) = A^\,n-\) + a.\yn{nf-f{wx,n-\)] where the step size r|(w) should decrease with the time index n. The number of multipliÂ¬ cations required by (4.49) is 8m + 2 (m is the dimension of the input signals), while the number of multiplications required by the method (4.35) of Chatteijee etal. [Cha97] is 6m.2 + 3m. The convergence of the stochastic algorithm in (4.49) can be shown by the stochastic approximation theory in the same way as Chatteijee etal. [Cha97]. The simulation results also show the convergence when the instantaneous values for the Hebbian and the anti- Hebbianterm Hx(wx) and H2(wx) are used; i.e., w,(n+l) = Wj(n) + ri(n)Aw1(n) Aw,(n) = Hl(w],n)-H2(wx,n)f{w],n) Hx(wvn) = yu(n)xx(n) (4.50) H2(wx,n) = yx2(n)x2(n) /0â€žÂ«) = Awi,n-1) + a[yn(n)2-fiwx,n-l)] Notice that in both (4.49) and (4.50), when convergence is achieved, the balancing T scalar f(wx, n) will approach its batch mode version /(w,) = w, 5j w,. As shown in the above, the batch mode scalar /(w,) will approach the largest generalized eigenvalue A., when Wj approaches . So, we can conclude that f(wx, n) â€”> Xx and all the quantities in both (4.49) and (4.50) have been fully utilized. 127 4.5.2 The Proposed Learning Rules for the Other Connections In this section, the adaptation rule for both lateral connections and the feedforward connections of the other projections are discussed. For simplicity, only v2 = c12w, + w2 is considered. The other cases are similar. Suppose w, already has reached its final posiÂ¬ tion; i.e., W| = and (v^) 5jv^] = A,j and (v^j) = 1. Again, we will first discuss the batch mode rule for both the feedforward connections w2 and the lateral inhib- itive connection c12, then its stability analysis and finally the corresponding local on-line adaptive rule. A. The Batch Mode Adaptation Rule Similar to 4.3.4, the adaptation rule can be described as two parts: the decorrelation and the optimal signal-to-signal ratio search. The decorrelation between the output signal y n (n) and y2l (n) can be achieved by the anti-Hebbian learning of the inhibitive connecÂ¬ tion c12, and the optimal signal-to-signal ratio search can be achieved by the similar rule as the previous section 4.5.1 for the feedforward connection w2. So, we have r Ac12 = C(wâ€žv2) f C,2 = c12-T!cAc12 < \ (4.51) [ Aw2 = i/1(v2)-//2(v2)/(v2) [w2 = vv2 + tiwA>v2 T where C(W|,v2) = WjS'jVj = V y\\{,n)y2X(n) is the cross-correlation between two â– /i output signals yn(Â«) and y2\(n), H\(v2) = Â«S'iV2 = V y2i(7j)jcj(n) is the Hebbian term which will â€œenhanceâ€ the output signal y21(Â«),//2(v2) = 52v2 = ^ y22(n)x2(n) is the anti-Hebbian term which will â€œattenuateâ€ the output signal y22(n), T 2 /(v2) = v25| v2 = V y2j(n) is the scalar playing a balancing role between the Heb- bian term H^{y2) and the anti-Hebbian term H2{v2), r)c is the step size for decorrelation process, r|w is the step size for feedforward adaptation. 128 First, letâ€™s consider the case where w2 is fixed. Then, as pointed out in 4.1, the lateral inhibition connection c]2 in (4.51) will decorrelate the output signals yu(n) and y21(Â«). In fact, we have the variation for the cross-correlation AC = wfsj(Av2) = w[5'1(-ric(Ac12)vv1) = -r)c(w[iS'1w1)C, and C(Â«+l) = C(n) + AC = (1 â€” ri(;,w15'1w1)C(Â«). If r|c is small enough such that 1 â€” r|w1S'1wI| < 1, then lim C(n) = 0. When the decorrelation is achieved; i.e., C = 0, there will be no adjust- n oo ment in cl2, namely c12 will remain the same. Second, letâ€™s consider the case with c]2 fixed. Then we have Av2 = Aw2 = H\ (v2) â€” H2(v2)f(v2). By the conclusion in the previous section 4.5.1, we know that Av2 is in the direction to increase the signal-to-signal ratio J(y2). Combining these two points, intuitively, we can say that as long as the step size r|c for the decorrelation process is large enough relative to the step size r|w for the feedforward process such that the decorrelation process is faster than the feedforward process, then the optimal signal-to-signal ratio search will basically take place within the subspace that is T o S] orthogonal to the first eigenvector; i.e., v2*Sj v^, = 0, and the whole process will conÂ¬ verge to the solution; i.e., v2 â€”Â» vÂ£2 and /(v2) â€”> X2. However, we should notice that v2 â€”> vÂ°k2 does not necessarily mean cn â€”> 0, which is the case for the APEX model. Actually c12 can take any value, but the overall projection will converge. B. The Stability Analysis of the Batch Mode Rule The stationary points of (4.51) can be obtained by solving both Ac12 = 0 and Aw2 = 0. Obviously, v2 = 0 and v2 = (i = 2, ..., m) are the stationary points for the dynamic process of (4.51). Based on the results in the previous section 4.5.1, it is not diffiÂ¬ cult to show that v2 = 0 and v2 = vÂ£f (/ = 3, ..., m) are all unstable. Actually, if the initial 129 State of v2 is in the subspace Sj orthogonal to ; i.e., v2Ai vIi = 0 and V2S2VU = (v2S\vÂ°X\)/^\ = Â°. then Ac12 will be 0 and the adjustment of v2 will be Av2 = Aw2 = Siv2 â€” (v2S^v2)S2v2 which is also Sj orthogonal to v^j; i.e., (Av2)rv^j = 0. So, v2 + r|Av2 will also be Sj orthogonal to v?,. This means that once v2 is in the subspace which satisfies the decorrelation constraint, it will remain in this subspace by the . T rule in (4.51). In this case, the adaptation of (4.51) will become Av2 = 5']v2 â€”(v2S1v2)5'2v2 in the subspace Sj orthogonal to vÂ°y {, which is exactly the same as the case of the first proÂ¬ jection except that the search is within the subspace orthogonal to v^,. According to the result in 4.5.1, we know that the stationary points v2 = 0 and v2 = v^- (i = 3,..., m) are all unstable even in the subspace. To show that v2 = vÂ£2 is stable, we can study the overall process Av2 = CÂ»(v2) = Aw2 + â€”w,(Ac12) . Its corresponding differential equation is jU>(v2) = 51v2-(v[51v2)52v2-^Â£w1wf,S1v2 (4.52) where w, = vÂ£, will remain unchanged after the convergence of the first projection. The linearization matrix A of (4.52) at w, = and v2 = v^2 is Wl = Vxi, V2 = vl2 (4.53) A = _d_ dv- (ft(v2) = S] â€” â– ^â– wxw\slâ€”2S2v2v2Sl â€” (v25jV2)52| Tl W - 51!- vxl(vxl) 5, â€” 2X,252vu(vX2) S2 â€” â€˜k2S2 'I W As a comparison, the corresponding linearization matrix B of (4.35) of the method in Chatterjee etal. [Cha97] can be similarly obtained: B = â€” A2vX.l(vXi) â– ~2X25'2Vji2(v^2) S2 â€” k2S2 (4.54) 130 500 -1000 -1500 -.v -'i-.- >fy â– -s- \. .?/-.u. ..â€¢ * â€¢ . , '.--'s-v-i:.-*'.**.-*'. \ .. 5 - -2000 -2500 â€™ -3000 - â€¢ -3500 - â€¢. â€¢ â€¢ : -4000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Figure 4-7. The distribution of the real parts of the eigenvalues for A in 1000 trials for sigÂ¬ nals with dimension 10 Notice that A is not symmetric. So, the eigenvalues of A may be complex. To show the stability of v2 = vÂ£2 for (4.51), we need to show the negativeness of all the real parts of the eigenvalues of the matrix A . Although there is no rigorous proof that the real pars of all the eigenvalues of A are negative (in this case, it is difficult to show the negativeÂ¬ ness because A is not symmetric), the Monte Carlo trials show the negativeness as long as the step size is large enough relative to the step size r\w. Figure 4-7 is the results of 1000 trials for randomly generated signals with dimension 10 and the condition that T|c = T[w. As can be seen from the figure, all the real parts of the eigenvalues of A are negative. To compare the proposed method with the one in Chatterjee etal. [Cha97], the eigenvalues of the linearization matrix for the method in Chatteijee etal. [Cha97] (i.e. B in (4.54)) are also calculated. The mean value of the real parts for 10 eigenvalues of A and B 131 are calculated for each trial. The mean values are displayed in Figure 4-8 from which we can see that most of the mean values for A is even less than the corresponding mean valÂ¬ ues for B, which somehow means that the most real parts of the eigenvalues for A are even less than those of B. This indicates that the dynamic process characterized by dv2/dt = Av2 will converge faster than the dynamic process characterized by dv2/dt = Bv2 . This may explain the observations that the proposed method usually have a faster convergence speed than the method in Chatterjee etal. [Cha97] in our simulations. Figure 4-9 further shows the mean difference for A and B, i.e. mean(A) - mean(B), from which we can see all the values are negative, which means that the means of the real parts of the eigenvalues for A are less than those of B. 500 0 -500 -1000 -1500 -2000 -2500 -3000 -3500 -4000 in proposed method of (4.50) -500 -1000 -1500 -2000 -2500 200 400 600 800 1000 B in the method in (4.35) [Cha97] Figure 4-8. The Comparison of the mean of the real parts of the eigenvalues of A and B in the same trials as in Figure 4-7. 132 Figure 4-9. The difference of the mean real parts of the eigenvalues of A and B C. The Local On-Line Adaptive Rule To get an adaptive on-line algorithm, we can again use the exponential window to estiÂ¬ mate the terms in (4.51). Thus, we have Ac12(Â«) = C(wj, v2, n) A w2(n) = H{(v2,n)-H2(v2,n)f(v2,n) H\(v2, n) = Hx{y2,n-\) + a\y2l{n)xx(n)-Hx{y2, zi-1)] H2{v2,n) = H2(v2,nâ€”\) + a\y22{n)x2(n)â€”H2(y2,n â€” \)~\ (455) A*2, n) = /(v2, n- 1) + ct[y21(Â«)2 -/(v2, n - 1)] C(w,,v2, Â«) = C(wl5 v2,n-l) + a|>11(n)^21(n)-C(w],v2,n-l)] where a is a scalar between 0 and 1. The convergence of (4.55) can also be related to the solution to its corresponding deterministic ordinary differential equation characterized by (4.51) by the stochastic approximation theory [Dia96, Cha97]. 133 The number of the multiplications required by the proposed method for the first two . . . . . 2 projections at each time instant n is 16m + 9 versus 8m +8m required by the method in (4.35) of Chatteijee etal. [Cha97]. Simulation results also show the convergence when instantaneous values are used for Hx(v2, n), H2(v2, n) and C(w,, v2, n); i.e., Ac12(w) = C{wx,v2,n) Aw2(w) = H\{v2, n) â€” H2(v2, n)f(y2, n) Hx(v2,n) = y2x(n)xx(n) Hi(v2â€™ Â«) = y22(n)xi(n) (4'56) f(v2, n) = /(v2, n- 1) + a[y21(n)2â€”/(v2, n- 1)] C(yv\, v2, n) = yn(n)y2X(n) 4.6 Simulations Two 3-dimensional zero-mean colored Gaussian signals are generated with 500 samÂ¬ ples each. Table 1 compares the results of the numerical method with those of the proÂ¬ posed adaptive methods after 15000 on-line iterations. In Experiment 1, all the terms in (3) and (4) are estimated on-line by an exponential window with a = 0.003, but in ExperÂ¬ iment 2, all Hx, H2 and C use instantaneous values while f(wx) and f\y2) remain the same. As an example, Figure 2 (a) shows the adaptation process of Experiment 2. Figure 2 (b) compares the convergence speed between the proposed method and the method in Chatteijee etal. [Cha97] for the adaptation of v2 in batch mode when w, = . There are 100 trials (each with the same initial condition). The vertical axis is the minimum number of iterations for convergence (with the best step size obtained by exhaustive search). ConÂ¬ vergence is claimed when the difference between /(v2) and J(v2) is less than 0.01 for 10 consecutive iterations. Figure 2 (c) and (d) respectively show a typical evolution of J(v2) 134 and C in one of the 100 trials where the eigenvalues of the linearization matrices are â€” 28.3 + 6.7/, â€”28.3 â€” 6.7/, â€”1.5 for A of the proposed method and â€”21.5, â€”1.7, â€”0.4 for B of the method in Chatterjee etal. [Cha97]. Figure 4-11 shows the process of the batch mode rule in (4.51). 4,7 Conclusion and Discussion In this chapter, the relationship between the Hebbian rule and the energy of the output of a linear transform and the relationship between the anti-Hebbian rule and the cross corÂ¬ relation of two outputs connected by a lateral inhibitive connection are discussed. We can see that an energy quantity is based on the relative position of each sample to the mean of all samples. Thus, each sample can be treated independently and an on-line adaptation rule is relatively easy to derive while the information potential and the cross information potential are based on the relative position of each pair of data samples and an on-line adaptation rule for the information potential or the cross information potential is relatively difficult to obtain. The information-theoretic formulation and the formulation based on energy quantities for the eigendecomposition and the generalized eigendecomposition are introduced. The energy based formulation can be regarded as a special case of the information-theoretic formulation when data are Gaussian distributed. Based on the energy formulation for the eigendecomposition and the relationship between the energy criteria and the Hebbian and the anti-Hebbian rules, we can underÂ¬ stand Ojaâ€™s rule, Sangerâ€™s rule and the APEX model in an intuitive and effective way. Starting from such an understanding, we propose a similar structure as the APEX model and an on-line local adaptive algorithm for the generalized eigendecomposition. The sta- 135 bility analysis of the proposed algorithm is given and the simulation shows the validity and the efficiency of the proposed algorithm. Based on the information-theoretic formulation, we can generalize the concept of the eigendecomposition and the generalized eigendecomposition by using the entropy differÂ¬ ence in 4.2.1. For non-Gaussian data and nonlinear mapping, the information potential can be used to implement the entropy difference to search for an optimal mapping such that the output of the mapping will convey the most information about the first signal xx(n) while it will contain the least information about the second signal x2(n) at the same time. This can be regarded as a special case of the â€œinformation filtering.â€ Table 4-1. COMPARISON OF RESULTS. J(v^,) and J(vÂ°2) are the generalized eigenvalues. vÂ°kX and vÂ°xl are corresponding normalized eigenvectors Numerical Method Experiment 1 Experiment 2 '(vil) 45.9296570 45.9295867 45.9296253 vJi(l) -0.1546873 -0.1550365 0.1549409 4(2) -0.8400303 -0.8396349 0.8397703 4(3) 0.5200200 0.5205544 -0.5203643 AvÂ«) 6.1679926 6.1678943 6.1679234 4(0 -0.2162832 -0.2147684 0.2175495 4(2) 0.9668235 0.9672048 -0.9664919 4(3) 0.1359184 0.1356071 -0.1362553 136 Adaptation Process Â£ 300 o 2 250 o 200 Comparison of Convergence on 100 Trials the method in Chatterjee etal. [Cha^97] S .1 50 G 1 :: 'l t1 i 1 Ij ll il > t 'i,\! . it! h the proposed method 20 40 60 80 100 (b) trials 12 10 8 6 4 2 Comparison of the Evolution of J(v2) Comparison of Cross-Correlation 10 C of the method in J{v2) of the method in f'â€™ "-s Chatterjee etal. [Cha97] Chatterjee etal. [Cha97] 5 i s â€”. v i u \ I J(v2)of the proposed method -5 C of the proposed method -10 100 200 300 (c) iterations 100 200 300 (d) iterations Figure 4-10. (a) Evolution of J(vj) and J(v2) in Experiment 2. (b) Comparison of ConÂ¬ vergence Speed in terms of the minimum number of iterations, (c) Typical adaptation curve of J(v2) of two methods when initial condition is the same and the best step size is used, (d) Typical adaptation curve of C in the same trial as (c). In (b), (c) and (d), the solid lines represent the proposed method while the dashed lines represent the method in Chat- teijee etal. [Cha97]. 137 Figure 4-11. The Evolution Process of the Batch Mode Rule CHAPTER 5 APPLICATIONS 5.1 Aspect Angle Estimation for SAR Imagery 5.1.1 Problem Description The relative direction of a vehicle with respect to the radar sensor in SAR (synthetic aperture radar) imagery is normally called the aspect angle of the observation, which is an important piece of information for vehicle recognition. Figure 5-1 shows typical SAR images of a tank or military personnel carrier with different aspect angles. 20 40 60 â€™ *4 *1 80 .A 100 120 20 40 60 80 100 120 Figure 5-1. SAR Images of a Tank with Different Aspect Angles 138 139 We are given some training data (both SAR images and the corresponding true aspect angles). The problem is to estimate the aspect angle of the vehicle in a testing SAR image based on the information given in the training data. This is a very typical problem of â€œlearning from examples.â€ As can be seen from Figure 5-1, the poor resolution of SAR combined with speckle and the variability of scattering centers makes the determination of the aspect angle of a vehicle from its SAR image a nontrivial problem. All the data in the experiments are from the MSTAR public release database [Ved97], 5.1.2 Problem Formulation Letâ€™s use X to denote a SAR image. In the MSTAR database [Ved97], a target chip is usually 128-by-128. So, X can usually be regarded as a vector with dimension 128 x 128 = 16384. Or, we can just use the center region of 80 x 80 = 6400 since a tarÂ¬ get is located in the center of each image in the MSTAR database. Letâ€™s use A to denote the aspect angle of a target SAR image. Then, the given training data set can be denoted by {(x-, a,)|i= 1, ...,N) (the upper case X and A represent random variables and the lower case x and a represent their samples). In general, for a given image x, the aspect angle estimation problem can be formulated as a maximum a posteriori probability (MAP) problem: a argmax fAlx(x, a) a arg max a /aÃx>a) fxix) arg max fAX(x, a) a (5.1) where a is the estimation of the true aspect angle, fA\x(x, a) is the a posteriori probaÂ¬ bility density function (pdf) of the aspect angle A given X, f^x) is the pdf of the image X, fAX{x, a) is the joint pdf of image X and aspect angle A. So, the key issue here is to 140 estimate the joint pdf fAX(x, a). However, the very high dimensionality of the image variÂ¬ able X make it very difficult to obtain a reliable estimation. Dimensionality reduction (or feature extraction) becomes necessary. An â€œinformation filterâ€ y = q(x, w) (where w is the parameter set) is needed such that when an image x is the input, its output y can conÂ¬ vey the most information about the aspect angle and discard all the other irrelevant inforÂ¬ mations. Such an output is the feature for aspect angle. Based on this feature variable Y, the aspect angle estimation problem can be reformulated by the same MAP strategy: a = argmax fAY(y,a), y = q(x,w) (5.2) a where fAY(y, a) is the joint pdf of the feature Y and the aspect angle A. The crucial point for this aspect angle estimation scheme is how good the feature Y turns out to be. Actually, the problem of reliable pdf estimation in a high dimensional space is now converted to the problem of building a reliable aspect angle â€œinformation filÂ¬ terâ€ only on the given training data set. To achieve this goal, the mutual information is used and the problem of finding an optimal â€œinformation filterâ€ can be formulated as woptimal = argmax I(Y= q(X,w),A) (5.3) W that is to find the optimal parameter set woptimal such that the mutual information between the feature Y and the angle A is maximized. To implement this idea, the quadratic mutual information based on the Euclidean distance IED and its corresponding cross information potential Ved between the feature Y and the angle A will be used. There will be no assumption made on either the data or the â€œinformation filter.â€ The only thing used here will be the training data set itself. In the experiments, it is found that a linear mapping with 141 â€¢ T two outputs is good enough for the aspect angle information filter (7 = (7,, 72) )â€¢ The system diagram is shown bellow. Angles A Back-Propagation Figure 5-2. System Diagram for Aspect Angle Information Filter One may notice that the joint pdf fAY(y, a) is the natural â€œby-productâ€ of this scheme. Recall that the cross information potential is based on the Parzen window estimation of the joint pdf fA Y(y, a). So, there is no need to further estimate the joint pdf fA Y(y, a) by any other method. Since the angle variable A is a periodic one, e.g. 0 should be the same as 360, all the angles are put in the unit circle; i.e., the following transformation is used. fA j = cos(yf) [A2 = sin(/l) (5.4) So, the actual angle variable used is A = (A j, A2), a two dimensional variable. 142 In the experiment, it is also found that the discrimination between two angles with 180 degrees difference is very difficult. Actually, it can be seen from Figure 5-1 that it is diffiÂ¬ cult to tell where is the front and where is the back of a vehicle although the overall direcÂ¬ tion of the vehicle is clear to our eyes. Most of the experiments are just to estimate the angle within 180 degrees, e.g. 240 degree will be treated as 240-180 = 60 degree. ActuÂ¬ ally, the following transformation is used in this case. ÃAÂ¡ = cos (2 A) [A2 = sin(2i4) In this case the actual angle variable is A = (Al,A2). Correspondingly, the estimated angles will be divided by 2. N 1 * *, 2 2 2 Since the joint pdf fAY(y> a) = Tr G(yâ€”yÂ¿, ov)G(a â€” aÂ¡, 2 . anee for the Gaussian Kernel for the feature Y, aa is the variance for the Gaussian Kernel for the actual angle A, and all the angle data ai are in the unit circle, the search for the optimal angle a = argmax fAY(y, a), y = q(x, w) can be implemented by scanning a the unit circle in (Ay,A2) plane. Then the real estimated angle can be a/2 for the case without 180 degree difference. 5.1.3 Experiments of Aspect Angle Estimation There are three classes of vehicles with some different configurations. Totally, there are 7 different vehicle types. They are BMP2_C21, BMP2_9563, BMP2_9566, BTR70_C71, T72_132, T72_S7. 2 2 To use the ED-CIP to implement the mutual information, the kernel size 143 . 2 2 manee are not sensitive to them. The typical values are a = 0.1 and a = 0.1. There / u 2 2 2 will be no big performance difference if ay = 0.01 or a = 1.0 or oa = 1.0 is used. â€”8 The step size is usually around 1.5 x 10 .It can be adjusted according to the training process. Output data (angle feature) distribution. estimated angle and true value Diamondâ€”training data; Triangleâ€”testing data (solid line) Figure 5-3. Training: BMP2_C21 (0-180 degree); Testing: BMP2_C21 (0-180 degree) Error Mean: 3.45 (degree); Error Deviation: 2.58 (degree) Figure 5-3 shows a typical result. The training data are chosen from BMP2_C21 within the angle range from 0 to 180 degrees, totally 53 images and their corresponding angles with an approximate 3.5 degrees difference between each neighboring angle pair. The testing data are from the same vehicle in the same degree range 0-180 but not included in the training data set. The left graph shows the output data distribution for both training and testing data. It can be seen that the training data form a circle, the best way to represent angles. The testing images are first fed into the information filter to obtain the features. The triangles in the left graph of Figure 5-3 indicate these features. The aspect angles are then estimated according to the method described above. The right graph shows 144 the comparison between the estimated angles (the dots indicated by x) and the true value (solid line) (the testing image are sorted according to their true aspect angles). Output data (angle feature) distribution. estimated angle and true value Diamond-training data; Triangle-testing data (solid line) Figure 5-4. Training: BMP2_C21 (0-360 degree); Testing: BMP2_C21 (0-360 degree) Error Mean: 12.40 (degree); Error Deviation: 20.56 (degree) Output data (angle feature) distribution. estimated angle and true value Diamond-training data; Triangle-testing data (solid line) (180 difference is ignored) Figure 5-5. Training: BMP2_C21 (0-180 degree); Testing: T72_S7 (0-360 degree) Error Mean: 6.18 (degree); Error Deviation: 5.19 (degree) 145 Figure 5-4 shows the result of the training on the same BMP2_C21 vehicle but the angle range is from 0 to 360 degree. Testing is done on the same BMP2_C21 within the same angle range (0 to 360) but all the testing data are not included in the training data set. As can be seen, the results become worse due to the difficulty of telling the difference between two images with 180 degree angle difference. The figure also shows that the major error occurs when 180 degree difference can not be correctly recognized (The big errors in the figure are about 180 degree). Figure 5-5 shows the result of training on the personnel carrier BMP2_C21 within the range of 180 degree but testing on the tank T72_S7 within the same range (0-180 degree). The tank is quite different from the personnel carrier because the tank has a cannon but the carrier hasnâ€™t. The good result indicate the robustness and the good generalization ability of the method. The following two experiments will further give us an overall idea on the performance of the method and they further confirm the robustness and the good generaliÂ¬ zation ability of the method. Inspired by the result of the method, we apply the traditional MSE criterion by putting the desired angles in the unit circle in the same way as the above. The results are shown bellow from which we can see that both methods have a compatible performance but ED-CIP method converges faster than the MSE method. In the experiment 1, the training is based on 53 images from BMP2_C21 within the range of 180 degrees. The results are shown in Table 5-1. The testing set â€œbmp2_c21_tlâ€ means the vehicle bmp2_c21 within the range of 0-180 degree but not included in the training data set, the set â€œbmp2_c21 t2â€ means the vehicle bmp2_c21 within the range of 180-368 degree but the 180 degree difference is ignored in the estimation, the set â€œt72_132_trâ€ means the vehicle t72 132 which will be used for training in the experiment 146 2, the set â€œt72_132_teâ€ means the vehicle t72 132 but not included in the set â€œt72_132_tr.â€ Table 5-1. The Result of Experiment 1; Training on bmp2_c21_.tr (53 images) (0-180) Vehicle Results (ED-CIP) error mean (error deviation) Results (MSE) error mean (error deviation) bmp2_c21 tr 0.54 (0.40) 1.05e-5 (8.293e-6) bmp2_c21_tl 2.76 (2.37) 2.48 (2.12) bmp2_c21_t2 2.63 (2.10) 2.79 (2.43) t72 132 tr 7.12(5.36) 7.42 (5.12) t72 132 te 4.75 (3.21) 4.09 (3.02) bmp2_9563 4.25 (3.62) 3.77 (3.16) bmp2_9566 3.81 (3.16) 3.60 (2.97) btr70_c71 3.18(2.84) 2.88 (2.47) t72_s7 6.65 (5.04) 6.95 (5.27) Table 5-2. The Result of Experiment 2. Training on bmp2_c21_tr and t72 132 tr. (0-180) Vehicle Results (ED-CIP) error mean (error deviation) Results (MSE) error mean (error deviation) bmp2_c21_tr 1.99(1.52) 0.18(0.14) bmp2_c21 te 2.96 (2.41) 0.18(0.11) t72 132_tr 1.97(1.48) 0.17(0.13) t72_132_te 3.01 (2.66) 0.17(0.13) bmp2_9563 2.97 (2.35) 2.54(1.90) bmp2_9566 3.32 (2.44) 2.80 (2.19) btr70_c71 2.80 (2.33) 2.42 (1.83) t72_s7 3.80 (2.57) 3.38 (2.40) 147 In Experiment 2, training is based the data set â€œbmp2_c2 l_trâ€ and the data set â€œt72_132_tr.â€ The experimental results are shown in Table 5-2, from which we can see the improvement of the performance when more vehicles and more data are included in the training process. More experimental results can be found in the paper [XuD98] and the reports of the DARPA project on Image Understanding (the reports can be found in the web site â€œhttp:// www.cnel.ufl.edu/~atr/.â€. From the experiment results, we can see that the error mean is around 3 degree. This is reasonable because the angles of the training data are approxiÂ¬ mately 3 degrees apart between the neighboring angles. Output data (angle feature) distribution. Diamond-training data; Triangle-testing data estimated angle and true value (solid line) Figure 5-6. Occlusion Test with Background Noise. The images corresponding to (a), (b), (c), (d), (e) and (f) are shown in Figure 5-7. 148 Figure 5-7. The occluded images corresponding to the points in Figure 5-6 149 5.1.4 Occlusion Test on Aspect Angle Estimation To further test the robustness and the generalization ability of the method, occlusion tests are conducted, where the testing input SAR images are contaminated by background noise or the vehicle image is occluded by the SAR image of trees. Figure 5-6 shows the result of â€œOcclusion Test,â€ where a squared window with backÂ¬ ground noise enlarges gradually until all the image is occluded and replaced by the backÂ¬ ground noise as shown in Figure 5-1 and Figure 5-7. Figure 5-7 shows the occluded images corresponding to the points in Figure 5-6. We can see that even when the most part of the target is occluded, the estimation is still good, which simply verifies the robustness and the generalization ability of the method. When the occluding square enlarges, the outÂ¬ put point (feature point) goes away from the circle, but the direction is essentially perpenÂ¬ dicular to the circle, which means the nearest point in the circle is essentially unchanged and the estimation of the angle basically remains the same. Figure 5-8. SAR Image of Trees. The squared region was cut for the occlusion purpose 150 Figure 5-8 is a SAR image of trees. One region was cut to occlude the target images to see how robust the method is and how good the generalization can be made by the method. As shown in Figure 5-10 and Figure 5-11, the cut region of trees is slid over the target image from the lower right comer to the upper left comer. The occlusion is made by averÂ¬ aging the overlapped target pixels and tree pixels. Figure 5-10 shows two particular occluÂ¬ sions, in the right one of which, the most part of the target is occluded but the estimation is still good. Figure 5-9 shows the overall results when sliding the occlusion square region. One may notice that the result gets better when the whole image is overlapped by the tree image. The explanation is that the occlusion is the average of both the target pixels and the tree pixels in this case, and the center region of the tree image has small pixel values while the center region of the target image has large pixel values, therefore, when the whole tarÂ¬ get image is overlapped by the tree image, the occlusion of the target (the center region of the target image) becomes even lighter. Output data (angle feature) distribution. Diamondâ€”training data; Triangleâ€”testing data estimated angle and true value (solid line) Figure 5-9. Occlusion Test with SAR Image of Trees. The images corresponding to the points (a) and (b) are shown in Figure 5-10. The images corresponding to the points (c) and (d) are shown in Figure 5-11. 151 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.4 0.6 101,191 -0.2 -0.0 100,600 0.2 0.4 Estimated Angle: 100.6 o.o - -0.2 -0.4 -0.6 -0.4 -0.2 -0.0 0.2 0.4 0.6 Estimated Angle: 105.2 0.8 101,191 i 1 1 1 i 1 ' 1 i 105,200 â€”iâ€”iâ€”iâ€”]â€”iâ€”iâ€”r Figure 5-10. Occlusion with SAR Image of Trees. Output data distribution (Diamond: training data; Triangle: testing data). Upper Images are occluded images. Lower Images show the occluded regions. The true angle is 101.19 â€”0.6 -04 -0.2 -0.0 0.2 0.4 0.6 -0.6 -04 -0.2 -0.0 0.2 0.4 0.6 Estimated Angle: 160.6 Estimated Angle: 99.6 Figure 5-11. Occlusion with SAR Image of Trees. Output data distribution (Diamond: training data; Triangle: testing data). Upper Images are occluded images. Lower Images show the occluded regions. The true angle is 101.19 152 5.2 Automatic Target Recognition iATRl In this section, we will see how important the mutual information will be for the perÂ¬ formance of pattern recognition, and how the cross information potential can be applied to automatic target recognition of SAR Imagery. First, letâ€™s look at the lower bound of recognition error specified by Fanoâ€™s inequality [Fis97], P(c* c) > Hs{c\y)-\ log(0(c)) (5.6) where c is a variable for the identity of classes, y is a feature variable based on which a classification will be conducted, 0(c) denotes the number of classes, Hs(c\y) is ShanÂ¬ nonâ€™s conditional entropy of c given y. Fanoâ€™s inequality means the classification error is lower bounded by the quantity which is determined by the conditional entropy of the class identity given the recognition feature y. By a simple manipulation, we get Hs(c)â€”I(c,y)â€” 1 P(c *c) > log(0(c)) (5.7) which means that to minimize the lower bound of the error probability, the mutual inforÂ¬ mation between the class identity c and the feature y should be maximized. 5.2.1 Problem Description and Formulation Letâ€™s use X to denote the variable for target images, and C to denote the variable for the class identity. We are given a set of training images and their corresponding class idenÂ¬ tities {(xÂ¡, cf)|/= 1, ..., N}. A classifier need to be established based only on this training data set such that when given a target image x, it can classify the image. Again, the probÂ¬ lem can be formulated as a MAP problem: 153 c = argmax Pc,^c\x) = argmax fcx(x,c) (5.8) C C where Pc\xic |x) is the a posteriori probability of the class identity C given the image X, fcÃx"> c) 's the joint pdf of image X and the class identity C. So, similarly, the key issue here is to estimate the joint pdf foÂ¿x, c). However, the very high dimensionality of the image variable X make it very difficult to obtain a reliable estimation. Dimensionality reduction (or feature extraction) again is necessary. An â€œinformation filterâ€ y = q(x, w) (where w is parameter set) is needed such that when an image x is its input, its output y can convey the most information about the class identity and discard all the other irreleÂ¬ vant informations. Such an output is the feature for classification. Based on the classificaÂ¬ tion feature y, the classification problem can be reformulated by the same MAP strategy: c = arg maxfCY(y,c), y = q(x,w) (5.9) C where fCY(y, c) is the joint pdf of the classification feature Y and the class identity C. Similar to the aspect angle estimation problem, the crucial point for this classification scheme is how good the classification feature Y is. Actually, the problem of reliable pdf estimation in a high dimensional space is now converted to the problem of building a reliÂ¬ able â€œinformation filterâ€ for classification based only on the given training data set. To achieve this goal, the information measure of the mutual information is used as also sugÂ¬ gested by Fanoâ€™s inequality, and the problem of finding an optimal â€œinformation filterâ€ can be formulated as w0ptimai = argmax I(Y= q(X,w),C) (5.10) W that is to find the optimal parameter set woptimal such that the mutual information between the classification feature Y and the class identity C is maximized. To implement this idea, 154 the quadratic mutual information based on Euclidean distance IED and its corresponding cross information potential VED will be used again. There will be no assumption made on either the data or the â€œinformation filter.â€ The only thing used here will be the training data set itself. In the experiments, it is found that a linear mapping with 3 outputs for the 3 classes is good enough for the classification of such high dimensional images (80 by 80). The system diagram is shown in Figure 5-12. Class Identity C O A â–¡ Image X Back-Propagation Figure 5-12. System Diagram for Classification Information Filter The joint pdf fCYiy, c) is still the natural â€œby-productâ€ of this scheme. Actually, the cross information potential is based on the Parzen window estimation of the joint pdf /ctO.c) = jj'Z G(y-yâ€ža2y)S(c-c,) (5.11) /= 1 2 where oy is the variance for Gaussian kernel function for the feature variable y, 5(c â€” cÂ¡) is the Kronecker delta function; i.e., 155 s (c-ct) c = ct otherwise (5.12) So, there is no need to estimate the joint pdf fCY(y, c) again by any other method. The ED-QMI information force in this particular case can be interpreted as repulsion among the â€œinformation particlesâ€ (IPTs) with different class identity, and attraction with each other among the IPTs within the same class. Based on the joint pdf fCY{y, c), the Bayes classifier can be built up: c = arg maxfCY(y,c) y = q(x,w) (5.13) Since the class identity variable C is discrete, the search for maximum in (5.13) can be simply implemented by comparing each value of fCY(y, c). 5.2.2 Experiment and Result The experiment is conducted on MSTAR database [Ved97]. There are three classes (vehicles): BMP2, BTR70 and T72. For each one, there are some different configurations (sub-classes) as shown bellow. There are also 2 types of confuser. BMP2 BMP2_C21, BMP2_9563, BMP2_9566. BTR70 BTR97_C71. T72 T72_132, T72_S7, T72_812. Confuser 2S1, D7. The training data set is composed of 3 types of vehicle: BMP2_C21, BTR70_C71 and T72_132 with depression angle 17 degree. All the testing data have 15 degree depression angle. The classifier is built within the range of 0-30 degree aspect angle. The final goal is to combine the result of aspect angle estimation with the target recognition such that with 156 the aspect angle information, the difficult overall recognition task (with all aspect angles) can be divided and conquered. Since a SAR image of a target is based on the reflection of the target, different aspect angles may result in quite different characteristics for SAR imagery. So, organizing classifiers according to aspect angle information is a good StratÂ¬ egy- Figure 5-13 shows the images for training. The classification feature extractor has three outputs. For the illustration purpose, 2 outputs are used in Figure 5-14, Figure 5-15 and Figure 5-16 to show the output data distribution. Figure 5-14 shows the initial state with 3 classes mixed up. Figure 5-15 shows the result after several iterations where the classes are starting to separate. Figure 5-16 shows the output data distribution at the final stage of the training where 3 classes are clearly separated and each class tends to shrink to one point. Figure 5-13. The SAR Images of Three Vehicles for Training Classifier (0-30 degree) 157 Figure 5-14. Initial Output Data Distribution for Classification Left graph: lines are illustration of â€œinformation forces;â€ Right graph: detailed distribution Figure 5-15. Intermediate Output Data Distribution for Classification Left graph: lines are illustration of â€œinformation forces;â€ Right graph: detailed distribution 158 0.2 -0,0 - -0.2 - -0.4 - -0.6 - -0.0 - -1.0 - -1.2 _ -iâ€”iâ€”iâ€”iâ€”iâ€”iâ€”iâ€”r I 1 ' r- * 0. -O.S -0.6 -0.4 -0.2 0.0 Figure 5-16. Output Data Distribution at Final Stage for Classification Left graph: lines are illustration of â€œinformation forces;â€ Right graph: detailed distribution Table 5-3. Confusion Matrix for Classification by ED-CIP BMP2 BTR70 T72 BMP2_C21 18 0 0 BMP2_9563 11 0 0 BMP2_9566 15 0 0 BTR70_C71 0 17 0 T72_132 0 0 18 T72_812 0 2 9 T72_S7 0 0 15 Table 5-3 shows the classification result. With limited number of training data, the classifier still shows a very good generalization ability. By setting a threshold to allow 10% rejection, a detection test is further conducted on all these data and the data for two other confusers. A good result is shown in Table 5-4. The results in Table 5-3 and Table 5- 159 2 â€”5 4 are obtained by using kernel size a = 0.1 and the step size 5.0 x 10 .Asa compariÂ¬ son, Table 5-5 and Table 5-6 give the corresponding results of the support vector machine (more detailed results are presented in 1998 image understanding workshop [Pri98]), from which we can see that the classification result of ED-CIP is even better than that of supÂ¬ port vector machine. Table 5-4. Confusion Matrix for Detection (with detection probability=0.9) (ED-CIP) BMP2 BTR70 T72 Reject BMP2_C21 18 0 0 0 BMP2_9563 11 0 0 2 BMP2_9566 15 0 0 2 BTR70_C71 0 17 0 0 T72_132 0 0 18 0 T72_812 0 2 9 7 T72_S7 0 0 15 0 2S1 0 3 0 24 D7 0 1 0 14 Table 5-5. Confusion Matrix for Classification by Support Vector Machine (SVM) BMP2 BTR70 T72 BMP2_C21 18 0 0 BMP2_9563 11 0 0 BMP2_9566 15 0 0 BTR70_C71 0 17 0 T72_132 0 0 18 T72_812 5 2 4 T72_S7 0 0 15 160 Table 5-6. Confusion Matrix for Detection (with detection probability=0.9) (SVM) BMP2 BTR70 T72 Reject BMP2_C21 18 0 0 0 BMP2_9563 11 0 0 2 BMP2_9566 15 0 0 2 BTR70_C71 0 17 0 0 T72_132 0 0 18 0 T72_812 0 1 2 8 T72JS7 0 0 12 3 2S1 0 0 0 27 D7 0 0 0 16 5.3 Training MLP Laver-bv-Laver with CIP During the first neural network era that ended in the 1970s, there was only RosenbÂ¬ lattâ€™s algorithm [Ros58, Ros62] to train one layer perceptron and there was no known algorithm to train MLPs. However the much higher computational power of the MLP when compared with the perceptron was recognized in that period of time [Min69], In the late 1980s, the back-propagation algorithm was introduced to train MLPs, contributing to the revival of neural computation. Ever since this time, the back-propagation algorithm has been exclusively utilized to train MLPs to a point that some researchers even confuse the network topology with the training algorithm by calling MLPs as back-propagation networks. It has been widely accepted that training the hidden layers requires backpropa- gation of errors from the output layers. As pointed out in Chapter 3, Linskerâ€™s InfoMax can be further extended to a more genÂ¬ eral case. The MLP network can be regarded as a communication channel or â€œinformation 161 filterâ€ for each layer. The goal of the training of such network is to transmit as much inforÂ¬ mation about the desired signal as possible at the output of each layer. As shown in (3.16), this can be implemented by maximizing the mutual information between the output of each layer and the desired signal. Notice that we are not using the back-propagation of errors across layers. The network is incrementally trained in a strictly feedforward way, from the input layer to the output layer. This may seem impossible since we are not using the information of the top layer to train the input layer. The training in this way is simply guaranteeing that the maximum possible information about the desired signal is transÂ¬ ferred from the input layer to each layer. The cross information potential can make the explicit immediate response to each network layer without the need to backpropagate from the output layer. To test the method, the â€œfrequency doublerâ€ problem is selected, which is representaÂ¬ tive of a nonlinear temporal processing. The input signal is a sinewave and the desired outÂ¬ put signal is still a sinewave but with the frequency doubled (as shown in Figure 5-17). A focused TDNN with one hidden layer is used. There are one input node with 5 delay taps, two nodes in hidden layer with tanh nonlinear function and one linear output node (as shown in Figure 5-17). The ED-QMI or ED-CIP is used for training. The hidden layer is trained first followed by the output layer. The training curves are shown in Figure 5-18. The output of the hidden nodes and output node after training are shown in Figure 5-19 which tells us that the frequency of the final output is doubled. The kernel size for the 2 training of both the hidden layer and the output layer are o~ = 0.01 for the output of each 2 layer and ad = 0.01 for the desired signal. 162 This problem can also be solved with MSE criterion and BP algorithm. The error may be smaller. So, the point here is not to use CIP as a substitute to BP for MLP training. It is an illustration that the BP algorithm is not the only possible way to train networks with hidden layers. From the experimental results, we can see that even without the involvement of the output layer, CIP can still guide the hidden layer to leam what is needed. The plot of two hidden node outputs already reveals the doubled frequency which means the hidden nodes best represent the desired output from the transformation of the input. The output layer simply selects what is needed. These results, on the other hand, further confirm the validÂ¬ ity of the CIP method proposed. From the training curves, we can see the sharp increases in CIP which suggest that the step size should be varied and adapted during the training process. How to choose the kerÂ¬ nel size of Gaussian function in CIP method is still an open problem. For these results, it is determined experimentally. Figure 5-17. TDNN as a Frequency Doubler 163 Figure 5-18. Training Curve. CIP vs. Iterations First Hidden Node Second Hidden Node Plot the output of two hidden nodes together The output of the network Figure 5-19. The output of the nodes after training 164 5.4 Blind Source Separation and Independent Component Analysis 5.4.1 Problem Description and Formulation Blind source separation is a specific case of ICA. The observed data X - AS is a linÂ¬ ear mixture (A e Rmxm is non-singular) of independent source signals (S = (5,, SÂ¿ independent with each other). There is no further information about the sources and the mixing matrix. This is why it is called â€œblind.â€ The problem is to find a projection W e Rmxm, Y = WX so that Y = S up to a permutation and scaling. Comon [Com94] and Cao and Liu [Cao96] among others have already shown that this result will be obtained for a linear mixture when the outputs are independent of each other. Based on the IP or CIP criteria, the problem can be re-stated as finding a projection W g Rmxm, Y = WX so that the IP is minimized (maximum quadratic entropy) or CIP is minimized (minimum QMI). The system diagram is shown in Figure 5-20. The different cases will be discussed in the following sections. Back-Propagation Figure 5-20. The System Diagram for BSS with IP or CIP 165 5.4.2 Blind Source Separation with CS-QMIÃCS-CIPI As introduced in Chapter 2, CS-QMI can be used as an independence measure. Its corÂ¬ responding cross information potential CS-CIP will be used here for the blind source sepÂ¬ aration. For ease of illustration, only 2-source-2-sensor problem is tested. There are two experiments presented here. Source Mixed Signal Recovered Figure 5-21. Data Distribution for Experiment 1 Training Curve. dB vs. iteration Figure 5-22. Training Curve for Experiment 1. SNR (dB) vs. iterations 166 Experiment 1 tests the performance of the method on a very sparse data set. Two difÂ¬ ferent colored Gaussian noise segments are used as sources, with 30 data points for each segment. The data distribution for source signals, mixed signals and recovered signals are plotted in Figure 5-21. Figure 5-22 is the training curve which shows how the SNR of de- mixing-mixing product matrix (WA) changes with iteration (SNR approaches to 36.73dB). Both figures show that the method works well. Figure 5-23. Two Speech Signals from TIMIT Database as Two Source Signals Training Curva Figure 5-24. Training Curve for Speech Signals. SNR (dB) vs Iterations 167 Experiment 2 uses two speech signals from the TIMIT database as source signals (shown in Figure 5-23). The mixing matrix is [1,3.5; 0.8,2.6] where two mixing direction [1, 3.5] and [0.8, 2.6] are similar. Whitening is first done on mixed signals. An on-line implementation is tried in this experiment, in which a short-time window slides over the speech data. In each window position, speech data within the window are used to calculate the CS-CIP, related forces and back-propagated forces to adjust the de-mixing matrix. As the window slides, all speech data will make contribution to the de-mixing and the contriÂ¬ butions are accumulated. The training curve (SNR vs. sliding index, SNR approaches to 49.15dB) is shown in Figure 5-24 which tells us that the method converges fast and works very well. We can even say that it can track the slow change of mixing. Although whitenÂ¬ ing is done before the CIP method, we believe that whitening process can also be incorpoÂ¬ rated into this method. ED-QMI (ED-CIP) can also be used and similar results have been obtained. For the blind source separation, the result is not sensitive to the kernel size for the cross information potential. A very large range of the kernel size will work, e.g. from 0.01 to 100, etc. 5.4.3 Blind Source Separation bv Maximizing Quadratic Entropy Bell and Sejnowski [Bel95] have shown that a linear network with nonlinear function at each output node can separate linear mixture of independent signals by maximizing the output entropy. Here, quadratic entropy and corresponding information potential will be used to implement the maximum entropy idea for BSS. Again, for the ease of exposition, only 2-source-2-sensor problem is tested. The source signals are the same speech signals 168 from the TIMIT database as above. The mixing matrix is [1 0.8; 3.5 2.78], near singular. It becomes [-0.5248 0.5273; 0.5876 0.467] after whitening, which is near orthogonal. The signal scattering plots are shown in Figure 5-25 for both source and mixed signals. Two narrow line-shape distribution areas can be visually spotted in Figure 5-25 which correspond to mixing directions. Usually, if such lines are clear, the BSS will be relatively easier. To test the IP method, a â€œbadâ€ segment with only 600 samples are chosen, where no obvious line-shaped narrow distribution area can be seen (as shown in Figure 5-26). Figure 5-27 shows the mixed signals of this â€œbadâ€ segment. All the experiments are done only on this â€œbadâ€ segment. . . 2 The parameters used are Gaussian kernel size a , initial step size s, the decaying facÂ¬ tor of step size a, the step size will decay according to s(n) = s(n â€” 1 )a where n is the time index. Data points in the same â€œbad segmentâ€ are used for training. All results are the iterations from 0 to 10000, â€˜tanhâ€™ functions are used in the output space. 1.0 0.5 0.0 -0.5 -1.0 -1.0 â€”0.5 0.0 0.5 1.0 â€”1-0 â€”0.5 0.0 0.5 1.0 Source Signals Mixed Signals (after whitening) Figure 5-25. Signals Scattering Plots 169 Figure 5-26. A â€œbadâ€ Segment of Source Signals lines indicate mixing directions Figure 5-27. The Mixed Signals for the â€œbadâ€ Segment (after whitening) Output Signalsâ€™ Scattering Plot Training Curve. Demixing SNR (dB) vs. iterations. DDDâ€”desired demixing direction (approaching 27.0956 dB) ADDâ€”actual demixing direction Figure 5-28. The Experiment Result, a2 = 0.01, s = 0.4, a = 0.9999 170 IODO 2000 3000 Â«00 5000 Output SingÃ¡isâ€™ Scattering Plot Training Curve. Demixing SNR (dB) vs. iterations. DDDâ€”desired demixing direction (approaching 24.7210 dB) ADDâ€”actual demixing direction Figure 5-29. The Experiment Result, ct = 0.02 s = 0.4 a = 1.0 Output SingÃ¡isâ€™ Scattering Plot DDDâ€”desired demixing direction ADDâ€”actual demixing direction Training Curve. Demixing SNR (dB) vs. iterations, (approaching 24.6759 dB) Figure 5-30. The Experiment Result, a2 = 0.02, s = 0.2, a = 1.0 171 1.0 25 6000 8000 10000 -1.0 -0.5 0.0 0.5 1.0 Training Curve. Demixing SNR (dB) vs. iterations, (approaching 20.7904 dB) Output SingÃ¡isâ€™ Scattering Plot DDDâ€”desired demixing direction ADDâ€”actual demixing direction Figure 5-31. The Experiment Result, a2 = 0.01, s = 1.0, a = 1.0 5.4.4 Blind Source Separation with ED-OMIÃED-CIP') and MiniMax Method For simplicity of exposition and without changing the essence of the problem, weâ€™ll discuss only the case with 2 sources and 2 sensors. Figure 5.14 is a mixing model, where only Xj(i),x2(t) are observed. Source signals sx(t),s2(t) are statistically independent and unknown. Mixing directions M, and M2 are different and unknown either. The probÂ¬ lem is to find a demixing system of Figure 5.15 to recover the source signals up to a perÂ¬ mutation and scaling. Equivalently, the problem is to find statistically orthogonal (independent) directions WÂ¡ and W2 rather than geometrically orthogonal (uncorrelated) directions as PCA [Com94, Cao96, Car98a], Nevertheless, geometrical orthogonality exists between demixing and mixing directions, e.g. either Wx 1M, or Wx Â± M2. Wu etal. [WuH98] have shown that even when sources are more than sensors; i.e., there are no statistically orthogonal demixing directions, mixing directions can still be identified as long as there are some signal segments with some sources being zero or near zero. Look- 172 ing for the mixing directions is therefore more essential than searching demixing direcÂ¬ tions and the non-stationarity nature of the sources plays an important role. "*,(0N 'm\\ m2\' rS\ (O'' *2(0, â„¢n m22, ^(0, 0^1 (O' iw\\ w2lY(xl(t)'\ VW12 W22J yX2{t)j = + M2s2(t) = w\xx(t) + W2x2(t) (5.14) (5.15) From Figure 5.14, if s2 is zero or near zero, the distribution of observed signals in (xj, x2) plane will be along the direction of M,, forming a â€œnarrow bandâ€ data distribuÂ¬ tion, which is good for finding the mixing direction Mx. If sx and s2 are comparable in energy, the mixing directions will be smeared, which is considered â€œbad.â€ Figure 5-25 and Figure 5-26 give two opposite examples. Since there are â€œgoodâ€ and â€œbadâ€ data segments, we seek a technique to choose â€œgoodâ€ ones while discarding â€œbadâ€ segments. It should be pointed out that this issue is rarely addressed in the BSS literature. Most methods treat data equally and simply apply a criterion to achieve the independence of the demixing sysÂ¬ tem outputs. Minimizing ED-CIP can be used for this purpose. In addition, ED-CIP can be used to distinguish â€œgoodâ€ segments from â€œbadâ€ ones. Wu etal. [WuH98] utilize the non-stationarity of speech signals and the eigen-spread of different speech segments to choose â€œgoodâ€ segments. However, how to decompose signals in frequency domain to find â€œgoodâ€ frequency bands remains obscure. It is well known that an instantaneous mixture will have the same mixture in all the frequency bands while a convolutive mixture will in general have different mixtures in different freÂ¬ quency bands (therefore, BSS for convolutive mixture is a much more difficult problem than BSS for instantaneous mixture). For an instantaneous mixture, different frequency 173 bands may reveal a same mixing direction. So, It is necessary to find â€œgoodâ€ frequency bands by which mixing directions are easier to find. For convolutive mixture, to treat difÂ¬ ferent frequency bands differently may also be important but weâ€™ll only discuss the probÂ¬ lem related to instantaneous mixture here. Let h(t,n) denote the impulse response of a FIR filter with parameters n. Applying this filter to the observed signals, new observed signals are obtained V)j = h(t, ti)* vx2(0; = Ml(h*s j) + M2(h*s 2) (5.16) Obviously, the mixing directions remain unchanged. The problem here is how to choose n so that only one source signal dominating dynamic range so that the correspondÂ¬ ing mixing direction is clearly revealed. First, letâ€™s consider the case when mixing matrix M = kR where A: is a positive scaÂ¬ lar, R is a rotation transform (orthonormal matrix), and mixing directions are near 45Â° or 135Â°. Obviously, when there is only one source, x,' and x2' are linear dependent. So, the necessary condition to judge a â€œgoodâ€ segment is the high dependence between xt' and x2 . But a more important problem is whether the high dependence between x,' and x2 can guarantee that there is only one dominating filtered source signal. The answer is yes. On one hand, since the source signals are independent, as long as the filter length is short enough (frequency band large enough), the filtered source signals will scattered in a wide region or a narrow one along natural bases (otherwise, the source signals are not indepenÂ¬ dent). On the other hand, the mixing is a rotation with about 45 or 135 degrees or equivaÂ¬ lent degrees and a narrow band distribution along these directions means the high dependence between two variables. So, if a narrow distribution in (xj', x2) plan appears, 174 it must be the result with only one dominating source signal. To maximize the dependence between*]' and x2' based on data set {x'(n,i),i= 1, N] where n are the parameters of the filter, N is the number of the filtered samples, ED-CIP can be used ^optimal = argmax (VED({x'(n,i),i= 1 (5.17) IUII = i where ||ti|| = 1 means the FIR filter is constrained with unit norm. One narrow distribution can be only associated with one mixing direction. Once a desired filter with parameters n 1 and outputs {* 1'} is obtained, the remaining problem is how to obtain the second, the third etc. so that the narrow distribution associated with another mixing direction will appear. One idea is to let the outputs of the filter be highly dependent with each other and at the same time be independent with all the outputs of preÂ¬ vious filters, e.g. n20ptimai = argma*[p.Fc(*2') â€”(1 â€” p)Fc(*2',*1')] where p is a l*2| = 1 weight and can change from 0 to 0.5 or to 1. After several â€œgoodâ€ data set {*1'}, ..., {xri} are obtained, the demixing can be found by minimizing the ED-CIP of the outputs of demixing on all chosen data set: yi = W xi' i = 1, ..., n Kpâ€žmal = arg/xin [Vc(yl)+... + Vc(yn)] (5'18) w This is why the method is called the â€œMini-Maxâ€ method. If mixing M is not a rotation, whitening can be done so that the mixing matrix will be close to kR mentioned above. If the mixing directions (after whitening) are far from 45 or 135 degree direction, a rotation transform can be further introduced before filters. The parameters of rotation will be trained by the same criterion and will converge to the state where the overall mixing direction is near 45 or 135 degree direction. So the procedure 175 will be 1) whitening; 2) training the parameters of a rotation transform; 3) training the parameters of filters. Since mixing directions can be identified easily by narrow scattering (distribution), this method is also expected to enhance the demixing performance when the observation is corrupted by noise; i.e., x = Mxs{ + M2s2 + Noise. The same â€œbadâ€ segment and mixing matrix as the previous section will be used here (shown in Figure 5-26). Whitening is first done, and the mixed signals after whitening is shows in Figure 5-27. White Gaussian noise (SNR=0dB) is added into the mixed signals and make even a worse segment (shown in Figure 5-32). From Figure 5-27, we can see that the mixing directions are difficult to find. The case in Figure 5-32 is even worse due to the noise. Figure 5-32. The â€œbadâ€ Segment in Figure 5-27 + Noise (SNR=0dB) By directly minimizing ED-CIP of the outputs of a demixing system, the results in Figure 5-33 is obtained, from which we can see the average demixing performance conÂ¬ verges to 32.18dB for the case without noise, and 15.20dB for the case with noise. Based only on the limited number of data points in the â€œbad segmentâ€ (first 400 data points are 176 used), Mini-ED-CIP method can still get a good performance (Comparing the results with the results by IP method in the previous section). This further verifies the validity of ED- CIP. By applying Max-ED-CIP method to train FIR filters, we get results shown in Figure 5-34 and Figure 5-35, where frequency bands with only one dominating source signal are found, and the scattering distributions of the outputs of those filters match with mixing directions. Mini-Max-ED-CIP is further applied to these results to find demixing system, obtaining improved 38.50dB average demixing performance for the case without noise, and 24.39dB for the case with noise (Figure 5-36 and Figure 5-7). In this section, it is pointed out that finding mixing directions is more essential than obtaining demixing directions. Maximizing ED-CIP can help to obtain frequency bands in which mixing directions are easier to find. Mini-Max-ED-CIP method can improve the demixing performance over Mini-ED-CIP method. Although the experiments presented here are specific ones, they further confirms the effectiveness of ED-CIP method. The work on Mini-Max-ED-CIP is preliminary, but it suggests the other extreme (maximizing mutual information) for BSS compared with all the current methods (minimizing mutual information). As ancient philosophy suggests, two opposite extremes can often exchange. It is worthwhile to explore this direction for BSS and even blind deconvolution. Table 5-7. Demixing Performance Comparison The case without noise The case with noise s Mini-CIP 32.18 dB 15.20 dB Mini-Max-CIP 38.50 dB 24.39 dB 177 the case without noise demixing SNR approaching 32.18 dB the case with noise demixing SNR approaching 15.20 dB Figure 5-33. Performance by Minimizing ED-CIP (a) (b) (c) (d) (a): distribution of the outputs of FIR 1 (c): distribution of the outputs of Filter 2 (b): source signals filtered by FIR 1 ratio of two signals: from -0.87dB to 13.21dB (d): source signals filtered by FIR 2 ratio of two signals: from -0.87dB to -19.86dB Figure 5-34. The results of filters FIR 1 and FIR 2 obtained by Max ED-CIP (the case without noise) 178 (a) (b) (c) (d) (a): distribution of the outputs of FIR 3 (c): distribution of the outputs of Filter 4 (b): source signals filtered by FIR 3 ratio of two signals: from -0.87dB to 13.02dB (d): source signals filtered by FIR 4 ratio of two signals: from -0.87dB to -13.84dB Figure 5-35. The results of filters FIR 3 and FIR 4 obtained by Max ED-CIP (with noise) the case without noise demixing SNR approaching 38.50dB the case with noise demixing SNR approaching 24.39dB Figure 5-36. The Performance by Mini-Max ED-CIP CHAPTER 6 CONCLUSIONS AND FUTURE WORK In this chapter, we would like to summarize the issues addressed in this dissertation and the contributions we made towards their solutions. The initial goal is to establish a general nonparametric method for information entropy and mutual information estimation based only on data samples, without any other assumption. From a physical point of view, the world is a â€œmass-energyâ€ system. It turns out that entropy and mutual information can also be viewed from this point of view. Based on the other general measure for entropy, such as Renyiâ€™s entropy, we interpret entropy as a rescaled norm of a pdf function and proÂ¬ posed the idea of the quadratic mutual information. Based on these general measures, the concepts of â€œinformation potentialâ€ and â€œcross information potentialâ€ are proposed. The ordinary energy definition for a signal and the proposed IP and CIP are put together to give a unifying point of view about these fundamental measures which are crucial for sigÂ¬ nal processing and adaptive learning in general. With such fundamental tool, a general information-theoretic learning framework is given which contains all the current existing information-theoretic learning as a special case. More importantly, we not only give a genÂ¬ eral learning principle, but also give an effective implementation of this general principle. We break the barrier of model linearity and Gaussian assumption on data which are the major limitation of the most existing methods. In Chapter 4, a case study on learning with on-line local rule is presented. We establish the link between the power field, which is a 179 180 special case of the information potential field, to the famous biological learning rules: the Hebbian and the anti-Hebbian rules. Based on these basic understanding, we developed an on-line local learning algorithm for the generalized eigendecomposition for signals. SimuÂ¬ lations and experiments of these methods are conducted on several problems, such as aspect angle estimation for SAR imagery, target recognition, layer-by-layer training of multilayer neural networks, blind source separation. The results further confirm the proÂ¬ posed methodology. The major problem left is the further theoretic justification of the quadratic mutual information. The basis for the QMI as an independence measure is strong. We further proÂ¬ vide some intuitive arguments that it is also appropriate as a dependence measure and we apply the criteria successfully to solve several problems. However, there is still no rigorÂ¬ ous theoretical proof that the QMI is appropriate for mutual information maximization. The problem of the on-line learning with IP or CIP is mentioned in Chapter 4. Since IP or CIP examines such detailed information as the relative position of each pair of data samples, it is very difficult to design an on-line algorithm for IP and CIP. The on-line rule for an energy measure is relatively easy to obtain because it only examines the relative position of each data sample to their mean point. Thus, each data point is relatively indeÂ¬ pendent with each others while IP or CIP need to take care the relation of each data sample to all the others. One solution to this problem may come from the use of the mixture model where the means for subclasses of all data are used. Then the relative position between each data sample and each subclass mean need to be considered. Each mean may just like a â€œheavyâ€ data point with more â€œmassâ€ than an ordinary data sample. These â€œheavyâ€ data points may serve as a kind of memory in a learning process. The IP or CIP then may 181 become the IP or CIP of each sample in the IP or CIP field of these â€œheavy mean partiÂ¬ cles.â€ Based on this scheme, an on-line algorithm may be developed. The Gaussian mixÂ¬ ture model and the EM algorithm mentioned in Chapter 3 may be the powerful tools to obtain such â€œheavy information particles.â€ The computational complexity of IP or CIP method is in the order of 0(N2) where N is the number of data samples. With the â€œheavy information particlesâ€ (suppose there are M such â€œparticlesâ€ and MÂ«N and may be fixed), the computational complexity may be reduced to the order of 0(MN). So, it may be very significant to further study this possiÂ¬ bility. In terms of algorithmic implementation, how to choose the kernel size for IP and CIP is not discussed in the previous Chapters. We empirically choose the kernel size during our experiments. It has been observed that the CIP is not sensitive to kernel size, but kerÂ¬ nel size may be crucial for the IP. Further study on this issue or even a method to select the optimal kernel size is important for the IP and the CIP methods. The IP and the CIP methods are general. They may find many applications in practical problems. To find more applications will also be an important work in the future. APPENDIX A THE INTEGRATION OF THE PRODUCT OF GAUSSIAN KERNELS 1 n r-O ... Let G(y, X) = â€”â€”â€” expI â€”-y X y I be the Gaussian function in k dimen- (2ti) 7 |X| ^ 2 J k k k sional space, where X is the covariance matrix, y e R . Let aÂ¡ e R and cij e R be two data points in the space, X, and X2 be two covariance matrices for two Gaussian kernels in the space, then we have +00 I G(y-aÂ¡,I.l)G(y-aj,l.2)dy = GÂ«a,-aj),(I, + Z2)) (A.l) â€”00 Similarly, the integration of the product of three Gaussian kernels can also be obtained. The following is the proof of (A.l). Proof: 1. Let d = at â€” aj, then (A.l) becomes +oo +00 J G(y-ai,'Zl)G(y-aj,'L2)dy = J G{y-d, X,)G(y, X2)Â¿y â€”oo â€”oo = G(d, (X, + X2)) 2. Let c = (Xj1 + X2') 1X^1 <3?, then we have O - d)TI^1 (y-d)+ y rX2 V = O'-cftXj1 + X^)0-c) + /(X j + X2)-IJ Actually, by the matrix inversion lemma [Gol93] (A.2) (A.3) 182 183 (.A + CBC7) 1 = A l-A lC(B 1 + CTA 1C) 'CTA 1 (A.4) and let A - Â£ ,, B = Â£2 and C = I (identity matrix), we have 271 + 27')â€V = (Sl + s2)_1 (Aâ€™5) Since Â£, and Â£2 are all symmetric, we have (y-d) rÂ£7J (y~d)+ y rÂ£7 V = yrÂ£7'y - 2^rÂ£7y + /Â£ 71 d + y rÂ£Â¡y = yr(Â£71 + Â£71)y-2cr(Â£Â¡1 + Â£7*)y + c7â€™(Â£Â¡1 + Â£7*)c -cr(Â£7' + Â£7â€˜ )c + /Â£7â€˜ d (A.6) = (y - c)r(Â£7! + Â£7â€™ )(y - c) - dTlT^ (Â£7* + Â£j1)â€' Â£7* d + d = (y - c)r(Â£7' + Â£7' )0 - c) + /[Â£7! - Â£7* (Â£7â€˜ + Â£Â¡â€˜ )_1 Â£7* ] d = (y - c)^1 + Â£Â¡* )(y - c) + /(Â£, + Â£2)_1 d 3. Since U||5| = |^4i?| (if ^45 exists) and \A\ 1 = U *| (if A 1 exists), we have |Â£j +12| e.'+e;1 -1 lÂ£ l||S2| = lÂ£ 1-1 2| |2] + S2 _1||Â£, +Â£2 Â£,' +Â£2' Â£7' + Â£Â¡! -1 Â£11+Â£21 Â£11+Â£21 = 1 (A.7) 4. Based on (A.3) and (A.7), we have G(y â€” d, Â£,)G(y, Â£2) = G(y-c,(Â£Â¡1 + Â£71)'1)G(^(S1+Â£2)) (A.8) Actually, by applying (A.3) and (A.7), we have 184 (2*) |Â£,| G(y-d,l:)G(y, Z2) ,l/2,â€ž ,1/2CXPi->-J)^>1 ^+ /Z2 Vl r^-1 . 1, ^-1 - S,'w2.y.U2exp(4[(7-cY(27 +15-)(y-c) + dâ€˜(E, + jy"V] (271) |S,| |Z2| V Z = Gty-c^'+I,1) â€˜^(Sj+E,)) = GCy-c^'+I,1) ')G(^(E,+E2)) (A.9) ns,+z2 i 1 + E 1 1 ^2 -s h V :il lS2 y 5. Since G( , ) is the Gaussian pdf function and its integration equals to 1, we have +00 J G(y-d,^)G(y, X2)dy +00 = I G(y-c, (SÂ¡â€˜ + S71)'')G(rf, (I, + I.2))dy â€”oo +oo = G(d, (E, + Z2)) J G(y-c,(lT2l +-L~l)~l)dy â€”00 = G(4(I , + Z2)) (A. 10) So, (A.2) is proved and equivalently (A.l) is proved. APPENDIX B SHANNON ENTROPY OF MULTI-DIMENSIONAL GAUSSIAN VARIABLE k For a Gaussian random variable X e R with pdf function 1 ( 1 T -1 "N fJx) = â€”â€”â€”expI â€”-(x â€” p) Z (x â€”p) I, where p is the mean and Z is the (2ti) |Z| ^ 2 J covariance matrix, Shannonâ€™s information entropy is HS(X) = |log|Z|+|log2Tr + | (B.l) Proof: HS{X) = Ei-logf^x)] = E[ |log(27t) + |log|Z| + = Â±log|Z| + ^ log 2 7i + l-E[tr{XTlTXX)] = ilog|Z| + |log27T + l-E[tr{XXTtTX)] 1 it 1 r i (B-2) = V-\og\L\+\\og2n+i-tr[E{XXTY. â€™)] = |log|Z| + |log27r + X-tr[E{XXT) Zâ€œâ€˜] = ^log|Z| + |log27i + |tr[7] = ^log|Z| + |log27i + | where tr[ ] is the trace operator. 185 APPENDIX C RENYI ENTROPY OF MULTI-DIMENSIONAL GAUSSIAN VARIABLE â€¢ k For a Gaussian random variable X e R with pdf function 1 ( 1 T -l A fXx) = â€”â€”â€”exp â€” -(jc â€” p) X (xâ€” p) , where p is the mean and X is the (2rc) |X| V 2 ) covariance matrix, Renyiâ€™s information entropy is HRa(X) = iiogM + liogan + l^) (c.i) Proof: using (A.l), we have +oo +oo Jfx(x)adx = { G(x-p,X)a/2G(x-p,X)a/2ÃÂ¿c â€”00 â€”oo = J (2â€™')â€˜ll"a/2)ÃsT|E| â€”oo = (2h)*<1~â€œ/2>Qj)W1~o/2)G^0,Â£s) (C.2) (2jt)Â«1-a/2,gjâ€˜|J:|(|-<./2) ./t/2 1/2 (2tc) -X a =(l-a) 1(1 - a) = (2tt) a UX| (C.3) = .log|s|+|log2,+|(^ 1 +a0 1 Hr^X) = â€”log \fx(x)adx = = ^log jd-a) -- k\-a)' (2n) a |X| 186 APPENDIX D H-C ENTROPY OF MULTI-DIMENSIONAL GAUSSIAN VARIABLE k For a Gaussian random variable X e R with pdf function 1 fx(x) k/2, ,1/2 (2n) |2| 1 T â€”1 exp(â€”-(jc â€”p) Â£ (x â€” p) ], where p is the mean and I is the covariance matrix, Havrda-Charvatâ€™s information entropy is W = 1 â€” a 5(l-a) - 5 J(l-a) A (2tt) a |Â£| -1 v y (D.l) Proof: using (C.2), we have f+O0 w = 1 â€”a J fy(y)ady~ i V^o 1 â€” a ^(1-a) kl-a) "k (2tt) a |Â£| -1 V J (D.2) 187 REFERENCES [Ace92] A. Acero, Acoustical and Environmental Robustness in Automatic Speech RecÂ¬ ognition, Kluwer Academic Publishers, Boston, 1992. [Acz75] J. Aczel, Z. Daroczy, On Measures of Information and Their Characterizations, Academic Press, New York, 1975. [Ama98] S. Amari, â€œNatural Gradient Works Efficiently in Learning,â€ Neural ComputaÂ¬ tion, Vol.10, No.2, pp.251-176, February, 1998. [Att54] F. Attneave, â€œSome Informational Aspects of Visual Perception,â€ Psychological Review, Vol.61, pp. 183-193, 1954. [Bat94] R. Battiti, â€œUsing Mutual Information for Selecting Features in Supervised Neural Net Learning,â€ IEEE Transactions on Neural Networks, Vol.5, No.4, pp.537-550, July, 1994. [Bec89] S. Becker and G.E. Hinton, â€œSpatial Coherence as an Internal Teacher for a NeuÂ¬ ral Network,â€ Technical Report GRG-TR-89-7, Department of Computer SciÂ¬ ence, University of Toronto, Ontario, 1989. [Bec92] S. Becker and G.E. Hinton, â€œA Self-Organizing Neural Network That Discovers Surfaces in Random-dot Stereograms,â€ Nature (London), Vol.355, pp.161-163, 1992. [Bel95] A. J. Bell and T. J. Sejnowski, â€œAn Information-Maximization Approach to Blind Separation and Blind Deconvolution,â€ Neural Computation, Vol.7, No.6, pp. 1129-1159, November, 1995. [Car97] J.-F. Cardoso, â€œInfomax and Maximum Likelihood for Blind Source Separation,â€ IEEE Signal Processing Letters, Vol.4, No.4, pp.l 12-114, April, 1997. [Car98a] J. F. Cardoso, â€œMultidimensional Independent Component Analysis,â€ the ProÂ¬ ceedings of 1998 IEEE International Conference on Acoustic, Speech and Signal Processing, pp.l941-1944, Seattle, 1998. [Car98b] J.-F. Cardoso, â€œBlind Signal Separation: A Review,â€ Proceedings of IEEE, 1998, to appear. 188 189 [Cao96] X-R. Cao, R-W. Liu, â€œGeneral Approach to Blind Source Separation,â€ IEEE Transactions on Signal Processing, Vol.44, pp.562-571, March, 1996. [Cha87] D. Chandler, Introduction to Modem Statistical Mechanics, Oxford University Press, New York, 1987. [Cha97] C. Chatterjee, V. P. Roychowdhury, J. Ramos and M. D. Zoltowski, â€œSelf-OrgaÂ¬ nizing Algorithms for Generalized Eigen-decomposition,â€ IEEE Transactions on Neural Networks, Vol.8, No.6, ppl 518-1530, November, 1997. [Chr80] R. Christensen, Entropy MiniMax Sourcebook, Vol.3, Computer Implementation, First Edition, Entropy Limited, Lincoln, MA, 1980. [Chr81] R. Christensen, Entropy MiniMax Sourcebook, Vol.l, General Description, First Edition, Entropy Limited, Lincoln, MA, 1981. [Com94] P. Comon, â€œIndependent Component Analysis, A New Concept?â€ Signal ProÂ¬ cessing, Vol.36. pp.287-314, April, 1994, Special Issue on Higher-Order StatisÂ¬ tics. [Cor95] C. Cortes and V. Vapnik, â€œSupport-Vector Networks,â€ Machine Learning, Vol.20, No.3, pp.273-297, 1995. [Dec96] G Deco and D. Obradovic, An Information-Theoretic Approach to Neural ComÂ¬ puting, Springer, New York, 1996. [Dem77] A. P. Dempster, N. M. Laird and D. B. Rubin, â€œMaximum Likelihood from Incomplete Data via the EM Algorithm (with Discussion),â€ Journal of the Royal Statistical Society B, Vol.39, pp.1-38, 1977. [Dev85] L. Devroye and L. Gyorfi, Nonparametric Density Estimation in LI View, Wiley, New York, 1985. [deV92] B. deVries and J. C. Principe, â€œThe Gamma Modelâ€”A New Neural Model for Temporal Processing,â€ Neural Networks, Vol.5. pp.565-576, 1992. [Dia96] K. I. Diamantaras and S. Y. Kung, Principal Component Neural Networks, Theory and Applications, John Wiley & Sons, Inc, New York, 1996. [Dud73] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, New York, 1973. [Dud98] R. Duda, P. E. Hart and D. G. Stork, Pattern Classification and Scene Analysis, Preliminary Preprint Version, to be published by John Wiley & Sons, Inc. 190 [Fis97] J. W. Fisher, â€œNonlinear Extensions to the Minimum Average Correlation Energy Filter,â€ Ph.D dissertation, Department of Electrical and Computer Engineering, University of Florida, Gainesville, 1997. [Gal88] A. R. Gallant and H. White, â€œThere Exists a Neural Network That Does Not Make Avoidable Mistakes,â€ IEEE International Conference on Neural Network, Vol.l, pp.657-664, San Diego, 1988. [GÃ181] R E. Gill, W. Murray and M. H. Wright, Practical Optimization, Academic Press, New York, 1981. [Gol93] G. Golub and C. Van Loan, Matrix Computations, second edition, John Hopkins University Press, Baltimore, 1993. [Hak88] H. Haken, Information and Self-Organization: A Macroscopic Approach to ComÂ¬ plex Systems, Springer-Verlag, New York, 1988. [Har28] R. V. Hartley, â€œTransmission of information,â€ Bell System Technical Journal, Vol.7, pp.535-563, 1928. [Har34] G H. Hardy, J. E. Littlewood and G Polya, Inequalities, University Press, CamÂ¬ bridge, 1934. [Hav67] J.H. Havrda and F. Charvat, â€œQuantification Methods of Classification Processes: Concept of Structural a Entropy,â€ Kybematica, Vol.3, pp.30-35,1967. [Hay94] S. Haykin, Neural Networks, A Comprehensive Foundation, Macmillan PublishÂ¬ ing Company, New York, 1994. [Hay94a] S. Haykin, Blind Deconvolution, Prentice Hall, Englewood Cliffs, New Jersey, 1994. [Hay96] S. Haykin, Adaptive Filter Theory, Third Edition, Prentice Hall, Englewood Cliffs, NJ, 1996. [Hay98] S. Haykin, Neural Networks: A Comprehensive Foundation, Second Edition, Prentice Hall, Englewood Cliffs, NJ, 1998. [Heb49] D.O. Hebb, The Organization of Behavior: A Neuropsychological Theory, Wiley, New York:, 1949 [Hec87] R. Hecht-Nielsen, â€œKolmogorovâ€™s Mapping Neural Network Existence TheoÂ¬ rem,â€ 1st IEEE International Conference on Neural networks, Vol.3, pp. 11-14, San Diego, 1987. [Hes80] M. Hestenes, Conjugate Direction Methods in Optimization, Springer-Verlag, 191 New York, 1980. [Hon84] M. L. Honig and D. G. Messerschmitt, Adaptive Filters: Structures, Algorithms, and Applications, Kluwer Academic Publishers, Boston, 1984. [Hua90] X. D. Huang, Y. Ariki and M.A. Jack, Hidden Markov Models for Speech RecogÂ¬ nition, University Press, Edinburgh, 1990. [Jay57] E.T. Jaynes, â€œInformation Theory and Statistical Mechanics, I, II,â€ Physical Review Vol.106, pp.620-630, and Vol.108, pp. 171-190, 1957. [Jum86] G Jumarie, Subjectivity, Information, Systems: Introduction to a Theory of RelaÂ¬ tivistic Cybernetics, Gordon and Breach Science Publishers, New York, 1986. [Jum90] G. Jumarie, Relative Information: Theories and Applications, Springer-Verlag, New York, 1990. [Kap92] J. N. Kapur and H. K. Kesavan, Entropy Optimization Principles with ApplicaÂ¬ tions, Academic Press, Inc., New York, 1992. [Kap94] J.N. Kapur, Measures of Information and Their Applications, John Wiley & Sons, New York, 1994. [Kha92] H. K. Khalil, Nonlinear Systems, Macmillan, New York, 1992. [Kol94] J. E. Kolassa, Series Approximation Methods in Statistics, Springer-Verlag, New York, 1994 [Kub75] L. Kubat and J. Zeman (Eds.), Entropy and Information in Science and PhilosoÂ¬ phy, Elsevier Scientific Publishing Company, Amsterdam, 1975. [Kul68] S. Kullback, Information Theory and Statistics, Dover Publications, Inc., New York, 1968. [Kun94] S. Y. Kung, K. I. Diamantaras and J. S. Taur, â€œAdaptive Principal Component Extraction (APEX) and Applications,â€ IEEE Transactions on Signal Processing, Vol. 42, No. 5, pp. 1202-1217, May, 1994. [Lan88] K. J. Lang and G E. Hinton, â€œThe Development of the Time-Delay Neural NetÂ¬ work Architecture for Speech Recognition,â€ Technical Report CMU-CS-88-152, Camegie-Mellon University, Pittsburgh, PA, 1988. [Lin88] R. Linsker, â€œSelf-Organization in a Perceptual Network,â€ Computer, Vol.21, pp. 105-117, 1988. 192 [Lin89] R. Linsker, â€œAn Application of the Principle of Maximum Information PreservaÂ¬ tion to Linear Systems,â€ In Advances in Neural Information Processing Systems I (edited by D.S. Touretzky), pp. 186-194, Morgan Kaufmann, San Mateo, CA, 1989. [Mao95] J. Mao and A. K. Jain, â€œArtificial Neural Networks for Feature Extraction and Multivariate Data Projection,â€ IEEE Transactions on Neural Network, Vol.6, No.2, pp.296-317, March, 1995. [Mcl88] G J. McLachlan and K. E. Basford, Mixture Models: Inference and Applications to Clustering, Marcel Dekker, Inc., New York, 1988. [Mcl96] G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, John Wiley & Sons, Inc., New York, 1996. [Men70] J. M. Mendel and R. W. McLaren, â€œReinforcement-Learning Control and Pattern Recognition Systems,â€ in Adaptive, Learning, and Pattern Recognition Systems: Theory and Applications, Vol. 66, (edited by J.M.Mendel and K.S.Fu), pp.287- 318, Academic Press, New York, 1970. [Min69] M. L. Minsky and S. A. Papert, Perceptrons, MIT Press, Cambridge, MA, 1969. [Ngu95]. H. L. Nguyen and C. Jutten, â€œBlind Sources Separation for Convolutive MixÂ¬ tures,â€ Signal Processing, Vol.45, No.2, pp.209-229, August, 1995. [Nob88] B. Noble and J. W. Daniel, Applied Linear Algebra, Prentice-Hall, Englewood Cliffs, NJ, 1988. [Nyq24] H. Nyquist, â€œCertain Factors Affecting Telegraph Speed,â€ Bell System Technical Journal, Vol.3, pp.332-333, 1924. [Oja82] E. Oja, â€œA Simplified Neuron Model as a Principal Component Analyzer,â€ JourÂ¬ nal of Mathematical Biology, Vol. 15, pp.267-273, 1982. [Oja83] E. Oja, Subspace Methods of Pattern Recognition, John Wiley, New York, 1983. [Pap91] A. Papoulis, Probability, Random Variables, and Stochastic Processes, Third EdiÂ¬ tion, McGraw-Hill, Inc., New York, 1991. [Par62] E. Parzen, â€œOn the Estimation of a Probability Density Function and the Mode,â€ Ann. Math. Stat., Vol.33, pp. 1065-1076, 1962. [Par91] J. Park and I. W. Sandberg, â€œUniversal Approximation Using Radial-Basis-FuncÂ¬ tion Networks,â€ Neural Computation, Vol.3, pp246-257, 1991. [Pha96] D. T. Pham, â€œBlind Separation of Instantaneous Mixture of Sources via an Inde- 193 pendent Component Analysis,â€ IEEE Transactions on Signal Processing, Vol.44, No. 11, pp.2768-2779, November, 1996. [Plu88] M. D. Plumbley and F. Fallside, â€œAn Information-Theoretic Approach to UnsuÂ¬ pervised Connectionist Models,â€ in Proceedings of the 1988 Connectionist ModÂ¬ els Summer School (edited by D. Touretzky, G. Hinton and T. Sejnowski), pp.239-245, Morgan Kauimann, San Mateo, CA, 1988. [Pog90] T. Poggio and F. Girosi, â€œNetworks for Approximation and Learning,â€ ProceedÂ¬ ings of the IEEE, Vol.78, pp.1481-1497, 1990. [Pri93] J. C. Principe, B. deVries and P. Guedes de Oliveira, â€œThe Gamma Filters: A New Class of Adaptive HR Filters with Restricted Feedback,â€ IEEE Transactions on Signal Processing, Vol.41, No.2, pp.649-656, 1993. [Pri97a] J. C. Principe, D. Xu and C. Wang, â€œGeneralized Ojaâ€™s Rule for Linear DiscrimiÂ¬ nant Analysis with Fisher Criterion,â€ the proceedings of 1997 IEEE International Conference on Acoustic, Speech and Signal Processing, pp3401-3404, Munich, Germany, 1997. [Pri97b] J. C. Principe and D. Xu, â€œClassification with Linear networks Using an On-line Constrained LDA Algorithm,â€ Proceedings of the 1997 IEEE Workshop on NeuÂ¬ ral Networks for Signal Processing VII, pp.286-295, Amelia Island, FL, 1997. [Pri98] J. C. Principe, Q. Zhao and D. Xu, â€œA Novel ATR Classifier Exploiting Pose Information,â€ Proceedings of 1998 Image Understanding Workshop, Vol.2, pp.833-838, Monterey, California, 1998. [Rab93] L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993. [Ren60] A. Renyi, â€œSome Fundamental Questions of Information Theory,â€ in Selected Papers of Alfred Renyi, Vol. 2, pp.526-552, Akademiai Kiado, Budapest, 1976. [Ren61] A. Renyi, â€œOn Measures of Entropy and Information,â€ in Selected Papers of Alfred Renyi, Vol. 2. pp.565-580, Akademiai Kiado, Budapest, 1976. [Ros58] F. Rosenblatt, â€œThe Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,â€ Psychological Review, Vol.65, pp.386-408, 1958. [Ros62] R. Rosenblatt, Principles of Neurodynamics: Perceptron and Theory of Brain Mechanisms, Spartan Books, Washington DC, 1962. [Ru86a] D. E. Rumelhart and J. L. McClelland, eds., Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge, MA, 1986. 194 [Ru86b] D. E. Rumelhart, G E. Hinton and R. J. Williams, â€œLearning Representations of Back-Propagation Errors,â€ Nature (London), Vol.323, pp.533-536, 1986. [Ru86c] D. E. Rumelhart, G E. Hinton and R. J. Williams, â€œLearning Internal RepresentaÂ¬ tions by Error Propagation,â€ in Parallel Distributed Processing, Vol.l, Chapter 8, MIT Press, Cambridge, MA, 1986. [Sha48] C. E. Shannon, â€œA Mathematical Theory of Communication,â€ Bell System TechÂ¬ nical Journal, Vol.27, pp.379-423, pp.623-653, 1948. [Sha62] C. E. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, Urbana, 1962. [Sil86] B. W. Silverman, Density Estimation For Statistics and Data Analysis, Chapman and Hall, New York, 1986. [Tri71] M. Tribus and E.C. Mclrvine, â€œEnergy and Information,â€ Scientific American, Vol.225, September, 1971. [Ukr92] A. Ukrainec and S. Haykin, â€œEnhancement of Radar Images Using Mutual InforÂ¬ mation Based Unsupervised Neural Network,â€ Canadian Conference on ElectriÂ¬ cal and Computer Engineering, pp.MA6.9.1-MA6.9.4, Toronto, Canada, 1992. [Vap95] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995 [Ved97] Veda Incorporated, MSTAR data set, 1997. [Vio95] P. Viola, N. Schraudolph and T. Sejnowski, â€œEmpirical Entropy Manipulation for Real-World Problems,â€ Proceedings of Neural Information Processing System (NIPS 8) Conference, pp.851-857, Denver, Colorado, 1995. [Wai89] A. Waibel, T. Hanazawa, G Hinton, K. Shikano and K. J. Lang, â€œPhoneme RecÂ¬ ognition Using Time-Delay Neural Networks,â€ IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP-37, pp.328-339, 1989. [Wan96] C. Wang, H. Wu and J. Principe, â€œCorrelation Estimation Using Teacher Forcing Hebbian Learning and Its Application,â€ in Proceedings 1996 IEEE International Conference on Neural Networks, pp.282-287, Washington DC, June, 1996. [Weg72] E. J. Wegman, â€œNonparametric Probability Density Estimation: I. A Summary of Available Methods,â€ Technometrics, Vol.14, No.3, August, 1972. [Wer90] P. J. Werbos, â€œBackpropagation Through Time: What It Does and How to Do It,â€ Proceedings of the IEEE, Vol.78, pp. 1550-1560, 1990. 195 [Wid63] B. Widrow, A Statistical Theory of Adaptation, Pergamon Press, Oxford, 1963. [Wid85] B. Widrow, Adaptive Signal Processing, Prentice-Hall, Englewood Cliffs, New Jersey, 1985. [Wil62] S. S. Wilks, Mathematical Statistics, John Wiley & Sons, Inc, New York, 1962. [Wil89] R. J. Williams and D. Zipser, â€œA Learning Algorithm for Continually Running Fully Recurrent Neural Networks,â€ Neural Computation, Vol.l. pp.270-280, 1989. [WÃ190] R. J. Williams and J. Peng, â€œAn Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories,â€ Neural Computation, Vol.2, pp.490-501, 1990. [WuH98] H.-C. Wu, J. Principe and D. Xu, â€œExploring the Tempo-Frequency Micro- Structure of Speech for Blind Source Separation,â€ Proceedings of 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, Vol.2, pp.1145-1148, 1998. [XuD95] D. Xu, â€œEM Algorithm and Baum-Eagon Inequality, Some Generalization and Specification,â€ Technical Report, CNEL, Department of Electrical and Computer Engineering, University of Florida, Gainesville, November, 1995. [XuD96] D. Xu, C. Fancourt and C. Wang, â€œMulti-Channel HMM,â€ 1996 International Conference on Acoustic, Speech & Signal Processing, Vol. 2, pp.841-844, Atlanta, GA, 1996. [XuD98a] D. Xu, J. Fisher and J. C. Principe, â€œA Mutual Information Approach to Pose Estimation,â€ Algorithms for Synthetic Aperture Radar Imagery V, SPIE 98, Vol 3370, pp.218-229, Orlando, FL, 1998. [XuD98] D. Xu, J. C. Principe and H-C. Wu, â€œGeneralized Eigendecomposition with an On-Line Local Algorithmâ€, IEEE Signal Processing Letter, Vol.5, No. 11, pp.298- 301, November, 1998. [XuL97] L. Xu, C-C. Cheung, H. H. Yang and S. Amari, â€œIndependent Component AnalyÂ¬ sis by the Information-Theoretic Approach with Mixture of Densities,â€ proceedÂ¬ ings of 1997 International Conference on Neural Networks (ICNNâ€™97), ppl821- 1826, Houston, TX, 1997. [Yan97] H. H. Yang and S. I. Amari, â€œAdaptive On-Line Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information,â€ Neural Computation, Vol.9, No.7, pp.1457-1482, October, 1997. 196 [Yan98] H.H. Yang, S.I. Amari and A.Cichocki, â€œInformation-Theoretic Approach to BSS in Non-Linear Mixture,â€ Signal Processing, Vol.64, No.3, pp.291-300, February, 1998. [You87] P. Young, The Nature of Information, Praeger, New York, 1987. BIOGRAPHICAL SKETCH Dongxin Xu was bom on January 26, 1963, in Jiangsu China. He earned his bachelorâ€™s degree in electrical engineering from Xiâ€™an Jiaotong University, China, in 1984. In 1987, he received his Master of Science degree in computer science from the Institute of AutoÂ¬ mation, Chinese Academy of Sciences, Beijing, China. After that, he had been doing research on speech signal processing, speech recognition, pattern recognition, artificial intelligence and neural network in the National Laboratory of Pattern Recognition in China, for 7 years. Since 1995, he has been a Ph.D student in the Department of Electrical and Computer Engineering, University of Florida. He has worked in the Computational Neuro-Engineering Laboratory on various topics in signal processing. His main research interests are adaptive systems, speech coding, enhancement and recognition, image proÂ¬ cessing, digital communication, and statistical signal processing. 1 certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in sqppe and quality, asa dissertation for the degree of Doctor of Philosophy. JosfejZ;. Pripeipj^?hairman Piprfessnf of Electrical and Computer Engineering I certify that 1 have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, asa dissertation for the degree of Doctor of Philosophy. f Professor of Electrical and Computer Engineering 1 certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, asa dissertation for the degree of Doctor of Philosophy. Assistant Professor of Electrical and Computer Engineering I certify that I have read this study and that in m\ opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Assistant Professor of Electrical and Computer Engineering This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. May 1999 r L Winfred M. Phillips Dean. College of Engineerin; |