 Title Page 
 Acknowledgement 
 Table of Contents 
 Abstract 
 Introduction 
 Gamma kernels 
 Timefrequency distributions 
 Interpolation 
 Representation of locally stationary... 
 System indentification using poisson... 
 Conclusions 
 Appendix 
 Reference 
 Biographical sketch 
 Copyright 

Full Citation 
Material Information 

Title: 
Representation of locally stationary signals using lowpass moments 

Physical Description: 
vii, 154 leaves : ill. ; 29 cm. 

Language: 
English 

Creator: 
Celebi, Samel, 1966 

Publication Date: 
1995 
Subjects 

Subject: 
Signal processing  Mathematical models ( lcsh ) Electric filters  Mathematical models ( lcsh ) Electrical and Computer Engineering thesis, Ph. D ( lcsh ) Dissertations, Academic  Electrical and Computer Engineering  UF ( lcsh ) 

Genre: 
bibliography ( marcgt ) nonfiction ( marcgt ) 
Notes 

Thesis: 
Thesis (Ph. D.)University of Florida, 1995. 

Bibliography: 
Includes bibliographical references (leaves 149153). 

Statement of Responsibility: 
by Samel Celebi. 

General Note: 
Typescript. 

General Note: 
Vita. 
Record Information 

Bibliographic ID: 
UF00082352 

Volume ID: 
VID00001 

Source Institution: 
University of Florida 

Holding Location: 
University of Florida 

Rights Management: 
All rights reserved by the source institution and holding location. 

Resource Identifier: 
aleph  002070213 oclc  34462281 notis  AKQ8475 

Table of Contents 
Title Page
Page i
Acknowledgement
Page ii
Table of Contents
Page iii
Page iv
Page v
Abstract
Page vi
Page vii
Introduction
Page 1
Page 2
Page 3
Gamma kernels
Page 4
Page 5
Page 6
Page 7
Page 8
Page 9
Page 10
Page 11
Page 12
Page 13
Page 14
Page 15
Page 16
Page 17
Page 18
Page 19
Page 20
Page 21
Page 22
Page 23
Page 24
Page 25
Page 26
Page 27
Page 28
Page 29
Page 30
Page 31
Page 32
Page 33
Page 34
Page 35
Page 36
Page 37
Page 38
Timefrequency distributions
Page 39
Page 40
Page 41
Page 42
Page 43
Page 44
Page 45
Page 46
Page 47
Page 48
Page 49
Page 50
Page 51
Interpolation
Page 52
Page 53
Page 54
Page 55
Page 56
Page 57
Page 58
Page 59
Page 60
Page 61
Page 62
Page 63
Page 64
Page 65
Page 66
Page 67
Page 68
Representation of locally stationary signals using poisson moments
Page 69
Page 70
Page 71
Page 72
Page 73
Page 74
Page 75
Page 76
Page 77
Page 78
Page 79
Page 80
Page 81
Page 82
Page 83
Page 84
Page 85
Page 86
Page 87
Page 88
Page 89
Page 90
Page 91
Page 92
Page 93
Page 94
Page 95
Page 96
Page 97
Page 98
Page 99
Page 100
Page 101
Page 102
Page 103
Page 104
Page 105
Page 106
Page 107
Page 108
Page 109
Page 110
Page 111
Page 112
Page 113
Page 114
System indentification using poisson moments
Page 115
Page 116
Page 117
Page 118
Page 119
Page 120
Page 121
Page 122
Page 123
Page 124
Page 125
Page 126
Page 127
Page 128
Page 129
Page 130
Page 131
Page 132
Conclusions
Page 133
Page 134
Page 135
Appendix
Page 136
Page 137
Page 138
Page 139
Page 140
Page 141
Page 142
Page 143
Page 144
Page 145
Page 146
Page 147
Page 148
Reference
Page 149
Page 150
Page 151
Page 152
Page 153
Biographical sketch
Page 154
Page 155
Page 156
Copyright
Copyright

Full Text 
REPRESENTATION OF LOCALLY STATIONARY SIGNALS
USING LOWPASS MOMENTS
By
SAMEL CELEBI
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1995
ACKNOWLEDGEMENTS
Simply looking at this book, it is hard to guess how much of an effort goes into a dis
sertation study. This book is a record of only a small portion that has been skimmed out of
days and nights of hard work often accompanied by financial, mental and emotional dis
tress. In order to overcome these difficulties, one needs continuous support of his/her envi
ronment. I feel very fortunate in that sense.
First and foremost, I would like to thank my chairman Professor Jose C. Principe for
the encouragement he provided during the course of my studies. His availability to his stu
dents and the abundance of the ideas he offered are irreplaceable. Special thanks are due
to Dr. John G. Harris whose constructive comments and careful proofreading helped to
improve the quality of the thesis. I also thank my committee members Dr. Donald G.
Childers, Dr. Fred J. Taylor and Dr. Sencer Yerelan for the guidance they granted me.
During my first semester in the graduate program at the University of Florida, I was
sponsored by the Turkish Ministry of Education. For this I am deeply grateful. I would
like to take this opportunity to thank my family for their patience and the morale support
they gave me.
It has been a pleasure to be a part of the Computational Neuroengineering Laboratory.
I benefitted from it both socially and academically. I'd like to thank all my labmates for the
friendship they extended me.
Finally, I thank Paula very much for being with me during all this. It would be very
hard, if not impossible, without her support and love.
TABLE OF CONTENTS
vage
ACKNOWLEDGEMENTS ................................................................................. ii
ABSTRACT ......................................................................................................... vi
CHAPTERS
1 INTRODUCTION ..................................................................................... 1
2 GAMMA KERNELS ................................................................................. 4
Introduction ................................................................................ 4
Gamma Kernels .......................................................................... 4
Gamma Filter ....................................................................... 6
Gamma Filter as a Memory Model ........................................ 8
Laguerre Kernels ............................................................................ 9
Laguerre Polynomials ............................................... ........... 11
Laguerre Polynomials and the Confluent Hypergeometric Equation 12
Parametric Representation of Signals Using Gamma Kernels ........... 13
Statement of the Problem ........................................... .......... 16
Structure of the Gamma Space .................................... ......... 17
Optimum Value of the Weights .................................................. 19
Optimum Value of the Time Scale X ......................................... 22
Previous Work .................................................................. 22
Optimum X .............................................................................. 22
Estimation of optimum X using moment matching ............... 25
Example ............................................................................ 29
Estimation of optimum X using a matched filter ................... 32
Online Adaptation of the Weights and the Time Scale ................ 34
Conclusions ................................................................................ 37
iii
3 TIMEFREQUENCY DISTRIBUTIONS .................................................. 39
Introduction ........................................................................................ 39
Uncertainty Principle .......................................................................... 39
TimeFrequency Representations ..................................................... 42
Cohen's Class of Bivariate Distributions .................................. .. 45
Spectrogram and Its Relation to Wigner Distribution ................... 49
Conclusions ................................................................................. 50
4 INTERPOLATION ............................................................................... 52
Introduction ............................................................................... 52
Polynomial Interpolation .................................................................... 52
General Problem of Finite Interpolation ......................................... 53
Example ........................................................................................ 54
Systems Possessing the Interpolation Property ............................ 55
Hermite Interpolation .................................................................. 56
Hermite Interpolation of ARMA Signals ................................. .. 59
Issues in Hermite Interpolation ...................................... ......... 60
Region of convergence ........................................... .......... 60
Bandwidth of interpolation ..................................... ........... 62
Maximum error versus the bandwidth .................................. 65
Conclusions ................................................................................. 66
5 REPRESENTATION OF LOCALLY STATIONARY SIGNALS USING
POISSON MOMENTS ................................................ ............ 69
Introduction .................................................................................. 69
Gamma Filter and the Poisson Moments: Analysis Formula ............... 70
Reconstruction of a Signal from Its Poisson Moments: Synthesis
Formula .................................................................................. 71
Poisson Moments and the Magnitude Spectrum ............................... 73
Spectral Hermite Interpolation ......................................... ......... 75
Spectrogram with Exponential Decaying Window ......................... 76
Variance and Bias of the Specrogram Estimator ................................. 78
Choice of the Time Scale ................................................................. 83
Examples ......................................................... 87
Asymptotic Properties of the Poisson Moments ............................... 97
Impulse Distribution Expansion Versus Taylor's Series Expansion... 97
Choice of Time Scale and Measurement Time .............................. 98
Poisson Moment of a Signal and Its Tune Delayed Versions ........... 105
Signal Classification Using the Poisson Moments ............................. 107
Estimation of the Time Delay ....................................................... 109
Example .......................................................................................... 110
Conclusions .................................................................................... 112
6 SYSTEM IDENTIFICATION USING POISSON MOMENTS .......... 115
Introduction ........................................................................................ 115
Poisson Moments and Their Relation to Function Derivatives ......... 116
Method I ........................................................................................ 118
Method II ........................................................................................ 119
Example ........................................................................................ 119
Effect of. ......................................................................................... 120
PMF in the Identification of TimeVarying Parameter Systems ........... 129
Least Squares and Recursive Least Squares Estimates ..................... 130
Conclusions ........................................................................................ 130
7 CONCLUSIONS ...................................................................................... 133
APPENDICES
A SUMMARY OF THE PROPERTIES OF THE GAMMA AND THE
LAGUERRE KERNELS .................................................................... 136
B SUMMARY OF PARKS' AND MASNADI'S WORKS ........................ 139
C TIMEFREQUENCY SPREAD OF THE GAMMA KERNEL ............. 142
D A PROOF ON THE SCHUR COMPONENT ......................................... 144
E COMPUTATION OF THE SCHUR COMPONENT FOR
ARMA SYSTEMS ............................................................................... 147
REFERENCES ................................................................................................... 149
BIOGRAPHICAL SKETCH ............................................................................... 154
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
REPRESENTATION OF LOCALLY STATIONARY SIGNALS
USING LOWPASS MOMENTS
By
SAMEL CELEBI
December 1995
Chairman : Dr. Jose C. Principe
Major Department: Electrical Engineering
The gamma filter is a generalized feedforward structure which can be implemented as
a cascade of identical first order lowpass filters. Although functionally it is an infinite
impulse response (IIR) filter, topologically it mimics a finite impulse response (FIR) filter.
As a result of this blend, it inherits the computational power of the former and the easy
stability analysis of the latter. Being a single input multiple output structure, the gamma
filter is characterized by its order and the time scale which linearly warps the time axis.
Furthermore, when orthogonalized, the gamma filter transforms into the wellknown
Laguerre filter.
The gamma kernels correspond to the impulse responses of the gamma filter outputs.
This family of decaying exponentials form a complete set in the Hilbert space. The first
problem attacked in this study is the approximation of a given signal as a weighted sum of
gamma kernels such that the mean square error is minimized. Conditions for the optimum
weights and time scale are derived and they are decoupled from each other. A vector space
interpretation of the role of the time scale is also provided. According to this interpretation
the time scale parameter has the effect of changing the relative angle between the desired
signal and the gamma projection space without altering the structure of the gamma space.
Two numerical methods to estimate the time scale are included.
The gamma filter produces at its taps, the Poisson moments of the input signal. Pois
son moments correspond to the Taylor's series coefficients of the recent magnitude spec
trum of the input signal at zero frequency. Localized in time and frequency, Poisson
moments offer a computationally inexpensive, alternate timefrequency representation for
locally stationary signals, such as speech. In the analog domain the moments can be phys
ically measured and fed into a nonlinear mapper, such as an artificial neural network for
tasks such as classification, prediction and identification. For a faithful representation of
the spectrum, it is sufficient to observe the moments at the information rate rather than the
usually higher Nyquist rate. This convenience removes a big deal of burden from the
shoulders of the processor following the gamma filter.
Very much like the Laplace transform, Poisson moments have the virtue of transform
ing linear differential equations into algebraic ones. Based on this property, one can lin
early solve for the unknown parameters of an autoregressive moving average (ARMA)
system simply by observing the Poisson moments of the system input and the output at
finite number of time instants. Although the method doesn't put any restrictions on the
choice of the filter time scale and the observation times, in practice, these parameters play
a significant role in determining the accuracy of the results. If the parameters are not
selected properly, the results tend to be illconditioned. This study provides some insight
into the choice of the time scale.
CHAPTER 1
INTRODUCTION
The purpose of this study is to find alternate signal representation schemes for locally
stationary temporal signals, such as speech. These alternate representations are expected
to express the signal in a low dimensional space with sufficient fidelity and little computa
tional cost. Analog implementation and resistance to noise are also among the desirable
features of the scheme.
Having restricted the scope of our search to linear structures, we realized that the
desired structure had to incorporate short term memory in order to be able to process non
stationary signals. In that respect, Finite Impulse Response (FIR) filters are not a good
choice due to the coupling between the filter size and the depth of the past information
they can provide. The other alternative, the Infinite Impulse Response (HR) filters, on the
other hand, suffered from instability problems.
Given these constraints, we concentrated our studies on the gamma filter and the fam
ily of exponential kernels that correspond to its impulse response: the gamma kernels.
Developed by Principe and deVries [1]..[3], the gamma filter is a generalized feedforward
filter. While functionally being an IIR, topologically it mimics an FIR structure. As an
outcome of this blend, gamma filter inherits the computational power of the former and
mathematical tractability and the stability of the latter. Furthermore, compared to a FIR fil
ter of the same order, it has only one more degree of freedom.
In the early stages of our research we observed that the gamma filter was more effi
cient1 than the conventional tapped delay line when applied to classical signal processing
1. In terms of the number of parameters required for a given mean square error.
tasks such as prediction and system identification. Encouraged by these results, we have
decided to explore this structure in more detail from a representation point of view.
Another finding that motivated us to analyze the gamma filter is the fact that structures
similar to the gamma filter are pervasive in various environments. Depending on the
medium, the gamma filter may take the form of a cascade of RC circuits, or a leaky trans
mission line or even a network of synaptic connections [4]. There is also some evidence
that some of the preprocessing of the timevarying biological signals are done by struc
tures similar to the gamma filter [5].
The core of this study is composed of analyzing the gamma filter from three perspec
tives: parametric and nonparametric signal representation and system identification.
Because of the distinct character of these individual topics, rather than organizing the the
sis in the conventional way, where there is a separate results section, we prefer to present
the results within the body of each chapter followed by some concluding remarks.
The flow of the thesis can be summarized as follows: In Chapter 2 we introduce the
gamma kernels and attack the problem of representing signals as a weighted sum of these
kernels which are shown to be complete in the Hilbert space. The solution of this paramet
ric representation problem turns out be a generalization of the WienerHopf normal equa
tion. This chapter also provides a vector space framework interpretation of the problem
clearly demonstrating the advantage of using the gamma structure over that of the FIR fil
ter. In addition to the derivation of the optimality conditions, we link the gamma kernels to
the Kummerfunctions which encompasses several wellknown functions such as Bessel,
Gaussian, Hermite, Laguerre and exponential.
In Chapter 5, we pick a nonparametric representation point of view and suggest to use
the gamma filter as an alternate timefrequency representation scheme. The gamma filter
generates at its taps the time moments of the input signal. These are called the Poisson
moments and they can be regarded as signatures of the input. In this chapter we show
how the Poisson moments carry information about the spectral content of the signal. This
information is localized both in time and frequency and it is readily available at the out
puts of the gamma filter as a measurable quantity. Hence, the moments offer a computa
tionally inexpensive timefrequency representation.
Chapter 3 is tailored to provide background information on the timefrequency distri
butions, a topic that is frequently visited in Chapter 5. In doing so, we pay particular atten
tion to the spectrogram method. By the same token, Chapter 4 provides background
information on the interpolation theory with special attention to the Hermite interpolation.
This is done in an attempt to prepare the reader for the nonparametric representation prob
lem studied in Chapter 5.
Chapter 6 ponders the problem of identifying autoregressive moving average
(ARMA) systems using the gamma filter. ARMA systems are described by linear differen
tial equations. Very much like the Laplace transform, the Poisson moments transform the
linear differential equations into algebraic ones. Using this, we will show how to algebra
ically determine the unknown parameters of a linear system by observing a finite number
of Poisson moments of the system input and the output.
Finally, Chapter 7 underscores the highlights of the study and cites possible future
research issues.
CHAPTER 2
GAMMA KERNELS
Introduction
The core of this study is composed of the utilization of the gamma kernels and their
orthogonalized counterpart, the Laguerre kernels, in parametric and nonparametric repre
sentation of signals and in the identification of systems. This chapter begins with introduc
ing these two family of kernels and elaborates on their properties. Although the gamma
and the Laguerre kernels span the same space, individually they help to explain different
aspects of the problem. Therefore, we will introduce both families and link them to each
other when necessary.
Gamma Kernels
In the course of this study, the term kernel refers to the implementation of an opera
tor. The gamma kernel, in particular, refers to the implementation of a generalized delay
operator. The gamma kernel was devised to substitute for unit delay operator which per
formed poorly in the processing of signals with redundancies. We define the gamma ker
nels as
,tI At ;.>O,t> O, I Mwl
o ( t) e
) (n 1)! = 1, 2, 3, ... (1)
This family of kernels constitutes a subset of eigenfunctions of the autoregressive moving
average (ARMA) systems. Therefore, it is a good candidate for modeling signals that
accept the linear model. The kernels are complete in the Hilbert space (L2 space) which
spans all the finite energy signals. This implies that any finite energy signal can be approx
imated arbitrarily closely using a finite number of gamma kernels. The parameter X con
trols the time scale (or the memory depth) of the kernels by changing the width of the
kernels. Increasing X has the effect of linearly compressing the kernels, while decreasing X
has the opposite effect. Figure 1 depicts the kernels in the time domain for 1=l. For rea
sons that will become clearer in later sections that deal with the memory structures, the
area under each kernel is normalized to unity i.e,
gn (,t) dt = 1 (2)
0
The Laplace transform of the kernels is given as
Gn(s) =( )= G (s)
(3)
where G (s) L { g (, t) } =
s+
Gamma Filter
Equation (3) suggests a practical implementation for the kernels, in the form of a cas
cade of identical first order lowpass filters. This structure shown in Figure 2 is called the
gamma Filter [1] .The impulse response of the filter from the input to the kh tap is given
by gk(,t). The Kt order gamma filter transfer function is characterized by a pole at s=X
with multiplicity K. The time scale X is responsible for setting the position of this pole.
The position of the zeros however, are determined by the weights wl,w2,...,WK and the
time scale X.
6
1
gl(Xt)
0.8
0. 6 g2(%t)
0.4 / g3(t)
g4(0t)
0.2
t
2 4 6 8 10
Figure 1 Continuous time gamma kernels for =l
Figure 2 Gamma filter.
Gamma Filter
x(t) [ r7 7wI
1st stage 2nd stage K1h stage
' DiscreteTime Continuous Time
Gamma kernels satisfy the following differential equation
d
agk+ 1( t) = [gk t) gk+ 1 (( t) ] (4)
If the time derivative is replaced by the backward time differences, the filter can also be
realized in the discrete time domain. For this case, the kernels satisfy the following differ
ence equation
gk+ (n) = Xg (n 1) + (lX)gk+ (n 1) (5)
The discrete time kernels and their corresponding ztransform are given by
gk (n)= n(lX nSk
(6)
G4(z) = z(1 )
Notice that for the special case of =1, the discrete time gamma filter becomes afinite
impulse response (FIR) filter. One can think of the gamma filter as a generalizedfeedfor
ward structure where the unit delay operator z1 is replaced with a generalized delay struc
ture G(z) that accommodates a local feedback. Due to the inclusion of this feedback, the
gamma filter becomes a recursive structure. Recursive structures are computationally
more efficient than their feedforward counterparts; however, they may suffer from instabil
ity problems. That is why FIR filters are almost exclusively used in adaptive signal pro
cessing. The instability problem can be taken care of by restricting the feedback
connections to be local. In the gamma filter, stability is guaranteed as long as one of the
identical stages is stable, which prescribes X>0 for the continuous time and 0
discrete time case.
The Gamma Filter as a Memory Model
We start this section by quoting two definitions, one for the memory depth and the
other for the memory resolution, proposed by Principe [1][2][6] in an attempt to quantify
and compare different memory structures.
Definition: Memory depth D is the temporal mean value of the impulse response g(t) of the
filter used for storing temporal information. Considering the impulse response as a proba
bility density, this corresponds to the first moment of the impulse response given by
St ( d = dG(s) continuous time
= tg(t dt ds
o s=0 (7)
dG(z)
D= ng (n) = z dz = discrete time
n=O
Definition: Memory resolution R is the number of degrees of freedom (i.e. memory state
variables) per unit time.
The conventional digital signal processing structures are built around the tapped delay
line (G(z)=z'1) which is an FIR filter. The memory depth of an FIR filter is equal to the
number of filter parameters, that is the number of its taps. This strict coupling of the mem
ory depth to the number of filter parameters leads to poor modeling of lowpass frequency
signals which usually have a long sustenance in time, but few degrees of freedom. Further
more, the high memory resolution of the FIR filters makes these filters susceptible to
noise on lowpass signals. Contrary to FIRs, memory depth of an IIR filter can be made
arbitrarily large at the expense of resolution. These filters, however, suffer from instability
problems. The gamma filter in the meantime, incorporates a cascade of generalized delay
units in the form of an overall feedforward structure. Restricting the feedbacks to be local
solves the instability problem while preserving the decoupling of the memory depth from
the number of system parameters. For the case of a Kt order gamma filter, the memory
depth equals
dG (z) K (8)
Dgamma = Z dz Iz =
The resolution of the gamma filter is given by
K
R g a K (9)
gamm a (KI/)
Note that the gamma structure obeys the relation
order =depth x resolution
K= DxR
This expression tells us that the gamma filter is able to trade the memory depth for the res
olution by varying its time scale parameter X. The memory depth and resolution for the
FIR structures, on the other hand, are fixed at DFIR=K and RFIR=.
Laguerre Kernels
The gamma kernels lead to the gamma filter which can be thought of as a link between
FIR and IR structures. They are also useful tools in understanding memory models.
Although they have other appealing properties such as being produced recursively from
each other using convolution, they still lack one important feature: orthogonality.1 These
kernels are linearly independent of each other, but not orthogonal. They can, however, be
orthogonalized using the GramSchmidt procedure. When the GramSchmidt procedure is
applied to the gamma kernels, one obtains the Laguerre kernels that have the form
1. Another advantage of the gamma kernels over the Laguerre kernels is that we can expand them to
the continuous index case where k is a real number
gk(Mt) = e k21
r (k)
L (t) = A ? ( 2 
v=o
n = 0, 1,2,...
(11)
These kernels are complete in the L2 space and they form an orthonormal set such that
Ln(t) Lm(t)dt = Snm
1
8 =(
nm 0
n=m
n m
(12)
The Laplace domain representation of the Laguerre kernels is
Ln () = (13)
(S + X)n+ I
Similar to that of the gamma filter, (13) suggests a filter structure in the form of a first
order lowpass filter followed by a cascade of identical first order allpass filters. Again, the
overall structure is characterized with a single multiple pole at s=X.
Laguerre kernels may be linked to the gamma kernels through the following relation
(14)
Ln(t) = G 1 (2) (i1) g(Xt)
i= 1
One can also realize the Laguerre kernels in the discrete domain. The corresponding z
domain representation will then be
Ln) = ( = 0122 1 (15)z
L (z) n=0,1,2,... (15)
t, 0
Before proceeding to the problem of signal representation using the Laguerre kernels,
we would like to introduce an orthogonal family of polynomials that shed some light into
the origins of the Laguerre kernels.
Laguerre Polynomials
A family of polynomials that are closely related to the Laguerre kernels is the
Laguerre polynomials. Laguerre polynomials have been studied extensively in the litera
ture [7]. These polynomials are orthogonal on the interval (0,oo) with respect to the weight
function et. In other words,
n(t)Lm (t)etdt = nm (16)
0
where Ln (t) is a polynomial. Laguerre polynomials satisfy the equation1
e' dn
S(t) (xlex) n = 0, 1, 2,... (17)
n! dt
which can be expanded as
Ln (t) = I i (18)
i=0
Like any other orthogonal polynomial system, Laguerre polynomials can be computed
using the triple recursion relation. The recursive relation (also called the ChristoffelDar
boux formula) [7] is
1. A more general definition for the Laguerre polynomials also exists where the polynomials
La (t) are orthonormal with respect to the weight tee" and they satisfy the following equation [7]
( Ca dt", a+,,_
l (rn (xd er
n! d?
(2n + 1 t) n (t) (19)
Ln + (t) = l (t) t) (19)
n+1 n+1
Comparing (11) and (18), Laguerre polynomials can be related to the Laguerre kernels
through the expression
Ln (X, t) = F2Ln (2Xt) e (20)
Laguerre Polynomials and the Confluent Hypergeometric Equation
The secondorder ordinary differential equation
dZ dx (t)
tx (t) + (Y t) dt x (t) = 0 (21)
dt? dt
is called the confluent hypergeometric equation [8][9][10]. The parameters a and y can
assume any real or complex value except for '0,,1,2,.... The variable t may be complex.
One of the solutions of (21) is called the Kummerfunction (or the confluent hypergeomet
ric function) and it is denoted by 1FI(a,y ;t). This function can be computed
by evaluating the infinite series
at a(a+1) t
F, (, ; t) = 1++ + +... (22)
yl1! y(y+ 1) 2!
Many well known transcendental functions, including the Laguerre polynomials, hap
pen to be the special cases of Kummer's function with different selections of the parame
ters a and y. Table 1 lists some of these functions. For an extensive list see [9]. Referring
to Table 1 it can be seen that the Laguerre polynomials satisfy the differential equation
SdLn (t)
t n (t) + (1 t) + n (t) = 0 n = 0, 1, 2,... (23)
d? dt
whose solution is
n (t) = 1F, (n, 1;t) (24)
Parametric Representation of Signals Using the Gamma Kernels
Decomposition of signals in terms of the linear sums of basis functions is a powerful
tool that allows the representation of a signal as a discrete table of numbers, in lieu of its
original form. The new representation is essentially the projection of the original signal
onto the hyperplane spanned by the basis functions Oi(t).
(t) = wii (t) (25)
i=
Choice of these basis functions may vary depending on the application. The basis
functions can be fixed as well as adaptive. Examples of popular basis functions are com
plex exponentials, which lead to Fourier series expansion Sine functions which are used
in representing bandlimited signals in terms of their time samples [11] and Gabor func
tions which have the best timefrequency resolution properties and consequently allow the
maximum information transfer rate [12]. Among these, complex exponentials secure
themselves a special place by being the eigenfunctions of linear systems. Even though
complex exponentials prove to be an invaluable tool in analyzing linear systems, they do
not seem to be very efficient (in terms of numbers of bases required) in representing tran
Table 1 Some wellknown transcendental functions derived from Kummer's function
Function t a y Relation to IFi(a,y ;t)
v
Bessel 2jt v+1/2 2v+l r(+v)e() J (t)
t 
ModifiedBessel 2t v+1/2 2v+l r(l+v) et () I(t)
Coulomb Wave 2jt L+ljTl 2L+2 eitFL (i, t) t 1 CL (Iq)
Incomplete Gamma t a a+l ataGamma (a, t)
Exponential t a a et
Trigonometric 2jt 1 2 sin (t)
t
Hyperbolic 2t 1 2 e sinh (t)
t
n! 1 n
Hermite t/2 n 1/2 (2n) H2 (t)
Error Function t2 1/2 3/2 erf(t)
Exp. Integral t 1 1 eEi (t)
Laguerre t n 1 (l)"n!L, (t)
sient signals and signals with low degrees of freedom and long sustenance in time. This is
mainly due to the fixed projection manifold these basis functions define.
A longneglected family of linear system eigenfunctions is the family of exponentials
given by
i (t) = ftre u (t) 0 r < i = 0, 1,... X > 0 (26)
Closely related to the complex exponentials, this family proves to be an efficient tool
in representing signals with redundancies [13]. In this study, we suggest to use the gamma
kernels in linearly expanding a given signal. The gamma kernels can be considered to be a
subset of the exponential family where all the time scales are limited to be the same, i.e.
ki=X. Laguerre kernels would be an equally good candidate, since these two kernel fami
lies span the same space. Both kernels happen to be eigenfunctions of autoregressive mov
ing average (ARMA) systems. Therefore, they are inherently able to capture the dynamics
of signals that accept the linear model.
When the number of bases used in the decomposition is finite, the representation prob
lem transforms into an approximation problem. In order to assess the goodness of the
approximation (or equivalently the closeness of representation) one needs a measure that
is called the norm. Two of the popular norms are L2 and L" which are the special cases
(p=2,o) of LP norm that is defined as
L/p
L f(t)} = ifP(t) dt (27)
0
The L2 norm which will be used throughout this study finds application in system identifi
cation and prediction, while L" has been shown to be of interest for control synthesis
problems.The L2 norm can be interpreted as the Euclidean distance. In optimization it has
the advantage of yielding a set of linear equations in terms of the basis weights. The
approximations obtained using L2 norm are called the least squares or mean square error
approximations.
Statement of the Problem
We would like to approximate a signal d(t) as a weighted sum of gamma (Laguerre) ker
nels
K
d(t) = y (t) = wngn (, t) = WTG
1n=1 (28)
W= [w1, W2, ..., Wn] weight vector
G = [g (, t), ..., gK( t)] b
basis vector
such that the mean square error (MSE) J
00
J= =e2(t)dt (29)
0
e (t) = d (t) y(t)
is minimized. In other words, we are trying to approximate the signal d(t) by its projection
on the Kdimensional subspace of gamma (Laguerre) kernels. This way we will obtain a
parametric representation of d(t) in terms of the weight vector W and the time scale X.
The gamma and Laguerre kernels each constitute a complete set in L2. That means by
selecting the approximation order K sufficiently high, the mean square error J can be made
arbitrarily small. There are two questions that need to be answered in this optimization
problem:
i) What should be the optimum value of the weights?
ii) What should be the optimum value of the time scale X such that the MSE is minimized?
The answer to the first question has been given by Wiener, Kolmogorov and Krein
[14][15] early this century. The solution is embraced in the WienerHopfequations (nor
mal equations). Here we will be attacking a generalization of this equation due to the extra
time scale parameter X involved in the gamma bases. As a result of this parameter
involved, the problem is called a parametric least squares approximation. A thorough dis
cussion on parametric least squares problem can be found in [16]. Before proceeding fur
ther with the calculation of the optimum weight and the time scale, we would like to
inspect the structure of the space spanned by the set of the gamma bases {gn(,,t) : n=l,2,
...}, to get insight into the roles of the weights and the time scale in the approximation.
Structure of the Gamma Space
Here we will inspect the gamma bases in a vector space framework. Notice that
despite the fact that the gamma bases are linearly independent, they are not orthogonal. If
we regard each basis as a vector in L2 space, then the cosine of the angle between two vec
tors will be equal to their correlation coefficients. The correlation between the bases gn(t)
and gm(t) is given by [6]
@0
= gn (, t) gm (, t) dt
o (30)
00 n+m+m2
= e2dt
(n 1)! (m 1)!
0
making the substitution X=j3/2 and n+m2=k1 we can rewrite this as
(k1)! e t (31)
= (n 1)! (m 1)! 2n+m1 (k 1) ()
0
The integrand is recognized as gk(P,t) the area under which is unity due to normalization
given in (2). Therefore,
1 n+m2
= 2n+n n 2 (32)
With that in mind, the cross correlation coefficient is given as
(n + m 2) !
Pnm 2 (n m2 )! (33)
2 2 2 (nl)!(m1)!
hence, we obtain the angle between gn(t) and gm(t) as
(n+m2)! ~ !1(4
Zgngm =Cos'[ + m 2) ] (34)
m l2(n )!( m 1)!
Equation (34) tells us that the angles among the gamma basis vectors do not change as
a function of X. This is an important property of the space constructed using the gamma
bases. X conformally maps the basis vectors onto another manifold. Conformal mapping is
the class of mapping that preserves the angle relations. Figure 3 demonstrates the angular
relation among the first three bases. In spite of the fact that, the space itself is parametric in
X, the infrastructure of the space does not change, because X leaves the relation among the
bases intact.
We can view X as an extra degree of freedom which has the effect of changing the rel
ative angle of the manifold with respect to the desired signal d(t). Lengths of the bases
actually change as a function of X. However, in a decomposition problem, scaling of the
basis functions is insignificant since a proper choice of the weights will counterbalance it.
Figure 3 Angular relation among the bases
Figure 4 The effect of changing X on the gamma space
Optimum Value of the Weights
For a fixed X, the best choice of weights can be found from the condition
J// W = 0 which yields the normal equations [14]
R., (.) Wpt., = P, ()
012=450
023=300
013=65.90
(35)
the P vector in (35) holds the crosscorrelation values between the bases and d(t). It has the
form
P,1 = [...]T (36)
In (35) R(k) is the Gram matrix of the bases. Its (nm)th element is which
was already derived in (32). The result is duplicated here for convenience.
= 2 n (n1) (37)
Using (32), the Gram matrix R(X) can be written as
R (X) = XRc (38)
where Rc is a constant matrix whose (n,m)th element is
1 (n+m2 (39)
2n+m1 n1 )9
For convenience, we will write gn(X,t) as gn(t) from this point on unless there is a possibil
ity of confusion. With that in mind, the optimum weights are given by
WPt' = R^ P () for gamma kernels (40)
If the approximation problem defined in (28) is carried out using the Laguerre kernels
instead of the gamma kernels the autocorrelation matrix in (35) becomes an identity
matrix and in return the optimum weights have a much simpler form
W'pt = p (k) for Laguerre kernels
(41)
where P () = [...]T
For a given %, the mean squared error J is quadratic in the weights. Therefore WOpt has a
single solution and that solution depends on X. That's why this is called a parametric least
squares problem.
Let's evaluate the mean squared error when the weights are chosen to be optimum.
This is called the Schur complement.
Jsch () = J () = WoP, =
Jsch () = = Ed 2W tP + W o;tW(42)
RWopt = P
Jch () = Ed WT tP
In (42) Ed denotes the energy of the desired signal d(t). One gets a simpler form for the
Schur complement if Laguerre kernels are used as the basis. Combining (41) and (42)
K1
Jsch () = Ed Wpt = Ed opt, for Laguerre kernels (43)
n=O
The value of the Schur complement Jsch(X) is the same for the gamma kernels and the
Laguerre kernels, however, the weights of the decomposition may be different. In the fol
lowing section we will attempt to minimize Jsch(X) and decouple the optimum value of
the time scale X from that of the optimum weights.
Optimum Value of Time Scale X
Previous work
In this section we will address the problem of finding the optimum time scale for sig
nal approximation using the gamma and the Laguerre kernels and derive the optimality
condition that is independent from the optimum weight condition. Determination of the
optimum time scale is not a new issue. The literature contains numerous publications on
the approximation of signals using the Laguerre kernels [17].. [24], all in the quest of opti
mum time scale. Although, no exact solution has been found yet, the works of Parks [22]
and Masnadi [24] who managed to solve the problem for special cases are worth mention
ing here. In his work, Parks gives an estimate of optimum X for a special class of continu
ous time signals which are defined by two moments. Later, Fu [23] extended this study to
the discrete time signals. Meanwhile, Masnadi showed that when the order of approxima
tion was chosen larger than a lower bound and when dealing with signals with no repeti
tive poles, optimum X was the only real root of a polynomial. He gave the coefficients of
this polynomial in terms of the poles of the signal that is to be approximated. Both Parks'
and Masnadi's works are summarized in Appendix B. In this study we will derive the opti
mality conditions for the time scale and devise a moment based method to estimate the
optimum value satisfying that condition.
Optimum X
Let's look at the value of the mean squared error for a given X when the optimum
weights are chosen. As explained before, this is called the Schur complement and it
occurs when e(t), the error of approximation is orthogonal to all the bases used (i.e. nor
mal equation is satisfied). Before proceeding to the calculation of optimum X's from the
condition alJ al = o let's take a look at the dependence of Jch on X. Figure 5 depicts how
the dependence of Jsch on X looks like when d(t) is chosen as the impulse response of an
arbitrary third order elliptical filter. This figure is obtained by evaluating the optimum
weights in (40) for different values of X and approximation order K and calculating the
corresponding mean square error numerically (see Appendix E).The structure of this fig
ure is typical for both continuous time and discrete time cases. As stated before, Jsch
curves are identical for the choice of gamma and Laguerre kernels since these two kernel
families span the same space.
Figure 5 Typical structure of Jc curves
As can be seen from Figure 5, Jsch curves coincide with each other for Kth order and
K+1st order approximations at the points where al,Ch/a = o is satisfied. That is, two con
secutive curves touch each other at the local minima or local maxima. A proof of this
property can be found in Appendix D. For a discrete time version of this proof see Silva
[25]. Since Jsch has the same value for orders K and K+l at the local minima, this implies
that the K+Ist kernel does not contribute to the approximation of d(t) at these points In
other words the nonzero component of the K+lst kernel that is orthogonal to the first K
log(Jsch)
kernels is also orthogonal to d(t). For the Laguerre kernels this corresponds to the K+Ist
Laguerre kernel LK(X,t) itself, since this is orthogonal to the Lo(,,t)...LK.I(X,t) by defini
tion. We can write the condition for the optimum X as
=ch 0 0 = wK = 0 (44)
Unlike the Laguerre kernel, the K+1st gamma kernel is not orthogonal to its predeces
sors. Let's denote the component of gK+i(t) that is orthogonal to gl(X,t),..., gK(%,t) with
g, + (X, t). It can be obtained by subtracting projections of the first K kernels from the
K+Ist kernel. Actually, this new kernel turns out be a scaled version of LK(t). For the
gamma kernels, the general form of the ih orthogonal component is given by
i
gi (, t) = aig, (, t) = ATG = 1, 2,...
v=1
(* ^ 1 (45)
where a, = V }( ) ()
Af = [ali...aii]
The optimality condition of (44) can be rewritten for the gamma kernels as
ajsch 0) T
= 0 = AK+ IP () = 0 (46)
The crosscorrelation vector P was already defined in (36), this time its dimension is
K+1. There are no weight terms in (44) and (46) which means that the optimum value of X
can be computed separately from that of the weights. Given d(t) one can first determine
the best orientation of the manifold (i.e. X) of order K to represent d(t) and then on this
manifold can compute the weights of the bases using the normal equation (40) or (41)
depending on the kernel used.
Challenges in computing the optimum time scale
Equations (44) and (46) state the necessary conditions for optimum time scale. How
ever, these are only the necessary conditions, not the sufficient conditions. Due to the non
linear dependence on X, Jsch function has a nonconvex shape. For an order K
approximation Jsch has K minima, only one being the global minimum. The optimality
conditions specified in (44) and (46) hold for all the minima and therefore does not distin
guish the global minimum from the local ones. In a way this is expected since this condi
tion was obtained by setting the gradient ofJsch to zero. Gradient techniques do not suffice
to find the global extremum by themselves in nonconvex functions. They bear only local
information about the function characteristics and therefore they need to be backed up by
global information about the function itself.
Another challenge in computing the optimum time scale is the involvement of the
crosscorrelation operation. One possibility of overcoming this is to substitute the expecta
tion operator by a time operator, and under the assumption of ergodicity, compute the opti
mal weight. Unfortunately for the case of (46), there are two problems with that approach:
first, the gamma bases are infinite in time, which means that the correlation computed with
a finite window may produce severe truncation. The second problem is related to the fact
that X is an implicit variable in the correlation. Therefore the conventional procedure to
estimate the normal equation does not hold for our case. In the next section we derive a
numerical estimation method which overcomes these difficulties by factoring X out of the
correlation operation and in return transforming the condition given in (46) into a polyno
mial equation in terms of X.
Estimation of optimum X using moment matching
In this section we will address the problem of estimating the value of X that minimizes
the mean square error J. As stated before, the optimum X has to satisfy the normal equa
tion in (46) (or (44) for Laguerre kernels) while the optimum weights have to satisfy the
normal equation in (40) (or (41)). We propose to use the moment matching method which
basically attempts to estimate X by equating a finite number of time moments of y(t) to
those of d(t), i.e. we want to make
mi{y(t) } = m{d(t)} I = O...K (47)
Using (47) we propose to create K+1 linearly independent equations which will enable us
to solve for K weights and k The operator mi{ } reads as the ith time moment of. Let's first
derive the ith time moment of the nth gamma basis, which is defined as
mi {(gn )} g, (, t)dt (48)
Matching the moments in time is the dual of Taylor's series approximation around f=0
in the frequency domain. In the time domain if the first K moments are matched, this will
correspond to the Kh order Taylor's series approximation performed in the frequency
domain.
Inserting the definition of gn(,t) into (48) yields
ntn+i1 t = (n+i1)! 1
m {gn (, t) } = dt (49)
(n1)! (n1) (i
0
From this we can calculate the ith time moment of the estimate y(t)
mi{y(t)} =mi {GT,,.+,W,,,i,}
1
Ti ' i l.+ T
(50)
The weight vector W and basis vector G were already defined in (28). Now their dimen
sion is K+1. Here qi is a row vector given as
(51)
Inserting (50) into (47) one gets
qiW
i mi {d(t) }
A*
i = O...K
(52)
These K+1 sets of equations can be put into a matrix form.
.QW = AMd
mo {d (t) }1
Md= :
Im {d(t) K+1xK+1
where
(53)
o: JK+lxK+ I
(54)
qi = (i 1)... (n+i1)! (K+i )!
= .I (n )! (K (K1)! ,1K+1
Now let's force equation (53) to obey the normal equations for the weights, i.e. let's sub
stitute W with 1R1P yielding
QRP = AM1
QR P = AMd
(55)
We can merge Q and Re1 together to form a matrix H
(56)
HP = AMd
the unknown crosscorrelation vector P can be solved as
P () = H A Md
K+ixK+1 KI+1xf+l Kr+IlI
(57)
There is one other normal equation that we have to satisfy.
(58)
AK+ P(,) = 0
K+
Premultiplying (57) with ATK+i yields the polynomial f(X)
f() = AP,P (,) = ATIAMd = 0
(59)
of order K+1 with constant coefficients. Since we have enforced the normal equations (40)
and (45) during the derivation of (59), roots of the polynomial f(%) turn out to be the esti
mates of the local minima of Jsch(X). f(%) is a polynomial with constant coefficients and its
roots can be computed using numerical techniques. To demonstrate this method an exam
ple is included.
Example
In this example, the signal to be decomposed is chosen as
d(t) = t2e0.4teO.2
tO0
(60)
Figure 6 shows the normalized mean squared error Jsch as a function of X for approxima
tion orders K=3 and K=4. J is computed by determining the optimum weights from (40)
for various values of X. Normalization is done with respect to , the energy of the
signal to be approximated.
Figure 6 Normalized J as a function of X
For order K=3, optimum value of k is around )opt=0.2. At =Xopt, Jnormalized0.095.
Using the moment matching technique, we will get an estimate of opt that is close to the
theoretical one.
Jomaized
order K=3
For K=3, Md vector that holds the time moments of d(t) can be computed as
M = (61)
2 1" m2 02S
45703.1
Matrices Rc, Q and H can easily be formed as
16842 1 1 1 1
R 8 6 4 1 2 3 4 (62)
S32 4 5 2 6 12 20
2 455 62460120
0 8 16 16
H i1 8 56 18 112 63)
c 64 432 992 8
528 3504 7872 6048
Also from (45)
A4= [1 6 12 8] (64)
Substituting these into (59) one gets
f(X) = ATAHl Md
(65)
= (0.39+ 2.93X 131.8X2 + 476X3)X = 0
Roots of f(X) are X1=0, X2=0.04, X3=0.08, X4=0.236. When Xopt is chosen as 0.236 this
yields Jnormalzed.1 1 which is slightly worse than the theoretical optimum. Figure 7
illustrates the approximation of d(t) obtained with K=3 and X0.236.
Figure 7 d(t) and its 3rd order approximation
Although we have matched the first K+1 time moments of y(t) to that of d(t), one still gets
consistent estimates of %opt when different moments are used. Table 2 shows the estimates
obtained by matching different moments which corroborate our claim.
Table 2 : Time scale estimates
Moments used estimate of
0,1,2,3 0.236
0,1,2,4 0.237
0,1,3,4 0.238
0,2,3,4 0.238
1,2,3,4 0.239
2,3,4,5 0.233
1,2,3,5 0.238
As mentioned before, matching the moments in the time domain is equivalent to
matching the Taylor's series coefficients in the frequency domain around the zero fre
quency. In that sense the method has some limitations. Due to the locality of the informa
tion provided by the moments, this method would give satisfactory results for signals
whose energy is concentrated around the de frequency. For signals with higher frequency
components, the method will tend to approximate the envelope of the signal. In represent
ing bandpass signals it is advisable to use complex time scale. The real part of the time
scale can be estimated using the moment method, this time the desired signal being the
bandpass signal frequency shifted to the baseband.
The formulation of the estimation method was carried out such that the number of
moments equaled the number of K+1 unknowns (K weights and the time scale). If desired
more moments can be used which would transform into an overdetermined problem that
can be solved using least square techniques.
Estimation of the optimum X using a matched filter
The left hand side of (44) represents the crosscorrelation between the desired signal
and the Kth Laguerre kernel. Kuo and Celebi [26] suggested to implement this equation for
different values of X in the form of a parallel organization of matched filters (Figure 8).
This arrangement is very popular in communication systems because it implements the
optimal receiver for a signal constellation in white noise. However, here we are interested
in the zero crossings of the matched filter (correlator) outputs instead of the more familiar
peak value. Let us state how the system can be utilized. The desired signal is applied to the
block of Figure 8, and X is varied from 0 to hax in increments of ax/m to produce m
time signals sj(t) constructed by correlating the input signal with the Kh order Laguerre
kernel. A change of the sign from mth correlator output to the m+ls will indicate an extre
mum point on the MSE curve somewhere in between these two time scale values. The
position of the zerocrossing can either be estimated or the time scale interval can be fur
ther tightened. The individual values have to be further tested find the value that mini
mizes the mean square error of the approximation. A rule of thumb for picking up ax is
selecting Xkm approximately 2.5 times the bandwidth of d(t). This upper bound stems
from the fact that when X exceeds it the last basis does not have a significant contribution
to the approximation.
Figure 9a depicts the dependence of Jsch() on X for approximation order K=3 and
choice of the desired signal whose Laplace transform is given by
s3
D (s) = 2
(s + 1) (s + s + 4.25)
Figure 9b illustrates the zerocrossings of the correlator outputs which coincide with the
minima of the Jsch curve.
LK(nax/m,t)
x d
LK(2Imaxflmt)
SI d
LK(Xmax,t)
_ i dt
Sign
changes
between
di, dil4
, optimum X
estimates
Figure 8 Estimation of the optimum time scale using a matched filter implementation
(66)
d(t)
Figure 9 a) MSE curve for order K=3. b) Weighted correlator outputs
Online Adaptation of the Weights and the Time Scale
It is possible to solve the optimization problem of (28) using online adaptation, in other
words, by gradually adapting the optimization parameters in the opposite direction of the
mean square error gradient, i.e.
dW = w J (,W)
(67)
=T J( W)
Since this approach is based on a gradient technique, online adaptation of (67) will only
find the stationary points on the Jsch not the global minimum.
Consider the system identification problem depicted in Figure 10. Our task is to adapt
the weights and the time scale of the gamma filter such that the mean square error is mini
mized. This is equivalent to the problem of (28) in the sense that we are approximating the
impulse response of the unknown plant as a weighted sum of gamma kernels. The weight
update term is obtained by evaluating
dW a 2
dt =w ~ (w, W)= twa
=21W
Figure 10 Adaptive system identification problem
where the vector X holds the signals at the taps, i.e.
x, (t) x (t) *g (t
~x(t) x(t)*g2(t)
XK(o Wx(t)*gK(t)
Evaluating the partial derivative of (68) one gets
dW(t)
VW(t) dt = 2rlw
(69)
step size: 1, > 0 (70)
(68)
Notice that the online adaptation is basically an integration operation
W(t) = VW(u)du
(71)
which inherently implements time averaging. Hence, if the input signal x(t) is ergodic and
the step size is chosen sufficiently small, the ensemble average operator of (70) can be
conveniently removed yielding the least mean squares (LMS) adaptation rule.
The update term for X can be derived similarly by partial differentiating the mean
square error with respect to X, i.e.
dA aJ e (t) ~ X
dt=  = 2l.u = 2
(72)
= 2rT
The vector GKx1 holds the gamma kernels and its form was already given in (28). As
derived in Appendix A, the derivative of gamma kernels with respect to X can be written as
the difference of the kernels. Taking this into account, we can write
dt= ( r > 0
(73)
where
1 1
Y = 2 2
0
K+ ]K+ 1xl
Again, if the input signal x(t) is ergodic and the step size is small enough, the expectation
operator in (73) can be conveniently ignored.
Conclusions
This chapter introduced the gamma kernels, a special class of exponentials, and the fil
ter structure associated with it. Gamma kernels were inspired as generalized delay opera
tors with the purpose of substituting the tapped delay lines which performed poorly in the
processing signals with redundancies. Gamma kernels are characterized by a single time
scale parameter that linearly controls the amount of the delay the kernels induce.
Having linked the gamma kernels to their orthogonal counterpart, the Laguerre ker
nels, we identified the two as a special case of the Kummer's function which entails
numerous wellknown transcendental functions (Hermite, Bessel, Gaussian, etc).
The main problem that was attacked in this chapter was that of representing a given
signal in the linear projection space of the gamma kernels such that the euclidian distance
between the signal and its projection is minimized. We formulated this problem in a vec
tor space framework and interpreted the role of the time scale as of changing the relative
angle between the projection manifold and the signal of interest. The solution of the prob
lem turned out to be an extension of the WienerHopf equation. We decoupled the solution
of the optimum time scale from that of the optimum weights. The solution of the optimum
time scale problem is unsolved. We developed two techniques, one based on moment
matching and the other on matched filtering, to numerically estimate the optimum time
scale.
There are several open questions that still must be addressed. Allowing the time scale
to be complex would make the processing of the bandpass signals possible. Some related
work has been done in the area of Kautz filters [27], but there are many unanswered ques
tions.
In addition to the time moment method presented in this study, other global informa
tion measures can be sought to solve the problem of optimum time scale. This global
information can help the gradient descent techniques avoid local minima during adapta
tion.
The highly organized structure of the Jsch curves must be investigated further. There
may be some potential information stored in these curve structures that can be extracted to
help us position the global time scale. Furthermore, we have observed very similar Jsch
curve structures when the gamma kernels are replaced with the Hermite kernels [28]. This
indicates the possibility that other members of the Kummer family may exhibit similar
phenomena.
CHAPTER 3
TIMEFREQUENCY REPRESENTATIONS
Introduction
Standard spectral estimation methods are based on the implicit assumption that the
signals being analyzed are characterized by a stationary spectrum, in other words a spec
trum that does not vary in time. This assumption is not justified for large number of signal
processing applications. Spectral characteristics of signals arising in nature often evolve in
time due to the changing dynamics of their underlying processes. Prime examples are
human speech, seismic signals and biological signals. It is often desirable to view the
spectral evolution of these signals. This can be achieved by expressing the time varying
signal as a joint timefrequency distribution which depicts the signal energy (or the inten
sity) per unit time, per unit frequency. There are several timefrequency distribution meth
ods available, each with its own motivation and properties. The most prominent methods
are WignerVille distribution, Page distribution, Rihaczek distribution and the spectro
gram. In this chapter we will briefly go over these techniques and show where the pro
posed method of spectral estimation using poisson moments falls in the global picture of
timefrequency distributions. We start by introducing the uncertainty principle which is
an important concept in understanding timefrequency distributions.
Uncertainty Principle
The Fourier transform is the most commonly used spectral analysis tool. It decom
poses a signal into its frequency components. In doing so, however, it fails to reflect when
these frequency components occur or how they evolve, at least not directly. Hence, while
possessing an infinite frequency resolution, Fourier transform representation lacks time
resolution. In the case of time varying signals where spectral characteristics vary over
time, this poses a problem, since the spectral variations can not be seen explicitly in this
domain. Instead, the Fourier transform displays an average of the stationary segments of
the overall signal. One way of attaining time resolution is by placing a time window over
the time segment of interest and evaluating the so called short term Fourier transform
(STFT) of the signal. This method makes sure that only the portion of the signal that is of
interest is taken into account. There is an inherent side effect of time windowing, though,
which shows itself in the form of spectral smoothing and thereby loss of frequency resolu
tion. Spectral smoothing stems from the fact that the windowing corresponds to the con
volution of the original spectrum with that of the window in the frequency domain.
Decreasing the window width in an attempt to increase the time resolution will cause the
window spectrum to widen which in return further degrades the frequency resolution.
Hence, the time resolution is acquired at the cost of the frequency resolution. This stems
from the fact that, both the window function and its spectrum cannot be made arbitrarily
narrow. For techniques that try to acquire locality in time using windows, there is an inher
ent tradeoff between the time and the frequency resolutions. In his attempt to quantify
this relation, Gabor [12] defined the effective time duration1 (At) and the effective band
width (Af) of an arbitrary waveform and showed that their product is lowerbounded by a
constant as follows.
1
AtAf> (74)
This is called the uncertainty principle. At a formal level, it resembles to the uncertainty
principle in quantum mechanics The inequality of (74) states that a narrow pulse yields
1. Gabor defines the effective (r.m.s.) duration of a signal f(t) as the variance of its analytic part
f(t), where f (t) = f(t) +j(t) Here (:t) symbolizes the Hilbert transform of f(t). A parallel def
inition holds for the effective bandwidth.
a wide spectrum. Due to the coupling of the time duration to the bandwidth there is
always an ambiguity attached to the time frequency measurements that employ window
ing. This ambiguity stems from our attempt to express a function of one independent vari
able (time), in terms of two variables, namely time and frequency. The frequency
.d
parameter (f) corresponds to the operator J in the time domain, and hence it is not an
independent variable1 [29]. To summarize, as far as the signal measurements go one
should talk about the energy within a band during a specified time interval rather than the
instantaneous frequency which cannot be measured.
Gaussian shaped signals (with a possible modulation) satisfy (74) with equality. Those
are the signals with the smallest timebandwidth product and they have the form2
S(t) = exp (t +tj2 "fot +jO (75)
Gabor [12] coined the phrase elementary signal for these signals. He further suggested
expanding a signal into a double sum of delayed and frequency shifted versions of these
elementary pulses. Later, Hellstrom [30] made Gabor's expansion exact by using an inte
gral expansion. Gabor believed that in the timefrequency plane, each of these pulses rep
resented a quantum of information which he called a logon and depicted these information
cells with nonoverlapping rectangles of constant area. It should be remarked that, due to
the fact that the delayed versions of Gaussian signals overlap with each other and the fact
that Gabor's definition of timebandwidth product was arbitrary, the approach of repre
senting signals via logons is only an approximate one. Further critique of Gabor's pioneer
ing work can be found in Lerner [31].
1. Through out this study represents the imaginary unit Ji
2. Triangular waveforms have slightly worse (10% higher) timebandwidth products than the Gaus
sian waveforms.
TimeFrequency Representations
The basic goal of the timefrequency representations is to devise a joint function of
time and frequency, that is a distribution, which represents the energy or the intensity of a
signal simultaneously in time and frequency. It is required that these distributions possess
some desirable properties. Of those properties, nonnegativity, invariance with respect to
time delays and frequency shifts in the signal are important ones [32]. Another, desirable
feature is the satisfaction of the marginal conditions. This implies that the integral of the
distribution over the entire frequency axis at a given time should yield the signal power at
that moment.
Gabor [12] and Ville [33] were the first ones to address the problem of timefrequency
distributions. Inspired by similar developments in quantum mechanics, their original moti
vation was to improve on the spectrogram which is basically a magnitude squared short
time Fourier transform (STFT) commonly used for speech signal analysis. The spectral
estimation technique that will be introduced in Chapter 5 can be formulated as a spectro
gram technique. For that reason we will start with this relatively simple technique and
relate it to other timefrequency representations.
We can write the spectrogram as the magnitude squared Fourier transform of a win
dowed signal. That is
2 2
P(t,f) = fs (u) h (u t) ej2fudu S (v) H (v f) I2vtdv (76)
Here s(t) is the signal of interest, while h(t) is a window that tapers to zero. As explained
in the previous section, like any other window based technique, spectrogram has a time
frequency resolution tradeoff associated with it. Equation (76) can be alternatively writ
ten as
0 0(x)y (77)
P(t,f) = s (x) s* (y) h* (x t) h (y t) e2'(XY) dxdy (77)
0@0..
This bivariate energy density has the units of Joules per Hertz per sec. Substituting
X = U+
2
y= u
2
we can rewrite it in the form
P(t,f) = s(u+)s (u1)h(ut)h (utJ )e fdudt
, 2 I"
(79)
In expanding (76) to this equivalent form, we have mapped the one dimensional window
h(t) to the twodimensional kernel h*(t+r/2)h(tv/2) which is a function of time and lag T.
If h(t) is equal to rect(/T), that is a rectangular window of width T, the two dimensional
kernel would manifest a diamond shape as depicted in Figure 11.
Figure 11 Twodimensional spectrogram kernel
The width T of the kernel along the taxis determines the time resolution, while the recip
rocal of the width 2T along the Taxis determines the frequency resolution [32]. Thus,
when using the spectrogram method, it is impossible to improve the frequency resolution
without inducing a loss of time resolution.
(78)
h(tr/2)
The form of the distribution given in (79) can be considered to be the Fourier trans
form of the timeindexed autocorrelation function R(t,j), i.e.
P (t,f) = eJ2fR (t, ) d (80)
where
R(t,r) = 4(ut,t)s(u+2)s (u )du
.. (81)
*(t,t) =h (t +)h(t )
As can be seen in this expression, the kernel function 4(t,t) plays an important role in
determining the characteristics of the timeindexed autocorrelation which is to be esti
mated utilizing time averages. Then, a question arises as to which properties the kernel
function should have for the estimation of the autocorrelation function. Since a time aver
age would smooth out any timevarying property of nonstationary signals, the kernel func
tion needs to assign a large weight to s(u+T/2)s*(u/2) when u is close to t and a small
value when u is far away from t On the other hand, to a give a reasonable accuracy to the
estimate R(t,t), the range of the time average should be increased for large value of t com
pared to the range of the time average for a small value of T.
An alternative selection of the time window is the decaying exponential window,
i.e. h(t)=e'u(t). This window emphasizes the recent past of the signal1. Exponential win
dow is closely related to the gamma filter structure since it is embedded in the gamma
filter in the form of a leaky integrator. In Chapter 5, we will propose to use gamma filter as
an alternate timefrequency representation scheme. For now, it should suffice to mention
that the proposed method falls under the category of spectrogram with its twodimen
1. Timefrequency spread of the decaying exponential window is studied in Appendix C.
sional kernel as depicted in Figure 12. As seen in this figure, the kernel assigns larger val
ues to the points close to the origin as desired. In the meantime, it provides a bigger range
of time average for large values of t, which in return allows a better autocorrelation esti
mate.
Figure 12 Exponential decaying window kernel a) top view b) 3 dimensional view
Cohen's Class of Bivariate Distributions
In an attempt to circumvent spectrogram's timefrequency resolution tradeoff, Ville
[33] tried to derive an exact signal energy distribution, based on one represented by
Wigner [34] to calculate two dimensional probability distribution in time and frequency of
a quantum particle, defining what is known as WignerVille distribution (WD).
(t, o) 1 fs (t )s (t+ e) e~rd
(t, 2i) =
@02
(82)
Its characteristic function, introduced in modified form to radar theory by Woodward [35],
is wellknown as the ambiguity function. WD is one of most commonly used timefre
quency representations. Following a similar approach that led us to (79), it can be shown
that the twodimensional kernel of WD is 8(%) (Figure 13). That is, while owing infinite
time resolution, its frequency resolution is proportional to the data length used. WD can
discern nonstationarities. One drawback of WD is the generation of crossterms. Cross
terms occur at the arithmetic mean between any two frequencies (positive or negative)
present in the original signal. This interference obscures the output of the WD. Working
with the analytic part of the signal will suppress crossterms due to positivenegative fre
quency pairs.
T
T
Figure 13 Twodimensional kernel associated with the WignerVille distribution
A year after Wigner, Kirkwood [36] came up with another distribution and argued that
for certain problems it had a simple form. Following this, Rihaczek [37] rederived this dis
tribution based upon complex energy density functions as
e (t, 0) = s (t) S* (O) et (83)
J27c
In 1952, Page [38] developed the concept of running spectrum and defined an instanta
neous power spectrum as the rate of change of energy density spectrum of the signal seg
ment from 0 to t as t is increased.
S12
P" (t, () = s (x) eJdr (84)
Levin [39] used the same definition for the segment (t, o) and defined a new function as
the average of both types of instantaneous power spectra.
PF (t, m) + P+ (t, m)
P (t, o) = 2 (85)
where P+(t,w) is the same as P(t,o) defined in (84) except for the fact that the integration
limits are (t,oo).
It was later realized by Cohen [40] [41] that these timefrequency representations have
more in common than is readily apparent, and he proposed what has now become known
as the Cohen's class of bilinear time frequency representations, a unifying framework for
several of the timefrequency distributions that have been proposed in the literature.
The general form of Cohen's bilinear distribution is as follows:
P (t, 0) = ,JfeetJr+J (,t)s (t ,)s (t + ~) duddO (86)
The function 4(t,t) is the kernel1. By choosing different kernels, different distributions are
obtained. For instance, Wigner and Page distributions are obtained by taking <(t,T)=l and
eti/V2, respectively (Table 3).
1. In general the kernels may depend explicitly on time and frequency. However, the time and fre
quency shift invariant distributions are those for which the kernel is independent of time and fre
quency.
A relatively new class of kernels is due to Choi and Williams [42]. This kernel, named
after its inventors is given by expression
e (t, ) = e > 0 : scaling factor (87)
This kernel is effective in diminishing the effects of cross terms while retaining most of
the properties which are useful for a timefrequency distribution.
All of these distributions have advantages and disadvantages of their own (for a
detailed discussion see Cohen [41]). Of the distributions cited here, WD possesses a large
number of desirable properties and provides the most concentration in the timefrequency
plane for a signal. This makes WD an attractive tool for the analysis of timevarying spec
tra. Given the aforementioned connection between the spectrogram and the representation
using the gamma filter, we would like to relate the spectrogram to WD, which we take as a
reference.
Spectrogram and Its Relation to Wiener Distribution
As shown in (76), the spectrogram of a signal s(t) is obtained by taking the square
magnitude of the shorttime Fourier transform of the signal and its window h(t). The basic
properties and effectiveness of the spectrogram for a particular signal depend on the func
tional form of the window. As seen in Table 3, spectrogram is a member of the Cohen's
class of bilinear timefrequency representations. The relationship between the spectro
gram and the Wigner distribution of a signal was given by Claasen [43] as a convolution
integral.
Sf(t, o) = W(t, ) Wh ( t, fo 0)drxd (88)
S~( O 2TR
Table 3 Some distributions and their kernels
Reference Kernel Distribution
1 Ije "s (t )s(t+!)dr
WignerVille 2J 2
S(t s(O (0))s e 
KirkwoodRihaczek
o* li/2 t 1 \ )j
Page it i2ix2
Sh* ( )h (u + T) Jej' s (x) h ( t)
Spectrogram 2 5 2
e du
cos (Re) Re { 1s (t) S (w) e"}j
MargenauHill
The spectrogram is a timefrequency smoothed version of WD, with the smoothing
function given by the WD of the window h(t). A direct consequence of this is the fact that
for a given signal waveform, the effective area in the timefrequency plane of the spectro
gram is slightly greater than that of the WD [44]. The timefrequency extent of the spec
trogram is minimized when the window is matched to the signal. The nature of the
spectrogram, by which it is relatively free of interference terms, has prompted research on
determining windows that would maximize concentration of signals in the timefrequency
plane. Since WD provides maximally concentrated distribution and the spectrogram is
related to it through (88) it seems obvious that a window function whose WD is most con
centrated in timefrequency plane would achieve the minimal spread. The Gaussian win
dow is such a window with its WD being a bivariate Gaussian function in the time
frequency plane.
In order to achieve optimal concentration, the parameters of the window (namely its
timefrequency width and orientation in the timefrequency plane) must be matched to
those of the signal components. Such an approach is embodied in a dataadaptive trans
form [45] where the window parameters are adapted to minimize a concentration measure
at each location in timefrequency domain.
Conclusions
This chapter provides background for Chapter 5 where an alternate timefrequency
representation scheme that utilizes the gamma filter is proposed. This new representation
scheme can be identified as a spectrogram method that uses a decaying exponential win
dow to discern nonstationarities. In an attempt to familiarize the reader with the general
concept of timefrequency distributions, this chapter gave a general outline of the available
techniques, paying particular attention to the spectrogram.
Starting out with the uncertainty principle, it was shown that any timefrequency dis
tribution that incorporates windowing, including the spectrogram, trades the time resolu
tion with the frequency resolution. The extent of the tradeoff is governed by the time
frequency spread of the window used. One rough way of quantifying the spread is by com
puting the timebandwidth product of the window. Timebandwidth product of the expo
nential decaying window is derived in Appendix C and compared with that of the
gaussian window, which assumes the lowerbound.
Not all timefrequency distribution methods are bound by the limitations of the uncer
tainty principle, such as the WignerVille distribution (WD), Page distribution and Choi
51
Williams distribution. However, these methods usually distort the distribution in some
sense, e.g. by yielding negative results, cross terms or failing to satisfy the marginality
conditions.
Among the distributions cited here, WD has the maximum timefrequency concentra
tion. Hence, it can be used as a benchmark. We link the spectrogram method to the WD
using a convolution equation. This equation formulates the spectrogram as a smoothened
version of the WD. It is shown that the amount of smoothing is dictated by the WD of the
window used.
CHAPTER 4
INTERPOLATION
Introduction
Suppose that we have been given certain information about a function. Perhaps, we
know its values, or its derivatives at certain points or maybe its moments. How can we use
this finite amount of information to construct other functions that will approximate the
original? This is the question we will address in this section. The process of approxima
tion using a finite amount of functional data is called interpolation. Interpolation can be
divided into; i) the analysis section where the information on the function is gathered, and
ii) the synthesis section, where the approximation is done by fitting prototype functions
to the available data. The logic behind the interpolation is that if an approximation agrees
with the original function at certain points, then it should also come close to the original
function at the intermediate points. It should be noted that the choice of the prototype
functions, the way we gather the functional data, as well as the process of function fitting
are problem dependent and they rely mostly on experience. In the next section we will
introduce the most common form of interpolation; the polynomial interpolation and then
proceed to the general problem of finite interpolation.
Polynomial Interpolation
Polynomial interpolation is an approximation technique where a polynomial pn(s) of
degree n is equated to the function f(s) at n+l distinct points. These points are called the
interpolation knots. The appropriateness of using the polynomial interpolation for func
tion approximation is justified by the following two theorems.
Theorem: Existence and Uniqueness
Given n+1 distinct (real or complex) points so, s1, ...,Sn and n+l values fo, f, ... fn,
there exists a unique polynomial pn(s) e Pn (set of all nth order polynomials) for which
Pn(si)=fi i=0,1,...,n
For proof refer to Davis [46].
Theorem (Weierstrass)
Let f(s) e C [a,b] where C is the set of complex numbers. Given 8>0 we can find a
polynomial pn(s) of sufficiently high degree for which
f(s) p (s) I E a
Together, these two theorems state that one can construct a unique polynomial that fits
the function f(s) at given points and the difference between the function and the interpolat
ing polynomial can be made arbitrarily small by selecting the number of points suffi
ciently large. A natural question that comes to one's mind is whether we can do a similar
approximation based on the basis of n+l arbitrary pieces of linear information, rather than
the values of the function itself. Further, can we use prototype functions other than poly
nomials for the approximation? These questions lead to the following general problem.
General Problem of Finite Interpolation
Let X be a linear space of dimension n and let L1,L2,...,Ln be given functionals defined
on X. For a given set of values fl, f2,...,fn can we find an element of X, say x such that
L, (x) = fi i= 1, 2, ..., n (90)
The answer is yes, if the Li are independent in X* 1.
Lemma: Let X have dimension n. If x1, x2,...xn are independent in X and L1,L2,...,Ln are
independent in X* then
IL, (x) ) 0
(91)
The converse is also true. See Davis [46] for a proof.
Theorem
Let a linear space X have a dimension n and let L1,L2,...,Ln be elements of X*. The inter
polation problem of (90) possesses a solution for arbitrary values w1, w2,...,wn if and only
if L4 are independent in X*. The solution will be unique [46][47].
Example
Choose four linearly independent functionals as
L1 [f] = Sin [f(0.4)]
L3 [f = f(0)
1
L2 f = ff(x) dx
df(x)
L4 f df
(92)
Given the functional values L4[f(x)] (i=1,...,4) interpolate to f(x)=e3xSin(x) using the
interpolant function h(x)
h(x)= aivi (x)
i=
(93)
v (x) = Sin (x)
3 (x) = x
1. X* refers to the space that encloses the functionals LI,L2 ,...L.
where
v2() = Cos(x)
v4() = 2
55
The weights ai of the interpolating function h(x) should be chosen such that
Li [f (x)] = L, [h (x) ] i = 1,..., 4
(94)
This can be put in a matrix form as
L, [vY] ...L[ ] L, [ a L [A
S= L (95)
L4 [v1] ... L4 4] a4 L44 4
Weights of the interpolating function h(x) can be found by solving this set of linear equa
tions. Figure 14 illustrates f(x) and its approximation by h(x).
Figure 14 Interpolation of f(x) by h(x)
Systems Possessing the Interpolation Property
Here is a list of commonly used interpolation functional and interpolant function pairs.
Example 1: interpolation at discrete points
x=Pn ~Lj(t)=f(zj) il,..
0.12
0.1
0.08
0.06
0.04
0.02
X=Pn
i=1,2,...
Interpolation is done at distinct points using only the function values.
Example 2: Taylor interpolation
X=Pn i(f)=fi)(Zo) i=1,2,...
Interpolation is done at only one point using the derivatives of the function.
Example 3: Hermite osculatoryy) interpolation
X=P2n1 Li(f)=)(k) i=12,.... k=,2,...i j=1,2,...,k
Hermite interpolation may be thought of as a combination of Example 1 and Example 2.
Interpolation is done at several points using the derivatives of the function in addition to
the values.
Example 4: Birkoff (lacunarv) interpolation
X=P2nI Li(f)=f)(zk)
This is similar to the Hermite interpolation, except for the fact that some of the derivatives
or the function values might not be available.
Example 5: Trigonometric interpolation
X= Linear combination of l,cos(x),Cos(nx),..., sin(x),sin(2x),..sin(nx). This is called a
trigonometric polynomial of degree n
L(f)ith Fourier series coefficient
Hermite Interpolation
In this study, we will restrict our attention to Hermite interpolation. This is the kind of
interpolation where we are trying to match both the values and the derivatives of the inter
polant polynomial to that of the function. Given the function values and the derivatives
f(so), /1) (o), ... m ol) (S)
: : : (96)
f(sn), f (Sn)" .. m,) (S"n)
with the interpolation knots si being distinct and possibly complex, it can be shown that
the Hermite polynomial [48]
Hmn m,1 mj1 P) (s) w (s) k (S S mi
Hm (s) 2 2 j!k! (si)miJk dSk w(s) Is=
i=0 j=O k=O (y 
where
n n
w(s) = n (s si) m = m i1 (98)
i=O i=0
interpolates to the values and the derivatives of f(s) given in (96), i.e.
Hm (si) = f (si) i5n (99)
0 5j < mi
It should be noted that when nO, Hermite interpolation becomes the Taylor's series
expansion. For analytic functions the following expression offers a more compact form
for the Hermite polynomial [46]
1 w (t) w(s)
Hm (s) w(t) f(t)dt (100)
2rj w (t) (t s)
The contour C should be selected such that it encloses all the interpolation knots si and f(s)
remains analytical inside as depicted in Figure 15. Recalling Cauchy's residue theorem
1 Xf(t)
f(s) = dt (101)
f(S) = 2f t s
C
One can write the remainder of interpolation (or the error of approximation) as
e (s) f(s) H (s) )dt (102)
2ryj w (t) (t s)
C
This equation leads to the Cauchy remainder given as
e (s) = w (s) (+1) (103)
(m+ 1)!
where 4 is an unknown number that lies within the ring
min{ Isl,...,lsnl }
the function f.
Although, it is not easy to compute the error of approximation using the Cauchy
remainder, it has an instructive form. As seen in (103), the magnitude of the error depends
on the product of the derivative of f(s), which we have no control on, and w(s) already
defined in (98). This raises the following question: how can we select the interpolation
knots so,sl,...,Sn in the interval [ab] such that the maximum of Iw(s)l is as small as possi
ble, i.e.
Wmin =min max Iw(s)l Isol s 1S < Isn
Si s (104)
For the case when multiplicity of all the interpolation knots is the same, i.e.
m0=ml=...=mn, the answer to that optimization problem is given by the Tschebyscheff
polynomials. It turns out that if the interpolation knots si are chosen as the roots of the
Tschebyscheff polynomial, Iw(s)l is minimized in the interval [a,b]. Further, if the interpo
lation knots si are all real that dictates [46]
si Cos[ (2i 1)] (b) +
= n (ba) +a
Figure 15 Contour of integration for evaluating the Hermite polynomial
Hermite Interpolation of Signals Produced by ARMA Models
Assume that f(s) fits into an ARMA model. i.e.
K
a.
f(s) = si (106)
i=
For this class of functions, the error of approximation given in (102) reduces to the form
K K
e (s) i = w(s) bi w(s) y (s)
e (s) =w (s) w (pi) (s pi) spi
i=1 i=l
(107)
Notice that when the function assumes an ARMA model, the approximation error can be
written as a product of w(s) and y(s). y(s) has the same poles as f(s) does, however its
zeros may be different. The peaks of y(s) should occur at s=Pi. Zeros of w(s) should be
selected to coincide with these peaks so as to cancel their effects which implies si=pi.
Poles of f(s)
.',
interpolation
points si
i = 0, 1, ..., n
mo = MI = ... =m n 2 0
(105)
Issues in Hermite Interpolation
There are four questions that need to be answered regarding the Hermite interpolation
problem:
1) What is the region of convergence of the approximation? i.e. for what values of s does
lim f(s) Hm(s)\= 0 seROC (108)
hold?
2) If two interpolation knots are used, for a given error bound and polynomial order what
can be the maximum separation between these two knots? Or assuming that this is a spec
tral approximation, what is the bandwidth of the approximation?
3) For a given bandwidth and error bound what order of polynomial is required?
4) How does the inclusion of derivative information in addition to the function values help
to improve the approximation? i.e. How does the Hermite interpolation compare with the
case where multiplicity mi of each knot is unity.
Region of convergence
Here, we will give an answer to the first question. Let's consider the simple two point
Hermite interpolation case and assume that the complex function fits into an ARMA
model. Our purpose is to approximate the spectrum of f(s) in between the points so and s1
that lie on the imaginary axis (Figure 16). Select the interpolation knots on the imaginary
axis as s00 and sl=jox Applying (107) we get
K Io
le~l (a= _[ (ss0) (ssl) 1(109)
e (s) P \Jo) (109)
i= 1
Figure 16 Two point Hermite interpolation along the imaginary axis
The condition (108) holds if
g ((sso) (ss)I < (pi,s) (pis )l (110)
for all the poles pi.
The contours of constant g are called lemniscates. Inside the lemniscate is the region
of convergence (ROC). Outside the lemniscate approximation diverges even for large
polynomial orders. ROC always encloses the interpolation knots, however for small g
(which occurs when a pole of f(s) is close to one of the interpolation knots) the ROC may
be composed of more than one disconnected region. Figure 17 illustrates the evolution of
the ROC for different g values. In this figure the 'x' mark represents the location of the
dominant pole pd. This is the pole for which I(pdso) (Pdsl)l is minimum determines the
region of convergence. Intuitively, this is the pole that is closest to either of the interpola
tion knots.
Im{s)=
sl=jco
interpolation
knots____
> Re{s}
Bandwidth of interpolation
As seen in Figure 17, when the poles of f(s) get close to the imaginary axis, g gets
smaller and in return ROC splits into two disconnected regions. As a result, the approxi
mation does not converge in the entireness of the interval s=jQ where fle [0,o)]. The natu
ral question that comes to one's mind is that what is the maximum bandwidth ax that
the approximation would converge? Or in other words, how far apart can so and sl be?
Im(s) Im(s)
2, 2
g=1.7 / g=0.4
1) nSl \
I x x
0 Re(s) o ~ Re{s)
2 1 0 1 2 1 0 1
Im(s)RC he to s)
g=0.3 g=0.2
1 S1 1 S
o Refs) 0 Re(s)
2 1 0 1 2 1 0 1
Figure 17 Regions of convergence for various dominant pole locations
Let's concentrate on (110). Substitute S==O, sl=jo and s=ji. The frequencies inside
the ROC have to satisfy
I U()j (n 0)I < g
Equivalently,
_g < C2 _ < g
This yields two equations
22 o+ g > O
!C2 _o(g > 0
and
(113)
The real Ql values that satisfy these two equations are given by
c 02 + 4g (0 O 4g
2 2
and
(114)
o+ 4g 0+ 2+4g
2 2
This represents two regions. These two regions intersect when
0m ox2 4
max max g
2
+ 2 4g
Cmax max
2
(115)
The maximum allowable bandwidth is given by
Smax = 2J = 2 JIPPdj I ma (116)
This nonlinear equation can be solved by considering that the variables satisfy a triangular
relation as illustrated in Figure 18.
(111)
g>O
(112)
x= Ipdl
y=lpdjcaxl
Figure 18 Triangular relation between the maximum bandwidth and the dominant pole
x and 0 represent the magnitude and the argument of the complex pole pd, respectively.
With this new notation one can rewrite (116) as
Wmax = 2 j (117)
The law of cosines dictates that
y2 = x2 + o2 2xc CosO or
max max (118)
y2 = x2 + 4xy 4x yCosO
This quadratic equation in y yields one positive solution given by
y = 2(xx2CosO) + j4(xx2CosO)2+x2 (119)
Substituting this into (117) yields the maximum allowable bandwidth in terms of the mag
nitude and the argument of the dominant pole.
COmax
Omax = 2 2X (x X32C 0) + 4 (x xM2Cos0) 2+ 2 (120)
x = Id 0 = ZPd
Maximum error versus the bandwidth
In the previous section we derived the bandwidth of convergence when the polyno
mial order m>oo. Here, we will try to find an answer to a more practical question: what is
the maximum bandwidth one can achieve for a given error bound 4 and finite polynomial
order m. i.e. we want to make
le(jO)l = Hf(j) Hm (jU)I 5 Q[0, o] 4>0 (121)
m is finite
That is
K (j)n n(j )
e (j)) = !ai n (122)
i= P7 (Pi J,)n (, Pi)
where m=2n. Moving the absolute value operator into the summation one can get a pessi
mistic bound
K I (jn)n nj) <
Ie (j) I Ia n' 1 (123)
i=o p IP @i ,J')I I p,)
The maximum value that the term I (ijO2)n jw)"l can attain in the interval
e [0, w] is (w/2) 2n. The minimum value that the terms I (j pi) l and I (p, jw) I
can have is pip where pir is the real part of the pole pi. With these in mind we can write
oe j2n K ail (124)
le (jO)j 2 () In n+ 1
An upperbound for the bandwidth is
1 K ml
co(n) _<2[ ()] yT(n) = ~ inP+1 (125)
y (n) i=OlP il^W+
For reasonably small error values where 4
increases with the interpolation order n. This increase is roughly logarithmic. Figure 19
illustrates the dependence of the maximum percent error of approximation within the band
as a function of the bandwidth o for an 8th order ARMA system. Although the poles and
zeros of this system were chosen randomly, the dependence is typical. Alternatively, Fig
ure 20 depicts the maximum bandwidth for a given error bound versus interpolation order
n. These two graphs clearly demonstrate the advantage of using derivative information in
addition to the function values.
Conclusions
This chapter gave an outline of the general theory of interpolation paying special atten
tion to the Hermite interpolation. It was shown that given a set of linearly independent
functional information about a function, a prototype function can be made to fit to this
information, thereby approximating the function in some sense. A special case of this is
when the functional information is presented in the form of function values and derivatives
at various points in the domain. For this case, the Hermite interpolation technique can be
used to tailor a polynomial whose values and derivatives fit to that of the function at the
given points. This is the same kind of interpolation scheme that will be adopted in
maximum percent error
0 0.5 1 1.5 2 2.5 3
w (rad/sec)
Figure 19 Maximum percent error of approximation versus bandwidth w
maximum bandwidth versus order
1 1.5 2 2.5 3 3.5 4 4.5 5
order n
Figure 20 Maximum bandwidth for a given error bound versus interpolation order n
1.5
a
Chapter 5 in approximating the magnitude spectrum of a signal from its frequency deriva
tives. In that sense, this chapter can be considered as a prelude aimed at providing the
background theory that we will frequently refer in Chapter 5.
Due to the intractability of a general purpose derivation, we analyzed the approxima
tion error mostly in the context of the Hermite interpolation and ARMA signals. It was
shown that the approximation error can be decreased by incorporating higher order deriva
tives into the interpolation. The contribution of the higher order terms is roughly logarith
mic. That is, the contribution becomes marginal for very high orders. In the case of
relatively high approximation errors, addition of high order derivatives may even degrade
the approximation.
The separation between the interpolation points plays an important role in determining
the approximation error. The maximum separation (that is the bandwidth, in the case of
spectral approximation) is dictated by the dominant pole of the function of interest in the
vicinity of the interpolation points. The closer the pole is to either of the points, the
smaller has to be the bandwidth. Throughout the chapter, we linked the parameters of the
interpolation (number of derivatives, bandwidth, dominant pole) to the region of conver
gence and approximation error. Furthermore, we demonstrated the advantage of using
higher order derivatives with an example.
It should be remarked that Hermite interpolation is not the only tool that allows one to
approximate a function in terms of its derivatives at distinct points. Hermite interpolation
creates a polynomial to do the approximation. Although, not very tractable, one can
replace the polynomial with a rational function and fit it to the derivatives of the function
of interest. This technique is closely related to the Pade' approximation method [49] [50].
Another prototype function that can be used is the family of Sinco's as suggested by Lin
den and Abramson [51][52].
CHAPTER 5
REPRESENTATION OF LOCALLY STATIONARY SIGNALS
USING LOWPASS MOMENTS
Introduction
In the previous chapter, we have seen how the incorporation of the derivative informa
tion decreases the interpolation error and broadens the region of support. In this chapter,
we will demonstrate how to obtain the spectral derivatives of the timevarying signals by
using the gamma filter. The gamma filter generates the so called Poisson moments of the
input signal at its taps, which will be shown to correspond to the Taylor's series coeffi
cients, i.e. derivatives of the recent input signal spectrum around the zero frequency. The
appeal of the proposed method stems from the fact that in the analog domain, the Poisson
moments can readily be measured at the tap outputs of the gamma filter, rather than being
computed offline by a computer. Being the Taylor's series coefficients, the moments con
stitute a nonparametric representation that depicts the temporal characteristics of the input
signal. Hence, they are computational correlates of the signal waveform, which can be
directly fed into a nonlinear mapper, such as an artificial neural network (ANN) perform
ing classification, identification or prediction. Or alternatively, they can be used to
approximate the signal spectrum based on the Hermite interpolation technique that was
introduced in the previous chapter.
The gamma filter accommodates an inherent decayingexponential timewindow that
makes it possible to localize signals in time and in return represent nonstationary signals.
The representation takes place in the form of a spectrogram approximation which was pre
viously introduced under the general topic of timefrequency distributions. In that respect,
the proposed method is an alternate timefrequency distribution that is readily available at
the outputs of the gamma filter. Furthermore, if the gamma filters are implemented in ana
log hardware, it suffices to sample these outputs only at the information rate', rather than
the Nyquist rate of the incoming signal. With this convenience, the speed of the processor
following the gamma filter becomes independent of the highest frequency of the input sig
nal, being only constrained by the stationarity period of the signal.
We start this chapter by first introducing Poisson moments. This is followed by the
review of the synthesis formula and the analysis of the relationship between the moments
and the signal magnitude spectrum.
Gamma Filter and the Poisson Moments: Analysis Formula
Fairman and Shen [53] proposed that a distribution f(t) can be expanded in terms of the
derivatives of Dirac's delta function as follows2
f(t) = fi (to) eX(to) () (t to) (126)
i=O
This infinite summation is identified as the impulse series expansion. The term fi(to) is
called the ith Poisson Moment of f(t) at t=t0 which is given by [55]
gi (t)
fi (t0) f(t) 0i+ (t)  pi (t) = gi = 0,1,2,... (127)
t= to x
' stands for the convolution operator, while pi(t) is recognized as a scaled version of
the continuoustime gamma kernel gi(t) already defined in (1). By definition, f.l(t) is set
equal to f(t). Assuming f(t) is causal, the convolution of (127) can be expanded as
1. By information rate, we mean the rate at which a new information packet arrives, i.e. statistical
characteristics of the signal change significantly.
2. For more information on the distribution theory, see Schwartz [54].
0
f (to) =f(toQt) e~ dt (128)
Obviously, the Poisson moment fi(to) can be physically measured as the i+lst tap value of
a scaled gamma filter' with input f(t), rather than evaluating the convolution integral of
(128). It is this computational convenience of the Poisson moments that essentially moti
vates their use (Figure 21).
Equation (127) can be considered to be an analysis formula for the Poisson moments.
Before introducing the problem of spectral approximation using the Poisson moments, we
would like to introduce the synthesis formula which is the counterpart of the analysis for
mula in the sense that it dictates how to restore a signal given its Poisson moment samples.
k: time scale
Iststage k+l st stage
Vt) 1
fo fk'
Figure 21 Poisson moments are generated by the gamma filter
Reconstruction of a Signal from Its Poisson Moment Samples: Synthesis Formula
Having seen how easily the Poisson moments can be obtained using a gamma filter,
let's try to develop a synthesis formula that will dictate the reconstruction of the signal
from its moment samples. As is well known from Shannon's sampling theory, one can
1. An alternate name for the scaled gamma filter is Poisson Filter Chain (PFC) [55].
reconstruct a signal f(t) of bandwidth B from its time samples as long as the sampling fre
quency is equal to or higher than the Nyquist rate 2B [56]. If this condition is met, a band
limited continuous time signal can be written in terms of its samples as follows
f(t) = /f ( sincec (2Btn) (129)
n = L
This basic reconstruction formula for the case of uniform sampling was later extended
by Papoulis [57] to incorporate the samples of multiple linear filter outputs. Papoulis
showed that it would be possible to restore a signal given data sampled at 1/Nt the
Nyquist rate from the output of N arbitrary filters into which the signal has been fed. Res
toration of the input signal from the samples of tap outputs (i.e. Poisson moments) of an
Nth order gamma filter would be a special case of this general reconstruction theory. For
the case of the gamma filter, the original signal f(t) can be written in terms of its filtered
samples (poisson moments) as follows
N1 
f(t) = fi(nTN)k(tnTN) T = (130)
i= On= 
The interpolation kernels ki(t) are given by
B
ki (t) = 'i (u;t) e2zutdu i = 0, 1, ...N 1 (131)
B 2B/N
Note that ki(t) and Ti(u;t) are not Fourier transform pairs. We state without proof that, if
they exist, the kernels Ti(u;t) should satisfy the simultaneous set of equations
N1 j2zmt
2B 2moB TN
S i (u;t) P (u ) = e (132)
i=O
over the parameter set 0_mN1, BB/N
rier transform of pi(t) already defined in (127). The analytical solution of (132) is usually
very involved, therefore, numerical solution techniques are preferred. Refer to Marks [52]
for a proof and some hints on the solution of this equation.
Poisson Moments and the Magnitude Spectrum
Having seen how the original signal can be restored from its Poisson moments, let's
examine the relationship between the moments and the input signal spectrum. Assume a
causal signal f(t) and rewrite (128) as
f(t) = f(tot)u(tot)u(t)e dt (133)
where u(t) is the unit step function. Next, introduce fw(to,t), the decaying exponential and
rectangular windowed version of the delayed and inverted f(t), such that
[t t/2 1
fw(to, t) = f(tot)rect  o e (134)
where
1 ltt <1/2
rect(t) = ( (135)
0 else
Figure 22 depicts how the decaying exponential window placed over the signal f(t) empha
sizes the recent past of the signal. Substituting (134) into (133) one obtains
/ F
S(t) = fw(to, t) dt fw(to, t) + Fw(to, ) (136)
f i ( to
The delay and the time inversion have no effect on the magnitude spectrum which is
generally the main concern in speech recognition. The decaying exponential windowing
however, helps to emphasize the recent past of the signal, thereby achieving time locality.
Its side effect shows itself in the form of frequency domain smoothing. The amount of
smoothing is associated with the frequency spread of the decaying exponential window
which is discussed in Appendix C.
In addition to the decaying exponential window, there is also a rectangular window,
but for to > 4/1, its effect can be safely ignored. Therefore, IFw(to,L)I can be considered
as an approximation to the magnitude spectrum of the recent past of the original signal
f(to).
I f(t) 1. I
Figure 22 The exponential window emphasizes the past of the signal prior to t=to.
Having established the relationship between IFw(to,Q)l and the magnitude spectrum of
the original signal, let's go back to (136). It can be shown that the ith Poisson moment of f
(to) is related to Fw(to,1) as follows
f(t) = F,(to, ) (137)
f! dQQ
n=0
Analyzing (137), the moments are recognized as the Taylor series expansion coefficients
of Fw(t0,) around 0. That is we can write
F (to, Q) = i (to) fi (138)
i=O
Using a finite number of Poisson moments, a Taylor's series approximation can be
obtained for Fw(to,Q). In many applications, one does not even need to compute the poly
nomial of (138). Being the computational correlates of the recent magnitude spectrum,
moments themselves suffice to represent the spectrum. Hence, instead of computing the
polynomial that approximates the spectrum, the moment vector can be directly fed into a
nonlinear mapper, such as an ANN for tasks such as prediction, identification and classifi
cation.
Spectral Hermite Interpolation
The Poisson moments carry valuable information about the smoothed spectrum of the
input signal in the form of Taylor's series coefficients at zero frequency. One shortcoming
of the Taylor's series expansion is however, its locality, i.e. approximation diverges at
points away from the origin. Hence, it is not possible to characterize a wide bandwidth of
frequencies using a finite number of Poisson moments. As a cure for this shortcoming, one
can partition the frequency range of interest into several small subbands and carry out the
approximation separately for each band. In practice, bandpass filters tuned to different
center frequencies followed by mixers can be used to shift each band to the origin. The
baseband signals at the outputs of these devices can be fed into the gamma filter to obtain
the corresponding moment vectors. Consequently, these moment vectors can be concate
nated together to give the overall spectral picture. Since the moment vectors store the
spectral derivatives at the edges of each band, Hermite interpolation technique can be uti
lized to obtain an approximation of the magnitude spectrum1.
Tracey and Principe [59] successfully implemented a similar scheme in their small
vocabulary word recognition task using artificial neural networks. A group of constant Q
bandpass filters were used to model the cochlea [60]. The authors used a cascade of a
square device and an envelope detector instead of the mixer, thereby roughly approximat
ing the power spectrum rather than the magnitude spectrum. As described above, the base
band signals were fed into Gamma filters, outputs of which are the Poisson moments.
These moments are further fed into an ANN for word classification (Figure 23).
Spectrogram with Exponential Decaying Window
Despite the fact that the time windowing results in the degradation of the frequency
resolution, due to the smoothing effect that comes along with it, windowing proves to be
beneficial in reducing the variance of the spectrogram. As studied in Chapter 3, the spec
trogram technique estimates the power spectral density of a signal in terms of the squared
magnitude spectrum of the windowed (filtered) signal (Table 3). Although, in the case of
infinite amount of data, the spectrogram estimator is unbiased, it remains to be an incon
sistent estimator irrespective of the data size used The variance of the estimation error
can be shown to be proportional to the true power density at a particular frequency
[61][62].
There are several methods for reducing the error variance. The ideal way is to ensem
ble average the spectrogram estimates of all the realizations of the underlying process.
Due to the unavailability of independent realizations in practice, this approach might not
1. Although, they might not be mathematically as tractable as the Hermite interpolation is, there
exist other interpolation techniques. See [58] for instance, which utilizes the rational polynomi
als and [51], [52] that utilize the Sine functions for that purpose.
Figure 23 Piecewise spectral approximation via Poisson Moments
be feasible in most applications. For ergodic signals, an alternate way would be averaging
the spectrograms of different segments of the data [63]. This approach trades in the fre
quency resolution for low estimator variance. Another variance reduction method is due
to Daniell [64] who suggested to smooth the spectrogram using a lowpass filter in the fre
quency domain. Similar to Daniell's method the the method proposed in this study
smooths the spectrogram using a lowpass filtering approach and in return reduce the esti
mator variance at the cost of frequency resolution.
As illustrated in Figure 24, the spectrogram method can be decomposed into three
operators: time windowing, Fourier transform and the absolute value squared. The Poisson
moment (PM) method, on the other hand can be implemented using the following cascade
of modules: time windowing, Fourier transformation, absolute value, spectral Taylor's
series coefficients, Hermite interpolator and a square device. The first four of these mod
ules are embedded in the Gamma filter. If we ignore the square device, PM method can be
thought of as an approximation of the spectrogram, which itself is an estimate of the true
power spectrum. For that reason, it becomes a challenge to quantify the performance of
the PM method, since there is no reference available to compare it to. One way of quanti
fying the goodness of an estimator is by determining its bias and variance. In view of this
next section is dedicated to the derivation of the bias and the variance of the spectrogram
estimator.
Baseband
Signal Taylor's series
window Coefficients
ident Hermite Spectrum
Interpolation Estimate
Gamma Filter
Figure 24 Gamma filter offers an alternative spectrogram representation at its taps
Variance and Bias of the Spectrogram Estimator
In this section we will derive the variance and the bias of a spectrogram estimator that
utilizes an exponential decaying window. Consider the windowed signal
x(t) = x (t) e~tu (t)
(139)
Using the WienerKhintchine theorem, the power spectrum of xw(t) can be written in
terms of the autocorrelation function. That is
00oo
PxW (f) = fRx () e gftd (140)
where the autocorrelation is
00
R (x(T) = Xw,(t)xw(t+r)dt (141)
The expected value of the spectrogram turns out to be
E {Pxf)} = fPx()W(f4)d4 (142)
The expectation is carried out over all possible realizations of the signal. Here,
Px(f)=E{ IX(f)12} represents the true power spectrum of the signal x(t), while W(f) is given
by
W(f) = F{e A u(t) 0e tu(t)} = F{teu(t)} = 1 (143)
(% +j2nf)2
Noting that W(f) has lowpass characteristics, the smoothing effect of the window on the
spectrum is easily seen from (142). The smoothened spectrogram is biased for finite X, but
as X>0, it becomes unbiased, i.e.
lim E ({Px, ( } = Px,)
X+o
(144)
With these in mind, the covariance of the spectrogram can be written as
cov{Px, (f)Px(f2)} = E{Px (f1)P:(f2)} E{Pxw(fl)}E{Px, f2)} (145)
The first right hand side term can be expanded in terms of the autocorrelation function as
E {Px (f) P (f2) } = E J f RX R(, R.(X. ) eJ2, "f2u)ddu (146)
Substituting the definition of the autocorrelation function, one gets
00
E {Pxw(f )P:,2)} = JJJJE {x(t )x (v)x(u)}e (t++v+u) (147)
0
Sej2x V (r t) +f (u v)] dtdrdudv
This expression involves fourthorder moments, which are difficult to evaluate in the gen
eral case. Therefore, we will assume that the signal is white noise. This an oversimplifica
tion, but will serve to illustrate the point. So, for the white noise input
N N
E{x(t)x(T) } = 8 (tx) = I (148)
holds. Then, the expectation term in (147) conveniently expands as
N2
E {x (t) x () x (v) x(u) = [88v + 8tv + 8tu v] (149)
This yields
N 1 1 1 22
NF16 X +2 + 2 (f +f2) 2 + 2 (f f2)21
Following similar steps, it can be shown that
N
E {Px(f) } (151)
Hence,
N2 i i
coV{Px ) P(i)P 2)} = V2,2 6+1 2 .+.j  2 (152)
^,wV +f2w,)2 w 16)
In general, any nonwhite process x(t) can be generated by passing white noise of vari
ance N02/2 through a filter of magnitudesquared response IH(f)12. Because the true power
spectrum Px(f) is related by
No 2
Px(f) = IH(f)2 (153)
then, the following is approximately true
+1 14)
coIvPxWOl)PxWV2)I PxOl) PxV2) [2.2 nf 22 +2.1.2]f, ) (154)
Analyzing this expression, it can be seen that the variance of the estimator decreases with
increasing X..
Having calculated the estimator variance at the output of the spectrogram, the next task
is finding out the variance of the spectral derivatives (i.e. Poisson moments). Again, due
to the intractability of a general solution, we will work with white noise signal for which
(152) holds. Assume that we are interested in the variance of the ith derivative of the spec
trum at frequency f=fl
i = [Px(f) + N(f)] (155)
al
N(f) represents the estimation error. The variance of ci can be written as
var{ci} = E i)(f) (156)
If we treat the differentiation as a linear filtering operation with magnitude response
IH(f)l=lfl1, the variance can be written in terms of the power spectrum PN(fl,f) of the error
term as
00
var{ci} = fiP(fl,f)df (157)
The power spectrum of N(f1) is nothing but the Fourier transform of the covariance given
in (152), i.e.
00
PN(f,f) = cov {Pxf1) Px, (f2) } e2fjdf2 (158)
Note that
P (f1, J) P, (Of) (159)
holds. We define
P, (f) P(O, f = ~ + x = e (160)
Inserting this into (157), one gets a tight upperbound for the Taylor's series variance as
N ie df (6 (2i)
var{ci} < ejdf 4 2i+1 (161)
0*
Two important observations that can be deduced from this are, first: selecting large
values of the time scale helps to decrease the variance of the moment variance. Second, as
expected, variance of the derivatives increase as a function of the derivative order. These
two results can be attributed to the lowpass nature of the time scale and the highpass
nature of the derivatives.
Choice of The TimeScale
The choice of the time scale X has a crucial role in determining the size and the accu
racy of the representation matrix whose elements are Poisson moments associated with
different bands and measurement times. X by itself, controls several parameters ranging
from the effective decaying window length to the maximum allowable bandwidth. This
section is devoted to the discussion of the overall effect of the time scale on the representa
tion scheme.
Recall that the Poisson moments correspond to the derivatives of the recent magnitude
spectrum around zero frequency. The recency in time is achieved as a result of the decay
ing exponential window that is embedded in the gamma filter. This exponential window
helps to emphasize the recent past of the input signal, thereby localizing it in time. The
effective length of the window is controlled by the time scale of the filter. A large time
scale implies a small window length. The moments represent the average spectral content
of the signal under the window. Hence, when dealing with nonstationary signals, one
should avoid averaging segments with different statistical characteristics, by keeping the
window length smaller than the stationarity period. The stationarity period can be thought
of as the rate at which the packets of information arrives. The effective length of the
decaying exponential window in the root mean square (rms) sense is derived in Appendix
Cas
p) (162)
xl (p) = (162)
xt (P) Z,~
where p is the percent energy of the window enclosed under the length xt(p). According to
this, 90% of the window energy is kept within the width xt(0.9)=1.15/X. Thus, the lower
bound for the time scale can be set as
[ n Pl
TS
where Ts is the stationarity period of the input signal. For speech signals this period varies
in the range 10 ms to 100 ms [65].
Notice that the time scale is constrained by the information rate, not the Nyquist rate
which usually assumes a higher value. The information rate also corresponds to the speed
at which the Poisson moments need to be sampled irrespective of the maximum frequency
content of the input signal. This relatively low sampling rate removes considerable
amount of burden over the shoulders of the processor following the gamma filter.
Although, the maximum frequency content of the signal has no direct effect on the
moment sampling rate, it dictates the number of bands required to cover the entire fre
quency axis. Based on this argument, one can conclude that the low sampling rate is
obtained by parallel processing the frequency domain information using several bandpass
filters.
While, the period of stationarity imposes a lowerbound on the time scale, frequency
resolution is the determining factor on the other end. As already explained in Chapter 3,
the side effect of the time windowing shows itself in the form of frequency domain blur
ring. That is, the resolution in time domain is attained at the cost of resolution in the fre
quency domain. Any two frequency components that are closer to each other more than
the frequency spread of the decaying window will be identified as one, rather than two
separate components. Thus, the time windowing acts as a spectral smoother. Conse
quently, the choice of the time scale cannot be larger than the minimum frequency resolu
tion desired in the representation. As derived in Appendix C, the frequency spread of the
time window, or in other words the frequency resolution of the representation is given by
tan (p)
xf(f) 2 (164)
such that p percent of the total signal power lies within the band fe[xf(p),xt(p)]. Merging
(163) and (164) together, one can write the limits of the time scale as
In 2
S p < < Af (165)
T tan (p)
where Af is the desired frequency resolution.
Having constrained the time scale from above in terms of the desired frequency resolu
tion, the next question addresses the choice of frequency resolution, itself. Consider the
two point Hermite interpolation problem discussed in Chapter 4 where one tries to approx
imate the recent magnitude spectrum of a signal f(t) within the band of frequencies [0,o]
using the derivatives of the spectrum at the edges of the band. Let pd be the dominant pole
of F(s) in that band1. Since it is based on a first order lowpass filtering, the decaying expo
nential window can be thought to be inserting an extra pole into the transfer function F(s)
at s=X (Figure 25). We know from Chapter 4 that the size of the region of convergence
and the maximum width of the band are inversely proportional to the minimum distance
between the dominant pole and the interpolation knots at s0=0 and sl=jo For that rea
son, one should avoid choosing the time scale such that the minimum distance between the
pole at s=X and either of the interpolation knots is smaller than that of the dominant pole.
1. The Dominant pole is defined as the pole with significant weight that is closest to the band of
interest. If there are more than one prospective dominant poles, the center of gravity of these
poles is taken as the dominant pole.
Otherwise, the pole at s=X would become the dominant pole and in return, further narrow
the band of approximation. In other words, one should select the time scale such that it
maximizes min { Ijp, IPd si }. This condition is guaranteed to hold if
X>1te {pd}I (166)
Another interpretation of (166) is that in terms of the frequency resolution, there is no
benefit in choosing X smaller than the real part of the dominant pole since the dominant
pole determines the smoothness of the true spectrum. In other words, there is no reason to
try to achieve frequency resolution higher than the sharpness of the true spectrum which is
essentially determined by the dominant pole in that band. Hence, provided that the infor
mation rate condition (163) is satisfied, a suitable choice for the timescale that ensures
both sufficient frequency and time resolution would be
S= 9te {pd} (167)
Im{s}=f
splane
sl=jc
sPd
x
SO o Re{s)
Figure 25 Te scale should be comparable with the real part of the dominant pole
Figure 25 Time scale should be comparable with the real part of the dominant pole
Examples
In this section we would like to present examples of spectral estimates obtained by the
Poisson moment method. As a test signal, we picked the speech utterance 'greasy'
archived in the TIMIT database [66]. The original data was sampled at 8 KHz and stored
in 16 bit data format.
In the case of finite amount of data, the true spectrum is not known. In other words, a
reference that the estimates can be compared to is not available. This poses the biggest
challenge in quantifying the goodness of the spectral estimates. One can resort to statisti
cal measures such as estimate bias, variance, etc, like we did in this chapter, but these are
average measures and do not help to assess the quality of the individual estimates. One
problem dependent solution would to be to test several estimation methods in the same
application and compare their performances. In our case for instance, one can compare the
classification error rate of a speech recognition system that uses the proposed PM method
as a preprocessor and other conventional frontends (tapped delay line, FFT, cepstral coef
ficients, etc) in front of an ANN. This approach is not within the scope of this study, how
ever. Interested reader can refer to the work of Tracey and Principe [59] who compared the
PM and tapped delay line preprocesors in an isolated speech recognition problem.
Figure 26 shows the logarithm of the spectral estimate of a segment of the speech sig
nal. This is obtained by first applying a rectangular window of width 20 ms over the
speech signal and then computing the magnitude square of its Fast Fourier Transform.
Notice the spiky nature of the spectral estimate.
Figure 27 shows the estimate when the rectangular window is replaced with an expo
nential window. The time scale of the exponential window was chosen to be 150 rad/sec.
This choice is mainly determined by the stationarity duration of the speech sound, which
was 20ms for our case. As a result of the window type used, this estimate is smoother than
the previous one. Although the original method is proposed to be used in the continuous
time domain, we were obliged to carry out the simulation in the discrete time domain
2000
frea (Hz)
3500
Figure 26 Spectral estimate using a rectangular window length of 20 ms
using highly interpolated versions of the sampled signal.1 In this approach, the signal is
first exponentially windowed and later its spectrum is computed using the discrete time
Fourier transform. The derivatives of the spectrum are computed numerically to give an
estimate of the Poisson moments. In Figure 27, the dashed line depicts the estimate
obtained by Poisson moments. We used only 1 moment per band. There were 16 logarith
mically distributed bands that cover the frequency axis. This estimate may be thought of as
a 16 point Fourier transform of an exponentially windowed signal, which is far from being
satisfactory. Figure 28 and Figure 29 display the estimate results for the cases of 3
moments/band and 4 moments/band, respectively. The improvement in the estimates
obtained by the utilization of the moments is clearly seen. Finally, in Figure 30 we plot
the maximum absolute error between the spectral estimate and its estimate obtained using
Poisson moment technique, as a function of the number of moments used per band. This
figure demonstrates the decrease in the error achieved by using higher order moments. The
1. The original signal was sampled at 8Khz. We upsampled it by a factor of 40.
reason we chose to use the maximum absolute error stems from the fact that this is a more
pessimistic measure compared to the MSE in the sense that it penalizes impulse like
errors more and in the case of speech signals, let's us identify errors that occur around the
formant frequencies where the main features of the signal are kept.
1.2 ii
1
Exponential windowed
0.8  Estimate
0.6
16 bands
1 moment/band
0.4 \ X=150
0.2
0
0 500 1000 1500 2000 2500 3000 3500
Hz
Figure 27 Spectral estimate using an exponential window (solid line) and its estimate
obtained by Poisson moment technique (dashed line) (1 moment/band).
In the rest of this section, we will analyze the dependence of the proposed method on
the time scale and the number of bands that are used. For this purpose, we define two error
criteria which complement each other. The first one is the power of the spectral estimation
error normalized with respect to the power of the desired spectrum (i.e. magnitude spec
trum of the exponentially windowed signal). This gives an idea of the average error along
the entire spectrum. This criterion however, does not penalize narrow, impulse like errors
substantially. For that reason, we pick the maximum absolute estimation error normalized
with respect to the peak of the desired spectrum as the second criterion to evaluate the
goodness of the estimates. It should be noted that compared to the error power, this is not
only a pessimistic criterion, but due to its nonlinear nature, it is also sensitive to the param
eters of the method.
0 500 1000 1500 2000 2500 3000 3500
freq (Hz)
Figure 28 Spectral estimate using an exponential window (solid line) and its estimate
obtained by Poisson moment technique (dashed line) (3 moments/band).
0 500 1000 1500 2000 2500 3000 3500
freq (Hz)
Figure 29 Spectral estimate using an exponential window (solid line) and its estimate
obtained by Poisson moment technique (dashed line) (4 moments/band).
Maximum Absolute Error
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 i i i
1 2 3 4
number of moments per band
Figure 30 Maximum normalized absolute estimation error in the entire frequency domain
We estimated the spectrum of segments of 10 'greasy' words that were uttered by the
same male speaker. In Figure 31, we plot the variation of the normalized error power as a
function of the time scale for one of the utterances. The improvement in the estimates as a
function of the the time scale is clearly seen. This is somewhat expected because, for large
values of the time scale the time window gets smaller decreasing the number of degrees of
freedom of the spectral content of the signal.
In Figure 32, we test the PM method using the same signal segment against the second
error criterion: normalized maximum absolute error. This plot too, demonstrates the
improvement in the estimate as a function of the time scale and the moment order.
As mentioned before, the results displayed in Figure 31 and Figure 32 were obtained
using a single word. Figure 33 and Figure 34 demonstrate the results of the same experi
ment averaged over the set of 10 utterances. In both of these figures, there exists a time
scale threshold below which the utilization of high order moments can be actually disad
vantegous. This is mainly due to the fact that the spectral estimates get spikier for small
values of X causing noisier derivative estimates.
200 250 300
350 400 450 500
Figure 31 Variation of the normalized error power as a function of the time scale for different number of
moments M used
Logarithm of the Normalized Maximum Absolute Error
0.4
0.6
0.8 \
/
/
/
I
 S
M=1 "'
S
S.
S.
5
S
'S
S.
.5
S......... M=2
 M=3
..... M=4
100 150 200 250 300
350 400 450
' F
Figure 32 Dependence of the normalized maximum absolute error power as a function of the time scale for
different number of moments M.
~
  . ..
..
..
.
L
Nt
1.5
1
0.5
0
0.5
1
1.5
2
2.5
\ M=1

......... M=2
S M=3
... M=4
 .. .
M4
7 .
%0 133 Dee 2o0 20 o o o 400 4o 500s
Figure 33 Dependence of the average error power on the time scale. (M: number of moments).
100 150
200 250 300
A>
350 400 450
Figure 34 Dependence of the average maximum absolute error on the time scale. (M: number of moments).
An alternate way of displaying the information in Figure 33 and Figure 34 is as shown in
Figure 35 and Figure 36. According to these graphs the fourth moment helps to decrease
the error for time scale values larger than 200.
Logarithm of average normalized error power
Logarithm of average normalized maximum absolute error
3.4
3.2
3
2.8
2.6
2.4
2.2
2
1.8
1.6
1.4
5
S
M=l
\ ......... M=2
\  M=3
.5'
S.
:
SM=4
....
................ .......................
" :..< ..... ....
3

