Title Page
 Table of Contents
 Gamma kernels
 Time-frequency distributions
 Representation of locally stationary...
 System indentification using poisson...
 Biographical sketch

Title: Representation of locally stationary signals using lowpass moments
Full Citation
Permanent Link: http://ufdc.ufl.edu/UF00082352/00001
 Material Information
Title: Representation of locally stationary signals using lowpass moments
Physical Description: vii, 154 leaves : ill. ; 29 cm.
Language: English
Creator: Celebi, Samel, 1966-
Publication Date: 1995
Subject: Signal processing -- Mathematical models   ( lcsh )
Electric filters -- Mathematical models   ( lcsh )
Electrical and Computer Engineering thesis, Ph. D   ( lcsh )
Dissertations, Academic -- Electrical and Computer Engineering -- UF   ( lcsh )
Genre: bibliography   ( marcgt )
non-fiction   ( marcgt )
Thesis: Thesis (Ph. D.)--University of Florida, 1995.
Bibliography: Includes bibliographical references (leaves 149-153).
Statement of Responsibility: by Samel Celebi.
General Note: Typescript.
General Note: Vita.
 Record Information
Bibliographic ID: UF00082352
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: aleph - 002070213
oclc - 34462281
notis - AKQ8475

Table of Contents
    Title Page
        Page i
        Page ii
    Table of Contents
        Page iii
        Page iv
        Page v
        Page vi
        Page vii
        Page 1
        Page 2
        Page 3
    Gamma kernels
        Page 4
        Page 5
        Page 6
        Page 7
        Page 8
        Page 9
        Page 10
        Page 11
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
        Page 23
        Page 24
        Page 25
        Page 26
        Page 27
        Page 28
        Page 29
        Page 30
        Page 31
        Page 32
        Page 33
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
    Time-frequency distributions
        Page 39
        Page 40
        Page 41
        Page 42
        Page 43
        Page 44
        Page 45
        Page 46
        Page 47
        Page 48
        Page 49
        Page 50
        Page 51
        Page 52
        Page 53
        Page 54
        Page 55
        Page 56
        Page 57
        Page 58
        Page 59
        Page 60
        Page 61
        Page 62
        Page 63
        Page 64
        Page 65
        Page 66
        Page 67
        Page 68
    Representation of locally stationary signals using poisson moments
        Page 69
        Page 70
        Page 71
        Page 72
        Page 73
        Page 74
        Page 75
        Page 76
        Page 77
        Page 78
        Page 79
        Page 80
        Page 81
        Page 82
        Page 83
        Page 84
        Page 85
        Page 86
        Page 87
        Page 88
        Page 89
        Page 90
        Page 91
        Page 92
        Page 93
        Page 94
        Page 95
        Page 96
        Page 97
        Page 98
        Page 99
        Page 100
        Page 101
        Page 102
        Page 103
        Page 104
        Page 105
        Page 106
        Page 107
        Page 108
        Page 109
        Page 110
        Page 111
        Page 112
        Page 113
        Page 114
    System indentification using poisson moments
        Page 115
        Page 116
        Page 117
        Page 118
        Page 119
        Page 120
        Page 121
        Page 122
        Page 123
        Page 124
        Page 125
        Page 126
        Page 127
        Page 128
        Page 129
        Page 130
        Page 131
        Page 132
        Page 133
        Page 134
        Page 135
        Page 136
        Page 137
        Page 138
        Page 139
        Page 140
        Page 141
        Page 142
        Page 143
        Page 144
        Page 145
        Page 146
        Page 147
        Page 148
        Page 149
        Page 150
        Page 151
        Page 152
        Page 153
    Biographical sketch
        Page 154
        Page 155
        Page 156
Full Text








Simply looking at this book, it is hard to guess how much of an effort goes into a dis-

sertation study. This book is a record of only a small portion that has been skimmed out of

days and nights of hard work often accompanied by financial, mental and emotional dis-

tress. In order to overcome these difficulties, one needs continuous support of his/her envi-

ronment. I feel very fortunate in that sense.

First and foremost, I would like to thank my chairman Professor Jose C. Principe for

the encouragement he provided during the course of my studies. His availability to his stu-

dents and the abundance of the ideas he offered are irreplaceable. Special thanks are due

to Dr. John G. Harris whose constructive comments and careful proofreading helped to

improve the quality of the thesis. I also thank my committee members Dr. Donald G.

Childers, Dr. Fred J. Taylor and Dr. Sencer Yerelan for the guidance they granted me.

During my first semester in the graduate program at the University of Florida, I was

sponsored by the Turkish Ministry of Education. For this I am deeply grateful. I would

like to take this opportunity to thank my family for their patience and the morale support

they gave me.

It has been a pleasure to be a part of the Computational Neuroengineering Laboratory.

I benefitted from it both socially and academically. I'd like to thank all my labmates for the

friendship they extended me.

Finally, I thank Paula very much for being with me during all this. It would be very

hard, if not impossible, without her support and love.



ACKNOWLEDGEMENTS ................................................................................. ii

ABSTRACT ......................................................................................................... vi


1 INTRODUCTION ..................................................................................... 1

2 GAMMA KERNELS ................................................................................. 4

Introduction ................................................................................ 4
Gamma Kernels .......................................................................... 4
Gamma Filter ....................................................................... 6
Gamma Filter as a Memory Model ........................................ 8
Laguerre Kernels ............................................................................ 9
Laguerre Polynomials ............................................... ........... 11
Laguerre Polynomials and the Confluent Hypergeometric Equation 12
Parametric Representation of Signals Using Gamma Kernels ........... 13
Statement of the Problem ........................................... .......... 16
Structure of the Gamma Space .................................... ......... 17
Optimum Value of the Weights .................................................. 19
Optimum Value of the Time Scale X ......................................... 22
Previous Work .................................................................. 22
Optimum X .............................................................................. 22
Estimation of optimum X using moment matching ............... 25
Example ............................................................................ 29
Estimation of optimum X using a matched filter ................... 32
Online Adaptation of the Weights and the Time Scale ................ 34
Conclusions ................................................................................ 37


3 TIME-FREQUENCY DISTRIBUTIONS .................................................. 39

Introduction ........................................................................................ 39
Uncertainty Principle .......................................................................... 39
Time-Frequency Representations ..................................................... 42
Cohen's Class of Bivariate Distributions .................................. .. 45
Spectrogram and Its Relation to Wigner Distribution ................... 49
Conclusions ................................................................................. 50

4 INTERPOLATION ............................................................................... 52

Introduction ............................................................................... 52
Polynomial Interpolation .................................................................... 52
General Problem of Finite Interpolation ......................................... 53
Example ........................................................................................ 54
Systems Possessing the Interpolation Property ............................ 55
Hermite Interpolation .................................................................. 56
Hermite Interpolation of ARMA Signals ................................. .. 59
Issues in Hermite Interpolation ...................................... ......... 60
Region of convergence ........................................... .......... 60
Bandwidth of interpolation ..................................... ........... 62
Maximum error versus the bandwidth .................................. 65
Conclusions ................................................................................. 66

POISSON MOMENTS ................................................ ............ 69

Introduction .................................................................................. 69
Gamma Filter and the Poisson Moments: Analysis Formula ............... 70
Reconstruction of a Signal from Its Poisson Moments: Synthesis
Formula .................................................................................. 71
Poisson Moments and the Magnitude Spectrum ............................... 73
Spectral Hermite Interpolation ......................................... ......... 75
Spectrogram with Exponential Decaying Window ......................... 76
Variance and Bias of the Specrogram Estimator ................................. 78
Choice of the Time Scale ................................................................. 83
Examples ......................................................... 87
Asymptotic Properties of the Poisson Moments ............................... 97
Impulse Distribution Expansion Versus Taylor's Series Expansion... 97

Choice of Time Scale and Measurement Time .............................. 98
Poisson Moment of a Signal and Its Tune Delayed Versions ........... 105
Signal Classification Using the Poisson Moments ............................. 107
Estimation of the Time Delay ....................................................... 109
Example .......................................................................................... 110
Conclusions .................................................................................... 112


Introduction ........................................................................................ 115
Poisson Moments and Their Relation to Function Derivatives ......... 116
Method I ........................................................................................ 118
Method II ........................................................................................ 119
Example ........................................................................................ 119
Effect of. ......................................................................................... 120
PMF in the Identification of Time-Varying Parameter Systems ........... 129
Least Squares and Recursive Least Squares Estimates ..................... 130
Conclusions ........................................................................................ 130

7 CONCLUSIONS ...................................................................................... 133


LAGUERRE KERNELS .................................................................... 136

B SUMMARY OF PARKS' AND MASNADI'S WORKS ........................ 139


D A PROOF ON THE SCHUR COMPONENT ......................................... 144

ARMA SYSTEMS ............................................................................... 147

REFERENCES ................................................................................................... 149

BIOGRAPHICAL SKETCH ............................................................................... 154

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy




December 1995

Chairman : Dr. Jose C. Principe
Major Department: Electrical Engineering

The gamma filter is a generalized feedforward structure which can be implemented as

a cascade of identical first order lowpass filters. Although functionally it is an infinite

impulse response (IIR) filter, topologically it mimics a finite impulse response (FIR) filter.

As a result of this blend, it inherits the computational power of the former and the easy

stability analysis of the latter. Being a single input multiple output structure, the gamma

filter is characterized by its order and the time scale which linearly warps the time axis.

Furthermore, when orthogonalized, the gamma filter transforms into the well-known

Laguerre filter.

The gamma kernels correspond to the impulse responses of the gamma filter outputs.

This family of decaying exponentials form a complete set in the Hilbert space. The first

problem attacked in this study is the approximation of a given signal as a weighted sum of

gamma kernels such that the mean square error is minimized. Conditions for the optimum

weights and time scale are derived and they are decoupled from each other. A vector space

interpretation of the role of the time scale is also provided. According to this interpretation
the time scale parameter has the effect of changing the relative angle between the desired

signal and the gamma projection space without altering the structure of the gamma space.

Two numerical methods to estimate the time scale are included.
The gamma filter produces at its taps, the Poisson moments of the input signal. Pois-

son moments correspond to the Taylor's series coefficients of the recent magnitude spec-

trum of the input signal at zero frequency. Localized in time and frequency, Poisson

moments offer a computationally inexpensive, alternate time-frequency representation for

locally stationary signals, such as speech. In the analog domain the moments can be phys-

ically measured and fed into a nonlinear mapper, such as an artificial neural network for

tasks such as classification, prediction and identification. For a faithful representation of

the spectrum, it is sufficient to observe the moments at the information rate rather than the
usually higher Nyquist rate. This convenience removes a big deal of burden from the

shoulders of the processor following the gamma filter.

Very much like the Laplace transform, Poisson moments have the virtue of transform-

ing linear differential equations into algebraic ones. Based on this property, one can lin-

early solve for the unknown parameters of an autoregressive moving average (ARMA)

system simply by observing the Poisson moments of the system input and the output at

finite number of time instants. Although the method doesn't put any restrictions on the

choice of the filter time scale and the observation times, in practice, these parameters play

a significant role in determining the accuracy of the results. If the parameters are not

selected properly, the results tend to be ill-conditioned. This study provides some insight

into the choice of the time scale.



The purpose of this study is to find alternate signal representation schemes for locally

stationary temporal signals, such as speech. These alternate representations are expected

to express the signal in a low dimensional space with sufficient fidelity and little computa-

tional cost. Analog implementation and resistance to noise are also among the desirable

features of the scheme.

Having restricted the scope of our search to linear structures, we realized that the

desired structure had to incorporate short term memory in order to be able to process non-

stationary signals. In that respect, Finite Impulse Response (FIR) filters are not a good

choice due to the coupling between the filter size and the depth of the past information

they can provide. The other alternative, the Infinite Impulse Response (HR) filters, on the

other hand, suffered from instability problems.

Given these constraints, we concentrated our studies on the gamma filter and the fam-

ily of exponential kernels that correspond to its impulse response: the gamma kernels.

Developed by Principe and deVries [1]..[3], the gamma filter is a generalized feedforward

filter. While functionally being an IIR, topologically it mimics an FIR structure. As an

outcome of this blend, gamma filter inherits the computational power of the former and

mathematical tractability and the stability of the latter. Furthermore, compared to a FIR fil-

ter of the same order, it has only one more degree of freedom.

In the early stages of our research we observed that the gamma filter was more effi-

cient1 than the conventional tapped delay line when applied to classical signal processing

1. In terms of the number of parameters required for a given mean square error.

tasks such as prediction and system identification. Encouraged by these results, we have

decided to explore this structure in more detail from a representation point of view.

Another finding that motivated us to analyze the gamma filter is the fact that structures
similar to the gamma filter are pervasive in various environments. Depending on the
medium, the gamma filter may take the form of a cascade of RC circuits, or a leaky trans-

mission line or even a network of synaptic connections [4]. There is also some evidence

that some of the preprocessing of the time-varying biological signals are done by struc-

tures similar to the gamma filter [5].
The core of this study is composed of analyzing the gamma filter from three perspec-

tives: parametric and nonparametric signal representation and system identification.

Because of the distinct character of these individual topics, rather than organizing the the-

sis in the conventional way, where there is a separate results section, we prefer to present

the results within the body of each chapter followed by some concluding remarks.
The flow of the thesis can be summarized as follows: In Chapter 2 we introduce the

gamma kernels and attack the problem of representing signals as a weighted sum of these

kernels which are shown to be complete in the Hilbert space. The solution of this paramet-

ric representation problem turns out be a generalization of the Wiener-Hopf normal equa-

tion. This chapter also provides a vector space framework interpretation of the problem

clearly demonstrating the advantage of using the gamma structure over that of the FIR fil-

ter. In addition to the derivation of the optimality conditions, we link the gamma kernels to
the Kummerfunctions which encompasses several well-known functions such as Bessel,

Gaussian, Hermite, Laguerre and exponential.

In Chapter 5, we pick a nonparametric representation point of view and suggest to use

the gamma filter as an alternate time-frequency representation scheme. The gamma filter

generates at its taps the time moments of the input signal. These are called the Poisson
moments and they can be regarded as signatures of the input. In this chapter we show

how the Poisson moments carry information about the spectral content of the signal. This

information is localized both in time and frequency and it is readily available at the out-

puts of the gamma filter as a measurable quantity. Hence, the moments offer a computa-

tionally inexpensive time-frequency representation.

Chapter 3 is tailored to provide background information on the time-frequency distri-

butions, a topic that is frequently visited in Chapter 5. In doing so, we pay particular atten-

tion to the spectrogram method. By the same token, Chapter 4 provides background

information on the interpolation theory with special attention to the Hermite interpolation.

This is done in an attempt to prepare the reader for the nonparametric representation prob-

lem studied in Chapter 5.

Chapter 6 ponders the problem of identifying autoregressive moving average

(ARMA) systems using the gamma filter. ARMA systems are described by linear differen-

tial equations. Very much like the Laplace transform, the Poisson moments transform the

linear differential equations into algebraic ones. Using this, we will show how to algebra-

ically determine the unknown parameters of a linear system by observing a finite number

of Poisson moments of the system input and the output.

Finally, Chapter 7 underscores the highlights of the study and cites possible future

research issues.




The core of this study is composed of the utilization of the gamma kernels and their

orthogonalized counterpart, the Laguerre kernels, in parametric and nonparametric repre-
sentation of signals and in the identification of systems. This chapter begins with introduc-

ing these two family of kernels and elaborates on their properties. Although the gamma

and the Laguerre kernels span the same space, individually they help to explain different

aspects of the problem. Therefore, we will introduce both families and link them to each

other when necessary.

Gamma Kernels

In the course of this study, the term kernel refers to the implementation of an opera-

tor. The gamma kernel, in particular, refers to the implementation of a generalized delay

operator. The gamma kernel was devised to substitute for unit delay operator which per-

formed poorly in the processing of signals with redundancies. We define the gamma ker-

nels as

,t-I -At ;.>O,t> O, I Mwl
o ( t) e
) (n 1)! = 1, 2, 3, ... (1)

This family of kernels constitutes a subset of eigenfunctions of the autoregressive moving

average (ARMA) systems. Therefore, it is a good candidate for modeling signals that

accept the linear model. The kernels are complete in the Hilbert space (L2 space) which

spans all the finite energy signals. This implies that any finite energy signal can be approx-
imated arbitrarily closely using a finite number of gamma kernels. The parameter X con-
trols the time scale (or the memory depth) of the kernels by changing the width of the
kernels. Increasing X has the effect of linearly compressing the kernels, while decreasing X
has the opposite effect. Figure 1 depicts the kernels in the time domain for 1=l. For rea-
sons that will become clearer in later sections that deal with the memory structures, the
area under each kernel is normalized to unity i.e,

gn (,t) dt = 1 (2)

The Laplace transform of the kernels is given as

Gn(s) =( )= G (s)

where G (s) L { g (, t) } =

Gamma Filter

Equation (3) suggests a practical implementation for the kernels, in the form of a cas-
cade of identical first order lowpass filters. This structure shown in Figure 2 is called the
gamma Filter [1] .The impulse response of the filter from the input to the kh tap is given

by gk(,t). The Kt order gamma filter transfer function is characterized by a pole at s=-X
with multiplicity K. The time scale X is responsible for setting the position of this pole.
The position of the zeros however, are determined by the weights wl,w2,...,WK and the
time scale X.




0. 6 g2(%t)

0.4 / g3(t)


2 4 6 8 10

Figure 1 Continuous time gamma kernels for =l

Figure 2 Gamma filter.

Gamma Filter

x(t) [ r7 7wI

1st stage 2nd stage K1h stage

' DiscreteTime Continuous Time

Gamma kernels satisfy the following differential equation

agk+ 1( t) = [gk t) gk+ 1 (( t) ] (4)

If the time derivative is replaced by the backward time differences, the filter can also be
realized in the discrete time domain. For this case, the kernels satisfy the following differ-
ence equation

gk+ (n) = Xg (n- 1) + (l-X)gk+ (n- 1) (5)

The discrete time kernels and their corresponding z-transform are given by

gk (n)-= n-(lX nSk

G4(z) = z-(1 )

Notice that for the special case of -=1, the discrete time gamma filter becomes afinite
impulse response (FIR) filter. One can think of the gamma filter as a generalizedfeedfor-
ward structure where the unit delay operator z1 is replaced with a generalized delay struc-
ture G(z) that accommodates a local feedback. Due to the inclusion of this feedback, the
gamma filter becomes a recursive structure. Recursive structures are computationally
more efficient than their feedforward counterparts; however, they may suffer from instabil-
ity problems. That is why FIR filters are almost exclusively used in adaptive signal pro-
cessing. The instability problem can be taken care of by restricting the feedback
connections to be local. In the gamma filter, stability is guaranteed as long as one of the
identical stages is stable, which prescribes X>0 for the continuous time and 0 discrete time case.

The Gamma Filter as a Memory Model

We start this section by quoting two definitions, one for the memory depth and the

other for the memory resolution, proposed by Principe [1][2][6] in an attempt to quantify

and compare different memory structures.
Definition: Memory depth D is the temporal mean value of the impulse response g(t) of the
filter used for storing temporal information. Considering the impulse response as a proba-

bility density, this corresponds to the first moment of the impulse response given by

St ( d = dG(s) continuous time
= tg(t dt-- ds
o s=0 (7)

D= ng (n) = -z dz = discrete time

Definition: Memory resolution R is the number of degrees of freedom (i.e. memory state

variables) per unit time.

The conventional digital signal processing structures are built around the tapped delay

line (G(z)=z'1) which is an FIR filter. The memory depth of an FIR filter is equal to the

number of filter parameters, that is the number of its taps. This strict coupling of the mem-

ory depth to the number of filter parameters leads to poor modeling of lowpass frequency

signals which usually have a long sustenance in time, but few degrees of freedom. Further-

more, the high memory resolution of the FIR filters makes these filters susceptible to

noise on lowpass signals. Contrary to FIRs, memory depth of an IIR filter can be made
arbitrarily large at the expense of resolution. These filters, however, suffer from instability

problems. The gamma filter in the meantime, incorporates a cascade of generalized delay

units in the form of an overall feedforward structure. Restricting the feedbacks to be local

solves the instability problem while preserving the decoupling of the memory depth from

the number of system parameters. For the case of a Kt order gamma filter, the memory

depth equals

dG (z) K (8)
Dgamma = -Z dz Iz =

The resolution of the gamma filter is given by

R g a K (9)
gamm a (KI/)

Note that the gamma structure obeys the relation

order =depth x resolution
K= DxR

This expression tells us that the gamma filter is able to trade the memory depth for the res-

olution by varying its time scale parameter X. The memory depth and resolution for the

FIR structures, on the other hand, are fixed at DFIR=K and RFIR=.

Laguerre Kernels

The gamma kernels lead to the gamma filter which can be thought of as a link between

FIR and IR structures. They are also useful tools in understanding memory models.

Although they have other appealing properties such as being produced recursively from

each other using convolution, they still lack one important feature: orthogonality.1 These

kernels are linearly independent of each other, but not orthogonal. They can, however, be

orthogonalized using the Gram-Schmidt procedure. When the Gram-Schmidt procedure is

applied to the gamma kernels, one obtains the Laguerre kernels that have the form

1. Another advantage of the gamma kernels over the Laguerre kernels is that we can expand them to
the continuous index case where k is a real number

gk(Mt) = --e k21
r (k)

L (t) = A ? ( 2 -

n = 0, 1,2,...


These kernels are complete in the L2 space and they form an orthonormal set such that

Ln(t) Lm(t)dt = Snm

8 =(
nm 0

n m


The Laplace domain representation of the Laguerre kernels is

Ln () = (13)
(S + X)n+ I

Similar to that of the gamma filter, (13) suggests a filter structure in the form of a first

order lowpass filter followed by a cascade of identical first order allpass filters. Again, the

overall structure is characterized with a single multiple pole at s=-X.

Laguerre kernels may be linked to the gamma kernels through the following relation


Ln(t) = G 1 (-2) (i-1) g(Xt)
i= 1

One can also realize the Laguerre kernels in the discrete domain. The corresponding z-

domain representation will then be

Ln) = ( = 0122 1- (15)z
L -(z) n=0,1,2,... (15)

t, 0

Before proceeding to the problem of signal representation using the Laguerre kernels,

we would like to introduce an orthogonal family of polynomials that shed some light into
the origins of the Laguerre kernels.

Laguerre Polynomials

A family of polynomials that are closely related to the Laguerre kernels is the
Laguerre polynomials. Laguerre polynomials have been studied extensively in the litera-
ture [7]. These polynomials are orthogonal on the interval (0,oo) with respect to the weight
function et. In other words,

n(t)Lm (t)e-tdt = nm (16)

where Ln (t) is a polynomial. Laguerre polynomials satisfy the equation1

e' dn
S(t) (xle-x) n = 0, 1, 2,... (17)
n! dt

which can be expanded as

Ln (t) = I i (18)

Like any other orthogonal polynomial system, Laguerre polynomials can be computed

using the triple recursion relation. The recursive relation (also called the Christoffel-Dar-

boux formula) [7] is

1. A more general definition for the Laguerre polynomials also exists where the polynomials
La (t) are orthonormal with respect to the weight tee" and they satisfy the following equation [7]
( Ca d-t", a+,,_
l (rn (xd er
n! d?

(2n + 1 t) n (t) (19)
Ln + (t) = l (t) -t) (19)
n+1 n+1

Comparing (11) and (18), Laguerre polynomials can be related to the Laguerre kernels
through the expression

Ln (X, t) = F2Ln (2Xt) e (20)

Laguerre Polynomials and the Confluent Hypergeometric Equation

The second-order ordinary differential equation

dZ dx (t)
t-x (t) + (Y- t) dt- x (t) = 0 (21)
dt? dt

is called the confluent hypergeometric equation [8][9][10]. The parameters a and y can
assume any real or complex value except for '0,-,-1,-2,.... The variable t may be complex.
One of the solutions of (21) is called the Kummerfunction (or the confluent hypergeomet-
ric function) and it is denoted by 1FI(a,y ;t). This function can be computed
by evaluating the infinite series

at a(a+1) t
F, (, ; t) = 1++ + -+... (22)
yl1! y(y+ 1) 2!

Many well known transcendental functions, including the Laguerre polynomials, hap-
pen to be the special cases of Kummer's function with different selections of the parame-
ters a and y. Table 1 lists some of these functions. For an extensive list see [9]. Referring
to Table 1 it can be seen that the Laguerre polynomials satisfy the differential equation

SdLn (t)
t n (t) + (1 t) + n (t) = 0 n = 0, 1, 2,... (23)
d? dt

whose solution is

n (t) = 1F, (-n, 1;t) (24)

Parametric Representation of Signals Using the Gamma Kernels

Decomposition of signals in terms of the linear sums of basis functions is a powerful
tool that allows the representation of a signal as a discrete table of numbers, in lieu of its
original form. The new representation is essentially the projection of the original signal
onto the hyperplane spanned by the basis functions Oi(t).

(t) = wii (t) (25)

Choice of these basis functions may vary depending on the application. The basis
functions can be fixed as well as adaptive. Examples of popular basis functions are com-
plex exponentials, which lead to Fourier series expansion Sine functions which are used
in representing bandlimited signals in terms of their time samples [11] and Gabor func-
tions which have the best time-frequency resolution properties and consequently allow the
maximum information transfer rate [12]. Among these, complex exponentials secure
themselves a special place by being the eigenfunctions of linear systems. Even though
complex exponentials prove to be an invaluable tool in analyzing linear systems, they do
not seem to be very efficient (in terms of numbers of bases required) in representing tran-

Table 1 Some well-known transcendental functions derived from Kummer's function

Function t a y Relation to IFi(a,y ;t)

Bessel 2jt v+1/2 2v+l r(+v)e() J (t)

t --
ModifiedBessel 2t v+1/2 2v+l r(l+v) et () I(t)

Coulomb Wave 2jt L+l-jTl 2L+2 eitFL (i, t) t- 1 CL (Iq)

Incomplete Gamma -t a a+l at-aGamma (a, t)

Exponential t a a et

Trigonometric -2jt 1 2 sin (t)

Hyperbolic 2t 1 2 e sinh (t)

n! 1 -n
Hermite t/2 -n 1/2 (2n)- H2 (t)

Error Function -t2 1/2 3/2 -erf(t)

Exp. Integral -t 1 1 -e-Ei (t)

Laguerre t -n 1 (-l)"n!L, (t)

sient signals and signals with low degrees of freedom and long sustenance in time. This is
mainly due to the fixed projection manifold these basis functions define.

A long-neglected family of linear system eigenfunctions is the family of exponentials

given by

i (t) = ftre u (t) 0 r < i = 0, 1,... X > 0 (26)

Closely related to the complex exponentials, this family proves to be an efficient tool

in representing signals with redundancies [13]. In this study, we suggest to use the gamma
kernels in linearly expanding a given signal. The gamma kernels can be considered to be a

subset of the exponential family where all the time scales are limited to be the same, i.e.

ki=X. Laguerre kernels would be an equally good candidate, since these two kernel fami-

lies span the same space. Both kernels happen to be eigenfunctions of autoregressive mov-

ing average (ARMA) systems. Therefore, they are inherently able to capture the dynamics

of signals that accept the linear model.

When the number of bases used in the decomposition is finite, the representation prob-

lem transforms into an approximation problem. In order to assess the goodness of the
approximation (or equivalently the closeness of representation) one needs a measure that

is called the norm. Two of the popular norms are L2 and L" which are the special cases

(p=2,o) of LP norm that is defined as


L f(t)} = ifP(t) dt (27)

The L2 norm which will be used throughout this study finds application in system identifi-

cation and prediction, while L" has been shown to be of interest for control synthesis

problems.The L2 norm can be interpreted as the Euclidean distance. In optimization it has

the advantage of yielding a set of linear equations in terms of the basis weights. The

approximations obtained using L2 norm are called the least squares or mean square error


Statement of the Problem

We would like to approximate a signal d(t) as a weighted sum of gamma (Laguerre) ker-


d(t) = y (t) = wngn (, t) = WTG
1n=1 (28)

W= [w1, W2, ..., Wn] weight vector

G = [g (, t), ..., gK( t)] b
basis vector

such that the mean square error (MSE) J

J= =e2(t)dt (29)
e (t) = d (t) -y(t)

is minimized. In other words, we are trying to approximate the signal d(t) by its projection

on the K-dimensional subspace of gamma (Laguerre) kernels. This way we will obtain a

parametric representation of d(t) in terms of the weight vector W and the time scale X.

The gamma and Laguerre kernels each constitute a complete set in L2. That means by

selecting the approximation order K sufficiently high, the mean square error J can be made

arbitrarily small. There are two questions that need to be answered in this optimization


i) What should be the optimum value of the weights?

ii) What should be the optimum value of the time scale X such that the MSE is minimized?

The answer to the first question has been given by Wiener, Kolmogorov and Krein

[14][15] early this century. The solution is embraced in the Wiener-Hopfequations (nor-

mal equations). Here we will be attacking a generalization of this equation due to the extra

time scale parameter X involved in the gamma bases. As a result of this parameter
involved, the problem is called a parametric least squares approximation. A thorough dis-

cussion on parametric least squares problem can be found in [16]. Before proceeding fur-

ther with the calculation of the optimum weight and the time scale, we would like to

inspect the structure of the space spanned by the set of the gamma bases {gn(,,t) : n=l,2,

...}, to get insight into the roles of the weights and the time scale in the approximation.

Structure of the Gamma Space

Here we will inspect the gamma bases in a vector space framework. Notice that

despite the fact that the gamma bases are linearly independent, they are not orthogonal. If

we regard each basis as a vector in L2 space, then the cosine of the angle between two vec-

tors will be equal to their correlation coefficients. The correlation between the bases gn(t)

and gm(t) is given by [6]


= gn (, t) gm (, t) dt
o (30)
00 n+m+m-2
= e-2dt
(n- 1)! (m- 1)!

making the substitution X=j3/2 and n+m-2=k-1 we can rewrite this as

(k-1)! e t (31)
= (n- 1)! (m- 1)! 2n+m-1 (k- 1) ()

The integrand is recognized as gk(P,t) the area under which is unity due to normalization
given in (2). Therefore,

1 n+m-2
= 2n+n n 2 (32)

With that in mind, the cross correlation coefficient is given as

(n + m 2) !
Pnm 2 (n m-2 )! (33)
2 2 2 (n-l)!(m-1)!

hence, we obtain the angle between gn(t) and gm(t) as

(n+m-2)! ~ !1(4
Zgngm =Cos'[ + m 2) ] (34)
m l2(n- )!( m- 1)!

Equation (34) tells us that the angles among the gamma basis vectors do not change as
a function of X. This is an important property of the space constructed using the gamma
bases. X conformally maps the basis vectors onto another manifold. Conformal mapping is
the class of mapping that preserves the angle relations. Figure 3 demonstrates the angular
relation among the first three bases. In spite of the fact that, the space itself is parametric in
X, the infrastructure of the space does not change, because X leaves the relation among the
bases intact.
We can view X as an extra degree of freedom which has the effect of changing the rel-
ative angle of the manifold with respect to the desired signal d(t). Lengths of the bases
actually change as a function of X. However, in a decomposition problem, scaling of the
basis functions is insignificant since a proper choice of the weights will counterbalance it.

Figure 3 Angular relation among the bases

Figure 4 The effect of changing X on the gamma space

Optimum Value of the Weights

For a fixed X, the best choice of weights can be found from the condition
J// W = 0 which yields the normal equations [14]

R., (.) Wpt., = P, ()



the P vector in (35) holds the crosscorrelation values between the bases and d(t). It has the

P,1 = [...]T (36)

In (35) R(k) is the Gram matrix of the bases. Its (nm)th element is which
was already derived in (32). The result is duplicated here for convenience.

= 2 n (n-1) (37)

Using (32), the Gram matrix R(X) can be written as

R (X) = XRc (38)

where Rc is a constant matrix whose (n,m)th element is

1 (n+m-2 (39)
2n+m-1 n-1 )9

For convenience, we will write gn(X,t) as gn(t) from this point on unless there is a possibil-
ity of confusion. With that in mind, the optimum weights are given by

WPt' = R^ P () for gamma kernels (40)

If the approximation problem defined in (28) is carried out using the Laguerre kernels
instead of the gamma kernels the autocorrelation matrix in (35) becomes an identity
matrix and in return the optimum weights have a much simpler form

W'pt = p (k) for Laguerre kernels
where P () = [...]T

For a given %, the mean squared error J is quadratic in the weights. Therefore WOpt has a
single solution and that solution depends on X. That's why this is called a parametric least
squares problem.
Let's evaluate the mean squared error when the weights are chosen to be optimum.
This is called the Schur complement.

Jsch () = J () = WoP, =

Jsch () = = Ed- 2W tP + W o;tW(42)

RWopt = P

Jch () = Ed WT tP

In (42) Ed denotes the energy of the desired signal d(t). One gets a simpler form for the
Schur complement if Laguerre kernels are used as the basis. Combining (41) and (42)

Jsch () = Ed Wpt = Ed- opt, for Laguerre kernels (43)

The value of the Schur complement Jsch(X) is the same for the gamma kernels and the
Laguerre kernels, however, the weights of the decomposition may be different. In the fol-
lowing section we will attempt to minimize Jsch(X) and decouple the optimum value of
the time scale X from that of the optimum weights.

Optimum Value of Time Scale X

Previous work

In this section we will address the problem of finding the optimum time scale for sig-
nal approximation using the gamma and the Laguerre kernels and derive the optimality
condition that is independent from the optimum weight condition. Determination of the

optimum time scale is not a new issue. The literature contains numerous publications on

the approximation of signals using the Laguerre kernels [17].. [24], all in the quest of opti-

mum time scale. Although, no exact solution has been found yet, the works of Parks [22]

and Masnadi [24] who managed to solve the problem for special cases are worth mention-
ing here. In his work, Parks gives an estimate of optimum X for a special class of continu-

ous time signals which are defined by two moments. Later, Fu [23] extended this study to

the discrete time signals. Meanwhile, Masnadi showed that when the order of approxima-

tion was chosen larger than a lower bound and when dealing with signals with no repeti-
tive poles, optimum X was the only real root of a polynomial. He gave the coefficients of

this polynomial in terms of the poles of the signal that is to be approximated. Both Parks'

and Masnadi's works are summarized in Appendix B. In this study we will derive the opti-

mality conditions for the time scale and devise a moment based method to estimate the
optimum value satisfying that condition.

Optimum X

Let's look at the value of the mean squared error for a given X when the optimum

weights are chosen. As explained before, this is called the Schur complement and it

occurs when e(t), the error of approximation is orthogonal to all the bases used (i.e. nor-

mal equation is satisfied). Before proceeding to the calculation of optimum X's from the

condition alJ al = o let's take a look at the dependence of Jch on X. Figure 5 depicts how

the dependence of Jsch on X looks like when d(t) is chosen as the impulse response of an

arbitrary third order elliptical filter. This figure is obtained by evaluating the optimum

weights in (40) for different values of X and approximation order K and calculating the

corresponding mean square error numerically (see Appendix E).The structure of this fig-
ure is typical for both continuous time and discrete time cases. As stated before, Jsch

curves are identical for the choice of gamma and Laguerre kernels since these two kernel

families span the same space.

Figure 5 Typical structure of Jc curves

As can be seen from Figure 5, Jsch curves coincide with each other for Kth order and

K+1st order approximations at the points where al,Ch/a = o is satisfied. That is, two con-

secutive curves touch each other at the local minima or local maxima. A proof of this

property can be found in Appendix D. For a discrete time version of this proof see Silva

[25]. Since Jsch has the same value for orders K and K+l at the local minima, this implies

that the K+Ist kernel does not contribute to the approximation of d(t) at these points In
other words the nonzero component of the K+lst kernel that is orthogonal to the first K


kernels is also orthogonal to d(t). For the Laguerre kernels this corresponds to the K+Ist
Laguerre kernel LK(X,t) itself, since this is orthogonal to the Lo(,,t)...LK.I(X,t) by defini-
tion. We can write the condition for the optimum X as

=ch 0 0 = wK = 0 (44)

Unlike the Laguerre kernel, the K+1st gamma kernel is not orthogonal to its predeces-
sors. Let's denote the component of gK+i(t) that is orthogonal to gl(X,t),..., gK(%,t) with

g, + (X, t). It can be obtained by subtracting projections of the first K kernels from the
K+Ist kernel. Actually, this new kernel turns out be a scaled version of LK(t). For the
gamma kernels, the general form of the ih orthogonal component is given by

gi (, t) = aig, (, t) = ATG = 1, 2,...
(* ^ 1 (45)
where a, = V }(- ) ()

Af = [ali...aii]

The optimality condition of (44) can be rewritten for the gamma kernels as

ajsch 0-) T
= 0 = AK+ IP () = 0 (46)

The crosscorrelation vector P was already defined in (36), this time its dimension is
K+1. There are no weight terms in (44) and (46) which means that the optimum value of X
can be computed separately from that of the weights. Given d(t) one can first determine
the best orientation of the manifold (i.e. X) of order K to represent d(t) and then on this

manifold can compute the weights of the bases using the normal equation (40) or (41)
depending on the kernel used.

Challenges in computing the optimum time scale

Equations (44) and (46) state the necessary conditions for optimum time scale. How-

ever, these are only the necessary conditions, not the sufficient conditions. Due to the non-
linear dependence on X, Jsch function has a nonconvex shape. For an order K
approximation Jsch has K minima, only one being the global minimum. The optimality

conditions specified in (44) and (46) hold for all the minima and therefore does not distin-

guish the global minimum from the local ones. In a way this is expected since this condi-
tion was obtained by setting the gradient ofJsch to zero. Gradient techniques do not suffice
to find the global extremum by themselves in nonconvex functions. They bear only local

information about the function characteristics and therefore they need to be backed up by

global information about the function itself.

Another challenge in computing the optimum time scale is the involvement of the
crosscorrelation operation. One possibility of overcoming this is to substitute the expecta-
tion operator by a time operator, and under the assumption of ergodicity, compute the opti-
mal weight. Unfortunately for the case of (46), there are two problems with that approach:
first, the gamma bases are infinite in time, which means that the correlation computed with
a finite window may produce severe truncation. The second problem is related to the fact

that X is an implicit variable in the correlation. Therefore the conventional procedure to

estimate the normal equation does not hold for our case. In the next section we derive a

numerical estimation method which overcomes these difficulties by factoring X out of the

correlation operation and in return transforming the condition given in (46) into a polyno-
mial equation in terms of X.

Estimation of optimum X using moment matching

In this section we will address the problem of estimating the value of X that minimizes
the mean square error J. As stated before, the optimum X has to satisfy the normal equa-
tion in (46) (or (44) for Laguerre kernels) while the optimum weights have to satisfy the
normal equation in (40) (or (41)). We propose to use the moment matching method which
basically attempts to estimate X by equating a finite number of time moments of y(t) to
those of d(t), i.e. we want to make

mi{y(t) } = m{d(t)} I = O...K (47)

Using (47) we propose to create K+1 linearly independent equations which will enable us
to solve for K weights and k The operator mi{ } reads as the ith time moment of. Let's first
derive the ith time moment of the nth gamma basis, which is defined as

mi {(gn )} g, (, t)dt (48)

Matching the moments in time is the dual of Taylor's series approximation around f=0
in the frequency domain. In the time domain if the first K moments are matched, this will
correspond to the Kh order Taylor's series approximation performed in the frequency
Inserting the definition of gn(,t) into (48) yields

ntn+i-1 -t = (n+i-1)! 1
m {gn (, t) } = dt (49)
(n-1)! (n-1) (i

From this we can calculate the ith time moment of the estimate y(t)

mi{y(t)} =mi {GT,,.+,W,,,i,}
Ti- '- i l.+ T


The weight vector W and basis vector G were already defined in (28). Now their dimen-

sion is K+1. Here qi is a row vector given as


Inserting (50) into (47) one gets

i- mi {d(t) }

i = O...K


These K+1 sets of equations can be put into a matrix form.

.QW = AMd

mo {d (t) }1
Md= :
Im {d(t) -K+1xK+1



o: JK+lxK+ I


qi = (i 1)... (n+i-1)! (K+i- )!
= .I (n- )! (K (K-1)! ,1K+1

Now let's force equation (53) to obey the normal equations for the weights, i.e. let's sub-
stitute W with 1R1P yielding

QR-P = AM1
QR P = AMd


We can merge Q and Re-1 together to form a matrix H


HP = AMd

the unknown crosscorrelation vector P can be solved as

P () = H A Md
K+ixK+1 KI+1xf+l Kr+IlI


There is one other normal equation that we have to satisfy.


AK+ P(,) = 0

Premultiplying (57) with ATK+i yields the polynomial f(X)

f() = AP,P (,) = ATIAMd = 0


of order K+1 with constant coefficients. Since we have enforced the normal equations (40)

and (45) during the derivation of (59), roots of the polynomial f(%) turn out to be the esti-

mates of the local minima of Jsch(X). f(%) is a polynomial with constant coefficients and its

roots can be computed using numerical techniques. To demonstrate this method an exam-
ple is included.


In this example, the signal to be decomposed is chosen as

d(t) = t2e-0.4-te-O.2



Figure 6 shows the normalized mean squared error Jsch as a function of X for approxima-

tion orders K=3 and K=4. J is computed by determining the optimum weights from (40)

for various values of X. Normalization is done with respect to , the energy of the

signal to be approximated.

Figure 6 Normalized J as a function of X

For order K=3, optimum value of k is around )opt-=0.2. At =Xopt, Jnormalized-0.095.

Using the moment matching technique, we will get an estimate of opt that is close to the

theoretical one.


order K=3

For K=3, Md vector that holds the time moments of d(t) can be computed as

M = (61)
2 1-" m2 02S

Matrices Rc, Q and H can easily be formed as

16842 1 1 1 1
R 8 6 4 1 2 3 4 (62)
S32 4 5 2 6 12 20
2 455 62460120

0 8 -16 16
H --i1 -8 56 -18 112 63)
c -64 432 -992 8
5-28 3504 -7872 6048

Also from (45)

A4= [1 -6 12 -8] (64)

Substituting these into (59) one gets

f(X) = ATAHl Md
= (0.39+ 2.93X- 131.8X2 + 476X3)X = 0

Roots of f(X) are X1=0, X2=-0.04, X3=0.08, X4=0.236. When Xopt is chosen as 0.236 this

yields Jnormalzed.1 1 which is slightly worse than the theoretical optimum. Figure 7

illustrates the approximation of d(t) obtained with K=3 and X-0.236.

Figure 7 d(t) and its 3rd order approximation

Although we have matched the first K+1 time moments of y(t) to that of d(t), one still gets

consistent estimates of %opt when different moments are used. Table 2 shows the estimates

obtained by matching different moments which corroborate our claim.

Table 2 : Time scale estimates

Moments used estimate of

0,1,2,3 0.236
0,1,2,4 0.237
0,1,3,4 0.238
0,2,3,4 0.238
1,2,3,4 0.239
2,3,4,5 0.233
1,2,3,5 0.238

As mentioned before, matching the moments in the time domain is equivalent to

matching the Taylor's series coefficients in the frequency domain around the zero fre-

quency. In that sense the method has some limitations. Due to the locality of the informa-

tion provided by the moments, this method would give satisfactory results for signals
whose energy is concentrated around the de frequency. For signals with higher frequency

components, the method will tend to approximate the envelope of the signal. In represent-

ing bandpass signals it is advisable to use complex time scale. The real part of the time

scale can be estimated using the moment method, this time the desired signal being the
bandpass signal frequency shifted to the baseband.

The formulation of the estimation method was carried out such that the number of

moments equaled the number of K+1 unknowns (K weights and the time scale). If desired

more moments can be used which would transform into an overdetermined problem that

can be solved using least square techniques.

Estimation of the optimum X using a matched filter

The left hand side of (44) represents the crosscorrelation between the desired signal
and the Kth Laguerre kernel. Kuo and Celebi [26] suggested to implement this equation for

different values of X in the form of a parallel organization of matched filters (Figure 8).

This arrangement is very popular in communication systems because it implements the

optimal receiver for a signal constellation in white noise. However, here we are interested

in the zero crossings of the matched filter (correlator) outputs instead of the more familiar
peak value. Let us state how the system can be utilized. The desired signal is applied to the

block of Figure 8, and X is varied from 0 to hax in increments of ax/m to produce m

time signals sj(t) constructed by correlating the input signal with the Kh order Laguerre

kernel. A change of the sign from mth correlator output to the m+ls will indicate an extre-

mum point on the MSE curve somewhere in between these two time scale values. The

position of the zero-crossing can either be estimated or the time scale interval can be fur-

ther tightened. The individual values have to be further tested find the value that mini-

mizes the mean square error of the approximation. A rule of thumb for picking up ax is

selecting Xkm approximately 2.5 times the bandwidth of d(t). This upper bound stems

from the fact that when X exceeds it the last basis does not have a significant contribution
to the approximation.
Figure 9a depicts the dependence of Jsch() on X for approximation order K=3 and
choice of the desired signal whose Laplace transform is given by

D (s) = 2
(s + 1) (s + s + 4.25)

Figure 9b illustrates the zero-crossings of the correlator outputs which coincide with the
minima of the Jsch curve.


x d


SI d


_ i dt

di, dil4

-, optimum X

Figure 8 Estimation of the optimum time scale using a matched filter implementation



Figure 9 a) MSE curve for order K=3. b) Weighted correlator outputs

Online Adaptation of the Weights and the Time Scale

It is possible to solve the optimization problem of (28) using online adaptation, in other

words, by gradually adapting the optimization parameters in the opposite direction of the
mean square error gradient, i.e.

dW = -w-- J (,W)
=--T -J( W)

Since this approach is based on a gradient technique, online adaptation of (67) will only
find the stationary points on the Jsch not the global minimum.

Consider the system identification problem depicted in Figure 10. Our task is to adapt

the weights and the time scale of the gamma filter such that the mean square error is mini-
mized. This is equivalent to the problem of (28) in the sense that we are approximating the
impulse response of the unknown plant as a weighted sum of gamma kernels. The weight

update term is obtained by evaluating

dW a 2
-dt =-w ~ (w, W)= -twa-


Figure 10 Adaptive system identification problem

where the vector X holds the signals at the taps, i.e.

x, (t) x (t) *g (t
~x(t) x(t)*g2(t)

XK(o Wx(t)*gK(t)

Evaluating the partial derivative of (68) one gets

VW(t) dt = 2rlw


step size: 1, > 0 (70)


Notice that the online adaptation is basically an integration operation

W(t) = VW(u)du


which inherently implements time averaging. Hence, if the input signal x(t) is ergodic and
the step size is chosen sufficiently small, the ensemble average operator of (70) can be
conveniently removed yielding the least mean squares (LMS) adaptation rule.
The update term for X can be derived similarly by partial differentiating the mean
square error with respect to X, i.e.

dA aJ e (t) ~ X
dt= -- = -2l.u = 2

= 2rT

The vector GKx1 holds the gamma kernels and its form was already given in (28). As
derived in Appendix A, the derivative of gamma kernels with respect to X can be written as
the difference of the kernels. Taking this into account, we can write

dt= ( r > 0



1 -1
Y = 2 -2


K+ ]K+ 1xl

Again, if the input signal x(t) is ergodic and the step size is small enough, the expectation

operator in (73) can be conveniently ignored.


This chapter introduced the gamma kernels, a special class of exponentials, and the fil-

ter structure associated with it. Gamma kernels were inspired as generalized delay opera-

tors with the purpose of substituting the tapped delay lines which performed poorly in the
processing signals with redundancies. Gamma kernels are characterized by a single time

scale parameter that linearly controls the amount of the delay the kernels induce.

Having linked the gamma kernels to their orthogonal counterpart, the Laguerre ker-

nels, we identified the two as a special case of the Kummer's function which entails

numerous well-known transcendental functions (Hermite, Bessel, Gaussian, etc).

The main problem that was attacked in this chapter was that of representing a given

signal in the linear projection space of the gamma kernels such that the euclidian distance

between the signal and its projection is minimized. We formulated this problem in a vec-

tor space framework and interpreted the role of the time scale as of changing the relative

angle between the projection manifold and the signal of interest. The solution of the prob-

lem turned out to be an extension of the Wiener-Hopf equation. We decoupled the solution

of the optimum time scale from that of the optimum weights. The solution of the optimum

time scale problem is unsolved. We developed two techniques, one based on moment

matching and the other on matched filtering, to numerically estimate the optimum time


There are several open questions that still must be addressed. Allowing the time scale

to be complex would make the processing of the bandpass signals possible. Some related

work has been done in the area of Kautz filters [27], but there are many unanswered ques-


In addition to the time moment method presented in this study, other global informa-

tion measures can be sought to solve the problem of optimum time scale. This global

information can help the gradient descent techniques avoid local minima during adapta-


The highly organized structure of the Jsch curves must be investigated further. There

may be some potential information stored in these curve structures that can be extracted to

help us position the global time scale. Furthermore, we have observed very similar Jsch

curve structures when the gamma kernels are replaced with the Hermite kernels [28]. This

indicates the possibility that other members of the Kummer family may exhibit similar





Standard spectral estimation methods are based on the implicit assumption that the

signals being analyzed are characterized by a stationary spectrum, in other words a spec-
trum that does not vary in time. This assumption is not justified for large number of signal

processing applications. Spectral characteristics of signals arising in nature often evolve in

time due to the changing dynamics of their underlying processes. Prime examples are

human speech, seismic signals and biological signals. It is often desirable to view the

spectral evolution of these signals. This can be achieved by expressing the time varying

signal as a joint time-frequency distribution which depicts the signal energy (or the inten-

sity) per unit time, per unit frequency. There are several time-frequency distribution meth-

ods available, each with its own motivation and properties. The most prominent methods
are Wigner-Ville distribution, Page distribution, Rihaczek distribution and the spectro-

gram. In this chapter we will briefly go over these techniques and show where the pro-

posed method of spectral estimation using poisson moments falls in the global picture of

time-frequency distributions. We start by introducing the uncertainty principle which is

an important concept in understanding time-frequency distributions.

Uncertainty Principle

The Fourier transform is the most commonly used spectral analysis tool. It decom-

poses a signal into its frequency components. In doing so, however, it fails to reflect when

these frequency components occur or how they evolve, at least not directly. Hence, while

possessing an infinite frequency resolution, Fourier transform representation lacks time

resolution. In the case of time varying signals where spectral characteristics vary over

time, this poses a problem, since the spectral variations can not be seen explicitly in this

domain. Instead, the Fourier transform displays an average of the stationary segments of

the overall signal. One way of attaining time resolution is by placing a time window over

the time segment of interest and evaluating the so called short term Fourier transform

(STFT) of the signal. This method makes sure that only the portion of the signal that is of

interest is taken into account. There is an inherent side effect of time windowing, though,

which shows itself in the form of spectral smoothing and thereby loss of frequency resolu-

tion. Spectral smoothing stems from the fact that the windowing corresponds to the con-

volution of the original spectrum with that of the window in the frequency domain.

Decreasing the window width in an attempt to increase the time resolution will cause the

window spectrum to widen which in return further degrades the frequency resolution.

Hence, the time resolution is acquired at the cost of the frequency resolution. This stems

from the fact that, both the window function and its spectrum cannot be made arbitrarily

narrow. For techniques that try to acquire locality in time using windows, there is an inher-

ent trade-off between the time and the frequency resolutions. In his attempt to quantify
this relation, Gabor [12] defined the effective time duration1 (At) and the effective band-

width (Af) of an arbitrary waveform and showed that their product is lower-bounded by a

constant as follows.

AtAf> (74)

This is called the uncertainty principle. At a formal level, it resembles to the uncertainty

principle in quantum mechanics The inequality of (74) states that a narrow pulse yields

1. Gabor defines the effective (r.m.s.) duration of a signal f(t) as the variance of its analytic part
f(t), where f (t) = f(t) +j(t) Here (:t) symbolizes the Hilbert transform of f(t). A parallel def-
inition holds for the effective bandwidth.

a wide spectrum. Due to the coupling of the time duration to the bandwidth there is

always an ambiguity attached to the time frequency measurements that employ window-

ing. This ambiguity stems from our attempt to express a function of one independent vari-

able (time), in terms of two variables, namely time and frequency. The frequency
parameter (f) corresponds to the operator -J in the time domain, and hence it is not an

independent variable1 [29]. To summarize, as far as the signal measurements go one

should talk about the energy within a band during a specified time interval rather than the

instantaneous frequency which cannot be measured.

Gaussian shaped signals (with a possible modulation) satisfy (74) with equality. Those

are the signals with the smallest time-bandwidth product and they have the form2

S(t) = exp- (t- +tj2 "fot +jO (75)

Gabor [12] coined the phrase elementary signal for these signals. He further suggested

expanding a signal into a double sum of delayed and frequency shifted versions of these

elementary pulses. Later, Hellstrom [30] made Gabor's expansion exact by using an inte-

gral expansion. Gabor believed that in the time-frequency plane, each of these pulses rep-

resented a quantum of information which he called a logon and depicted these information

cells with nonoverlapping rectangles of constant area. It should be remarked that, due to

the fact that the delayed versions of Gaussian signals overlap with each other and the fact

that Gabor's definition of time-bandwidth product was arbitrary, the approach of repre-

senting signals via logons is only an approximate one. Further critique of Gabor's pioneer-

ing work can be found in Lerner [31].

1. Through out this study represents the imaginary unit J-i
2. Triangular waveforms have slightly worse (10% higher) time-bandwidth products than the Gaus-
sian waveforms.

Time-Frequency Representations

The basic goal of the time-frequency representations is to devise a joint function of

time and frequency, that is a distribution, which represents the energy or the intensity of a

signal simultaneously in time and frequency. It is required that these distributions possess

some desirable properties. Of those properties, nonnegativity, invariance with respect to
time delays and frequency shifts in the signal are important ones [32]. Another, desirable

feature is the satisfaction of the marginal conditions. This implies that the integral of the

distribution over the entire frequency axis at a given time should yield the signal power at

that moment.

Gabor [12] and Ville [33] were the first ones to address the problem of time-frequency

distributions. Inspired by similar developments in quantum mechanics, their original moti-

vation was to improve on the spectrogram which is basically a magnitude squared short

time Fourier transform (STFT) commonly used for speech signal analysis. The spectral

estimation technique that will be introduced in Chapter 5 can be formulated as a spectro-

gram technique. For that reason we will start with this relatively simple technique and

relate it to other time-frequency representations.

We can write the spectrogram as the magnitude squared Fourier transform of a win-

dowed signal. That is
2 2

P(t,f) = fs (u) h (u t) e-j2fudu S (v) H (v- f) I2vtdv (76)

Here s(t) is the signal of interest, while h(t) is a window that tapers to zero. As explained

in the previous section, like any other window based technique, spectrogram has a time-

frequency resolution trade-off associated with it. Equation (76) can be alternatively writ-
ten as

0 0(x)y (77)
P(t,f) = s (x) s* (y) h* (x- t) h (y t) e-2'(XY) dxdy (77)

This bivariate energy density has the units of Joules per Hertz per sec. Substituting

X = U+

y= u-

we can rewrite it in the form

P(t,f) = s(u+)s (u-1)h(u-t-)h (u-tJ )e fdudt
,- 2 -I"


In expanding (76) to this equivalent form, we have mapped the one dimensional window

h(t) to the two-dimensional kernel h*(t+r/2)h(t-v/2) which is a function of time and lag T.

If h(t) is equal to rect(/T), that is a rectangular window of width T, the two dimensional

kernel would manifest a diamond shape as depicted in Figure 11.

Figure 11 Two-dimensional spectrogram kernel

The width T of the kernel along the t-axis determines the time resolution, while the recip-

rocal of the width 2T along the T-axis determines the frequency resolution [32]. Thus,

when using the spectrogram method, it is impossible to improve the frequency resolution

without inducing a loss of time resolution.



The form of the distribution given in (79) can be considered to be the Fourier trans-
form of the time-indexed autocorrelation function R(t,j), i.e.

P (t,f) = e-J2fR (t, ) d (80)


R(t,r) = 4(u-t,t)s(u+2)s (u- )du
-.. (81)

*(t,t) =h (t +)h(t -)

As can be seen in this expression, the kernel function 4(t,t) plays an important role in
determining the characteristics of the time-indexed autocorrelation which is to be esti-
mated utilizing time averages. Then, a question arises as to which properties the kernel
function should have for the estimation of the autocorrelation function. Since a time aver-
age would smooth out any time-varying property of nonstationary signals, the kernel func-
tion needs to assign a large weight to s(u+T/2)s*(u--/2) when u is close to t and a small
value when u is far away from t On the other hand, to a give a reasonable accuracy to the
estimate R(t,t), the range of the time average should be increased for large value of t com-
pared to the range of the time average for a small value of T.
An alternative selection of the time window is the decaying exponential window,
i.e. h(t)=e'u(t). This window emphasizes the recent past of the signal1. Exponential win-
dow is closely related to the gamma filter structure since it is embedded in the gamma
filter in the form of a leaky integrator. In Chapter 5, we will propose to use gamma filter as
an alternate time-frequency representation scheme. For now, it should suffice to mention
that the proposed method falls under the category of spectrogram with its two-dimen-

1. Time-frequency spread of the decaying exponential window is studied in Appendix C.

sional kernel as depicted in Figure 12. As seen in this figure, the kernel assigns larger val-

ues to the points close to the origin as desired. In the meantime, it provides a bigger range

of time average for large values of t, which in return allows a better autocorrelation esti-


Figure 12 Exponential decaying window kernel a) top view b) 3 dimensional view

Cohen's Class of Bivariate Distributions

In an attempt to circumvent spectrogram's time-frequency resolution trade-off, Ville

[33] tried to derive an exact signal energy distribution, based on one represented by

Wigner [34] to calculate two dimensional probability distribution in time and frequency of

a quantum particle, defining what is known as Wigner-Ville distribution (WD).

(t, o) 1 fs (t- )s (t+ e) e-~rd
(t, 2i) =


Its characteristic function, introduced in modified form to radar theory by Woodward [35],
is well-known as the ambiguity function. WD is one of most commonly used time-fre-

quency representations. Following a similar approach that led us to (79), it can be shown

that the two-dimensional kernel of WD is 8(%) (Figure 13). That is, while owing infinite

time resolution, its frequency resolution is proportional to the data length used. WD can

discern nonstationarities. One drawback of WD is the generation of cross-terms. Cross-
terms occur at the arithmetic mean between any two frequencies (positive or negative)

present in the original signal. This interference obscures the output of the WD. Working

with the analytic part of the signal will suppress cross-terms due to positive-negative fre-

quency pairs.



Figure 13 Two-dimensional kernel associated with the Wigner-Ville distribution

A year after Wigner, Kirkwood [36] came up with another distribution and argued that

for certain problems it had a simple form. Following this, Rihaczek [37] rederived this dis-

tribution based upon complex energy density functions as

e (t, 0) = s (t) S* (O) e-t (83)

In 1952, Page [38] developed the concept of running spectrum and defined an instanta-
neous power spectrum as the rate of change of energy density spectrum of the signal seg-
ment from -0 to t as t is increased.

P" (t, () = s (x) e-Jdr (84)

Levin [39] used the same definition for the segment (t, o) and defined a new function as

the average of both types of instantaneous power spectra.

PF (t, m) + P+ (t, m)
P (t, o) = 2 (85)

where P+(t,w) is the same as P(t,o) defined in (84) except for the fact that the integration
limits are (t,oo).
It was later realized by Cohen [40] [41] that these time-frequency representations have
more in common than is readily apparent, and he proposed what has now become known
as the Cohen's class of bilinear time frequency representations, a unifying framework for
several of the time-frequency distributions that have been proposed in the literature.
The general form of Cohen's bilinear distribution is as follows:

P (t, 0) = ,Jfe-et-Jr+J (,t)s (t- ,)s (t + ~) duddO (86)

The function 4(t,t) is the kernel1. By choosing different kernels, different distributions are
obtained. For instance, Wigner and Page distributions are obtained by taking <(t,T)=l and
eti/V2, respectively (Table 3).

1. In general the kernels may depend explicitly on time and frequency. However, the time and fre-
quency shift invariant distributions are those for which the kernel is independent of time and fre-

A relatively new class of kernels is due to Choi and Williams [42]. This kernel, named
after its inventors is given by expression

e (t, ) = e > 0 : scaling factor (87)

This kernel is effective in diminishing the effects of cross terms while retaining most of
the properties which are useful for a time-frequency distribution.

All of these distributions have advantages and disadvantages of their own (for a
detailed discussion see Cohen [41]). Of the distributions cited here, WD possesses a large
number of desirable properties and provides the most concentration in the time-frequency

plane for a signal. This makes WD an attractive tool for the analysis of time-varying spec-
tra. Given the aforementioned connection between the spectrogram and the representation
using the gamma filter, we would like to relate the spectrogram to WD, which we take as a

Spectrogram and Its Relation to Wiener Distribution

As shown in (76), the spectrogram of a signal s(t) is obtained by taking the square
magnitude of the short-time Fourier transform of the signal and its window h(t). The basic
properties and effectiveness of the spectrogram for a particular signal depend on the func-
tional form of the window. As seen in Table 3, spectrogram is a member of the Cohen's
class of bilinear time-frequency representations. The relationship between the spectro-
gram and the Wigner distribution of a signal was given by Claasen [43] as a convolution

Sf(t, o) = W(t, ) Wh ( -t, fo- 0)drxd (88)
S~( O 2TR

Table 3 Some distributions and their kernels

Reference Kernel Distribution

1 Ije "s (t- )s(t+!)dr
Wigner-Ville 2J 2

S(t -s(O (0))s e -

o* li/2 t 1 \ )-j
Page it i2ix2

Sh* (- )h (u + T) Je-j' s (x) h ( t)
Spectrogram 2 5 2
e du

cos (Re) Re { -1s (t) S (w) e"}-j

The spectrogram is a time-frequency smoothed version of WD, with the smoothing

function given by the WD of the window h(t). A direct consequence of this is the fact that

for a given signal waveform, the effective area in the time-frequency plane of the spectro-

gram is slightly greater than that of the WD [44]. The time-frequency extent of the spec-

trogram is minimized when the window is matched to the signal. The nature of the

spectrogram, by which it is relatively free of interference terms, has prompted research on

determining windows that would maximize concentration of signals in the time-frequency

plane. Since WD provides maximally concentrated distribution and the spectrogram is

related to it through (88) it seems obvious that a window function whose WD is most con-

centrated in time-frequency plane would achieve the minimal spread. The Gaussian win-

dow is such a window with its WD being a bivariate Gaussian function in the time-

frequency plane.

In order to achieve optimal concentration, the parameters of the window (namely its

time-frequency width and orientation in the time-frequency plane) must be matched to

those of the signal components. Such an approach is embodied in a data-adaptive trans-

form [45] where the window parameters are adapted to minimize a concentration measure
at each location in time-frequency domain.


This chapter provides background for Chapter 5 where an alternate time-frequency

representation scheme that utilizes the gamma filter is proposed. This new representation

scheme can be identified as a spectrogram method that uses a decaying exponential win-

dow to discern nonstationarities. In an attempt to familiarize the reader with the general

concept of time-frequency distributions, this chapter gave a general outline of the available

techniques, paying particular attention to the spectrogram.

Starting out with the uncertainty principle, it was shown that any time-frequency dis-

tribution that incorporates windowing, including the spectrogram, trades the time resolu-

tion with the frequency resolution. The extent of the trade-off is governed by the time-

frequency spread of the window used. One rough way of quantifying the spread is by com-

puting the time-bandwidth product of the window. Time-bandwidth product of the expo-

nential decaying window is derived in Appendix C and compared with that of the

gaussian window, which assumes the lowerbound.

Not all time-frequency distribution methods are bound by the limitations of the uncer-

tainty principle, such as the Wigner-Ville distribution (WD), Page distribution and Choi-


Williams distribution. However, these methods usually distort the distribution in some

sense, e.g. by yielding negative results, cross terms or failing to satisfy the marginality


Among the distributions cited here, WD has the maximum time-frequency concentra-

tion. Hence, it can be used as a benchmark. We link the spectrogram method to the WD

using a convolution equation. This equation formulates the spectrogram as a smoothened

version of the WD. It is shown that the amount of smoothing is dictated by the WD of the

window used.




Suppose that we have been given certain information about a function. Perhaps, we

know its values, or its derivatives at certain points or maybe its moments. How can we use

this finite amount of information to construct other functions that will approximate the

original? This is the question we will address in this section. The process of approxima-

tion using a finite amount of functional data is called interpolation. Interpolation can be

divided into; i) the analysis section where the information on the function is gathered, and

ii) the synthesis section, where the approximation is done by fitting prototype functions

to the available data. The logic behind the interpolation is that if an approximation agrees

with the original function at certain points, then it should also come close to the original

function at the intermediate points. It should be noted that the choice of the prototype

functions, the way we gather the functional data, as well as the process of function fitting

are problem dependent and they rely mostly on experience. In the next section we will

introduce the most common form of interpolation; the polynomial interpolation and then

proceed to the general problem of finite interpolation.

Polynomial Interpolation

Polynomial interpolation is an approximation technique where a polynomial pn(s) of

degree n is equated to the function f(s) at n+l distinct points. These points are called the

interpolation knots. The appropriateness of using the polynomial interpolation for func-

tion approximation is justified by the following two theorems.

Theorem: Existence and Uniqueness

Given n+1 distinct (real or complex) points so, s1, ...,Sn and n+l values fo, f, ... fn,
there exists a unique polynomial pn(s) e Pn (set of all nth order polynomials) for which

Pn(si)=fi i=0,1,...,n
For proof refer to Davis [46].

Theorem (Weierstrass)

Let f(s) e C [a,b] where C is the set of complex numbers. Given 8>0 we can find a

polynomial pn(s) of sufficiently high degree for which

f(s) p (s) I E a

Together, these two theorems state that one can construct a unique polynomial that fits

the function f(s) at given points and the difference between the function and the interpolat-

ing polynomial can be made arbitrarily small by selecting the number of points suffi-
ciently large. A natural question that comes to one's mind is whether we can do a similar

approximation based on the basis of n+l arbitrary pieces of linear information, rather than

the values of the function itself. Further, can we use prototype functions other than poly-
nomials for the approximation? These questions lead to the following general problem.

General Problem of Finite Interpolation

Let X be a linear space of dimension n and let L1,L2,...,Ln be given functionals defined

on X. For a given set of values fl, f2,...,fn can we find an element of X, say x such that

L, (x) = fi i= 1, 2, ..., n (90)

The answer is yes, if the Li are independent in X* 1.

Lemma: Let X have dimension n. If x1, x2,...xn are independent in X and L1,L2,...,Ln are
independent in X* then

IL, (x) ) 0


The converse is also true. See Davis [46] for a proof.


Let a linear space X have a dimension n and let L1,L2,...,Ln be elements of X*. The inter-
polation problem of (90) possesses a solution for arbitrary values w1, w2,...,wn if and only
if L4 are independent in X*. The solution will be unique [46][47].


Choose four linearly independent functionals as

L1 [f] = Sin [f(0.4)]

L3 [f = f(0)

L2 f = ff(x) dx

L4 f df


Given the functional values L4[f(x)] (i=1,...,4) interpolate to f(x)=e-3xSin(x) using the
interpolant function h(x)

h(x)= aivi (x)


v (x) = Sin (x)

3 (x) = x

1. X* refers to the space that encloses the functionals LI,L2 ,...L.


v2() = Cos(x)
v4() = 2


The weights ai of the interpolating function h(x) should be chosen such that

Li [f (x)] = L, [h (x) ] i = 1,..., 4


This can be put in a matrix form as

L, [vY] ...L[ ] L, [ a L [A
S= L (95)

L4 [v1] ... L4 4] a4 L44 4

Weights of the interpolating function h(x) can be found by solving this set of linear equa-

tions. Figure 14 illustrates f(x) and its approximation by h(x).

Figure 14 Interpolation of f(x) by h(x)

Systems Possessing the Interpolation Property

Here is a list of commonly used interpolation functional and interpolant function pairs.

Example 1: interpolation at discrete points

x=Pn ~Lj(t)=f(zj) il,..









Interpolation is done at distinct points using only the function values.

Example 2: Taylor interpolation

X=Pn i(f)=fi)(Zo) i=1,2,...
Interpolation is done at only one point using the derivatives of the function.

Example 3: Hermite osculatoryy) interpolation

X=P2n-1 Li(f)=)(k) i=12,.... k=,2,...i j=1,2,...,k
Hermite interpolation may be thought of as a combination of Example 1 and Example 2.
Interpolation is done at several points using the derivatives of the function in addition to
the values.

Example 4: Birkoff (lacunarv) interpolation

X=P2n-I Li(f)=f)(zk)
This is similar to the Hermite interpolation, except for the fact that some of the derivatives
or the function values might not be available.

Example 5: Trigonometric interpolation

X= Linear combination of l,cos(x),Cos(nx),..., sin(x),sin(2x),..sin(nx). This is called a
trigonometric polynomial of degree n
L(f)-ith Fourier series coefficient

Hermite Interpolation

In this study, we will restrict our attention to Hermite interpolation. This is the kind of
interpolation where we are trying to match both the values and the derivatives of the inter-

polant polynomial to that of the function. Given the function values and the derivatives

f(so), /1) (o), ... m o-l) (S)
: : : (96)
f(sn), f (Sn)" .. m,-) (S"n)

with the interpolation knots si being distinct and possibly complex, it can be shown that
the Hermite polynomial [48]

Hmn m,-1 m-j-1 P) (s) w (s) k (S S mi
Hm (s) 2 2 j!k! (s-i)mi-J-k dSk w(s) Is=
i=0 j=O k=O (y -


n n
w(s) = n (s -si) m = m i-1 (98)
i=O i=0

interpolates to the values and the derivatives of f(s) given in (96), i.e.

Hm (si) = f (si) i5n (99)
0 5j < mi

It should be noted that when n-O, Hermite interpolation becomes the Taylor's series
expansion. For analytic functions the following expression offers a more compact form
for the Hermite polynomial [46]

1 w (t) -w(s)
Hm (s) w(t)- f(t)dt (100)
2rj w (t) (t s)

The contour C should be selected such that it encloses all the interpolation knots si and f(s)
remains analytical inside as depicted in Figure 15. Recalling Cauchy's residue theorem

1 Xf(t)
f(s) = dt (101)
f(S) = 2f t --s

One can write the remainder of interpolation (or the error of approximation) as

e (s) f(s) H (s) )dt (102)
2ryj w (t) (t s)

This equation leads to the Cauchy remainder given as

e (s) = w (s) (+1) (103)
(m+ 1)!

where 4 is an unknown number that lies within the ring
min{ Isl,...,lsnl } the function f.
Although, it is not easy to compute the error of approximation using the Cauchy
remainder, it has an instructive form. As seen in (103), the magnitude of the error depends
on the product of the derivative of f(s), which we have no control on, and w(s) already
defined in (98). This raises the following question: how can we select the interpolation
knots so,sl,...,Sn in the interval [ab] such that the maximum of Iw(s)l is as small as possi-
ble, i.e.

Wmin =min max Iw(s)l Isol s 1S < Isn
Si s (104)

For the case when multiplicity of all the interpolation knots is the same, i.e.
m0=ml=...=mn, the answer to that optimization problem is given by the Tschebyscheff
polynomials. It turns out that if the interpolation knots si are chosen as the roots of the
Tschebyscheff polynomial, Iw(s)l is minimized in the interval [a,b]. Further, if the interpo-
lation knots si are all real that dictates [46]

si Cos[ (2i- 1)] (b-) +
= n (b-a) +a

Figure 15 Contour of integration for evaluating the Hermite polynomial

Hermite Interpolation of Signals Produced by ARMA Models

Assume that f(s) fits into an ARMA model. i.e.

f(s) = s-i (106)

For this class of functions, the error of approximation given in (102) reduces to the form

e (s) i = w(s) bi w(s) y (s)
e (s) =w (s) w (pi) (s -pi) s-pi
i=1 i=l


Notice that when the function assumes an ARMA model, the approximation error can be

written as a product of w(s) and y(s). y(s) has the same poles as f(s) does, however its

zeros may be different. The peaks of y(s) should occur at s=Pi. Zeros of w(s) should be

selected to coincide with these peaks so as to cancel their effects which implies si=pi.

Poles of f(s)

points si

i = 0, 1, ..., n
mo = MI = ... =m n 2 0


Issues in Hermite Interpolation

There are four questions that need to be answered regarding the Hermite interpolation


1) What is the region of convergence of the approximation? i.e. for what values of s does

lim f(s) -Hm(s)\= 0 seROC (108)


2) If two interpolation knots are used, for a given error bound and polynomial order what

can be the maximum separation between these two knots? Or assuming that this is a spec-

tral approximation, what is the bandwidth of the approximation?

3) For a given bandwidth and error bound what order of polynomial is required?

4) How does the inclusion of derivative information in addition to the function values help

to improve the approximation? i.e. How does the Hermite interpolation compare with the

case where multiplicity mi of each knot is unity.

Region of convergence

Here, we will give an answer to the first question. Let's consider the simple two point

Hermite interpolation case and assume that the complex function fits into an ARMA

model. Our purpose is to approximate the spectrum of f(s) in between the points so and s1

that lie on the imaginary axis (Figure 16). Select the interpolation knots on the imaginary

axis as s0-0 and sl=jox Applying (107) we get

K Io
le~l (a= _[ (s-s0) (s-sl) 1(109)
e (s)- P \-Jo)- (109)
i= 1

Figure 16 Two point Hermite interpolation along the imaginary axis

The condition (108) holds if

g (-(s-so) (s-s)|I < (pi,-s) (pi-s )l (110)

for all the poles pi.

The contours of constant g are called lemniscates. Inside the lemniscate is the region

of convergence (ROC). Outside the lemniscate approximation diverges even for large

polynomial orders. ROC always encloses the interpolation knots, however for small g

(which occurs when a pole of f(s) is close to one of the interpolation knots) the ROC may

be composed of more than one disconnected region. Figure 17 illustrates the evolution of

the ROC for different g values. In this figure the 'x' mark represents the location of the

dominant pole pd. This is the pole for which I(pd-so) (Pd-sl)l is minimum determines the

region of convergence. Intuitively, this is the pole that is closest to either of the interpola-

tion knots.



--> -Re{s}

Bandwidth of interpolation

As seen in Figure 17, when the poles of f(s) get close to the imaginary axis, g gets

smaller and in return ROC splits into two disconnected regions. As a result, the approxi-

mation does not converge in the entireness of the interval s=jQ where fle [0,o)]. The natu-

ral question that comes to one's mind is that what is the maximum bandwidth ax that

the approximation would converge? Or in other words, how far apart can so and sl be?

Im(s) Im(s)
2, 2

g=1.7 / g=0.4

1) nSl \

I x x

0 Re(s) o ~ Re{s)

-2 -1 0 1 -2 -1 0 1
Im(s)RC he to s)

g=0.3 g=0.2
1 S1 1 S

o Refs) 0 Re(s)

-2 -1 0 1 -2 -1 0 1

Figure 17 Regions of convergence for various dominant pole locations

Let's concentrate on (110). Substitute S==O, sl=jo and s=ji. The frequencies inside

the ROC have to satisfy

I U()j (n 0)I < g


-_g < C2- -_ < g

This yields two equations

2-2 o-+ g > O

!C2- _o(-g > 0



The real Ql values that satisfy these two equations are given by

c- 02 + 4g (0- O -4g
2 2



o+ -4g 0+ 2+4g
2 2

This represents two regions. These two regions intersect when

0m ox2 -4
max max g

+ 2 -4g
Cmax max


The maximum allowable bandwidth is given by

Smax = 2J = 2 JIPPd-j I ma (116)

This nonlinear equation can be solved by considering that the variables satisfy a triangular

relation as illustrated in Figure 18.




x= Ipdl


Figure 18 Triangular relation between the maximum bandwidth and the dominant pole

x and 0 represent the magnitude and the argument of the complex pole pd, respectively.

With this new notation one can rewrite (116) as

Wmax = 2 j (117)

The law of cosines dictates that

y2 = x2 + o2 2xc CosO or
max max (118)

y2 = x2 + 4xy 4x yCosO

This quadratic equation in y yields one positive solution given by

y = 2(x-x2CosO) + j4(x-x2CosO)2+x2 (119)

Substituting this into (117) yields the maximum allowable bandwidth in terms of the mag-

nitude and the argument of the dominant pole.


Omax = 2 2X (x X32C 0) + 4 (x xM2Cos0) 2+ 2 (120)

x = Id 0 = ZPd

Maximum error versus the bandwidth

In the previous section we derived the bandwidth of convergence when the polyno-
mial order m->oo. Here, we will try to find an answer to a more practical question: what is
the maximum bandwidth one can achieve for a given error bound 4 and finite polynomial
order m. i.e. we want to make

le(jO)l = Hf(j) -Hm (jU)I 5 Q[0, o] 4>0 (121)
m is finite

That is

K (j)n n(j )
|e (j)) = !ai n (122)
i= P7 (Pi -J,)n (, -Pi)

where m=2n. Moving the absolute value operator into the summation one can get a pessi-
mistic bound

K I (jn)n n-j) <
I|e (j) I Ia n' 1 (123)
i=o p IP @i ,-J')I I -p,)|

The maximum value that the term I (ijO2)n -jw)"l can attain in the interval
e [0, w] is (w/2) 2n. The minimum value that the terms I (j pi) l and I (p, -jw) I
can have is pip where pir is the real part of the pole pi. With these in mind we can write

oe j2n K ail (124)
le (jO)j 2 (-) In n-+ 1

An upperbound for the bandwidth is

1 K ml
co(n) _<2[ -()] yT(n) = ~ inP+1 (125)
y (n) i=OlP il^W+

For reasonably small error values where 4 increases with the interpolation order n. This increase is roughly logarithmic. Figure 19
illustrates the dependence of the maximum percent error of approximation within the band
as a function of the bandwidth o for an 8th order ARMA system. Although the poles and
zeros of this system were chosen randomly, the dependence is typical. Alternatively, Fig-
ure 20 depicts the maximum bandwidth for a given error bound versus interpolation order
n. These two graphs clearly demonstrate the advantage of using derivative information in
addition to the function values.


This chapter gave an outline of the general theory of interpolation paying special atten-
tion to the Hermite interpolation. It was shown that given a set of linearly independent
functional information about a function, a prototype function can be made to fit to this
information, thereby approximating the function in some sense. A special case of this is
when the functional information is presented in the form of function values and derivatives
at various points in the domain. For this case, the Hermite interpolation technique can be
used to tailor a polynomial whose values and derivatives fit to that of the function at the
given points. This is the same kind of interpolation scheme that will be adopted in

maximum percent error

0 0.5 1 1.5 2 2.5 3
w (rad/sec)

Figure 19 Maximum percent error of approximation versus bandwidth w

maximum bandwidth versus order

1 1.5 2 2.5 3 3.5 4 4.5 5
order n

Figure 20 Maximum bandwidth for a given error bound versus interpolation order n



Chapter 5 in approximating the magnitude spectrum of a signal from its frequency deriva-
tives. In that sense, this chapter can be considered as a prelude aimed at providing the

background theory that we will frequently refer in Chapter 5.
Due to the intractability of a general purpose derivation, we analyzed the approxima-
tion error mostly in the context of the Hermite interpolation and ARMA signals. It was

shown that the approximation error can be decreased by incorporating higher order deriva-

tives into the interpolation. The contribution of the higher order terms is roughly logarith-

mic. That is, the contribution becomes marginal for very high orders. In the case of

relatively high approximation errors, addition of high order derivatives may even degrade

the approximation.

The separation between the interpolation points plays an important role in determining

the approximation error. The maximum separation (that is the bandwidth, in the case of

spectral approximation) is dictated by the dominant pole of the function of interest in the
vicinity of the interpolation points. The closer the pole is to either of the points, the

smaller has to be the bandwidth. Throughout the chapter, we linked the parameters of the

interpolation (number of derivatives, bandwidth, dominant pole) to the region of conver-
gence and approximation error. Furthermore, we demonstrated the advantage of using
higher order derivatives with an example.

It should be remarked that Hermite interpolation is not the only tool that allows one to

approximate a function in terms of its derivatives at distinct points. Hermite interpolation

creates a polynomial to do the approximation. Although, not very tractable, one can

replace the polynomial with a rational function and fit it to the derivatives of the function

of interest. This technique is closely related to the Pade' approximation method [49] [50].

Another prototype function that can be used is the family of Sinco's as suggested by Lin-

den and Abramson [51][52].




In the previous chapter, we have seen how the incorporation of the derivative informa-

tion decreases the interpolation error and broadens the region of support. In this chapter,

we will demonstrate how to obtain the spectral derivatives of the time-varying signals by

using the gamma filter. The gamma filter generates the so called Poisson moments of the

input signal at its taps, which will be shown to correspond to the Taylor's series coeffi-

cients, i.e. derivatives of the recent input signal spectrum around the zero frequency. The

appeal of the proposed method stems from the fact that in the analog domain, the Poisson

moments can readily be measured at the tap outputs of the gamma filter, rather than being

computed offline by a computer. Being the Taylor's series coefficients, the moments con-

stitute a nonparametric representation that depicts the temporal characteristics of the input

signal. Hence, they are computational correlates of the signal waveform, which can be

directly fed into a nonlinear mapper, such as an artificial neural network (ANN) perform-

ing classification, identification or prediction. Or alternatively, they can be used to

approximate the signal spectrum based on the Hermite interpolation technique that was

introduced in the previous chapter.

The gamma filter accommodates an inherent decaying-exponential time-window that

makes it possible to localize signals in time and in return represent nonstationary signals.

The representation takes place in the form of a spectrogram approximation which was pre-

viously introduced under the general topic of time-frequency distributions. In that respect,

the proposed method is an alternate time-frequency distribution that is readily available at

the outputs of the gamma filter. Furthermore, if the gamma filters are implemented in ana-

log hardware, it suffices to sample these outputs only at the information rate', rather than

the Nyquist rate of the incoming signal. With this convenience, the speed of the processor

following the gamma filter becomes independent of the highest frequency of the input sig-

nal, being only constrained by the stationarity period of the signal.

We start this chapter by first introducing Poisson moments. This is followed by the

review of the synthesis formula and the analysis of the relationship between the moments

and the signal magnitude spectrum.

Gamma Filter and the Poisson Moments: Analysis Formula

Fairman and Shen [53] proposed that a distribution f(t) can be expanded in terms of the

derivatives of Dirac's delta function as follows2

f(t) = fi (to) e-X(t-o) () (t- to) (126)

This infinite summation is identified as the impulse series expansion. The term fi(to) is

called the ith Poisson Moment of f(t) at t=t0 which is given by [55]

gi (t)
fi (t0) -f(t) 0i+ (t) | pi (t) = gi = 0,1,2,... (127)
t= to x

' stands for the convolution operator, while pi(t) is recognized as a scaled version of

the continuous-time gamma kernel gi(t) already defined in (1). By definition, f.l(t) is set

equal to f(t). Assuming f(t) is causal, the convolution of (127) can be expanded as

1. By information rate, we mean the rate at which a new information packet arrives, i.e. statistical
characteristics of the signal change significantly.
2. For more information on the distribution theory, see Schwartz [54].

f (to) =f(toQ-t) e~ dt (128)

Obviously, the Poisson moment fi(to) can be physically measured as the i+lst tap value of

a scaled gamma filter' with input f(t), rather than evaluating the convolution integral of

(128). It is this computational convenience of the Poisson moments that essentially moti-

vates their use (Figure 21).

Equation (127) can be considered to be an analysis formula for the Poisson moments.

Before introducing the problem of spectral approximation using the Poisson moments, we

would like to introduce the synthesis formula which is the counterpart of the analysis for-

mula in the sense that it dictates how to restore a signal given its Poisson moment samples.

k: time scale

Iststage k+l st stage

Vt) 1

fo fk'

Figure 21 Poisson moments are generated by the gamma filter

Reconstruction of a Signal from Its Poisson Moment Samples: Synthesis Formula

Having seen how easily the Poisson moments can be obtained using a gamma filter,

let's try to develop a synthesis formula that will dictate the reconstruction of the signal

from its moment samples. As is well known from Shannon's sampling theory, one can

1. An alternate name for the scaled gamma filter is Poisson Filter Chain (PFC) [55].

reconstruct a signal f(t) of bandwidth B from its time samples as long as the sampling fre-

quency is equal to or higher than the Nyquist rate 2B [56]. If this condition is met, a band-

limited continuous time signal can be written in terms of its samples as follows

f(t) = /f ( sincec (2Bt-n) (129)
n = -L

This basic reconstruction formula for the case of uniform sampling was later extended

by Papoulis [57] to incorporate the samples of multiple linear filter outputs. Papoulis

showed that it would be possible to restore a signal given data sampled at 1/Nt the

Nyquist rate from the output of N arbitrary filters into which the signal has been fed. Res-

toration of the input signal from the samples of tap outputs (i.e. Poisson moments) of an

Nth order gamma filter would be a special case of this general reconstruction theory. For

the case of the gamma filter, the original signal f(t) can be written in terms of its filtered

samples (poisson moments) as follows

N-1 -
f(t) = fi(nTN)k(t-nTN) T = (130)
i= On= -

The interpolation kernels ki(t) are given by

ki (t) = 'i (u;t) e2zutdu i = 0, 1, ...N- 1 (131)
B 2B/N

Note that ki(t) and Ti(u;t) are not Fourier transform pairs. We state without proof that, if

they exist, the kernels Ti(u;t) should satisfy the simultaneous set of equations

N-1 j2zmt
2B 2moB TN
S i (u;t) P (u ) = e (132)

over the parameter set 0_mN-1, B-B/N rier transform of pi(t) already defined in (127). The analytical solution of (132) is usually
very involved, therefore, numerical solution techniques are preferred. Refer to Marks [52]
for a proof and some hints on the solution of this equation.

Poisson Moments and the Magnitude Spectrum

Having seen how the original signal can be restored from its Poisson moments, let's
examine the relationship between the moments and the input signal spectrum. Assume a
causal signal f(t) and rewrite (128) as

f(t) = f(to-t)u(to-t)u(t)e dt (133)

where u(t) is the unit step function. Next, introduce fw(to,t), the decaying exponential and
rectangular windowed version of the delayed and inverted f(t), such that

[t- t/2 1-
fw(to, t) = f(to-t)rect -- o e (134)


1 ltt <1/2
rect(t) = ( (135)
0 else

Figure 22 depicts how the decaying exponential window placed over the signal f(t) empha-
sizes the recent past of the signal. Substituting (134) into (133) one obtains

/ F
S(t) = fw(to, t) dt fw(to, t) -+ Fw(to, ) (136)
f i ( to

The delay and the time inversion have no effect on the magnitude spectrum which is
generally the main concern in speech recognition. The decaying exponential windowing
however, helps to emphasize the recent past of the signal, thereby achieving time locality.
Its side effect shows itself in the form of frequency domain smoothing. The amount of
smoothing is associated with the frequency spread of the decaying exponential window
which is discussed in Appendix C.
In addition to the decaying exponential window, there is also a rectangular window,
but for to > 4/1, its effect can be safely ignored. Therefore, IFw(to,L)I can be considered
as an approximation to the magnitude spectrum of the recent past of the original signal


I f(t) 1. I

Figure 22 The exponential window emphasizes the past of the signal prior to t=to.

Having established the relationship between IFw(to,Q)l and the magnitude spectrum of
the original signal, let's go back to (136). It can be shown that the ith Poisson moment of f

(to) is related to Fw(to,1) as follows

f(t) = F,(to, ) (137)
f! dQQ

Analyzing (137), the moments are recognized as the Taylor series expansion coefficients

of Fw(t0,) around --0. That is we can write

F (to, Q) = i (to) fi (138)

Using a finite number of Poisson moments, a Taylor's series approximation can be

obtained for Fw(to,Q). In many applications, one does not even need to compute the poly-

nomial of (138). Being the computational correlates of the recent magnitude spectrum,

moments themselves suffice to represent the spectrum. Hence, instead of computing the

polynomial that approximates the spectrum, the moment vector can be directly fed into a

nonlinear mapper, such as an ANN for tasks such as prediction, identification and classifi-


Spectral Hermite Interpolation

The Poisson moments carry valuable information about the smoothed spectrum of the

input signal in the form of Taylor's series coefficients at zero frequency. One shortcoming

of the Taylor's series expansion is however, its locality, i.e. approximation diverges at

points away from the origin. Hence, it is not possible to characterize a wide bandwidth of

frequencies using a finite number of Poisson moments. As a cure for this shortcoming, one

can partition the frequency range of interest into several small subbands and carry out the

approximation separately for each band. In practice, bandpass filters tuned to different

center frequencies followed by mixers can be used to shift each band to the origin. The

baseband signals at the outputs of these devices can be fed into the gamma filter to obtain

the corresponding moment vectors. Consequently, these moment vectors can be concate-

nated together to give the overall spectral picture. Since the moment vectors store the

spectral derivatives at the edges of each band, Hermite interpolation technique can be uti-

lized to obtain an approximation of the magnitude spectrum1.

Tracey and Principe [59] successfully implemented a similar scheme in their small

vocabulary word recognition task using artificial neural networks. A group of constant Q

bandpass filters were used to model the cochlea [60]. The authors used a cascade of a

square device and an envelope detector instead of the mixer, thereby roughly approximat-

ing the power spectrum rather than the magnitude spectrum. As described above, the base-

band signals were fed into Gamma filters, outputs of which are the Poisson moments.

These moments are further fed into an ANN for word classification (Figure 23).

Spectrogram with Exponential Decaying Window

Despite the fact that the time windowing results in the degradation of the frequency

resolution, due to the smoothing effect that comes along with it, windowing proves to be

beneficial in reducing the variance of the spectrogram. As studied in Chapter 3, the spec-

trogram technique estimates the power spectral density of a signal in terms of the squared

magnitude spectrum of the windowed (filtered) signal (Table 3). Although, in the case of

infinite amount of data, the spectrogram estimator is unbiased, it remains to be an incon-

sistent estimator irrespective of the data size used The variance of the estimation error

can be shown to be proportional to the true power density at a particular frequency


There are several methods for reducing the error variance. The ideal way is to ensem-

ble average the spectrogram estimates of all the realizations of the underlying process.

Due to the unavailability of independent realizations in practice, this approach might not

1. Although, they might not be mathematically as tractable as the Hermite interpolation is, there
exist other interpolation techniques. See [58] for instance, which utilizes the rational polynomi-
als and [51], [52] that utilize the Sine functions for that purpose.

Figure 23 Piecewise spectral approximation via Poisson Moments

be feasible in most applications. For ergodic signals, an alternate way would be averaging

the spectrograms of different segments of the data [63]. This approach trades in the fre-

quency resolution for low estimator variance. Another variance reduction method is due

to Daniell [64] who suggested to smooth the spectrogram using a lowpass filter in the fre-

quency domain. Similar to Daniell's method the the method proposed in this study

smooths the spectrogram using a lowpass filtering approach and in return reduce the esti-

mator variance at the cost of frequency resolution.

As illustrated in Figure 24, the spectrogram method can be decomposed into three

operators: time windowing, Fourier transform and the absolute value squared. The Poisson

moment (PM) method, on the other hand can be implemented using the following cascade

of modules: time windowing, Fourier transformation, absolute value, spectral Taylor's

series coefficients, Hermite interpolator and a square device. The first four of these mod-

ules are embedded in the Gamma filter. If we ignore the square device, PM method can be

thought of as an approximation of the spectrogram, which itself is an estimate of the true

power spectrum. For that reason, it becomes a challenge to quantify the performance of
the PM method, since there is no reference available to compare it to. One way of quanti-

fying the goodness of an estimator is by determining its bias and variance. In view of this

next section is dedicated to the derivation of the bias and the variance of the spectrogram

Signal Taylor's series
window Coefficients

ident Hermite Spectrum
Interpolation Estimate

Gamma Filter

Figure 24 Gamma filter offers an alternative spectrogram representation at its taps

Variance and Bias of the Spectrogram Estimator

In this section we will derive the variance and the bias of a spectrogram estimator that

utilizes an exponential decaying window. Consider the windowed signal

x(t) = x (t) e-~tu (t)


Using the Wiener-Khintchine theorem, the power spectrum of xw(t) can be written in

terms of the autocorrelation function. That is

PxW (f) = fRx () e gftd (140)

where the autocorrelation is

R (x(T) = Xw,(t)xw(t+r)dt (141)

The expected value of the spectrogram turns out to be

E {Pxf)} = fPx()W(f-4)d4 (142)

The expectation is carried out over all possible realizations of the signal. Here,

Px(f)=E{ IX(f)12} represents the true power spectrum of the signal x(t), while W(f) is given

W(f) = F{e A u(t) 0e tu(t)} = F{te-u(t)} = 1 (143)
(% +j2nf)2

Noting that W(f) has lowpass characteristics, the smoothing effect of the window on the

spectrum is easily seen from (142). The smoothened spectrogram is biased for finite X, but

as X->0, it becomes unbiased, i.e.

lim E ({Px, ( } = Px,)


With these in mind, the covariance of the spectrogram can be written as

cov{Px, (f)Px(f2)} = E{Px (f1)P:(f2)} -E{Pxw(fl)}E{Px, f2)} (145)

The first right hand side term can be expanded in terms of the autocorrelation function as

E {Px (f) P (f2) } = E J f RX R(, R.(X. ) e-J2, "-f2u)ddu (146)

Substituting the definition of the autocorrelation function, one gets

E {Pxw(f )P:,2)} = JJJJE {x(t )x (v)x(u)}e (t++v+u) (147)
Se-j2x V (r t) +f (u v)] dtdrdudv

This expression involves fourth-order moments, which are difficult to evaluate in the gen-
eral case. Therefore, we will assume that the signal is white noise. This an oversimplifica-
tion, but will serve to illustrate the point. So, for the white noise input

E{x(t)x(T) } = -8 (t-x) = I (148)

holds. Then, the expectation term in (147) conveniently expands as

E {x (t) x () x (v) x(u) = [88v + 8tv + 8tu v] (149)

This yields

N 1 1 1 22
NF16 X +2 + 2 (f +f2) 2 + 2 (f -f2)21

Following similar steps, it can be shown that

E {Px(f) } (151)


N2 i i
coV{Px ) P(i)P 2)} = V2,2 6+1 2 .+.j -- 2 (152)
^,wV +f2w,)2 w 16)

In general, any nonwhite process x(t) can be generated by passing white noise of vari-
ance N02/2 through a filter of magnitude-squared response IH(f)12. Because the true power
spectrum Px(f) is related by

No 2
Px(f) = IH(f)2 (153)

then, the following is approximately true

+1 14)
coIvPxWOl)PxWV2)I- PxOl) PxV2) [2.2 nf 22 +2.1.2]f, ) (154)

Analyzing this expression, it can be seen that the variance of the estimator decreases with
increasing X..
Having calculated the estimator variance at the output of the spectrogram, the next task
is finding out the variance of the spectral derivatives (i.e. Poisson moments). Again, due
to the intractability of a general solution, we will work with white noise signal for which
(152) holds. Assume that we are interested in the variance of the ith derivative of the spec-
trum at frequency f=fl

i = [Px(f) + N(f)] (155)

N(f) represents the estimation error. The variance of ci can be written as

var{ci} = E i)(f) (156)

If we treat the differentiation as a linear filtering operation with magnitude response
IH(f)l=lfl1, the variance can be written in terms of the power spectrum PN(fl,f) of the error
term as
var{ci} = fiP(fl,f)df (157)

The power spectrum of N(f1) is nothing but the Fourier transform of the covariance given
in (152), i.e.

PN(f,f) = cov {Pxf1) Px, (f2) } e-2fjdf2 (158)

Note that

P (f1, J) P, (Of) (159)

holds. We define

P, (f) P(O, f = ~ + x = e (160)

Inserting this into (157), one gets a tight upperbound for the Taylor's series variance as

N ie df (6 (2i)
var{ci} < e-jdf 4 2i+1 (161)

Two important observations that can be deduced from this are, first: selecting large
values of the time scale helps to decrease the variance of the moment variance. Second, as
expected, variance of the derivatives increase as a function of the derivative order. These
two results can be attributed to the lowpass nature of the time scale and the highpass
nature of the derivatives.

Choice of The Time-Scale

The choice of the time scale X has a crucial role in determining the size and the accu-
racy of the representation matrix whose elements are Poisson moments associated with
different bands and measurement times. X by itself, controls several parameters ranging
from the effective decaying window length to the maximum allowable bandwidth. This
section is devoted to the discussion of the overall effect of the time scale on the representa-

tion scheme.
Recall that the Poisson moments correspond to the derivatives of the recent magnitude
spectrum around zero frequency. The recency in time is achieved as a result of the decay-
ing exponential window that is embedded in the gamma filter. This exponential window
helps to emphasize the recent past of the input signal, thereby localizing it in time. The
effective length of the window is controlled by the time scale of the filter. A large time
scale implies a small window length. The moments represent the average spectral content
of the signal under the window. Hence, when dealing with nonstationary signals, one
should avoid averaging segments with different statistical characteristics, by keeping the
window length smaller than the stationarity period. The stationarity period can be thought

of as the rate at which the packets of information arrives. The effective length of the
decaying exponential window in the root mean square (rms) sense is derived in Appendix

p) (162)
xl (p) = (162)
xt (P) Z,~

where p is the percent energy of the window enclosed under the length xt(p). According to
this, 90% of the window energy is kept within the width xt(0.9)=1.15/X. Thus, the lower-
bound for the time scale can be set as

[ n Pl


where Ts is the stationarity period of the input signal. For speech signals this period varies
in the range 10 ms to 100 ms [65].
Notice that the time scale is constrained by the information rate, not the Nyquist rate
which usually assumes a higher value. The information rate also corresponds to the speed
at which the Poisson moments need to be sampled irrespective of the maximum frequency
content of the input signal. This relatively low sampling rate removes considerable
amount of burden over the shoulders of the processor following the gamma filter.
Although, the maximum frequency content of the signal has no direct effect on the
moment sampling rate, it dictates the number of bands required to cover the entire fre-
quency axis. Based on this argument, one can conclude that the low sampling rate is
obtained by parallel processing the frequency domain information using several bandpass

While, the period of stationarity imposes a lowerbound on the time scale, frequency
resolution is the determining factor on the other end. As already explained in Chapter 3,
the side effect of the time windowing shows itself in the form of frequency domain blur-
ring. That is, the resolution in time domain is attained at the cost of resolution in the fre-
quency domain. Any two frequency components that are closer to each other more than
the frequency spread of the decaying window will be identified as one, rather than two
separate components. Thus, the time windowing acts as a spectral smoother. Conse-
quently, the choice of the time scale cannot be larger than the minimum frequency resolu-

tion desired in the representation. As derived in Appendix C, the frequency spread of the
time window, or in other words the frequency resolution of the representation is given by

tan (p)
xf(f) 2 (164)

such that p percent of the total signal power lies within the band fe[-xf(p),xt(p)]. Merging
(163) and (164) together, one can write the limits of the time scale as

In 2--
S p < < Af (165)
T tan (p)

where Af is the desired frequency resolution.
Having constrained the time scale from above in terms of the desired frequency resolu-
tion, the next question addresses the choice of frequency resolution, itself. Consider the
two point Hermite interpolation problem discussed in Chapter 4 where one tries to approx-
imate the recent magnitude spectrum of a signal f(t) within the band of frequencies [0,o]
using the derivatives of the spectrum at the edges of the band. Let pd be the dominant pole
of F(s) in that band1. Since it is based on a first order lowpass filtering, the decaying expo-
nential window can be thought to be inserting an extra pole into the transfer function F(s)
at s=-X (Figure 25). We know from Chapter 4 that the size of the region of convergence
and the maximum width of the band are inversely proportional to the minimum distance
between the dominant pole and the interpolation knots at s0=0 and sl=jo For that rea-
son, one should avoid choosing the time scale such that the minimum distance between the
pole at s=-X and either of the interpolation knots is smaller than that of the dominant pole.

1. The Dominant pole is defined as the pole with significant weight that is closest to the band of
interest. If there are more than one prospective dominant poles, the center of gravity of these
poles is taken as the dominant pole.

Otherwise, the pole at s=-X would become the dominant pole and in return, further narrow

the band of approximation. In other words, one should select the time scale such that it
maximizes min { Ijp, IPd si }. This condition is guaranteed to hold if

X>1te {pd}I (166)

Another interpretation of (166) is that in terms of the frequency resolution, there is no
benefit in choosing X smaller than the real part of the dominant pole since the dominant

pole determines the smoothness of the true spectrum. In other words, there is no reason to

try to achieve frequency resolution higher than the sharpness of the true spectrum which is

essentially determined by the dominant pole in that band. Hence, provided that the infor-
mation rate condition (163) is satisfied, a suitable choice for the time-scale that ensures

both sufficient frequency and time resolution would be

S= 9te {pd} (167)




SO o Re{s)

Figure 25 Te scale should be comparable with the real part of the dominant pole
Figure 25 Time scale should be comparable with the real part of the dominant pole


In this section we would like to present examples of spectral estimates obtained by the

Poisson moment method. As a test signal, we picked the speech utterance 'greasy'

archived in the TIMIT database [66]. The original data was sampled at 8 KHz and stored
in 16 bit data format.

In the case of finite amount of data, the true spectrum is not known. In other words, a

reference that the estimates can be compared to is not available. This poses the biggest

challenge in quantifying the goodness of the spectral estimates. One can resort to statisti-

cal measures such as estimate bias, variance, etc, like we did in this chapter, but these are

average measures and do not help to assess the quality of the individual estimates. One

problem dependent solution would to be to test several estimation methods in the same

application and compare their performances. In our case for instance, one can compare the

classification error rate of a speech recognition system that uses the proposed PM method

as a preprocessor and other conventional frontends (tapped delay line, FFT, cepstral coef-

ficients, etc) in front of an ANN. This approach is not within the scope of this study, how-

ever. Interested reader can refer to the work of Tracey and Principe [59] who compared the

PM and tapped delay line preprocesors in an isolated speech recognition problem.
Figure 26 shows the logarithm of the spectral estimate of a segment of the speech sig-

nal. This is obtained by first applying a rectangular window of width 20 ms over the

speech signal and then computing the magnitude square of its Fast Fourier Transform.

Notice the spiky nature of the spectral estimate.

Figure 27 shows the estimate when the rectangular window is replaced with an expo-

nential window. The time scale of the exponential window was chosen to be 150 rad/sec.

This choice is mainly determined by the stationarity duration of the speech sound, which

was 20ms for our case. As a result of the window type used, this estimate is smoother than

the previous one. Although the original method is proposed to be used in the continuous

time domain, we were obliged to carry out the simulation in the discrete time domain

frea (Hz)


Figure 26 Spectral estimate using a rectangular window length of 20 ms

using highly interpolated versions of the sampled signal.1 In this approach, the signal is

first exponentially windowed and later its spectrum is computed using the discrete time

Fourier transform. The derivatives of the spectrum are computed numerically to give an

estimate of the Poisson moments. In Figure 27, the dashed line depicts the estimate

obtained by Poisson moments. We used only 1 moment per band. There were 16 logarith-

mically distributed bands that cover the frequency axis. This estimate may be thought of as

a 16 point Fourier transform of an exponentially windowed signal, which is far from being

satisfactory. Figure 28 and Figure 29 display the estimate results for the cases of 3

moments/band and 4 moments/band, respectively. The improvement in the estimates

obtained by the utilization of the moments is clearly seen. Finally, in Figure 30 we plot

the maximum absolute error between the spectral estimate and its estimate obtained using

Poisson moment technique, as a function of the number of moments used per band. This

figure demonstrates the decrease in the error achieved by using higher order moments. The

1. The original signal was sampled at 8Khz. We upsampled it by a factor of 40.

reason we chose to use the maximum absolute error stems from the fact that this is a more

pessimistic measure compared to the MSE in the sense that it penalizes impulse like

errors more and in the case of speech signals, let's us identify errors that occur around the

formant frequencies where the main features of the signal are kept.

1.2 ii

Exponential windowed
0.8 -------- Estimate

16 bands
1 moment/band
0.4- \ X=150


0 500 1000 1500 2000 2500 3000 3500
Figure 27 Spectral estimate using an exponential window (solid line) and its estimate
obtained by Poisson moment technique (dashed line) (1 moment/band).

In the rest of this section, we will analyze the dependence of the proposed method on

the time scale and the number of bands that are used. For this purpose, we define two error

criteria which complement each other. The first one is the power of the spectral estimation

error normalized with respect to the power of the desired spectrum (i.e. magnitude spec-

trum of the exponentially windowed signal). This gives an idea of the average error along

the entire spectrum. This criterion however, does not penalize narrow, impulse like errors

substantially. For that reason, we pick the maximum absolute estimation error normalized

with respect to the peak of the desired spectrum as the second criterion to evaluate the

goodness of the estimates. It should be noted that compared to the error power, this is not

only a pessimistic criterion, but due to its nonlinear nature, it is also sensitive to the param-

eters of the method.

0 500 1000 1500 2000 2500 3000 3500
freq (Hz)

Figure 28 Spectral estimate using an exponential window (solid line) and its estimate
obtained by Poisson moment technique (dashed line) (3 moments/band).

0 500 1000 1500 2000 2500 3000 3500
freq (Hz)

Figure 29 Spectral estimate using an exponential window (solid line) and its estimate
obtained by Poisson moment technique (dashed line) (4 moments/band).

Maximum Absolute Error









0 i i i
1 2 3 4
number of moments per band

Figure 30 Maximum normalized absolute estimation error in the entire frequency domain

We estimated the spectrum of segments of 10 'greasy' words that were uttered by the

same male speaker. In Figure 31, we plot the variation of the normalized error power as a

function of the time scale for one of the utterances. The improvement in the estimates as a

function of the the time scale is clearly seen. This is somewhat expected because, for large

values of the time scale the time window gets smaller decreasing the number of degrees of

freedom of the spectral content of the signal.

In Figure 32, we test the PM method using the same signal segment against the second

error criterion: normalized maximum absolute error. This plot too, demonstrates the

improvement in the estimate as a function of the time scale and the moment order.

As mentioned before, the results displayed in Figure 31 and Figure 32 were obtained

using a single word. Figure 33 and Figure 34 demonstrate the results of the same experi-

ment averaged over the set of 10 utterances. In both of these figures, there exists a time

scale threshold below which the utilization of high order moments can be actually disad-

vantegous. This is mainly due to the fact that the spectral estimates get spikier for small

values of X causing noisier derivative estimates.

200 250 300

350 400 450 500

Figure 31 Variation of the normalized error power as a function of the time scale for different number of
moments M used

Logarithm of the Normalized Maximum Absolute Error



-0.8 \




- S

M=1 "'



S......... M=2

----- M=3

.-.... M=4

100 150 200 250 300

350 400 450

' F

Figure 32 Dependence of the normalized maximum absolute error power as a function of the time scale for

different number of moments M.


- - . ..











\ M=1
......... M=2
S-- M=3
-..-. M=4
- -.. .

7 .

%0 133 Dee 2o0 20 o o o 400 4o 500s

Figure 33 Dependence of the average error power on the time scale. (M: number of moments).

100 150

200 250 300

350 400 450

Figure 34 Dependence of the average maximum absolute error on the time scale. (M: number of moments).

An alternate way of displaying the information in Figure 33 and Figure 34 is as shown in

Figure 35 and Figure 36. According to these graphs the fourth moment helps to decrease

the error for time scale values larger than 200.

Logarithm of average normalized error power

Logarithm of average normalized maximum absolute error













\ ......... M=2

\ ----- M=3



................ .......................

" :..< ..... ....


University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs