Title Page
 Table of Contents
 An equalization strategy for real-time...
 Least mean squares (lms) algorithm...
 Least mean squares (lms) algorithm...
 Determination of maximum lambda...
 Validation of concepts
 Conclusions and recommended extensions...
 Biographical sketch

Group Title: new adaptive algorithm for the real-time equalization of acoustic fields
Title: A new adaptive algorithm for the real-time equalization of acoustic fields
Full Citation
Permanent Link: http://ufdc.ufl.edu/UF00082223/00001
 Material Information
Title: A new adaptive algorithm for the real-time equalization of acoustic fields
Physical Description: vii, 220 leaves : ill. ; 29 cm.
Language: English
Creator: Spaulding, Jeffrey James, 1960-
Publication Date: 1992
Subject: Electrical Engineering thesis Ph. D
Dissertations, Academic -- Electrical Engineering -- UF
Genre: bibliography   ( marcgt )
non-fiction   ( marcgt )
Thesis: Thesis (Ph. D.)--University of Florida, 1992.
Bibliography: Includes bibliographical references (leaves 218-219)
Statement of Responsibility: by Jeffrey James Spaulding.
General Note: Typescript.
General Note: Vita.
 Record Information
Bibliographic ID: UF00082223
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: aleph - 001758321
oclc - 26708651
notis - AJH1378

Table of Contents
    Title Page
        Page i
        Page ii
        Page iii
    Table of Contents
        Page iv
        Page v
        Page vi
        Page vii
        Page 1
        Page 2
        Page 3
        Page 4
        Page 5
        Page 6
        Page 7
        Page 8
        Page 9
        Page 10
        Page 11
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
        Page 23
        Page 24
        Page 25
        Page 26
        Page 27
        Page 28
        Page 29
        Page 30
        Page 31
        Page 32
        Page 33
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
    An equalization strategy for real-time self-adjustment
        Page 39
        Page 40
        Page 41
        Page 42
        Page 43
        Page 44
        Page 45
        Page 46
        Page 47
        Page 48
        Page 49
        Page 50
        Page 51
        Page 52
        Page 53
        Page 54
        Page 55
        Page 56
        Page 57
        Page 58
        Page 59
        Page 60
        Page 61
        Page 62
    Least mean squares (lms) algorithm convergence with uncorrelated input data
        Page 63
        Page 64
        Page 65
        Page 66
        Page 67
        Page 68
        Page 69
        Page 70
        Page 71
        Page 72
        Page 73
        Page 74
        Page 75
        Page 76
        Page 77
        Page 78
        Page 79
        Page 80
        Page 81
        Page 82
        Page 83
        Page 84
        Page 85
        Page 86
        Page 87
        Page 88
        Page 89
        Page 90
    Least mean squares (lms) algorithm convergence with correlated input data
        Page 91
        Page 92
        Page 93
        Page 94
        Page 95
        Page 96
        Page 97
        Page 98
        Page 99
        Page 100
        Page 101
        Page 102
        Page 103
        Page 104
        Page 105
        Page 106
        Page 107
        Page 108
        Page 109
        Page 110
        Page 111
        Page 112
        Page 113
        Page 114
    Determination of maximum lambda with a divergent lms (dlms) algorithm
        Page 115
        Page 116
        Page 117
        Page 118
        Page 119
        Page 120
        Page 121
        Page 122
        Page 123
        Page 124
        Page 125
        Page 126
        Page 127
        Page 128
        Page 129
        Page 130
        Page 131
        Page 132
        Page 133
        Page 134
        Page 135
        Page 136
        Page 137
        Page 138
        Page 139
        Page 140
        Page 141
        Page 142
        Page 143
        Page 144
        Page 145
    Validation of concepts
        Page 146
        Page 147
        Page 148
        Page 149
        Page 150
        Page 151
        Page 152
        Page 153
        Page 154
        Page 155
        Page 156
        Page 157
        Page 158
        Page 159
        Page 160
        Page 161
        Page 162
        Page 163
        Page 164
        Page 165
        Page 166
        Page 167
        Page 168
        Page 169
        Page 170
        Page 171
        Page 172
        Page 173
        Page 174
        Page 175
        Page 176
        Page 177
        Page 178
        Page 179
        Page 180
        Page 181
        Page 182
        Page 183
        Page 184
        Page 186
        Page 187
        Page 188
        Page 189
        Page 190
        Page 191
        Page 192
        Page 193
        Page 194
        Page 195
        Page 196
        Page 197
        Page 198
    Conclusions and recommended extensions of research
        Page 199
        Page 200
        Page 201
        Page 202
        Page 203
        Page 204
        Page 205
        Page 206
        Page 207
        Page 208
        Page 209
        Page 210
        Page 211
        Page 212
        Page 213
        Page 214
        Page 215
        Page 216
        Page 217
        Page 218
        Page 219
    Biographical sketch
        Page 220
        Page 221
        Page 222
Full Text






Dedicated to

my Lord Jesus Christ
through Mary, Queen of Virgins,
in partial reparation of sins.


I would like to express my gratitude to my advisor, Dr. Jose Principe, for his

encouragement and support through the course of my studies and research. I would also

like to express my gratitude to my supervisory committee members, Professor Taylor,

Professor Childers, Professor Green, and Professor Siebein.

My work would not have been possible without the help and support of the students

of the Computational Neuroengineering Laboratory. I am deeply appreciative. In

addition I would like to express my thanks to Dr. Richard Henry and Mr. Phil Brink of

the United States Air Force for their constant support.

I would like to thank my friends and family. It has been difficult to have not had the

time to devote to the people I love. I appreciate their understanding. I wish to especially

thank my grandmother, Mrs. Ruth Moore, whose bequest made my studies possible.

Finally, I wish to thank my mother and father without whom I would never have had

the strength or courage to complete my program.


ACKNOWLEDGMENTS .................................................................... iii
A B STRA CT........................................................................... ................ vi

1 BACKGROUND....................................................................... 1
Introduction ................................................................................. 1
Basic Physics of Room Acoustics .............................................. 4
Psychoacoustics ........................................................... ................. 23
Consequences of Room Acoustics and Psychoacoustics
in the Solution of the Inverse Problem............................ ....... 34
SELF-ADJUSTMENT................................................... ......... 39
Introduction ....................................................................... ........... ...... 39
State-of-the-Art Equalizers .......................................... ........... 41
The Adkins-Principe Equalizer Architecture ................................ 48
Automatic Adjustment Using Adaptive Filters ............................. 57
The Wiener-Hopf Solution .............................................................. 64
The Gradient Descent Algorithm.............................................. 68
The LMS Algorithm with Uncorrelated Input ............................... 84
4 Least Mean Squares (LMS) Algorithm Convergence
WITH CORRELATED INPUT................................................. 91
Convergence of the LMS Algorithm with Correlated Input ........... 91
Conditions on gi for Convergence of .................................. ..... 93
Conditions on gp for High Convergence Speed of .......................... 101
Experimental Results ..................................................................... 102
Conclusions ....................................................................................... 108

LMS (DLMS) Algorithm........................................................... 118
Determination of Xm with a Divergent Gradient
Descent Algorithm ................................................................................ 115
Parameter Selection for the Divergent Gradient Descent
(DGD) Algorithm ................................................................................ 118
Determination of Xmax with a Divergent LMS Algorithm................ 120
Parameter Selection for the Divergent LMS Algorithm ................... 128
Integration of the DLMS Algorithm into the Equalizer Architecture. 139
Summary and Conclusions ................................................................. 143
6 Validation of Concepts ...................................................................... 146
Test Plan .......................................................................................... 146
Test Results ........................................................................................ 158
EXTENSIONS OF RESEARCH............................................. 199
Review .................................................................................................. 199
Recommended Extensions to this Research.................................... 205

ECVT Plots of Contiguous Epochs of Audio Data................... 208

REFERENCES......................................................................................... 218
BIOGRAPHICAL SKETCH.................................................................. 220

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy



Jeffrey James Spaulding

May 1992

Chairman: Dr. Jose C. Principe
Major Department: Electrical Engineering

This dissertation presents a solution for the problem of acoustic equalization.

Acoustic data are collected from a microphone at a listener location, and compared

with a source (CD player). An adaptive signal processing algorithm, based on the Least

Mean Squares (LMS), warps the CD signal to account for the filtering effects of the

listening room. The algorithm to adapt the coefficients of a multirate equalizer is

computationally efficient and can be performed in real-time with current

microprocessor technology.

As music is nonstationary, the LMS algorithm will need to undergo rapid

convergence to a new set of optimal filter coefficients each time the input signal

statistics change. As the LMS is a gradient descent algorithm, fast convergence implies

a step size selection which operates at the algorithm's edge of stability. An analytic

expression exists to determine the step size for uncorrelated input data as a function of

the maximum eigenvalue of the input data. This dissertation extends this work to cover

the case of correlated signals such as music. This result provides verification of

heuristic rules that have been proposed in the literature.

To date, formulations involving the maximum input signal power have been utilized

as an estimate of the maximum eigenvalue. In this dissertation a new method of

determining the maximum eigenvalue is proposed by allowing the LMS algorithm to

diverge. For a large and consistent domain of initial condition, iterations, and purposely

divergent step sizes, the maximum eigenvalue dominates the rate of algorithm

divergence. Simulations were pursued to determine the bounds of these variables for

robust operation. This methodology was utilized to analyze the performance of the

adaptive equalizer on. selected music epochs, and to validate the theory put forward in

this dissertation.



For millennia architects have designed structures with the intent of minimizing distortion of
acoustic signals propagating from a speaker or a musical instrument to a listening audience. From
the Greek amphitheaters of antiquity to the modern concert halls of today technologies developed
over the centuries have resulted in sophisticated techniques which have greatly improved acoustic
fidelity. Only in this century has technology been sufficiently advanced to approach the problem
of acoustic distortion from the perspective of equalization. Equalization anticipates the filtering
introduced by the enclosed space and pre-processes the acoustic signal to take these effects into
account. In this way when the acoustic signal arrives at the listener, the effects of the equalizer
filtering and the room induced filtering cancel each other.

The revolutionary development of the Compact Disk (CD) player, first introduced to the market
in 1982, has completely changed the direction of audio engineering. Research in audio engineering
is being directed towards creating all-digital audio systems. With the tremendous improvement in
microprocessor speed, audio signals can now be processed digitally in real-time. The research pre-
sented in this dissertation will provide a design for an equalizer which will be capable of real-time
self-adjustment. The design will be focused on providing a system to perform equalization in a
normal home listening room with standard audio components. It will be based on a new equalizer
architecture developed by Atkins and Principe at the University of Florida [1], which uses a multi-
rate filter bank. The Atkins- Principe architecture is enhanced so that filter coefficients in each
band of the filter bank will be continuously updated in real-time, with coefficient updates being
provided using principles from adaptive signal processing.

In chapter two background information will be provided on the current state-of-the-art of acous-
tic equalization. Background information on room acoustics, psychoacoustics, and the state-of-the-
art of equalizers will lead to conclusions regarding the limitations of acoustic equalizers. An im-
proved strategy of adaptive equalization based on the Adkins-Principe equalizer architecture is pro-

posed. Chapter two will discuss in detail the theory and operational characteristics of the
Atkins-Principe architecture. It will be shown to be an elegant and extremely efficient im-
plementation strategy. Chapter two will also discuss the relevant considerations in choos-
ing an adaptive algorithm for real-time updating of filter coefficients. On the basis of
algorithmic complexity and the speed of the current generation of microprocessors the
Least Mean Squares (LMS) algorithm will be justified as an appropriate algorithm for this

The LMS algorithm can be described as a stochastic gradient descent method which
converges due to its time-averaged behavior. For fast convergence speed and low misad-
justment in a nonstationary environment the algorithm must have an adaptive step size
which can be precisely controlled. When the statistics of the acoustic signal change dra-
matically, equalizer settings will be incorrectly set and the adaptation algorithm will need
to converge as quickly as possible to the new optimal equalizer settings in the least mean
squares sense. LMS step size is the key parameter to adjust in order to maximize conver-
gence speed. The theory guiding the optimal value of step size is well-understood for un-
correlated input data [2-3]. Chapter three will present the theory of LMS convergence for
the case of uncorrelated input data. The theoretical structure of the Wiener-Hopf formu-
lation, the stochastic gradient descent algorithm, and the LMS algorithm with uncorrelated
data will be discussed to provide the necessary framework for the more complicated case
of the LMS with correlated input data. Chapter four will extend the theory of the LMS al-
gorithm for the convergence properties of mean square error for the case of correlated input
data. This theory is necessary to understand adaptation performance for acoustic data
which by nature are correlated signals.

The divergence properties of the stochastic descent algorithm and the LMS algorithm
are discussed in chapter five as a simple extension of the convergence properties examined
in chapter three. The results of this study indicate that the maximum eigenvalue dominates-
a diverging gradient descent algorithm. It will be further shown that the optimal value of
the step size, after a change in the statistics of the input data, can be well-approximated as
a function of the maximum eigenvalue of the vectors input into the filters performing adap-

tive equalization. The standard technique for estimating Xmax is to assume that it is approx-
imately equal to the power in the input vector times the filter order. From systems theory
we know that the power is the sum of all eigenvalues, and thus, is less than ax This es-
timation is excellent under certain restricted conditions, e.g. low filter orders and high ei-
genvalue spreads. No such strong statements can be made more generally of this
approximation technique. A new method of approximating the maximum eigenvalue of a
system by allowing an adaptive gradient descent algorithm to diverge is introduced in chap-
ter five. The method is first developed for the stochastic gradient descent algorithm which
is based on the statistical properties of an ensemble of independent, identically distributed
systems. The gradient descent algorithm will provide a theoretical framework from which
to analyze the more complicated LMS algorithm which is based on the time-averaged prop-
erties of a single system.

Chapter six discusses the experiments used to test the equalizer architecture and the al-
gorithms used for real-time self-adjustment. The NeXT computer with a digitizing micro-
phone is used to collect and process data. The software used to simulate real-time
equalization with the proposed algorithms is discussed. Finally the test and evaluation re-
sults are presented.

Chapter seven summarizes the results of research presented in the dissertation. Specif-
ically it reviews the success of the new architecture proposed for real-time self-adjustment
of an audio equalizer. It reviews the theory of the convergence and divergence properties
of the LMS algorithm with statistically correlated input data. It reviews the theory and per-
formance of a new variation of the LMS algorithm which provides an estimate of the max-
imum eigenvalue of the input vectors into the equalizer filters. Finally chapter seven
discusses the many areas of this research that could be extended

Because the physics of room acoustics and the principles of psycho-acoustics motivate
many of the key design characteristics of the acoustic equalizer, the remainder of chapter
one will provide background information on the pertinent areas of these fields.

Basic Physics of Room Acoustics

Equalization is an inverse problem, i.e. the adaptive equalizer will seek a configuration
such that its response will be the inverse of the room response. This section will develop
the most simple features of room acoustics because of the basic understanding it will pro-
vide into the nature of the signals and systems on which the algorithms and methods intro-
duced in this dissertation will be applied. As electrical engineers are more familiar with
electromagnetic radiation than acoustic, an effort will be made to develop this introduction
to the physics of room acoustics in an analogous manner to the standard method of intro-
ducing the theory of electromagnetic waveguides, with the most important similarities
pointed out where appropriate. A listening room may be thought of as an acoustic coupler,
or communications channel, between an acoustic source and a listener. A further limitation
that the room is described as a minimum phase system will be imposed. The description of
the acoustic channel is particularly complicated because of the importance of both its tem-
poral and spatial dependency. Because of the wave nature of acoustic fields, the sound en-
ergy in an enclosed space may be considered to be the mechanical analogy of
electromagnetic radiation in a rectangular waveguide.

A discussion of the basic equations and definitions is first presented. Specifically the
pressure equation will be discussed as the principal descriptor of acoustic phenomena. The
conservation of momentum equation will be shown to supply boundary conditions for the
wave equation. A discussion of the reflection and transmission coefficients are discussed.
It is shown that many features from optics have analogous results in acoustics,e.g. Snell's
Law and the Law of Reflection. The prominent factor in determining the reflectivity of a
wall is the wall's characteristic impedance. The impedance, together with the room's geom-
etry, define the characteristics of the room. A brief discussion of acoustic impedance is
therefore presented. The wave equation for a room is developed in a method analogous to
methods used for electromagnetic waveguides. The application of this theory is discussed-
for a room which is a rectangular cavity with different types of impedance. Lastly the spa-
tial and temporal response of the room is outlined using methods from ray theory. Much

of the discussion is based on introductory texts of Egan[4], Kinsler et al. [5] and Kuttruff

Basic Eauations and Definitions

Sound waves are longitudinal waves of regions of greater and lesser pressure. An
acoustic field is described by a wave equation which expresses the deviation from ambient
pressure, p, as a function of spatial location r, and time t. The constant of proportionality
between the temporal and spatial derivatives is the speed of propagation of the medium, c.

2p 2p (1-1)

For a time harmonic plane wave the solution of the pressure equation is of the same form
as the expression for a plane wave of electromagnetic radiation. By separation of variables
the pressure equation reduces to the following familiar form, where k is the wavevector.

i(ot-kr) Kgm
p (r, t) = poe i2 (1-2)

The velocity of the vibrating particles, v, is related to the pressure field according to the
conservation of momentum. The density of the medium is represented by P.

DV (1-3)
Vp = -pa

By solving the above equations we can relate the pressure field to the velocity field by
an acoustic impedance, z = pc.

V(r,t) = V (ot kr) ] (1-4)

p (r,t) = pcV(r,t) = zV(r, t)


Sensory equipment, including the human auditory system, responds to the intensity of
the field. In the case of acoustic data, this corresponds to the average energy flowing across
a unit area normal to the direction of propagation per unit time.

2) FW
PJC2 (1-6)

In this discussion pressure waves will be considered to be plane waves in order to sim-
plify the mathematical treatment of the phenomena which will be discussed. Note that for
an acoustic point source the phase fronts would be spherical in an analogous manner to
electromagnetic point sources. The plane wave approximation is only valid when the radius
of curvature of the phase front is very large compared to the wavelength of the radiation.
We will assume in this discussion that the dimensions of the rooms being considered are
sufficiently large that this approximation is valid for frequencies over 1000 Hz. The room
under consideration will be a rectangular cavity. These highly idealized cases are studied
so that the important features of room responses are made clear. Before room responses
can be meaningfully discussed, the reflection of acoustic waves off a wall must be well-un-

Reflection of a Plane Wave by a Wall

The electromagnetic equivalent of reflection off a wall is a plane electromagnetic wave
reflecting off an infinite flat plate of complex impedance. Like the electromagnetic equiv-
alent the media impedances determine the behavior of the wave for a given geometry. Con-
sider the situation depicted in figure 1.1. Let pi be the incident pressure field, let Pr be the
reflected field, and let pt be the transmitted field, with corresponding wavevectors ki, k, and

Since we are assuming plane waves we can drop the y-dependency from our equations,
without loss of generalization. Recall that a wavevector K is defined as K = o/c ,
where c is the velocity of propagation in a given medium, and co is the radian frequency.
Let co be the speed of propagation in air at standard temperature and pressure (STP) and let


figure 1.1 The Law of Reflection and Snell's Law are valid for acoustic
radiation as well as electromagnetic radiation. The figure
above indicates the coordinate system and nomenclature
which will be used throughout the discussion of room

n be the ratio of the speed of propagation of a medium from that of air at STP, i.e.
n = co/ Let ux, Uy and uz be the unit vectors of our coordinate system as depicted in
figure 1.1. Then the incident, reflected, and transmitted wavevectors are expressed by equa-
tions 1-7 through 1-9, where IKR = nKO and E is the angle between the wall normal
and the wavevector.

By matching the pressure field at the wall, the reflection and transmission coefficients
can be calculated. The pressure waves propagating along the wavevectors given in equa-

Ki = nlK0 [- uxcosi + uzsinOi] (1-7)

Kr = nlKO [uxcosOr + uzsinr] (1-8)

Kt = n2K0 [- uxcos9t + usinOt] (1-9)
tions 1-10 through 1-12 are shown below.

Pi= Pi, 0e-inlKo [-XcosO + ZsinGiO 1ki (1-10)
-in Ko [XcosOr + Zsiner]
P = P, e Uk (1-11)

t Pt, 0e-in2Ko [-XcosG,+Zsin,] k (1-12)

The pressure field must be continuous at all point along the boundary, i.e.
Pi, interface +r, interface = t, interface These equations yield the Law of Reflection and
Snell's Law precisely in the form as the electromagnetic equivalent.
nlKOsinOi = nlK sinOr = n2K sinSt (1-13)

sinE. = sinO Law of Reflection (1-14)
1 T
nlsin i = n2sinet Snell's Law (1-15)

In addition, the conservation of momentum (equation 1-3) must be satisfied at the wall.
This will imply a matching of particle velocities at the interface. Note that because of the
plane wave assumption only the derivative with respect to x is non-zero in grad P. The ve-
nlKO -inKo, [-XcosO + ZsinO (1-16)
Vi = cosO.P. 0e (
O 1p t i, ki

-nlKO -inIKo [Xcos Or + Zsin Orl
Vr OP 1 cos rPr, e Uk (1-17)

n2Ko -in2K0 [-Xcos, + Zsin ,]
Vt = cosP2 t Oe Ukt (1-18)
t j t 2t,0

locity field propagating along the wavevectors given in equations 1-7 through 1-9 is de-
scribed by equations 1-16 through 1-18, shown above.

By a proper choice of the coordinate system, i.e. for the wall located at x=0, and by an
application of Snell's Law and the Law of Reflection, the phase related terms in V can be
ignored, as they are common to Vi, V, and Vt. Matching the particle velocities at the inter-
face yields the following expression for the reflection coefficient.

Pi, 0cosO Pr, OCOS r PtO csOt (1-19)
z1 z1 z2

2Pi, 0cosOi-Z2Pr, ocosr = Z (Pi, 0 + Pr, 0) cosOt (1-20)

0 (z cost + z2cosOi) = z2cosi-z1cosEt (1-21)
i, 0

Pr, 0 z2cosOi zl cosOt
R (E, ) = i (1-22)
Pi, 0 2cose( +zlcos0t

If we assume medium one is air, i.e. z = PoCo, and medium two is a hard wall, i.e.
z2 1 the above equation reduces to the form seen in most elementary books on acous-
z2cosOi PoCo
tics, R = The transmission coefficients are easily derived from the re-
election coefficients as follows.

Pt (Pi + Pr) 2z2cosOi
T(O, o) 12 + R =
Pi Pi z2 cOsOi + Zl cost

With the expressions for R and T it is clear that the impedances of the materials are the
key acoustic parameter of the material, and it is this parameter which determines a room's
acoustic properties.

Acoustic Impedance

Acoustic impedance can be separated into a real and an imaginary part. The real part is
the acoustic resistance, the component associated with the dissipation of energy. The imag-
inary part is the acoustic reactance, the component associated with temporary storage of en-
ergy. These components are made explicit in the expression shown below.

z (O, c) = R +j [ -K (1-24)

The acoustic reactance can be further factored into an inertance and a compliance term.
The acoustic inertance term is defined as the effective mass of the element under consider-
ation divided by surface area, M = with units of Kg/m2. The acoustic compliance is
S dx
defined as the volume displacement induced by the application of a unit pressure, C =
in units of m4sec2/Kg. A mechanical interpretation of a locally reacting wall is shown in
figure 1.2.

z (E, o) =R+j((oM- -) (1-25)

Let pressure correspond to voltage, velocity to current, resistance to resistance, compli-
ance to capacitance, and inertance to conductance. Then an acoustic system can be re-

placed by an equivalent electrical system. It is precisely the behavior of these types of cir-
cuits that we wish to equalize.

figure 1.2 The mechanical analogy or room impedance is illustrated by
in the figure by a spring for the reactive term and a damper
for the dissipative term.


Inductance Inertance

Figure 1.2 Coductance-Compliance
Figure 1.2
figure 1.3 A room may be modelled by its electrical circuit equivalent
as shown above.

General Theory of Room Responses

The solution to the wave equation in an enclosed space is a familiar eigenvalue problem.
The equation reduces to a Helmholtz equation with boundary equations as shown below.

The homogenous part of the Helmholtz equation provides a complete set of orthogonal ei-
genfunctions and associated eigenvalues, yn (r) and Xn


V2P + K2p = -iowpq(r) (1-26)

i, interface r, interface t, interface (1-27)

dV iOP
Vp = -p -- P (1-28)
As a result our source distribution, q(r), can be expanded as a sum of eigenfunctions.V
is the enclosed volume of the room.

q (r) = Cn n (r)(1-29)

n = q (r) VndV (1-30)

The unknown solutions to the inhomogenous Helmholtz equation can also be expressed
as a summation of eigenfunctions.

P (r) = IDnyn (r) (1-31)

Substitute the summations for P, (r) and q(r) into the inhomogenous Helmholtz
equation and solve for the unknown { Dn in terms of the known { Cn) as shown below.

Dn [V2n (r) + K2Vn (r) = -iowp Cnn (r) (1-32)
n n

The equation has a simple solution for a point source located at ro, i.e.
q(r) = aQ(r-ro).

Vn (r) An (rO)
PW (r) = iQ (1-33)
n J(K2 K2

Note that Kn it the wavevector associated with X. P. (r) is recognized as the Green's
function for the room. The Green's function expresses the acoustic transfer function be-
tween two points in the room. Note from the Green's function that acoustic fields satisfy
the Reciprocity theorem just as in the electromagnetic analogy. The Green's function can
be extended to a source, S, emitting a continuum of frequencies as shown below.

S (t) = JQ (o) eitdo (1-34)


P(r) = JP(o0)eiOtdo (1-35)

Equations 1-33 through 1-35 provide a complete wave theoretical description. A dis-
cussion of the implicatioiis of this formulation for a rectangular room are now in order.

Simple Rectangular Rooms

This section will apply the general wave theory to an empty rectangular cavity charac-
terized by a single impedance. While this example does not represent the complexities of
actual rooms, it nevertheless provides insight into the features of room responses with a
minimum of mathematical complexity. The connection between the reflection properties
of walls and the general wave theory will be made explicit through wall impedances. Three
classes of impedances will be considered. Consider the room depicted below.

The wave equation for this geometry is most easily applied in the rectangular coordinate
system. The room eigenyalues are found by solving the homogenous Helmholtz equation
shown below.

+ + + K2Y = 0 (1-36)
ax2 y2 2

The equation is easily solved by the technique of separation of variables. Let
P (x, y, z) = W (x) V (y) V (z) Substitution of this expression into the wave equation

Z y

figure 1.4 A room with dimensions LLy, and Lz and walls with
uniform impedance will be evaluated using the wave
theoretic formulation.

leads to three ordinary differential equation plus the separation equation as shown below.

d (x)+ 2 = 0 (1-37)

2 (y) +K2 = (1-38)

d2 (1-39)
d z2 z+K

K2 =K2+K2+K2 (1-40)
x y z

The solution of these equations is trivial.

(x) = A e-ik- + A2eikx (1-41)

yV(y)= Be-iky +B2eikyy (1-42)

S(z)= C1e-ikzz 2eikz (1-43)


The constants A1,A2,B1,B2,C1, and C2 are determined by matching boundary conditions.
The solution of three specific examples are outlined.

Case i. Re[z]-= and Im[z]=very large.

The method of solution of the three ordinary differential equations is identical. Consid-
er the equation in x. As a boundary condition guarantee the conservation of momentum is
satisfied at the walls (x=0 and x=Lx).

dV iMp d iop
Vp -p P= = z-P P = 0 (1-44)
at z dx z
As limizI -> -+ 0 The boundary conditions are satisfied as follows.

dx==: = iK 1 -iKxA A = A (1-45)

d iKLx -iKL =-K
d x=L = iKA (e' --e KLx) = --2Al SinLx= Lx

a is any integer. The two boundary conditions have been used to solve for A2 and Kx. Sim-
ilar solutions are found for B2,C2,Ky, and Kz. Substituting these results in the expression
for Y (x, y, x) and the separation equation leads to a solution of the room eigenmodes and
eigenfrequencies as shown below.

anx pf y yn iz
T(x,y,z) = Dcos cos cos (1-47)

cK c a 2 2 2
f(a, P,' ) 2 = 2( + ) (1-48)
y z

The constant D=AIBICI is determined from an initial condition.

case ii. Re[z] = 0.

Satisfy the new boundary conditions at x-0 and x=Lx. A solution for A2 and Kx can be
found by solving the two transcendental equations or by using methods form the theory of

d Kpc Kpc1-4)
x = 0 = A + iKx) +A2 iKx (1-49)

d A Kpc iKxLx iKxLx Kpc-iKxLx -iKL
Ix' x Lx -A (--e +iKxe ) +A( iKxe ) =

conformal mapping. If X < 0, the corresponding eigenfrequencies are lower than for the
hard wall case (Im[z]>>). For X > 0 the corresponding eigenfrequencies are higher than
the hard wall case. Because the impedance is completely reactive, the room will continue
to support standing waves, but with phase terms included in the sinusoidal terms of the ei-

case ii. Arbitrary z.

The form of the boundary conditions are the same as case ii with X replaced by z. Ei-
genmodes can not support standing waves because of the lossy component of z represented
be (X)
by e-

T (x, y,z) = De-C(xy s z) COS( +x) cOS ( +Z )COS(
X y z

It is clear from matching boundary conditions in the three cases outlined above that the
room impedance is key to understanding the behavior of the room. By using the eigenmode
expansion technique and the inhomogenous Helmholtz equation the pressure field can be
determined for any location in the room. An initial condition is also required in order to
evaluate D. The eigenfrequencies indicate those frequencies at which the room has reso-
nant properties. Note that the number of eigenfrequencies increase as the cube of the upper
frequency limit of the source. The wave theoretic technique provides a complete descrip-
tion of the acoustic signal in an enclosed space, and provides a solid theoretical foundation
for the field of acoustics.

Temporal Properties of a Listening Room

Each reflection of an acoustic wave must be described by its delay in arrival from the
direct acoustic wave at the listener location, the direction from which the reflected wave ar-
rives at the listener location, and the strength of the reflection. In this section we will dis-
cuss the time delay and strength of reflected acoustic energy. The rate at which reflected
acoustic energy arrives at a listener location is an important factor in evaluating the quality
of a listening room. Ray theory provides a simple and relatively accurate method of deter-
mining the rooms's temporal properties. This theory is completely analogous to ray theory
from geometric optics, and it will therefore be applied with a minimum of discussion. Like
geometric optics, geometric acoustics ignores interference and refraction effects. Never-
theless it is accurate for rooms of ordinary size for acoustic frequency content in excess of
1000 Hz.

If a room is constructed of a series of planes, image sources can be found by succes-
sively reflecting the original source and each subsequent image source about the planes in
a lattice constructed by using the room as a unit cell. The lattice and a cross section plane
are shown in figure 1.5. A spherical shell is chosen with inner radius at distance ct and outer
radius c(t+dt), where dt is much less than t. The number of sources in the spherical shell
corresponds to the number of mirror reflections between t and t+dt. The volume of the
spherical shell, assuming dt source per room. The number of image sources can be found by dividing the volume of the
shell by the volume of the room. As t increases the number of mirror sources increases as

Number of Reflections = 4 dt (1-52)

As time increases, the reflections have less and less energy associated with them. The
higher order reflections lose energy at each reflecting plane, the amount of which depends
on the characteristic impedance of the wall. An approximation of how the energy decreases
can be made from geometric acoustics by multiplying the initial energy of the direct ray in-

side the listening room by R2, each time the ray path from an image intersects the room
walls or an image room wall.

- ~- -- I-- a' -

- -

figure 1.5 Geometric acoustics can be used to explain the temporal
properties of reflected energy in a room. Image sources can
be formed by successively reflecting the original source and
each subsequent image source about the planes that are
constructed by using the room as a unit cell.

Figure 1.6 demonstrates the concept of an echogram, a graphic method describing the
temporal response of the room. The height of each line represent the intensity of reflection.

0 -- .--

* ..^





~L~s~sZ~ Illl~t~F~;~;~;~bbq~(

During the discussion of psychoacoustics, the temporal properties of an audio field, which
generally indicate a pleasing listening room, will be discussed.

degree of shading represents
the number of reflections per unit time

figure 1.6 An echogram indicates that as the time delay from the direct
field increases, the number of reflected waves increase, and
the intensity decreases.

The rate of decay in the intensity field is an important factor in creating a good quality
listening room. Recall that the intensity of a wave decreases as a function of the square of
the distance travelled, (ct)2. The intensity of the field will also be decreased due to the at-
tenuating influence of air (or any medium through which the wave traverses). Let a be the
e-folding time constant of this attenuation. Each time the acoustic wave reflects off of a
wall, the intensity will decrease by the square of the wall reflectance. The decay of the in-
tensity of the acoustic wave is summarized in the equation below, where 10 is the intensity
at t-0, and n is the number of reflections of the ray.

= 0 e-actR2nt -act+ntlnlR (1-53)
(ct)2 (ct)2

It is not possible to follow the path of each ray and perform the above calculation. In-
stead the calculation is made by computing the average number of reflections of a ray (n),

assuming the acoustic field is diffuse, i.e. the rays are uniformly distributed over all possible
directions of propagation. The assumption of a completely diffuse wave is never achieved
in an ordinary listening room, and the attenuation of the reverberant field will have large or
small deviations from the simple calculations being outlined depending on the amount by
which the actual acoustic field differs from our assumption of diffuseness.

First express the number of reflections per second which occur at walls perpendicular
to the x-axis for a ray propagating at an angle E with respect to the x-axis, nx (E) .

nx (0) = -cosO (1-54)

Average over all angles O.

(nx ) c (1-55)
S 2L

Similar results are obtained for (ny) and (nz). The average number of reflections along
all three axes is expressed below where S is the surface area of the room and V is the room

c 1 1 1 cS
n = (nx) + (n ) +(z L y -+ ) 4V (1-56)

Substituting the above into the intensity equation yields the following.

I0 +act+ tlnlRI2
0 4V
= t) 2e (1-57))

The reverberation time is taken as that time at which I(t)/Io = 1X10-6, and the expression
for this time is known as Sabines' Law.

4aV-SlnlR12 (1-58)

Spatial Properties of a Listening Room

The spatial distribution of acoustic energy is a key factor in constructing a listening
room. A diffuse field can be defined as a field in which the intensity in a spherical shell
subtended by a solid angle, df, is independent of the direction in which the solid angle is
pointed. For the case of a room which is a rectangular cavity with hard walls, no matter
how small dR, there exists a time at which the field is diffuse. Of course there are no such
walls, and clearly, for a rectangular cavity, the field will not be diffuse. For example, im-
ages located in directions along the covers of the room experience more attenuating reflec-
tions than the other sources.

Fortunately walls are not perfectly smooth and they partially diffuse the acoustic energy
at each reflection. Consider the following simple model. Assume at each reflection the per-
centage of energy reflected specularly is s. Then 1-s represents the portion of energy which
is reradiated by the wall diffusely. This is, of course, an overly simplistic model. All inci-
dent energy is reradiated by the wall according to a directional distribution function. Nev-
ertheless this model does demonstrate approximately the diffusion process. In the steady
state the power present in the total reflected field and in the field reflected specularly can be
related to the reflection coefficient, R, and to s.
P o IR12n (1-59)


Ps snlR2n (1-60)

The percentage of energy power which has been diffused increases as s decreases as shown


P-P (1 sn) R2n
s n=
d = p ~-- =

Specular Reflection Constant S


figure 1.7 As the reflection constants decreases and the specular
reflection constant increases, the percentage of diffuse
energy in the room increases.

Even more important than wall roughness are the diffusing effects of room furniture and
the deviations of most rooms from a rectangular cavity. Even the presence of people in a
room act to diffuse the acoustic field. Furthermore in the discussion ofpsychoacoustics it
will be shown that a pleasant listening environment can be created with far less than a com-

pletely diffuse field.

Diffuse Energy = f(R,S)


An equalizer will not be capable of exactly reproducing the time varying pressure field
recorded in a concert hall or recording studio. The engineering compromises which must
be made in designing the equalizer must focus on restoring the most psycho-acoustically
significant features of the original signal. The design methodology will be such that impor-
tance will be placed on maintaining only those features of the waveform which a listener
will perceive in an average home listening room. This brief introduction to psycho-acous-
tics is intended to motivate the signal processing objectives of the adaptive equalizer. Spe-
cifically, the level to which signal coloration can be accepted without a loss of listener
enjoyment is discussed. This is largely determined by the frequency response and frequen-
cy resolution of our hearing, and of the masking properties of the signal itself. A descrip-
tion of how the room-induced temporal and spatial characteristics of the signal contribute
to a pleasing sound is investigated. Finally the dynamic range of human hearing is de-
scribed. Stereophony is not discussed as this research does not attempt to process signals
to alter their stereophonic properties.


In the time domain a system neither adds coloration to or removes coloration from a sig-
nal if its frequency response can be characterized as shown below.

h (t) = kS(t- T) (1-62)

The function implies no change in the shape of the waveform. The output is a delayed and
amplified (or attenuated) replica of the input. Performing a Fourier transform of the im-
pulse response function gives the frequency domain restrictions for such a system.

H (o) = ke


The frequency domain requirements for an ideal room are the following:

i. The system must have a constant magnitude response, H (o) I = k.

ii The phase response must be linearly proportional to frequency, t (co) = -cot.

Violation of i. can lead to an unintended emphasis in some frequencies over others in ways
not intended by a composer or performer. Violation of ii. gives rise to non-linear phase.
Non-linear phase causes certain frequencies to pass through the system faster than others.

To accurately reproduce the sensation produced by the original acoustic waveform it is
essential that the sound reproduction system have a flat magnitude response. Obviously
some degradation from this standard will not alter the perception of the listener. Neverthe-
less, the fundamental difference between middle C and any other note is that the energy of
the signal is concentrated at 262 Hz and its harmonics. An attenuation of the frequency re-
sponse at middle C, if other unmasked information is present in the signal, will negatively
impact the fidelity of the signal because of the lessening of emphasis on the C-like quality
of the sound. More precisely, we are interested in maintaining the pitch and timbre of the
signal. Pitch of a musical signal is primarily determined by frequency, but is also influ-
enced by the shape of the waveform and the loudness of the signal. The frequency depen-
dency is depicted in figure 1.8, where pitch in mels is plotted against frequency, for a signal
with a loudness level of 60 phons. Note that a change in frequency has a greater effect on
pitch above 1000 Hz than below.

The loudness dependency of pitch can be quite dramatic for pure sinusoidal tones. For
music signals the loudness dependency is much less significant. As an example, to main-
tain the same pitch when increasing the loudness of a musical signal from 60 to 100 phons
requires at most a frequency shift of 2%.

If a signal is not a pure tone, but consists of harmonic tones, due to the nonlinear pro-
cessing of the ear, sum and difference signals are generated. The difference signals deter-
mine the pitch perceived by the listener. For example a signal consisting of the first four
harmonics of a 100 Hz tone, all of approximately the same intensity as the fundamental, is

perceived as a signal of approximately 160 mels. If the fundamental is removed, the dif-

ference signal is unaltered, and the listener still perceives a 160 mel pitch.These effects

3 0 0 0 ..................... ........................ ...................... ....................... ........................ ........................ . ............

7 2000 -

500 *

3 4 5 6 7 8 9

figure 1.8 Pitch is plotted as a function of frequency for a signal with a
loudness of 60 phons [6].

are not quite so dramatic in music signals as the harmonics are usually considerably less

intense than the fundamental.

Assume that a series of non-harmonically related frequencies do not mask each other.

If we wish to maintain pitch, the relative loudness of the frequencies must be unaltered by

the electronic equipment or the listening room. For a pure sinusoidal it is possible to hear

changes in loudness of only 0.5 phons [6]. Note also that the perceived loudness is a func-

tion not only of the acoustic intensity of the signal, but also the signal frequency. Figure

1.9 illustrates this effect for two equal loudness contours. For frequencies over 1000Hz.,

as frequency increases, the acoustic intensity must increase to maintain the same loudness


Surprisingly, controlling the phase response has no appreciable effect on maintaining

pitch. The phase response, except in extreme cases, cannot be perceived by listeners hear-

ing music signals. Thus while a lack of phase distortion is required for the shape of the

wave to be unaltered by a system, the same requirement is not generally necessary for a sig-

nal to be psycho-acoustically equivalent. The results of Suzuki, Morita, and Shindo[7] in-

dicate that the effects of nonlinear phase on the quality of transient sound is much smaller

compared to the effect one would expect from the degree of change in the signal waveform.

60 dB contour. .
6 0 ....." .... .. .............-.. .............. ..

0 ................................. .................. ... ...................

10 -
o 0 dB contour

10 2000 4000 6000 8000 10000 12000 14000 16000
frequency [Hz]

figure 1.9 Equal loudness contours are presented as a function of
frequency and acoustic intensity [5].

Hansen and Madsen[8] found similar results. In their experiment a signal had all of its

frequency components shifted by 900 in phase, producing the most severe distortion pos-

sible in the shape of the waveform while maintaining the magnitude response. Only for

fundamental frequencies below 1000 Hz was a distinctive difference perceived by the lis-

teners involved in the test. The relative insensitivity to phase will be seen to be critical to

the operation of an adaptive equalizer.

Maintaining signal timbre is also a basic requirement in a good sound system. Those

features of the signal which are important to the timbre of sound are poorly understood, and

the study of timbre is still an area of active research. The transient behavior of an instru-.

ment being excited effects the initial behavior of the pressure field. The transients effect

the amount of energy present in each of the overtones. This is an important factor in deter-

mining the unique sound of a particular instrument. Amazingly, our hearing system is able

to pick out these cues even when many instruments are orchestrated together.

Perhaps the most problematic factor for which to account is the level of masking caused

by certain frequency components of a waveform on others. Masking is the degree in which

the threshold of audibility of a signal is raised for non-harmonically related frequency com-

ponents in the same signal, or in a completely separate waveform. Masking occurs when

one signal has frequency components which are close to the frequency content in another

signal. It has been clearly demonstrated that lower frequency components more effectively

mask higher frequency components than vice versa. These effects can be explained by

physiological phenomena. Lower frequencies produce harmonics which stimulate the re-

gion of the basilar membrane sensitive to higher frequencies, thereby masking the sound of

higher frequencies. Figure 1.10 shows the masking effect of a 400 Hz sinusoidal at 60 dB.

The threshold increases most dramatically in the region slightly above 400 Hz.

500 1000 1500 2000 2500 3000 3500
malsled fren uene', QF4rl

4000 4500 500(

The masking effects of a 400 Hz signal are demonstrated
above. The region of the basilar membrane which responds
to the 400 Hz waveform will raise the audibility threshold of
higher frequencies which stimulate the same region of the
basilar membrane [5].

figure 1.10

1`~1~---~--~~-~~-~ '~-I'

Determining exact design criteria to guarantee a true reproduction of pitch and timbre
is made difficult by the complicated dependency on signal frequency content, loudness,
masking, and the transient behavior of the energy distribution in the harmonics of instru-
ments. For example, if most of the time signals above a certain frequency are masked by
lower frequency signals, it would not be important to control the attenuation of these re-
gions of the spectrum. The simplest analysis assumes the worst case scenario in which
masking does not occur, regions of the spectra are of a loudness just at the audibility limit,
and frequencies must be controlled at a level at which any error in loudness would be below
the 0.5 phon detection limit. To maintain timbre we wish to be able to control transients in
the signal caused by the excitation of instruments. Fortunately the phase of the signal will
not have to be carefully preserved because of the relative insensitivity of our hearing to
these effects.

Practical implementation of these requirements is possible only if the frequency reso-
lution of hearing is sufficiently large that filters of reasonable order can carry out the nec-
essary inverse filtering. Frequency resolution for two sinusoids is known as the difference
limen. It has a relatively constant value of 3 Hz below 500 Hz. Above 500 Hz frequency
resolution decreases approximately as a linear function of frequency (see figure 1.11).

= 3 Hz for f < 500 Hz
0.0030f for f > 500 Hz (1-64)

Clearly it will be necessary to control pitch and timbre with a much finer degree of resolu-
tion at low frequencies than at high.

The human hearing organ can be described as a parallel filter bank with characteristic
bandwidths. Fletcher and Munson [9] found that the detection threshold of a tone in broad-
band noise is not affected by the noise bandwidth until it falls beneath the "critical band-
width", which are shown in figure 1.12.

Psychoacoustic Responses to the Temporal Aspects of Reflected Energy

Reflections under certain circumstances serve to enhance the quality of sound by adding
to the direct wave in a way which is reinforcing. For musical signals the reverberation of
the room when within certain bounds gives a pleasing effect. Under other circumstances

figure 1.11

8 1 1.2 1.4 1.6 1.8 2
frequency [Hz] x104

The difference limen gives the frequency resolution of two
sinusoidals. An acoustic equalizer will need to control the
frequency response of low frequency bands with a finer
resolution than high frequency bands [5].

reflections can cause highly undesirable effects. Although many effects can be explained
partially through the physiology of hearing, perhaps the most important factors, at least in
the reproduction of music, are the types of environments with which we have been habitu-
ated and from which our aesthetic senses have been influenced.

The discussion of the temporal effects of sound fields begins by finishing our discussion
of coloration. Strong, evenly-distributed echoes can cause serious degradation in the accu-
rate reproduction of an acoustic wave's spectral distribution. Consider a room with the fol-
lowing simple impulse response.

h(t) = an~(t-nTo)

figure 1.12


100 1000 10000
frequency (Hz]
Because our hearing can be described as a parallel filter
bank, acoustic equalizers are designed to simulate the critical
bandwidths High quality equalizers are designed to have
1/3 octave bands because of their similarity to critical
bandwidths, as seen above.

The power spectral density indicates equally spaced resonances.

IH () 12 I 2
1 2acoscoTo+ a


Figure 1.13 illustrates the frequency response of the room for different echo periods, TO.
As TO becomes larger the frequency components become less separated. Because of mask-

ing effects, the threshold, at which the spectral distribution of the audio signal is perceived
to change, decreases as To increases.

Figure 1.14 gives results for the threshold level at which an average listener perceives
noticeable coloration induced by a reflected wave. These measurements are based on lis-

tening tests of six different music selections.

I frequency [l/T0 Hz]
figure 1.13 As the echo period, To, increases, the frequency components
are more closely separated. Due to masking, coloration
becomes less audible.

In addition to our sensitivity to the frequency domain, our hearing is affected by time
domain properties of reflected waves. Reflected energy which is just barely perceived acts
to increase the loudness of the signal and to change the apparent size of the source. At a
higher loudness and a small delay, the perceived source direction moves. For a large delay
the reflected energy is perceived as a highly annoying echo. Figure 1.15 shows the intensity
at which a distinct echo is perceived as a function of time delay.

If the delay is sufficiently short, the reflected wave can be 10 dB higher than the direct
wave without causing echoing. This phenomenon is know as the Haas effect after the dis-
coverer, and is an important factor in the design of auditoria [10].

Acoustic clarity is a measure of our ability to resolve fine temporal structures in acoustic
waves. The faster the tempo and the higher the degree of attack on the instruments, the
higher is the necessary degree of room clarity. Many measures have been suggested. All
of them rely on determining the amount of acoustic power over a period of time represent-
ing "early" reflections divided by a normalization. One index used for determining the clar-
ity of rooms for musical signals is given in equation 1-67. A value of C = 0 dB provides




20 40 60
echo delay [msec]

80 100

As the echo delay decreases, the reflectivity of the room
surfaces must decrease in order to avoid a perceptible change
in the coloration of the acoustic signal. For long delays,
relatively intense echoes may exist without causing
noticeable coloration [5]. This figure shows the level of
10log(a) at which lesteners perceive coloration.

As the time delay of the echo increases, its intensity must
decrease in order to avoid an annoyingly distinct sound. The
above figure shows the threshold intensity for hearing
distinct echoing [5].

figure 1.14

figure 1.15

delay of reflected wave [msec]


sufficient clarity for fast staccato passages. A value of -3 dB is usually acceptable for sym-
phonic music.

Sh2 (t)dt
C = lOlog 0 (1-67)

Sh2 (t)dt

Reverberation time of a field is a measure of how the room damps out an acoustic
source. It is usually determined by measuring the time required for the field intensity to
decrease by 60 dB from the moment the acoustic source is turned off. The listening room's
reverberant field acts to mask the details of a music performance which listeners find un-
pleasant, e.g. a minor lack of synchrony in an orchestra. The ideal reverberation time de-
pends on the type of music being performed, and on local styles and current fads. For
chamber music a reverberation time consistent with the rooms in which the compositions
were originally performed is preferred, which is from 1.4 to 1.6 seconds. For symphonies
from the romantic period or pieces utilizing large choirs, a reverberation time of over 2 sec-
onds is preferred. For opera a shorter reverberation time is preferred so that the libretto may
be more easily understood. As an example, the La Scala opera house in Milan has a rever-
beration time of 1.2 seconds [11].

Psychoacoustic Response to the Spatial Aspects of Reflected Energy

For a room to have a good acoustic ambience the listener must have a sense of spatious-
ness. Originally researchers believed that spaciousness was caused by hearing spatially in-
coherent fields. It has been shown, however, that this gives rise only to a phantom source
direction, and does not provide a sensation of spaciousness. The following conditions are

i. The field must be temporally incoherent.

ii. The intensity of the reflected energy must surpass the audibility threshold.

iii. The reflected energy must have components with time delays of less than 100 msecs.

iv. For early reflections, the field must be spatially incoherent, with components

arriving from the lateral directions being of primary importance.

These requirements are generally met in large rooms, but are increasingly difficult to en-
sure in smaller and smaller rooms. It has been shown [6] that spatiousness is independent
of the style of music being performed.

The dynamic range of human hearing is 140 dB. If our hearing were more sensitive it
would not be beneficial as we would start hearing the Brownian motion of air molecules.
In processing music digitally, ideally a sufficient number of bits are available to ensure cov-
erage of the entire dynamic range of musical works at a resolution consistent with our phys-
iological capabilities. The standard CD format provides 16 bit encoding of amplitude
information (approximately 90 dB). Equalizer technology should aim to provide no further
degradation in dynamic range than the limit defined by the CD encoding.

Consequences of Room Acoustics and Psychoacoustics
in the Solution of the Inverse Problem

Several important inferences can be drawn about potential solution methods to the room
inverse problem from the discussion of room acoustics and psychoacoustics. Two obvious
approaches are available as solution methods: 1) a theoretical modelling of the physics of
the listening room acoustics, and 2) experimentally measuring the characteristics of the

The first approach has been briefly reviewed in the discussion of room acoustics. The
key acoustic parameter, complex acoustic impedance, must be carefully measured for each
material present in the listening room. Using these impedances and the usually complex
geometry of most home listening rooms, the Helmholtz equation must be solved. Note that
matching boundary conditions will be impossible to perform analytically, and will require
a numerically intensive calculation using a finite elements algorithm. As a result of the
computations, a set of room eigenvalues and eigenmodes will be determined, from which
the room's Greens function is determined. Even for the case of a simple rectangular room,

the number of room eigenmodes increase as the cube of the upper frequency limit of the
source. For a room of average dimensions, i.e. Lx = 5 m., Ly = 4 m., and Lz = 3.1 m., be-
tween 0 and 20,000 Hz. there are over 57,000,000 [6] resonant room modes. As an addi-
tional complication, except at low frequencies, the modes are extremely dense in the
frequency domain, i.e. the half-width of a mode is much greater than the separation between
modes. Thus the majority of modes are mutually coupling.

The aforementioned calculations are so computationally burdensome that they can not
reasonably be performed. Even if a complete set of modes were available, from which the
Greens function inverse could be calculated, the processing problem would not be feasible.
For the simple room geometry discussed above, there were 57,000,000 room modes, and
an FIR filter length of 100 million tap weights would be required. Realizing that implemen-
tation of such a large filter is impossible, a selection of the most important room modes
would need to be calculated. Because of the mutually coupling nature of the modes, and
the changing spectral distribution of the acoustic source, the "important modes" would be
constantly changing.

The rigorous eigenmode expansion of the room response, the subsequent calculations
of the room's Greens function, and implementation of the inverse on hardware, can not be
performed. As a simplification in the computational complexity of describing a home lis-
tening room, a geometric acoustics approach could be attempted. The previous discussion
makes clear, however, that geometric acoustics gives insight into the time average proper-
ties of the listening room. Depending on the nature of the acoustic source, the approxima-
tions could yield poor results. In addition, the computational complexity of characterizing
the complex acoustic impedances for the room materials and the complicated geometry of
the listening room, continue to result in a huge computational problem. Figure 1.16 sum-
marizes the rationale for rejecting approaching the inverse room problem by modelling the
physics of the room.

The experimental approach must be selected, and this approach is discussed in detail in
chapter two. Several implications of this method must be addressed in light of the basic
principles of room acoustics and psychoacoustics in order to design an intelligent tech-

unique. The experimental approach can be summarized as follows. The listening room is

excited by a broadband noise source. A measurement is made of the acoustic pressure at
the listener location. This measurement is compared with the source signal band by band

with an octave filter bank. An inverse of the room is approximated by ensuring the energy

distribution of the source signal is maintained by the measured signal.

Reason to Reject Waye Theoretic Geometric
Approach Optics Approach

I. Acoustic Impedances of Materials X X
2. Complicated Boundary Conditions. X X
3. Computational Complexity. X X
4. Resultant Filter Orders.
5. Discernability of Important X
Room Modes.
6. Limited Accuracy. X
7. Time-Averaged Properties Only. X

figure 1.16 The figure demonstrates the reasons that modelling the
physics of room acoustics is not feasible for solving practical
room inverse problems.

The discussion ofpsychoacoustics makes clear that acoustic high fidelity requires main-

taining signal pitch, and the level of spatial and temporal coherency, with sufficient spectral

resolution to match the psychoacoustic and physiological characteristics of human hearing.

Equalization can not restore true audio fidelity of the acoustic field as perceived at the lo-

cation of the signal recording. Without a significant increase in the sophistication of the

recording process, coherence information will continue to not be encoded on the CD. As a

result, properties such as clarity, spatiousness, apparent source size, etc. will not be restored

by equalization. The coherence properties of the signal received by the listener in the home
listening room will be determined by the listening room itself.

By ignoring the signal coherence properties, the inverse problem is radically simplified.
In addition we have assumed that the listening rooms are minimum phase. Limiting the in-
verse problem to minimum phase systems does not significantly reduce the robustness of
the technology. From the principles of room acoustics it is known that for a listening room
to be non-minimum phase it must have a bad acoustic design, e.g. a rectangular cavity with
one axis much longer than the others terminated by walls with very low acoustic resistance.
The assumption of minimum phase has the important characteristic that equalization will
not require poles to be placed outside the unit circle.

The filter design problem has been reduced in scope to removing gross coloration ef-
fects of the stereo electronics and the listening room at a sufficiently high level of spectral
resolution that it matches the characteristics of human hearing. The summary of psycho-
acoustics indicated that the phase response of a system does not affect our perception of
pitch. The filter designer is left with the simpler task of calculating the inverse filter which
inverts the magnitude response of the listening room at a resolution of approximately 1/3
octave. The following figure demonstrates the simplified set of design criteria.

Design Factor

I. Spatial Coherence

2. Temporal Coherence
3. Magnitude Response
4.Phase Response

5. Frequency Resolution

6.Amplitude Resolution
7. Stability


Not encoded on CD

Not encoded on CD

Not pyschoacoustically
Must match critical
16 bits encoded on CD
room responses are
usually minimum phase

The figure demonstrates the design criteria for a
measurement based equalization strategy.

figure 1.17




Chapter two will introduce the current state-of-the-art equalizer technology. It will be shown
that this technology does not make efficient use of an equalizer's limited degrees of freedom. Be-
cause of the finer frequency resolution required in the lower frequency bands, high filter orders are
required to control the small critical digital frequencies. In the high frequency bands however, the
spectral resolution will be finer than necessary. The Adkins-Principe architecture significantly im-
proved the traditional strategy by introducing a multi-rate octave filter bank. By downsampling
data in the lower bands according to the Nyquist limit, the critical digital frequency requirements
are made proportionally less severe, and the required inverse filtering can be performed by filters
of significantly lower order. As this is the basic approach of the technology introduced in this dis-
sertation, a complete description of this technique will be reviewed in chapter two.

While this strategy is a significant improvement, it does not make use of the time varying spec-
tral distribution of the source. The Adkins-Principe strategy continues to assign the limited degrees
of freedom available in the inverse filters to equalize the entire audio bandpass, even if the source
has no power in large spectral regions. It furthermore continues to equalize regions of the bandpass
where little or no coloration is being induced by the stereo electronics or the listening room. The
approach investigated in this work is to add into the Adkins-Principe architecture adaptive filters.
Adaptive filters by their very nature will assign the most processing resources to regions of the au-
dio bandpass in which the most coloration exists, significantly improving the efficiency and perfor-
mance of equalization. A brief description of the major classes of adaptive algorithms is included.
Because of the requirements of real-time processing, as well as the requirement of high fidelity,
appropriate classes of adaptive algorithms are limited. Because of the low computational complex-
ity, and satisfactory performance the Least Mean Squares (LMS) algorithm is chosen for further
investigation in this research.

The Adkins-Principe equalizer architecture is a parallel multi-rate filter bank.

LP = Lowpass Filter
HP = Highpass Filter
DEC = Decimation
INT = Interpolation

figure 2.0

State-of-the-Art Acoustic Equalizers

It is clear from the preceding section on the physics of room acoustics that a theoretical

modelling of a listening room, even with an unreasonably simplistic geometry and con-

structed with materials of uniform impedance, is such a computationally burdensome en-
deavour that it can not be performed for an individual's listening room. The current
equalizer technology seeks to excite a room with broadband noise in order to stimulate a

broad range of eigenmodes. This signal is measured at a listener location or locations, and

an inverse of the cascade of the home stereo system and the home listening room is estimat-
ed. Equalization is not a panacea which will restore a perfect reproduction of the pressure
field as propagated by the original rendition of the musical performance. In fact, acoustic
equalization can not even reconstruct a pressure field which is psycho-acoustically equiva-

lent. Current technology attempts only to remove gross coloration. The strategy employed
is depicted in figure 2.1. Limitations of this strategy will be examined, with special empha-
sis on those items which the proposed adaptive equalizer will ameliorate.

Snois hoe listening micro-
source equalizer electronic speakers room phone(s)

filter --
bandpass coefficients bandpass
filters filters

band by band
comparator -

figure 2.1 State-of-the-art equalizers are designed as a parallel filter
banks with filter coefficients that are adjusted by comparing
the power collected in each band by a microphone located in
the listening room, with the noise which is used to excite the

After room excitation with a broadband source, current techniques break the measured
signal into octave or 1/3 octave bands using a set of bandpass filters. The power in each
band is compared with the power initially radiated in the same band. The equalizer is ad-
justed so that its filter coefficients give sufficient gain in the attenuated bands that the re-
ceived signal has a flat frequency response, or any pre-programmed response which the
listener finds acoustically'pleasing. The noise source is turned off, and the CD player is
turned on.

Source Excitation

The excitation source utilized in equalizers has been the subject of much debate. Sev-
eral articles [13-14] deal with the varying responses of rooms to different types of sources,
i.e. firecrackers, pistol blanks, etc. The most important characteristics of the test signal are
the signal's spectral content, the amplitude distribution of the source, and the duration of
the signal. State-of-the-art equalizers use pink, Gaussian noise for reasons which are now

Except for small, undamped rooms, at low excitation frequencies, the sound pressure
amplitude distribution is independent of the volume, shape, or acoustical properties of the
room. By examining the room's Green's function it is seen that the pressure field is the su-
perposition of many eigenfrequencies, each with its separate damping coefficient. This su-
perposition is complicated due to the finite half-width, and small separation of the room
resonances. Since the eigenmodes are closely spaced and mutually coupling, the field may
be considered to be resultant from a single frequency with randomly distributed amplitudes
and phases. Applying the Central Limit Theorem to both real and imaginary terms of the
pressure, P, leads to the Raleigh distribution of IPI. Figure 2.2 demonstrates the similarity
of the distribution function of pressure to Gaussian excitation.

The spectral content must be sufficiently broad that the eigenmodes excited by a music
signal are also excited by Gaussian noise. Pink noise satisfies this requirement and has
theadditional advantage that it will simplify the signal processing. Pink noise has the prop-
erty that equal power is radiated in each octave. In addition, unlike impulsive sources, the

figure 2.2 A Gaussian noise source is used with many state-of-the-art
equalizers because of its similarity to the pressure
distribution of acoustic data in an enclosed space.

pink noise source is of sufficient duration to "charge" the reactive elements of the listening
room, which will allow a better equalization of the room's steady state response.

Bandpass Filters

Bandpass filters are designed to have bandwidths that approximately match the band-
width characteristics of our human physiology. All high quality equalizers employ a digital
1/3 octave parallel filter bank, and from figure 1.12, it is seen that this resolution will closely
match the critical bandwidths as discussed in the psycho-acoustics review. The center fre-
quencies and upper and lower cut-off frequencies are given in figure 2.3.

To maintain pitch the bandpass filters must be designed with sufficient control of the sig-
nal's spectral characteristics, while realizing that too fine a resolution will not improve our
subjective sense of pitch and will add processing time and expense to the equalizer. The
filter bank is made up of linear FIR lowpass and highpass filters cascaded to form bandpass
filters. Performance is limited by the low digital cut-off frequencies associated with the

lower frequency bands, which require high order filters. Control of the band at the level of

the difference limen (fig. 1.11) is not achieved. Except in the most unusual circumstances

1/3 Octave Filter Bank

Band Center Frequencle [Hz] Band Umia [Hz)
25 28
31.5 35
40 44
50 57
63 7
80 22-28 Hz--
100 88
125 113
160 141
200 176 28-35 Hz---
250 22583
40 353
500 440
630 56507
800 7070
1000 880
100 1130
1250 1414
1600 1760
2000 2250
3150 2250
4000 3530 -[kHz -
5000 4400
6300 5650
8000 7070
10000 8800
12500 11300
16000 14140
20000 17600

figure 2.3 The critical frequencies of an 1/3 octave bandpass filter are
shown above. Because of the similarities with the resolution
characteristics of human hearing, the 1/3 octave filter bank is
most often chosen for state-of-the-art equalizers.

the difference limen is an unnecessarily strict standard due to the masking effects of music

signals. The precise requirements are difficult to evaluate do the complex nature of mask-


Signal Measurement

Any system which relies on current microphone technology for precise acoustic mea-

surement is problematic. A situation analogous to the Heisenburg Uncertainty Principle

arises when measuring the field introduction of the microphone into the field disturbs the

field itself. In minimizing this effect, high quality microphones are made as small as pos-
sible while maintaining their sensitivity. In addition measuring the pressure field intensity
is not synonymous with measuring signal loudness as was shown in the previous section.
The conversion from intensity to loudness can be accounted for in software. A more diffi-
cult matter is to completely characterize the microphone directionality, which will differ
from the directional properties of our hearing. In practice a precise understanding of the
microphone's directional response is unavailable. A high quality microphone must also
have a fast response if it is not to smear the details of the signal, affecting the perceived tim-
bre and clarity.

Signal Processing

Processing is performed by a dedicated chip which calculates the power in each band
collected from the CD player. The same procedure is performed on the signal received from
the microphone. The two signals are compared, and a calculation of the appropriate filter
coefficients is made in order to guarantee a flat frequency response, or any other preset fre-
quency response, across the audio bandpass. Unfortunately the details of the processing are
not discussed in the literature, probably because of sensitivity towards proprietary technol-
ogy on the part of manufacturers. The measurements of current state-of-the-art equalizers
indicate they are capable of maintaining a flat frequency response across the audio bandpass
to within IdB. At 40 dB between 100 and 1000 Hz for speech and music signals a band
ripple of IdB is not perceptible. At the band edges ripple can be as great as 2 dB before
listeners can perceive coloration. The number of bits used for filter all processing calcula-
tions must be at least 16 in order to avoid degrading the S/N of the CD player. High quality
equalizers allow separate equalization to be performed for each channel of the stereo. The
period of time required for the equalization can be as high as 15 seconds. The filter coeffi-
cients can be stored in one of several memories. A graphic interface is provided to allow
the user to manually set his listening preferences, which are taken into account in the cal- -
culation of filter coefficients. These systems are a large improvement in traditional equal-
ization technology which required time consuming manual adjustment of graphic
equalizers, and depended on the skill of the individual making the adjustments.


In the discussion of room acoustics the temporal characteristics of the acoustic field was
introduced using the principals of geometric acoustics. This approach gave a qualitative
understanding, but it is of limited accuracy and validity. The spatial characteristics were
also presented using a simplistic approach to acoustic reflections. A rigorous theoretical
foundation of room acoustics was made from the wave equation. By solving boundary con-
ditions it was shown that the impedances of surface materials, and the room's geometry are
the key parameters in developing a Green's function. Because of the oversimplification of
geometric acoustics, and the huge computational requirements of a wave theoretical ap-
proach, filter coefficients required for an equalizer can not be calculated.

State-of-the-art equalizers determine the room response by exciting a room with a
broadband source. A measurement is made of the acoustic field and the listener location,
and an inverse of the room response is calculated for implementation in an octave filter
bank. The excitation source, signal measurement, signal processing, and equalizer filters
of state-of-the-art equalizers were reviewed. This technology is focused on removing
those effects which are psycho-acoustically most significant. In the presentation of psycho-
acoustics pitch was discussed as a key feature of the music signal. To preserve pitch the
equalizer must remove coloration introduced by the stereo electronics and the listening
room. Phase response was shown to be relatively unimportant in maintaining pitch. The
temporal and spatial characteristics of the acoustic field were discussed as important char-
acteristics in giving a listener the sense of fullness and spatiousness of the sound.

The state-of-the-art equalizer technology described above is able to remove most color-
ation of the stereo-room combination, and thus maintain pitch. The equalizer will not be
able to correct the following effects.

1) The equalizer will not be able to correct coloration due to strong regular reflections.
These types of effects are represented by transfer functions that are non-minimum phase,
and thus would require the equalizer to have poles outside the unit circle. This should not
be a problem unless the room in which the original recording was made, or the home lis-
tening room has a terrible acoustic design, e.g. a rectangular cavity with one axis much

longer than the others terminated in a wall with very low acoustic resistance.

2) The perceived timbre will not be affected by the equalizer. Timbre is related transient
phenomena, and as the equalizer's filters are time invariant, the equalizer will not be able
to restore timbre related features.

3) Equalization can not restore true audio fidelity of the signal because of the loss of
information regarding both temporal and spatial coherence properties of the signal. Unfor-
tunately these characteristics are psycho-acoustically important. There are many reasons
for the lack of fidelity with respect to field coherence properties. Without a significant in-
crease in the level of sophistication in the recording process, the coherence information will
not be included on the CD itself. Microphones used in the recording process, as well as the
equalization process, do not encode information regarding the spatial characteristics of the
signal. Fast temporal structures may also be lost due to the non-zero time constants in all
microphones. In addition the state-of-the-art equalizers utilize only one microphone and
cannot account for stereophonic properties of the acoustic field. As a result of the above
limitations, properties such as clarity, spatiousness, apparent source size, apparent source
location, etc. will not be restored by equalizers. These properties will be maintained by the
characteristics of the listening room itself.

4) The equalization of a music reproduction system is valid for only one listener loca-
tion. If the listener moves locations, the equalization must be performed again. The adap-
tive equalizer will eliminate this problem.

5) The best source with which to excite the room is the actual acoustic signal of interest
- not pink noise. The equalizer's filters should utilize their limited degrees of freedom to
correct frequencies at which distortion is a maximum, and not the entire spectrum. This is
a fundamental limitation which affects the degree to which relative loudness, and hence
pitch, can be maintained. This limitation will be ameliorated by the adaptive equalizer.

The Adkins-Principe Equalizer Architecture

The Adkins-Principe (AP) architecture [1] uses a multi-rate filter bank to produce an

octave band structure by repeatedly filtering and decimating a single input signal. A sepa-

rate highpass filter equalizes each band, making the processing computationally parallel
and greatly reducing the necessary microprocessor speed for real-time operation. The ad-

vantage of multi-rate designs lies in their increased efficiency. The elegance of the structure

can be seen in the block diagram shown in figure 2.0. The proposed real-time equalization

strategy will be based on this architecture. The following section will discuss its character-

istics. The advantages of this architecture over the current state-of-the-art will be made

clear. Its operation will be shown to be consistent with maintaining pitch as described in

chapter one. In addition, the features of the architecture which must be carefully controlled

to assure proper signal reconstruction will be discussed.

Octave Bands Generation and Equalization

The highest frequency band is controlled by passing the original signal through a high-
pass filter. All lower bands are controlled by a cascade oflowpass, decimation, and high-

pass filters. This is accomplished as shown in figure 2.4.


CD kqxA HP2

figure 2.4 The tree structure efficiently implements the octave filter bank. At each
node the signal is equalized by a highpass filter along one branch, and
further bandlimited along a second branch.

The tree structure is an efficient strategy for implementing an octave filter bank. The
input to each branching node has been bandlimited and decimated. The lower branch per-
forms equalization with a highpass filter while the upper branch further reduces the CD
bandwidth. The characteristics of the upper and lower branches are described below.

Each of the upper branches attempts to perform the following lowpass operation.

HLP = 1 10 o < (2-1)
= 0, otherwise

The upper cutoff of band i is equal to the lower cutoff of band i-1. The sampling rate
can be reduced by decimating by M according to Shannon's sampling theorem.

Xdec [nT] = x [nMT] (2-2)

The decimation rescales the digital frequencies of the DFT, as shown in figure 2.5.
Clearly the decimation rate must be coupled to the value of the digital cutoff frequency of
the lowpass filters in order to avoid aliasing.

Decimation by 2


figure 2.5 Decimation by M in the time domain expands the digital
frequencies by M, and reduces the peak value of the
spectrum by M. This figure demonstrates the effects for M
equals 2.

Decimation reduces the number of operations required for real-time processing by re-
ducing the data rate by Mi1' for band i. In addition, because the critical digital frequencies

are increased by M, due to the rescaling of digital frequencies engendered by decimation,
the required filter orders for all subsequent filtering operations are considerably smaller, or
a finer frequency will be possible with the same order filters. This is an important charac-
teristic due to the finer resolution psycho-acoustically required in the lower frequency
bands. Figure 2.6 summarizes the upper branch.

At each branching node of the tree structure the bandlimited signal is also sent to a
highpass filter (lower branch) for equalization. If the room were non-filtering the highpass
filter would be designed as follows.

HHP = 1 < M (2-3)

= 0, otherwise

Note that the digital cutoff frequency is and not because of the decimation.
Operation of the Upper Branch of the Tree Structure
original signal Lolwass Filter lowpass signal

Decimate by M

figure 2.6 This figures demonstrates the processing of a signal
generated as the summation of two sine waves. The low
frequency sine wave is separated with a lowpass filter and
then decimated. The decimated part of the signal is thrown

In the more realistic case of a room which filters its acoustic input, the highpass filters are

designed with user supplied gains for each band. The graphic interface used on an equalizer

implemented on the NeXT computer workstation [15] is shown in figure 2.7. The interface

indicates gains for seven bands, and interpolates between the frequencies indicated below..

Begin Pausel Stop

1.12 931298 92 1-143283G Sa3-eee

I06 Eo1 T5n7 |3o

%41-423rS294 r|4 6sea71

17conr |i0uin




I kll

figure 2.7 The graphic interface for an equalizer using the AP
architecture was implemented on the NeXT computer with a
user interface as shown in this figure. The highpass filters are
adjusted according to the selections made by dragging the
slides with a mouse.

The combination of filter length and decimation rate must be such that the requirement

of at least 1/3 octave resolution is satisfied. In addition, band ripple, transition bandwidth,

and stopband attenuation must be such to guarantee that there is less than 1dB band ripple

across the entire audio bandpass. The research documented in this dissertation will use the

filter design presented by Adkins and Principe [1]. Their extensive simulations resulted in

a recommended design of 4 bands, each utilizing symmetric finite impulse response (FIR)

filters of order 45, and a constant decimation rate of six. A summary of the frequency res-

olution of the design is given in figure 2.8.

Frequency # of frequencies bandwidit of controlled
Band Range [z] controlled frequencies [octaves]

Band 1 3675-22050 19 0.045 0.250
Band 2 12.5-3675 19 0.045 0.250
Band 3 102-612.5 19 0.045 0.250
Band 4 17-102 19 0.045 0.250

I I \l

fU I 23
f,/2M4 f/2M3

012 3675
f/2M2 f,/2M
frequency [Hzl (not to scale)


figure 2.8 Frequency resolution for a four band equalizer with filters of
order 45, and a constant decimation rate of 6, is shown

The highpass FIR filters are designed by taking the inverse FFT of the frequency re-

sponse requested by the user with the equalizer's graphical interface. The filters are

smoothed by truncating symmetrically about time = 0 the resulting time domain represen-

tation with a Kaiser window with 3 = 3.5. The symmetric nature of the filters guarantees

linear phase. The time domain representation is shifted to guarantee that the filters are caus-

al. A summary of the process is shown in figure 2.9.

Signal Reconstruction

After equalization has been performed with highpass filters, the data in each band must

be summed in order to reconstruct the signal. The summation is complicated because of

the different sampling rates at which the individual highpass filters operate. In addition

there is an unique delay due to the differing number of convolution operations performed

in the various branches of the equalizer. Immediately proceeding equalization, the signal

41 3 2 1

~AinIiA U

figure 2.9 Operation of the lower branch of the tree structure.
a) User defines desired response; b) Frequency response; c)
inverse fourier transform for time damson representation, (d)
For filter coefficients multiply time domain representation by
a Kaiser window to lessen degradation induced by Gibb's

is interpolated to bring it to the sampling frequency of the preceding band. Interpolation

performs the opposite operation of decimation by inserting M-l evenly spaced samples of

value zero between every pair of data points.

x, -- n = x [kT] if k=n/M=an integer (2-4)
0 otherwise

Interpolation acts to recompress the digital frequencies of the equalized time series in man-

ner precisely opposite to the decimation process. These effects are demonstrated in the fig-

ure 2.10.

Images created by interpolation must be removed with an anti-aliasing lowpass filter.
7t 7C
Note that the digital cutoff frequency is and not because of the interpolation.
M M2
The lowpass filter should have a gain of M to restore the energy lost in the decimation stag-

e.The output from the anti-imaging filters are summed with the output of the preceding

band efficiently by using the inverse tree structure shown in figure 2.11.

Graphic Interface Desired Frequency Response
H (e )

(a) (b)
Tmne Domain Representatlon Kaiser Window

figure 2.10

Interpolation by 3

(a) For decimation by M, M-1 evenly spaced samples of zero
are placed between each pair of samples, (b) frequency
response before interpolation, (c) frequency response after

Inverse Tree Structure

HP ouu

figure 2.11 The inverse tree structure efficiently sums the bands together
after interpolation and anti-imaging, bringing the lower
frequency band up to the sampling rate of the higher
frequency band.

The delay associated with each band is calculated with respect to the highest sampling

rate. Recall that for an FIR filter the delay, Td, is expressed as follows, where Tsi is the



sampling rate of band 1.

(N- 1)
Td 2 Tsl (2-5)

For band i the total delay as seen at the output of the highpass filter is expressed as follows.

Td= 2 1 Mi Ts (2-6)
Si= 0

The delay at all nodes of the inverse tree structure is dominated by the value of Td in the
upper branch. The total delay as seen at the output of the AP structure for a four band sys-
tem is dominated by Td4 and is given below.

2T (N-)M3T ()
Tdo = 2 Td 2 M3 s1 (2-7)
Ttotal d4 2

Processing Speed

Efficient use of computing resources can be made because of the differing sampling
rates of data in various stages of the architecture. Convolutions are performed at different
rates and are staggered in time. Data is sampled such that multiply and accumulate opera-
tions are made in non-overlapping time slots. In this way major timing bottlenecks are
avoided. The CD player sampling rate determines the value of the time slots (1/44,100 =
22.676 microseconds). For the four band design 10 FIR filters are operated. Three filters
perform convolutions at the highest sampling rates (44.1 kHz), three perform convolutions
at the CD rate decimated by six (7.35 kHz), three perform convolutions at the CD rate dec-
imated by 36 (1225 Hz), and one convolution at the CD rate decimated by 216 (204 Hz).
The timing diagram is shown in figure 2.12.

As an example of the computational load of the convolutions indicated in the timing di-
agram above, consider the TI TMS320C40 25Mflop floating point digital signal processing

figure 2.12

HPt, t tt.t.tt.t.ottt t,.
I.Lt t t t f t f t f t t t t tt tt

.LP2 TTI t
HP3 t 3
LP2 t 36

LP lt 216

HP4 3

The timing diagram indicates the efficiency of
microprocessor implementation of the AP architecture.
Decimation and selective sampling allow a staggering of the
convolutions which prevent computational bottlenecks.

chip. This processor performs a multiply and accumulate in one cycle. With one chip ded-

icated to each band of the equalizer the number of computations which can be performed

between CD data samples is limited to the following.

# of computations = 25MF = 566

According to the timing diagram above, for FIR filters of order 45, 360 multiply and accu-

mulate operations are required, i.e. 63.6% of the CPU time.

Theoretical Performance

The AP architecture is an efficient implementation of a filter bank. The audio signal is

broken into octaves using a tree structure. Along the lower branch, a bandlimited and

downsampled signal is equalized with a highpass filter. Along the upper branch the signal

is further bandlimited and downsampled according to the Nyquist limit. After equalization

the octave bands are brought to the original sampling rate by upsampling, and using an in-

verse tree structure. Each band has a distinct delay which must be accounted for when re-
combining bands. Because of the octave band structure and downsampling, the critical
digital frequencies are raised as much as possible, and lead to a degree of bandwidth reso-
lution and of band ripple, across the entire audio bandpass, that is superior to performance
advertised on existing equalizer architectures. From figure 2.5 it is seen that frequency res-
olution is finer than 1/3 octave across the entire bandpass. At worst, resolution is 1/4 oc-
tave, while at best the equalizer has 0.045 octave resolution. Computer simulations of the
AP architecture indicate for equalizer settings for a flat response, there is a band ripple of
less than 0.2 dB from 17Hz to 22500 Hz. This value is far superior to the values of 2dB
which are measured on equalizers which are commercially available. The spectral resolu-
tion and the high degree of control of the highpass filters are precisely the characteristics
required to restore the proper pitch of the acoustic signal received at the listener location.

Automatic Adjustment Using Adaptive Filters

Elliot and Nelson [16] have proposed substituting the highpass filters in a standard filter
bank with adaptive filters designed to minimize the mean square of an error signal which is
generated by taking the difference between a pink noise source, and the noise source as col-
lected by a microphone at the listener location. Although this approach has the advantage
of a lower computational complexity than the standard frequency domain techniques, the
most compelling reason for the use of adaptive filters is overlooked, namely, the use of the
music signal itself as a reference signal. This research will substitute adaptive filters for the
highpass filters in the AP architecture as shown in figure 2.13.

In each stage the CD reference signal is bandlimited and decimated in precisely the
manner discussed above. A microphone signal is filtered and decimated in an analogous
manner to the CD signal.' A feedback signal (error signal), which is provided by the differ-
ence between the filtered CD and microphone signals, is sent to an adaptation algorithm
which updates the adaptive filter weights in such a way that the mean square error is mini-
mized. As an example, the structure for band 2 is shown in figure 2.14.

The adaptive equalizer architecture is a modification to the
AP architecture in which highpass filters are replaced by
adaptive filters.

The adaptive filter is outlined in grey. The filtered CD signal
is compared with the microphone signal similarly
bandlimited an decimated. An error signal is generated and
sent to the adaptation algorithm which updates the filter
weights in such a way that mean square error is minimized.

This strategy will improve the state-of-the-art by ameliorating two of the limitations
discussed in chapter one. Specifically, if the listener changes location (assuming the micro-

Band 2

figure 2.13

figure 2.14

phone is also moved) the filter will automatically correct for the changes in the room's re-
sponse because of the equalizer's capability for real-time self-adjustment. Depending on
the type of adaptation algorithm chosen, the updates will be performed either every sam-
ple(44.1 kHz), or after every block of samples(44.1kHz/N, where N is the length of the
block). More importantly, the reference signal used for equalization is the CD signal itself
instead of pink noise. The adaptive filter, when minimizing mean square error, will auto-
matically utilize its limited degrees of freedom to adjust the frequency response in those
portions of the spectrum in which the most energy is present not the entire audio bandpass.

Several adaptation algorithms have been developed for adaptive signal processing
[17-19]. The choice of algorithms depends on several considerations, namely, the station-
arity characteristics of the signal, the data rate of the input signal, the degree of accuracy
required in filtering, and the computational resources available for real-time operation.
Acoustic data is highly non-stationary. Because the adaptive filters must find their optimal
coefficients in a fraction of the time in which the audio signal is quasi-stationary, the adap-
tation process must be fast. Secondly, although the adaptation must occur rapidly, the al-
lowable misadjustment of the filter weights must be relatively small for meaningful
improvement in the current state-of-the-art. Thirdly, due to the high CD sampling rate, the
complexity of the algorithm must not be so great as to preclude carrying out the arithmetic
in real-time. This is a serious concern even with the high speed microprocessors currently
available. Lastly, any adaptation algorithm must not introduce any noise components
which will lower the signal fidelity.

All available adaptation algorithms require trade-offs among the crucial factors outlined
above. Extremely rapid convergence can be achieved, but at the expense of increased com-
putational complexity. Computationally efficient algorithms exist, but convergence is
slowed, and further serious trade-offs between convergence speed and misadjustment need
to be taken into account. Other algorithms may make a good compromise between com-
plexity and convergence speed, but by their nature they contribute undesirable noise into
the system. Consider the tap delay line architecture for an FIR filter shown below, where

Yk represents the output of the feedforward filter at time k, dk represents the desired (or
reference) signal, and ek represents the difference signal between dk and yk (error signal).

figure 2.15

An adaptive tapped delay line with desired signal, dk, and
error signal, ek.

Assume that the sequence {Xk} is quasi-stationary, i.e. the signal has been chosen over a
time window in which it is approximately stationary. The output of the tapped delay line
is the inner product of the n+1 most current filter weights with the filter input.

Yk = XkWk= k k- [ xk- 2 xk-j


The filter weights, W*, which minimize the mean square error, E [ (dk Yk) 2] are ex-
pressed by well-known Wiener formulation: W*=R-P, where R is the matrix E[XXT] and
P is the cross correlation between d and X, E[dX]. The field of adaptive signal processing
is concerned with approximating the solution for W*. There are two standard methods by
which the solution is approximated: gradient search algorithms and least squares (LS) al-


gorithms. The gradient search methods can be generalized as follows:
Wk+l = Wk pV ,, where = E [ (dk -k) 2] = mean square error. This common
sense approach adjusts the weights constituting vector W in a direction opposite to that of
the gradient of the error surface. g represents a constant which regulates the step size of
the weight update. The most common gradient search algorithm is the Least Mean Squares
(LMS) algorithm which estimates V ( = V E [error2] by taking the time average of
V errorr] The approximation is valid over quasi-stationary periods because of the qua-
si-ergodicity of the signal. The LS algorithms differ by estimating the value of R and P
with time averages. LS algorithms can be broken into two categories: algorithms which
calculate new estimates over a window of data (block LS algorithms), and algorithms which
recursively update the estimate after every sample (RLS algorithms). One example of an
LS algorithm is the block covariance method. Let R and P be the time averaged approxi-
mations of R and P, and let to be a windowing function.

Rk M XIX (2-9)

k = MiLlLXI (2-10)

A (2-11)-1
Wk = Rk Pk (2-11)

Certain classes of algorithm can be ruled out of consideration immediately because of
their unsuitability with respect to one or more of the requirements given above. Although
the block LS algorithms converge more quickly than gradient search techniques, after every
stationary window of data the input data autocorrelation matrix, a, must be inverted. Some
of the block algorithms have autocorrelation matrices that are Toeplitz and can use the Dur-
ban algorithm for matrix inversion, requiring O(N2) operations, where N is the length of
the filter. Many of the LS methods(i.e. the covariance method), however, are not Toeplitz
and matrix inversion must be performed using the Cholesky algorithm, requiring O(N3) op-
erations. Furthermore, these algorithms update the weight vector every M samples intro-
ducing an undesirable artifact into the spectrum of the acoustic signal.

Algorithms which minimize error in the frequency domain operate on blocks of data.

As such, they update filter coefficients at regular intervals, introducing a noise component

into the signal. In addition, they require more operations in order to transform the data. Al-

though we are most interested in minimizing the error of the magnitude response of the

transform, good time domain equalization will guarantee good frequency domain equaliza-

tion as well.

The recursive algorithms are more promising. They converge rapidly, and the weight

vector is updated every sample. These algorithms unfortunately have a heavy computation-

al load. Figure 2.16 gives the computational requirements of the most familiar versions of

the RLS algorithm [18].


Modilled Fast
Kalman (MFK) 5N + 13 5N + 13
Fast Kalman 8N + 5 7N + 2
Growing Memory 13N 6 N + 1
Covanance (GMC) 1 +
Sliding Window 13N + 13 12N + 7
Covanance (SWC)
Normalized GMC 12N + 16 7N + 2 3
Normalized SWC 23N 11N 5N
Normalized PLS 9N + 17 5N + 2 2

figure 2.16

The computational complexity for several adaptive
algorithms are displayed above.

All algorithms, except the Kalman types, required significantly more computations than

the LMS algorithm (2N + 1 multiplications and 2N additions). On the basis of computa-

tional considerations, the algorithms seriously considered for the equalizer are reduced to

these two alone. The choice has been made to utilize a fast LMS method as the adaptation

algorithm. Chapters three and four will investigate LMS algorithms in detail.


The LMS algorithm converges quickly to an approximately optimum solution (in the
mean square error sense) given that the step size has been properly initialized. Because of
the low computational complexity of the LMS algorithm, it has been selected as the adap-
tive algorithm which will be implemented in the AP architecture. Because the LMS algo-
rithm is a gradient descent technique, there is a trade-off between convergence speed and
misadjustment. Misadjustment measures on average how close the LMS solution is to the
optimum solution, after algorithm convergence. A smaller step size results in a smaller
misadjustment. For fast convergence and low misadjustment an adaptive step size is re-
quired. At the beginning of a quasi-stationary segment of data, the value of the step size
must be initialized in such a way that convergence speed is maximized. In this stage of the
adaptive process the lack of convergence speed provides the major contribution to mean
square error, and misadjustment is unimportant. As the filter converges, the step size must
be reduced because of the increasing dominance of misadjustment.

Chapter three will investigate the optimal value at which the step size should be set at
the beginning of a quasi-stationary segment of acoustic data. The initialization problem has
been rigorously solved for uncorrelated input signals. To efficiently utilize the LMS algo-
rithm on music signals understanding must be extended to the case of correlated input data
(see figure 4.1). Chapter three will provide the necessary background information for this
extension. Chapter three first presents the Wiener-Hopf formulation of the minimization
of mean square error. The characteristics of the error surface are highlighted as part of this
discussion. As a result it is clear that a stochastic gradient descent algorithm is a feasible
approach to the minimization problem. The properties of gradient descent algorithms are
discussed with particular emphasis on the LMS algorithm and it properties. Finally, the
convergence speed of the mean square error for the LMS algorithm is minimized as a func-

tion of step size. The background information in this chapter is largely taken from the work
of Widrow and Stears [19], and their notation will be used throughout.

The Weiner-Hopf Formulation

The Weiner-Hopf formulation [20] gives a method in which filter weights of a tapped
delay (direct form) filter can be optimized when the input to the filter is a stationary sto-
chastic signal. The results hold for a quasi-stationary signal to a greater or lesser degree de-
pending on the precise statistical nature of the signals. The measure by which the filter
weights are optimized is the mean square error (MSE).

Because the Wiener-Hopf equation is the underpinning of all adaptive signal processing
algorithms, it will be reviewed in detail. The derivation of the Wiener filter will yield im-
portant information regarding the nature of the error surface. As a result it will be obvious
that gradient descent techniques will be applicable optimization algorithms. The derivation
of the Weiner solution is presented below.

Derivation of the Wiener Solution

Figure 2.15 shows a tapped delay line. The input sequence { xk}, is a stationary stochas-
tic signal. The output of the filter, yk, is simply the inner product of the n+1 most recent
inputs with their respective filter weights, for a filter of order n+1.

w0 (3-1)

Yk= XkTWk= [xkk-xk-2* k-n 2


The error signal is generated by taking the difference between the desired signal, dk, and
the filter output.

ek = dk-Yk

dk-k Wk

Square the error.

S= (dk- T
e2 =dk k-XkWk)


= d-2dkkWk+XiWkWikXk

Take the expectation of both sides of the above equation.

E [e2] = Ed- 2dkX Wk + kWXk
k k K Wk Xk


= E[d] -2E[dkXkW] +E[XWT kWTXk

By keeping W at a fixed value its subscript can be omitted. Furthermore note that
XTww x = T T
XkWW Xk =*XkXkW

E [e] = E [d2] -2E[d kT+] WW TE[XkXk] W

Define the expectation value of the cross correlation matrix as follows.

k =E[dkT] = E [dkxk

dkxk- 1 *

* dkxk-n]

Define the expectation of the correlation matrix as follows.



Rk = E[XkXk] = E

xk lXk

xk nXk

Xkxk 1
Xk- 1Xk- 1

Xkxk- 1

Xkxk n
Xk 1k n



The mean square error can now be expressed in a simpler notation. Let r = E [e6].

k = E[d2] -2Pk W+WTRkW


d t
To minimize C, set d- to zero and solve for the optimum weights, W*


-2P+2RW = 0


W* = R-1p

The above equation is the expression of the Wiener Hopf equations. Substituting equation
10 into equation 8 and making use of the symmetry of R results in an expression for the
minimum value of (, .*.

* = E[dk] 2PTR-P + (R-1P) TR (R-P)

= E[d2] -pTw



With a stationary signal in which the cross correlation matrix and correlation matrix
were perfectly determined, it would be possible to solve the Weiner-Hopf equations for W*
and (* Unfortunately in engineering applications of interest neither of these assumptions
is generally true. The field of adaptive signal processing is concerned with estimating W*
without the necessary a priori information required by the Wiener Hopf formulation.

Error Surfaces

Equations eight and ten provide important information regarding the characteristics of
the error surface, which is defined as f = f( W). The most critical result is the hyperpar-
abolic nature of the surfaces. This is demonstrated by changing the coordinate system used
in the expression for C. Let V = W W* be the value of the displacement of the filter
weights from the optimal value. Express MSE in terms of this translated coordinate system,
i.e. f = f(V) The transformation is accomplished as follows. Recall equation 3.8.

S= E[d] + WTRW-2WTP (3-8)

Equation 3-13 is inspired by assuming the proper solution of f(V) and demonstrating its
equivalence to equation 3-8.

= E [d] + WTRW- 2WTRR-1 + PTR-1RR-1 pTR-1P

Substitute W* for R-1P andC* for E [d2] -PTr.

; = r* + WTRW-2WTRw* +PTR-1RW (3-14)

Because correlation matrices are Hermitian, (PTR- ) = R-'P.

S= +WTRW-2WTRW*+W RW* (3-15)

Since WTRW* is a scalar, it must be equal to its own transpose. Again use the Hermitian
property of R.

= * + WTRW+ W*TRW WTRW* WTRW (3-16)

Note that transposition is a linear operator.

S= + (W- W*) TR (W- *) (3-17)

= + VTRV (3-18)

The error surface is clearly a quadratic function of V. As a result, the following impor-
tant characteristics are obvious: 1) the error surface is a hyperparaboloid, 2) the error sur-
face must be concave upwards in order to avoid regions of negative MSE (a logical
impossibility), 3) the optimal set of filter weights are located at the bottom of the hyperpa-
raboloid, and 4) there is only one minimum (no local minima). These observations are crit-
ical in their implication that gradient search techniques are able to find the global minimum.
An error surface for both the translated and untranslated coordinates is shown below.

The Gradient Descent Algorithm
The gradient of a surface points in the direction in which the function describing the
surface will be most increased. A common sense approach to finding a surface minimum
is to move in a direction opposite to that of its gradient. This section will show rigorously
the convergence properties for a stationary signal. The trade-off between convergence
speed and misadjustment will be demonstrated explicitly. Finally, a description of the dif-
ficulty in using gradient descent techniques with quasi-stationary signals is discussed.

figure 3.1 The minimum of the error surface is at the origin of the
translated coordinate system. The parabolic nature of the
surface makes gradient descent algorithms feasible.

Let gA represent a variable which regulates the step size of the weight adjustment. Gra-
dient descent(GD) methods can be generalized as follows.

Wk+1 = Wk- Vtk


Figure 3.2 graphically demonstrates how GD methods update themselves on the basis of

past information.


figure 3.2 Gradient descent methods find the error surface minimum by moving in
a direction opposite to that of the error surface gradient.

The trajectory of the filter weights is the projection of the n+1 dimensional error surface

onto the n-dimensional weight space. For a hyperparaboloid the approach is intuitive the

greater the distance from the surface minimum the larger the gradient, and hence, the larger
the weight updates. At the surface minimum the gradient is zero, and no further adjustment

is made. A more rigorous justification follows.

Algorithm Convergence

Recall from equation 3-9 that the gradient is expressed as = -2P+2RW. Substituting

this into equation 3-19 yields the following.

k+1 = Wk + (2P- 2RWk) (3-20)

= Wk + R (2R-IP- 2Wk) (3-21)

Substitute the solution of the Wiener-Hopf equation (equation 3-10) into 3-21.

Wk+ 1 = Wk + 2R (W Wk)

= (I- 2R) Wk + 2R


Switch to the translated coordinate system by subtracting W from both sides of equation

Wk+I -W* = (I-2gR) Wk+2gRW*-W*


= (I-2R) (Wk- W)

Vk+1 = (I-2gVR)Vk


Perform a unitary transformation on R. Let A be the diagonal matrix of eigenvalues of R,
and let Q be the matrix in which the columns represent the corresponding eigenvectors.
This transformation assumes R is not singular.

R = QAQ-1 for A =

X1 0
0 12


Q = [q q2 *

0 9 qn]

qi is the eigenvector associated with x.

Substitute the above expression for R into equation 3-24.

Vk+l = (I- 21 QAQ-1) Vk

Note that the following transformation has the effect of decoupling the weight vectors in
the translated coordinate space, i.e. the transformation causes the new coordinates (princi-
pal coordinate system) to be colinear with the eigenvectors of the error surface.

v = Q-1 V (3-26)

With the above transformation the coordinates are aligned with the principal axes of the hy-
perparaboloid. This can be shown as follows.

= + VTRV (3-27)

*+ V (QAQ-1) V (3-28)

= + (QV) TA (Q-V) (3-29)

= C* + VTA t (3-30)

The transformation is illustrated in figure 3.3. Returning to the convergence proof, substi-
tute equation 3-26 into equation 3-25.

Q k+l = (I-2gQAQ1)Qt k (3-31)

Premultiply by Q-1

Quadratic Error Surface in the Principal Coordinate System

figure 3.3 The principal coordinate system has axes which lie along the
principal axes of the hyperparabolic error surface.

k+1 = Q-1 (I-2gQAQ-) Qtk

= (I- 2gA) Vk

By induction Vtk can be expressed in terms of Vto.

Vfk = (I 2gA) kvt0

The update equations show their decoupled nature more clearly in matrix notation.



(1 2p0) k




(1 -2p1)k



.. O
* 0
... (1-2p~ )

The convergence condition can be seen from equation 3-34. If o < 1 < then with perfect
knowledge of the gradient vector, the weight vector will converge to the optimal solution.




Convergence Speed

The speed of convergence is limited by the value of step size, and hence, the largest ei-
genvalue. Consider convergence along coordinate j. A geometric convergence factor can
be defined as follows.

r = (1- 2g .) (3-36)

Recall the power series for an exponential.

2 3
exp(x) = +x+ + + *

* *


Let T. = Then x. is approximately the e-folding time of the adaptation process, as
can be seen by truncating the first two terms of the exponential power series.

Vtk- 1

Vtk n




r. = T.exp (- (3-38)

The convergence of C to 1* is represented by a learning curve which can be approximated
as the product of exponential terms.

2 2 2
learningcurve = (o-c*) e Toe 1* e "

Learning is limited to the largest e-folding time, which is directly related to the smallest ei-
genvalue and the magnitude of the step size.

Excess Mean Sauare Error *

In the discussion it has been assumed V is known. In fact it is not. Adjustments must
be made in the above equations to account for a noisy gradient estimate. Let V C be the
estimate of V .

Wk+1 = Wk- k (3-40)

Switch to the translated coordinate system.

Vk+1 = Vk- Vk (3-41)

The estimate of the gradient can be expressed as the sum of the gradient plus a noise term.

V k = V


From equations 3-19 and 3-42 we conclude the following.

Vek = 2RVk+Nk


Substitute equation 3-43 into equation 3-41.

Vk+ 1 = Vk- (2RVk+Nk)

= (I-2p.R) Vk -Nk

Switch to the principal coordinate system.

QVfk+l = (I- 2gR) QVk--Nk

k+1 = e-1 (I- 2R)Q k -1Nk

= (I-2 2A) V-k -1 Nk

Let Q-1Nk = Nk be the gradient noise projected onto the principal axes.

Vtk+1 = (I-2gA) Vfk- g Nk





Note that equation 3-47 now represents a set of uncoupled difference equations. The matrix
formulation is given below.

(1 2gx0)



vtk -


0 ... O
o 0

* ... 0

0 ... (1-2gtn)

By induction Vt k can be expressed in terms of Vto.

Vk+l = (I-2gLA)klfOt -2 (I-2[pA)JNtk-J-1


Thus, with gradient noise included, Vtk does not approach zero as k --> o. The degree of

lim Vk = -9 Y (I- 2gA)JNtk-j. 1
k -~ j=0

suboptimality is determined by the set of eigenvalues, {Xj} the statistical characteristics-
of Nt, and the step size, i.

The difference between the minimum MSE and the average MSE is defined as the ex-
cess mean square error. Excess MSE provides a measure of the difference between the ac-



(1 2x1)


ntk- -2
ntk- 2


tk- 1
Vtk -2

vtk -n-i


tual and the optimal performance, averaged over time, and is expressed as follows.

Excess MSE = E [- C*]

= E[VkTAvtk


Recall that after adaptation vt k = (I 2tA)ijN k-i- Substitute this into equation 3-51.

( k-1 ] T

- j=O0

Excess MSE = E

A 0 (I-2i)' k-i-1

00 00
i 2 Eij_
i j

(i 2gA)j A (i 2tA) ilk-i- 1] (3-52)

Since (I 2g.A)J and A are both diagonal, they can be commuted.
00 00
ExcessMSE = E[N t+tj 1A (I-2A)i+jfk_ i-
[Atm Ta 4
= g.2 E k[N'kj- 1A (I- 4A + 42A2)JNk_-j- 11 (3-53)

Assume that Nt results from independent errors. Then Vi #j there will be no contribution

to excess MSE.

2YE [Ntk A (I-211A) 2JNtk
Excess MSE = A2LE T[NtkA (- 2A) 2Kkj

= 2E [k A (I- 2UA) 2j



Since I 4pA + 4g2A2 is a diagonal matrix, the convergent infinite series may be simplified
as follows.

S( -4 4A + 4p2A2) = (4A 4p2A2) -1

Substitute this expression into equation 3-55.


Excess MSE =

gE[ANkA (A gA2) -1k]

=E [Nk (I A) -INtk]

In matrix form the decoupled nature of equation 3-57 is evident.

rcess MSE = E nk ntk 1
4 iL''1'

..- tk- n]


0 0

o o

0 0 0

0 0 0




4 k 1

Lntk nJ

... 1 x


The matrix equation can be further simplified as follows.

SExcsMSE = Etk -j (3-59)
4 L 1 -4gX1

Equations 3-36, 3-38,3-39, and 3-59 provide a clear indication of the trade-off between
misadjustment and convergence speed. The smaller is the step size the smaller is the mis-
adjustment. Below a certain upper bound, however, convergence speed is slowed. Figure
3.4 illustrates the implications of equations 3-60 and 3-61.

lim Excess MSE = 0 (3-60)

lim T = o
li -- 0 (3-61)

Adaptation in a Ouasi-Stationary Environment

The adaptation process for quasi-stationary input data, while conceptually simple to un-
derstand, is difficult to describe analytically. The optimal weight vector is now a function
of time, i.e. W* = f(k) as the error surface changes shape and moves through a n+1 di-
mensional space for a filter of order n. Widrow has an analytical description of the optimal
step size for the LMS by examining the case of modelling an unknown system by an LMS
tap delay filter [21]. The unknown system is assumed to be a first order Markov process
with 1* constrained to be a constant.

The more general problem for step size optimization is unsolved. Since the weight vec-
tors are adapted on the basis of past information, and the error surface is changing shape
and moving with time, the weight vector error results from two contributions. The first
contribution results from noise in the estimation of the error surface gradient. The second
contribution to error results from estimating the gradient of the error surface at iteration k

using the error surface at iteration k-1. Note the formulation for the filter weights for the
non-stationary case, in the translated coordinate system.

Vk+ = Wk- k = (Wk- E [Wk]) + (E[Wk] -Wk)

Substitution of the above into equation 3-19 yields the following.

k+1 = + (Wk-E[Wk])TRk(Wk-E[Wk]) +

(E[Wk] -Wk*) TRk(E[Wk] -Wk*) +

2E[ (Wk-E[Wkl)TRk(E[Wk] Wk)]


Expanding the third term in equation 3-63 and recognizing Wk is constant over an ensem-
ble, the third term reduces to zero, and ,k+ 1 reduces to the following.

k + 1 = + (error from gradient noise) +
(error from moving error surface)

= + (Wk-E[Wk])TRk(Wk-E[Wk] +

(E [Wk] Wk) TRk(E [Wk] W ) (3-64)

Equation 3-64 is as far as one can go without usually unavailable a priori information
concerning the stationarity characteristics of the signal. A graphic illustration of the oper-
ation of the LMS algorithm with non-stationary input data is shown in figure 3-5.


"* *.. ,^', :- ,' 'j^ -^ .^ ^ d I

-.. ', . .
- -' , .


=f= f ((w*)

=f *)

w2 W1 W2

figure 3.4 The larger step size results in faster convergence, but larger
excess MSE. For this reason the equalizer design will utilize
a varying step size.


:~8~i'-P~i~i, i:
I '



w L---
S W2

"^ .... ,,",- ...... -... .
e:t m --* -- -

figure 3.5

W k+1


4. :.

For quasi-stationary adaptation their are two contributions to
error in the gradient: 1) the error induced by the typical form
of gradient noise, and 2) the error induced by error surface
lag, i.e. the error induced by using information from a prior
error surface instead of the current error surface. In the
figure above the highlighted update, AWk+ is based on
information concerning Ck instead of ,k+ 1


The LMS Algorithm with Uncorrelated Input Data

The best known and most widely used gradient method for stochastic signals in adap-
tive signal processing is the LMS algorithm[22]. The method is well-understood for sta-
tionary uncorrelated input data. The method can be extended to the quasi-stationary case,
although the analytic description loses much of its simplicity and is possible only for the
most simple models. This section will provide the theoretical background necessary for a
discussion on the LMS algorithm with correlated input data.

Stationary Model for Uncorrelated Input

Recall the general form of the gradient descent algorithms, Wk+ = Wk p.V t. The
LMS algorithm makes a crude estimate of V (,which nevertheless is quite effective. Let
Vt represent the estimate of V t.

Gradient Descent LMS Approximation

S= [e2] Vt = -2ekXk (3-66)

= 2E eka-

= -2E[ekXk] (3-65)

The LMS approximation replaces the ensemble estimate of V C with a time average. As
the algorithm operates over several iterations, on average the gradient estimate will be in
the correct direction. The LMS descent is illustrated in figure 3.6.

-f (W)

OIu Ig

r """ /_I

figure 3.6 The LMS algorithm approximates the gradient by replacing
the ensemble average of the gradient descent with a time
average. In the time average the LMS gradient will point in
the direction of the surface gradient.

It is a simple matter to demonstrate that the LMS estimate is unbiased.

E [V] = E[-2ekXk] (3-67)

S-2E [Xk (dk XWk) (3-68)

= 2RkWk- 2Pk (3-69)

= Vg (3-70)

If the convergence condition expressed in equation 3-35 holds, the LMS algorithm will con-

verge to W* in the mean.

~" I -'rrl r r^r - . uBm*Q~pl~i~ lulr I ~lx ~ r

1 li Wk = W* in the mean (3-71)
For 0 k **
Consider next the variance of the estimate. Assume that after k iterations Wk has con-
verged. Then the gradient estimate is just the noise process Nk.

V Z =7t+ Nk (3-72)

= Nk (3-73)

For j > k, Nk = -2ekXk. After convergence ek and Xk are approximately independent.

cov [Vp] = cov [Nj] = E N t] (3-74)

= 4E [eX jX] = 4E [e E [XjX] (3-75)

= 4jR (3-76)

The above does not imply that the variance of the estimate remains finite, or that the mean
square error remains finite. A considerably more stringent requirement on step size is nec-
essary, and is discussed below.

Conditions on i for Finite Variance for Square Error

Horowitz and Senne[2] have studied the convergence properties of the LMS mean
square error for the case of Gaussian, zero-mean, uncorrelated data. Let X = [Xo X1 ... Tn]T
and ak be the diagonal elements of E[ V Vt v].

k= E[ (dk-X k 2] = E[(dk -XW) k(Wk-W)]2

= -2E[XT(Wk-W*) (dk-Xkr )] +

E [ (Wk W) TXX T (3-77)
E - xkxk W*) I

Because {xi} is Gaussian and uncorrelated, Xk and Wk are statistically independent. Fur-
thermore, Xk (dk XTW*) = Pk RkW implies independence. Thus Xk can be factored
out of the second term of equation 3-77. Since Xk is zero mean, the term vanishes altogeth-

S= + E ( Wk )TX kT( k (3-78)

Since Xk is uncorrelated, E [XkXT] is a diagonal matrix and 5 can be expressed as follows.

k = + trace {E [(Wk- W) (Wk- ) E [Xk ]} (3-79)
= + trace E [VkV] E [XkXT] (3-80)

k + Tk (3-81)

Horowitz and Senne next develop a recursive relationship for (k from the definition of the
LMS algorithm.

Wk+ 1 = [I- 2L XkXT] Wk+ 2dkXk (3-82)

Subtract W* and perform a unitary transformation. Let ek = dk XWk .

Vk+ 1 = [I- 2(XkXktT) Vk + 2 1Xek*X (3-83)

Take the expectation of Vt multiplied by its transpose. Because of the independence of
ek* and Xtk, the expectation of their cross terms is zero.

E[Vk+VkT+] = E[VtkV ] -4gAE[Vtk k] + (3-84)

4p2 (E[XkX-kT kvtkXk ]) +4k2i*A

Apply the Gaussian factoring theorem to the third term of equation 3-84.

E[XkxkVT k kT xk = 2A2E[Vtk ] + (3-85)

trace AE[ V k] } A

Substitute the decomposed moment into equation 3-84.
E[ 1k+ lT+ 1] = E[ VkVT] -4gAE[ IVk k] +

8g2A2E[ Vtkv tkT] +4g2trace {AE VkT] } A + 4g2*A

Let Ck = E[ Vt k Vt. Ck can be decomposed as follows.

Ck+1 [i, i] = (1 4gi + 82) Ck [i,i] +(3-87)
4g2Xi pCk [i, p] + 412 i* i

A recursive formulation is now possible for (5k. Let F [i, i] = 1 4gi + 12z2X,? and
F[i,j] = 4Xg2,..

Sk+ = FGk+4g2*X (3-88)

With equation 3-88 a recursive expression is available for .

k+l = k k*+XT[F(k +4pg2 *,] (3-89)

Horowitz and Senne develop convergence conditions which can be seen most easily from
the matrix formulation of equation 3-89. Let .gk = k,O o0 1 ... k,

S= +4 (3-90)



4,Lx 1 4x011 ... 1 4g) + 12g 2 ak,

For every entry of F, jj F [i, j] < 1< and gt is real if 0 < 3 This condition on g
will guarantee finite mean square error.

Conditions on L for Maximum Convergence Speed and FiniteVariance for Square Error

Feuer and Weinstein [3] have developed an expression for step size which will optimize
convergence speed. The expression is a function of generally unavailable a priori informa-
tion, i.e. all eigenvalues and the initial misadjustment along each of the principal coordi-
nates of the error surface. Given no information concerning k it is assumed that all of
its elements are equal. Horowitz and Senne note that the convergence rate of C is approx-
imately optimized (given no further a priori information) by setting g as follows.


d (1-41 + 12.L22 ) =0 (3-91)
dp max max

S3L* -1
6 6 a (3-92)


Music signals are not uncorrelated as can be seen from the autocorrelation function for
several musical instruments shown in figure 4.1. To meaningfully discuss step size initial-
ization for an adaptive acoustic equalizer, the understanding of LMS convergence proper-
ties must be extended to the case of uncorrelated input data. Even though the adaptive step
size will be optimized for fast convergence only when a large change occurs in the error
surface, and hence the most recent data sample will be largely uncorrelated with the previ-
ous data, the tapped delay line will have an input vector which, for order n+1, still shares n
components with the previous vector.

Convergence of LMS with Correlated Input

Several mathematicians and engineers have investigated a completely general conver-
gence proof for the LMS algorithm without success. To date no satisfactory proof has been
put forward guaranteeing lim (kh < M, VA < .max. The difficulty in a rigorous proof lies
k -.o max
in the assumption of strong correlation, i.e. VT 2 0, E [XtXt +] > 0. The theoretical work
done in this area is mathematically sophisticated. Fortunately, Macchi and Eweda [23]
have provided a discussion of the theoretical contributions to convergence problems which
are here summarized.

Lyung [24] has demonstrated convergence almost everywhere for Wk under the condi-
tion that the step size is a decreasing sequence tending to zero. He creates a non-divergence
criterion by erecting a suitable barrier on Vk which will always reflect Vk onto a random
compact set. However, he has not shown that lim Wk = W*. Daniell [25] has shown that
Scan be made arbitrarily small by choosing p. sufficiently small. However he must use the
assumption of uniformly asymptotically independent observations. In addition he
makesthe restrictive assumption that the conditional moments of observations, given past
observations, are uniformly bounded. This condition is not satisfied even for the case of



0.2 0.4 0.6 0.8 1.0
I (msec)

figure 4.1 The autocorrelation functions are given above for several
different instruments [6]. a) organ music (Bach);
b)symphony orchestra (Glasunov); c)cembalo music
(Frescobaldi); d) singing with piano accompaniment
(Grieg), and e)speech (male voice).

Gaussian input. Farden [26] has found a bound on by making the reasonable assumption
of a decreasing autocorrelation function. He further makes use of reflecting barriers for Vk
to keep Vk in a compact set. Macchi and Eweda [23] find a bound on by assuming only
blocks of M samples are correlated and all moments are bounded. They do not make use
of reflecting boundaries.

The proofs referred to above have assumptions which are representative of acoustic sig-
nals. Acoustic signals do generally have decreasing autocorrelation functions, and for a

sufficient delay, may be assumed to be uncorrelated. In order to determine the step size
which will minimize 1 at the beginning of a quasi-stationary segment of data, the work
of Horowitz and Senne, and Feuer and Weinstein is extended without the assumption of un-

correlated input. The mathematical expressions which result are simplified, and conditions
on g for maximum convergence speed for are found. Simulations are subsequently car-
ried out for different degrees of correlation, step size, and filter order.

Conditions on ti for Convergence of (

Adding correlation to the input data significantly complicates the expression of C for
the LMS algorithm. The approach developed below formulates a recursive expression for
Vk in terms of Vo. The expression is substituted into the equation for ( (equation 3-79).
With the assumption that ek is small, cross terms including ek can be ignored. This re-
sults in a matrix equation for in terms of the initial conditions of the algorithm. The ma-
trix norm is investigated numerically as a function of gI. These results are then compared
with large ensemble averages of C computed with the LMS algorithm. To begin, recall the
expression for the LMS algorithm (equation 3-86) formulated in the translated coordinate

Vk+1 = (I- 291XkXT) Vk + 2e*kXk (4-1)

Recursively formulate Vk in terms of Vo using equation 4-1.

k (term 1)
Vk = 1 [I- 2XkjXT_ ]V +
k-1 k-i' (term 2)
2g [I -2gXkjXT _] (e i_1Xi_1)
j = 1 = 1 (4-2)

2ie*k- lXk- (term 3)

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs