A NEW ADAPTIVE ALGORITHM FOR THE REAL-TIME EQUALIZATION

OF ACOUSTIC FIELDS

BY

JEFFREY JAMES SPAULDING

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL

OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

1992

Dedicated to

my Lord Jesus Christ

through Mary, Queen of Virgins,

in partial reparation of sins.

ACKNOWLEDGMENTS

I would like to express my gratitude to my advisor, Dr. Jose Principe, for his

encouragement and support through the course of my studies and research. I would also

like to express my gratitude to my supervisory committee members, Professor Taylor,

Professor Childers, Professor Green, and Professor Siebein.

My work would not have been possible without the help and support of the students

of the Computational Neuroengineering Laboratory. I am deeply appreciative. In

addition I would like to express my thanks to Dr. Richard Henry and Mr. Phil Brink of

the United States Air Force for their constant support.

I would like to thank my friends and family. It has been difficult to have not had the

time to devote to the people I love. I appreciate their understanding. I wish to especially

thank my grandmother, Mrs. Ruth Moore, whose bequest made my studies possible.

Finally, I wish to thank my mother and father without whom I would never have had

the strength or courage to complete my program.

TABLE OF CONTENTS

ACKNOWLEDGMENTS .................................................................... iii

A B STRA CT........................................................................... ................ vi

CHAPTERS

1 BACKGROUND....................................................................... 1

Introduction ................................................................................. 1

Basic Physics of Room Acoustics .............................................. 4

Psychoacoustics ........................................................... ................. 23

Consequences of Room Acoustics and Psychoacoustics

in the Solution of the Inverse Problem............................ ....... 34

2 AN EQUALIZATION STRATEGY FOR REAL-TIME

SELF-ADJUSTMENT................................................... ......... 39

Introduction ....................................................................... ........... ...... 39

State-of-the-Art Equalizers .......................................... ........... 41

The Adkins-Principe Equalizer Architecture ................................ 48

Automatic Adjustment Using Adaptive Filters ............................. 57

3 LEAST MEAN SQUARES (LMS) ALGORITHM

CONVERGENCE WITH UNCORRELATED INPUT.... 63

The Wiener-Hopf Solution .............................................................. 64

The Gradient Descent Algorithm.............................................. 68

The LMS Algorithm with Uncorrelated Input ............................... 84

4 Least Mean Squares (LMS) Algorithm Convergence

WITH CORRELATED INPUT................................................. 91

Convergence of the LMS Algorithm with Correlated Input ........... 91

Conditions on gi for Convergence of .................................. ..... 93

Conditions on gp for High Convergence Speed of .......................... 101

Experimental Results ..................................................................... 102

Conclusions ....................................................................................... 108

5 DETERMINATION OF ,ax WITH A DIVERGENT

LMS (DLMS) Algorithm........................................................... 118

Determination of Xm with a Divergent Gradient

Descent Algorithm ................................................................................ 115

Parameter Selection for the Divergent Gradient Descent

(DGD) Algorithm ................................................................................ 118

Determination of Xmax with a Divergent LMS Algorithm................ 120

Parameter Selection for the Divergent LMS Algorithm ................... 128

Integration of the DLMS Algorithm into the Equalizer Architecture. 139

Summary and Conclusions ................................................................. 143

6 Validation of Concepts ...................................................................... 146

Test Plan .......................................................................................... 146

Test Results ........................................................................................ 158

7 CONCLUSIONS AND RECOMMENDED

EXTENSIONS OF RESEARCH............................................. 199

Review .................................................................................................. 199

Recommended Extensions to this Research.................................... 205

APPENDIX

ECVT Plots of Contiguous Epochs of Audio Data................... 208

REFERENCES......................................................................................... 218

BIOGRAPHICAL SKETCH.................................................................. 220

Abstract of Dissertation Presented to the Graduate School

of the University of Florida in Partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

A NEW ADAPTIVE ALGORITHM FOR THE REAL-TIME

EQUALIZATION OF ACOUSTIC FIELDS

By

Jeffrey James Spaulding

May 1992

Chairman: Dr. Jose C. Principe

Major Department: Electrical Engineering

This dissertation presents a solution for the problem of acoustic equalization.

Acoustic data are collected from a microphone at a listener location, and compared

with a source (CD player). An adaptive signal processing algorithm, based on the Least

Mean Squares (LMS), warps the CD signal to account for the filtering effects of the

listening room. The algorithm to adapt the coefficients of a multirate equalizer is

computationally efficient and can be performed in real-time with current

microprocessor technology.

As music is nonstationary, the LMS algorithm will need to undergo rapid

convergence to a new set of optimal filter coefficients each time the input signal

statistics change. As the LMS is a gradient descent algorithm, fast convergence implies

a step size selection which operates at the algorithm's edge of stability. An analytic

expression exists to determine the step size for uncorrelated input data as a function of

the maximum eigenvalue of the input data. This dissertation extends this work to cover

the case of correlated signals such as music. This result provides verification of

heuristic rules that have been proposed in the literature.

To date, formulations involving the maximum input signal power have been utilized

as an estimate of the maximum eigenvalue. In this dissertation a new method of

determining the maximum eigenvalue is proposed by allowing the LMS algorithm to

diverge. For a large and consistent domain of initial condition, iterations, and purposely

divergent step sizes, the maximum eigenvalue dominates the rate of algorithm

divergence. Simulations were pursued to determine the bounds of these variables for

robust operation. This methodology was utilized to analyze the performance of the

adaptive equalizer on. selected music epochs, and to validate the theory put forward in

this dissertation.

CHAPTER 1

BACKGROUND

Introduction

For millennia architects have designed structures with the intent of minimizing distortion of

acoustic signals propagating from a speaker or a musical instrument to a listening audience. From

the Greek amphitheaters of antiquity to the modern concert halls of today technologies developed

over the centuries have resulted in sophisticated techniques which have greatly improved acoustic

fidelity. Only in this century has technology been sufficiently advanced to approach the problem

of acoustic distortion from the perspective of equalization. Equalization anticipates the filtering

introduced by the enclosed space and pre-processes the acoustic signal to take these effects into

account. In this way when the acoustic signal arrives at the listener, the effects of the equalizer

filtering and the room induced filtering cancel each other.

The revolutionary development of the Compact Disk (CD) player, first introduced to the market

in 1982, has completely changed the direction of audio engineering. Research in audio engineering

is being directed towards creating all-digital audio systems. With the tremendous improvement in

microprocessor speed, audio signals can now be processed digitally in real-time. The research pre-

sented in this dissertation will provide a design for an equalizer which will be capable of real-time

self-adjustment. The design will be focused on providing a system to perform equalization in a

normal home listening room with standard audio components. It will be based on a new equalizer

architecture developed by Atkins and Principe at the University of Florida [1], which uses a multi-

rate filter bank. The Atkins- Principe architecture is enhanced so that filter coefficients in each

band of the filter bank will be continuously updated in real-time, with coefficient updates being

provided using principles from adaptive signal processing.

In chapter two background information will be provided on the current state-of-the-art of acous-

tic equalization. Background information on room acoustics, psychoacoustics, and the state-of-the-

art of equalizers will lead to conclusions regarding the limitations of acoustic equalizers. An im-

proved strategy of adaptive equalization based on the Adkins-Principe equalizer architecture is pro-

posed. Chapter two will discuss in detail the theory and operational characteristics of the

Atkins-Principe architecture. It will be shown to be an elegant and extremely efficient im-

plementation strategy. Chapter two will also discuss the relevant considerations in choos-

ing an adaptive algorithm for real-time updating of filter coefficients. On the basis of

algorithmic complexity and the speed of the current generation of microprocessors the

Least Mean Squares (LMS) algorithm will be justified as an appropriate algorithm for this

application.

The LMS algorithm can be described as a stochastic gradient descent method which

converges due to its time-averaged behavior. For fast convergence speed and low misad-

justment in a nonstationary environment the algorithm must have an adaptive step size

which can be precisely controlled. When the statistics of the acoustic signal change dra-

matically, equalizer settings will be incorrectly set and the adaptation algorithm will need

to converge as quickly as possible to the new optimal equalizer settings in the least mean

squares sense. LMS step size is the key parameter to adjust in order to maximize conver-

gence speed. The theory guiding the optimal value of step size is well-understood for un-

correlated input data [2-3]. Chapter three will present the theory of LMS convergence for

the case of uncorrelated input data. The theoretical structure of the Wiener-Hopf formu-

lation, the stochastic gradient descent algorithm, and the LMS algorithm with uncorrelated

data will be discussed to provide the necessary framework for the more complicated case

of the LMS with correlated input data. Chapter four will extend the theory of the LMS al-

gorithm for the convergence properties of mean square error for the case of correlated input

data. This theory is necessary to understand adaptation performance for acoustic data

which by nature are correlated signals.

The divergence properties of the stochastic descent algorithm and the LMS algorithm

are discussed in chapter five as a simple extension of the convergence properties examined

in chapter three. The results of this study indicate that the maximum eigenvalue dominates-

a diverging gradient descent algorithm. It will be further shown that the optimal value of

the step size, after a change in the statistics of the input data, can be well-approximated as

a function of the maximum eigenvalue of the vectors input into the filters performing adap-

tive equalization. The standard technique for estimating Xmax is to assume that it is approx-

imately equal to the power in the input vector times the filter order. From systems theory

we know that the power is the sum of all eigenvalues, and thus, is less than ax This es-

timation is excellent under certain restricted conditions, e.g. low filter orders and high ei-

genvalue spreads. No such strong statements can be made more generally of this

approximation technique. A new method of approximating the maximum eigenvalue of a

system by allowing an adaptive gradient descent algorithm to diverge is introduced in chap-

ter five. The method is first developed for the stochastic gradient descent algorithm which

is based on the statistical properties of an ensemble of independent, identically distributed

systems. The gradient descent algorithm will provide a theoretical framework from which

to analyze the more complicated LMS algorithm which is based on the time-averaged prop-

erties of a single system.

Chapter six discusses the experiments used to test the equalizer architecture and the al-

gorithms used for real-time self-adjustment. The NeXT computer with a digitizing micro-

phone is used to collect and process data. The software used to simulate real-time

equalization with the proposed algorithms is discussed. Finally the test and evaluation re-

sults are presented.

Chapter seven summarizes the results of research presented in the dissertation. Specif-

ically it reviews the success of the new architecture proposed for real-time self-adjustment

of an audio equalizer. It reviews the theory of the convergence and divergence properties

of the LMS algorithm with statistically correlated input data. It reviews the theory and per-

formance of a new variation of the LMS algorithm which provides an estimate of the max-

imum eigenvalue of the input vectors into the equalizer filters. Finally chapter seven

discusses the many areas of this research that could be extended

Because the physics of room acoustics and the principles of psycho-acoustics motivate

many of the key design characteristics of the acoustic equalizer, the remainder of chapter

one will provide background information on the pertinent areas of these fields.

Basic Physics of Room Acoustics

Equalization is an inverse problem, i.e. the adaptive equalizer will seek a configuration

such that its response will be the inverse of the room response. This section will develop

the most simple features of room acoustics because of the basic understanding it will pro-

vide into the nature of the signals and systems on which the algorithms and methods intro-

duced in this dissertation will be applied. As electrical engineers are more familiar with

electromagnetic radiation than acoustic, an effort will be made to develop this introduction

to the physics of room acoustics in an analogous manner to the standard method of intro-

ducing the theory of electromagnetic waveguides, with the most important similarities

pointed out where appropriate. A listening room may be thought of as an acoustic coupler,

or communications channel, between an acoustic source and a listener. A further limitation

that the room is described as a minimum phase system will be imposed. The description of

the acoustic channel is particularly complicated because of the importance of both its tem-

poral and spatial dependency. Because of the wave nature of acoustic fields, the sound en-

ergy in an enclosed space may be considered to be the mechanical analogy of

electromagnetic radiation in a rectangular waveguide.

A discussion of the basic equations and definitions is first presented. Specifically the

pressure equation will be discussed as the principal descriptor of acoustic phenomena. The

conservation of momentum equation will be shown to supply boundary conditions for the

wave equation. A discussion of the reflection and transmission coefficients are discussed.

It is shown that many features from optics have analogous results in acoustics,e.g. Snell's

Law and the Law of Reflection. The prominent factor in determining the reflectivity of a

wall is the wall's characteristic impedance. The impedance, together with the room's geom-

etry, define the characteristics of the room. A brief discussion of acoustic impedance is

therefore presented. The wave equation for a room is developed in a method analogous to

methods used for electromagnetic waveguides. The application of this theory is discussed-

for a room which is a rectangular cavity with different types of impedance. Lastly the spa-

tial and temporal response of the room is outlined using methods from ray theory. Much

of the discussion is based on introductory texts of Egan[4], Kinsler et al. [5] and Kuttruff

[6].

Basic Eauations and Definitions

Sound waves are longitudinal waves of regions of greater and lesser pressure. An

acoustic field is described by a wave equation which expresses the deviation from ambient

pressure, p, as a function of spatial location r, and time t. The constant of proportionality

between the temporal and spatial derivatives is the speed of propagation of the medium, c.

2p 2p (1-1)

at2

For a time harmonic plane wave the solution of the pressure equation is of the same form

as the expression for a plane wave of electromagnetic radiation. By separation of variables

the pressure equation reduces to the following familiar form, where k is the wavevector.

i(ot-kr) Kgm

p (r, t) = poe i2 (1-2)

The velocity of the vibrating particles, v, is related to the pressure field according to the

conservation of momentum. The density of the medium is represented by P.

DV (1-3)

Vp = -pa

By solving the above equations we can relate the pressure field to the velocity field by

an acoustic impedance, z = pc.

V(r,t) = V (ot kr) ] (1-4)

s

p (r,t) = pcV(r,t) = zV(r, t)

(1-5)

Sensory equipment, including the human auditory system, responds to the intensity of

the field. In the case of acoustic data, this corresponds to the average energy flowing across

a unit area normal to the direction of propagation per unit time.

2) FW

PJC2 (1-6)

In this discussion pressure waves will be considered to be plane waves in order to sim-

plify the mathematical treatment of the phenomena which will be discussed. Note that for

an acoustic point source the phase fronts would be spherical in an analogous manner to

electromagnetic point sources. The plane wave approximation is only valid when the radius

of curvature of the phase front is very large compared to the wavelength of the radiation.

We will assume in this discussion that the dimensions of the rooms being considered are

sufficiently large that this approximation is valid for frequencies over 1000 Hz. The room

under consideration will be a rectangular cavity. These highly idealized cases are studied

so that the important features of room responses are made clear. Before room responses

can be meaningfully discussed, the reflection of acoustic waves off a wall must be well-un-

derstood.

Reflection of a Plane Wave by a Wall

The electromagnetic equivalent of reflection off a wall is a plane electromagnetic wave

reflecting off an infinite flat plate of complex impedance. Like the electromagnetic equiv-

alent the media impedances determine the behavior of the wave for a given geometry. Con-

sider the situation depicted in figure 1.1. Let pi be the incident pressure field, let Pr be the

reflected field, and let pt be the transmitted field, with corresponding wavevectors ki, k, and

kt.

Since we are assuming plane waves we can drop the y-dependency from our equations,

without loss of generalization. Recall that a wavevector K is defined as K = o/c ,

where c is the velocity of propagation in a given medium, and co is the radian frequency.

Let co be the speed of propagation in air at standard temperature and pressure (STP) and let

y

figure 1.1 The Law of Reflection and Snell's Law are valid for acoustic

radiation as well as electromagnetic radiation. The figure

above indicates the coordinate system and nomenclature

which will be used throughout the discussion of room

acoustics.

n be the ratio of the speed of propagation of a medium from that of air at STP, i.e.

n = co/ Let ux, Uy and uz be the unit vectors of our coordinate system as depicted in

figure 1.1. Then the incident, reflected, and transmitted wavevectors are expressed by equa-

tions 1-7 through 1-9, where IKR = nKO and E is the angle between the wall normal

and the wavevector.

By matching the pressure field at the wall, the reflection and transmission coefficients

can be calculated. The pressure waves propagating along the wavevectors given in equa-

Ki = nlK0 [- uxcosi + uzsinOi] (1-7)

Kr = nlKO [uxcosOr + uzsinr] (1-8)

Kt = n2K0 [- uxcos9t + usinOt] (1-9)

tions 1-10 through 1-12 are shown below.

Pi= Pi, 0e-inlKo [-XcosO + ZsinGiO 1ki (1-10)

-in Ko [XcosOr + Zsiner]

P = P, e Uk (1-11)

t Pt, 0e-in2Ko [-XcosG,+Zsin,] k (1-12)

The pressure field must be continuous at all point along the boundary, i.e.

Pi, interface +r, interface = t, interface These equations yield the Law of Reflection and

Snell's Law precisely in the form as the electromagnetic equivalent.

nlKOsinOi = nlK sinOr = n2K sinSt (1-13)

sinE. = sinO Law of Reflection (1-14)

1 T

nlsin i = n2sinet Snell's Law (1-15)

In addition, the conservation of momentum (equation 1-3) must be satisfied at the wall.

This will imply a matching of particle velocities at the interface. Note that because of the

plane wave assumption only the derivative with respect to x is non-zero in grad P. The ve-

nlKO -inKo, [-XcosO + ZsinO (1-16)

Vi = cosO.P. 0e (

O 1p t i, ki

-nlKO -inIKo [Xcos Or + Zsin Orl

Vr OP 1 cos rPr, e Uk (1-17)

n2Ko -in2K0 [-Xcos, + Zsin ,]

Vt = cosP2 t Oe Ukt (1-18)

t j t 2t,0

locity field propagating along the wavevectors given in equations 1-7 through 1-9 is de-

scribed by equations 1-16 through 1-18, shown above.

By a proper choice of the coordinate system, i.e. for the wall located at x=0, and by an

application of Snell's Law and the Law of Reflection, the phase related terms in V can be

ignored, as they are common to Vi, V, and Vt. Matching the particle velocities at the inter-

face yields the following expression for the reflection coefficient.

Pi, 0cosO Pr, OCOS r PtO csOt (1-19)

z1 z1 z2

2Pi, 0cosOi-Z2Pr, ocosr = Z (Pi, 0 + Pr, 0) cosOt (1-20)

Pr,

0 (z cost + z2cosOi) = z2cosi-z1cosEt (1-21)

i, 0

Pr, 0 z2cosOi zl cosOt

R (E, ) = i (1-22)

Pi, 0 2cose( +zlcos0t

If we assume medium one is air, i.e. z = PoCo, and medium two is a hard wall, i.e.

z2 1 the above equation reduces to the form seen in most elementary books on acous-

z2cosOi PoCo

tics, R = The transmission coefficients are easily derived from the re-

Z2COSi.+ P0CO

election coefficients as follows.

(1-23)

Pt (Pi + Pr) 2z2cosOi

T(O, o) 12 + R =

Pi Pi z2 cOsOi + Zl cost

With the expressions for R and T it is clear that the impedances of the materials are the

key acoustic parameter of the material, and it is this parameter which determines a room's

acoustic properties.

Acoustic Impedance

Acoustic impedance can be separated into a real and an imaginary part. The real part is

the acoustic resistance, the component associated with the dissipation of energy. The imag-

inary part is the acoustic reactance, the component associated with temporary storage of en-

ergy. These components are made explicit in the expression shown below.

P FKg

z (O, c) = R +j [ -K (1-24)

The acoustic reactance can be further factored into an inertance and a compliance term.

The acoustic inertance term is defined as the effective mass of the element under consider-

ation divided by surface area, M = with units of Kg/m2. The acoustic compliance is

S dx

defined as the volume displacement induced by the application of a unit pressure, C =

dp

in units of m4sec2/Kg. A mechanical interpretation of a locally reacting wall is shown in

figure 1.2.

z (E, o) =R+j((oM- -) (1-25)

Let pressure correspond to voltage, velocity to current, resistance to resistance, compli-

ance to capacitance, and inertance to conductance. Then an acoustic system can be re-

placed by an equivalent electrical system. It is precisely the behavior of these types of cir-

cuits that we wish to equalize.

figure 1.2 The mechanical analogy or room impedance is illustrated by

in the figure by a spring for the reactive term and a damper

for the dissipative term.

Resistance-Resistance

Inductance Inertance

Voltage=Pressure

11

Figure 1.2 Coductance-Compliance

Figure 1.2

figure 1.3 A room may be modelled by its electrical circuit equivalent

as shown above.

General Theory of Room Responses

The solution to the wave equation in an enclosed space is a familiar eigenvalue problem.

The equation reduces to a Helmholtz equation with boundary equations as shown below.

The homogenous part of the Helmholtz equation provides a complete set of orthogonal ei-

genfunctions and associated eigenvalues, yn (r) and Xn

I

V2P + K2p = -iowpq(r) (1-26)

i, interface r, interface t, interface (1-27)

dV iOP

Vp = -p -- P (1-28)

As a result our source distribution, q(r), can be expanded as a sum of eigenfunctions.V

is the enclosed volume of the room.

q (r) = Cn n (r)(1-29)

n

n = q (r) VndV (1-30)

The unknown solutions to the inhomogenous Helmholtz equation can also be expressed

as a summation of eigenfunctions.

P (r) = IDnyn (r) (1-31)

n

Substitute the summations for P, (r) and q(r) into the inhomogenous Helmholtz

equation and solve for the unknown { Dn in terms of the known { Cn) as shown below.

Dn [V2n (r) + K2Vn (r) = -iowp Cnn (r) (1-32)

n n

The equation has a simple solution for a point source located at ro, i.e.

q(r) = aQ(r-ro).

Vn (r) An (rO)

PW (r) = iQ (1-33)

n J(K2 K2

Note that Kn it the wavevector associated with X. P. (r) is recognized as the Green's

function for the room. The Green's function expresses the acoustic transfer function be-

tween two points in the room. Note from the Green's function that acoustic fields satisfy

the Reciprocity theorem just as in the electromagnetic analogy. The Green's function can

be extended to a source, S, emitting a continuum of frequencies as shown below.

00

S (t) = JQ (o) eitdo (1-34)

-00

P(r) = JP(o0)eiOtdo (1-35)

-00

Equations 1-33 through 1-35 provide a complete wave theoretical description. A dis-

cussion of the implicatioiis of this formulation for a rectangular room are now in order.

Simple Rectangular Rooms

This section will apply the general wave theory to an empty rectangular cavity charac-

terized by a single impedance. While this example does not represent the complexities of

actual rooms, it nevertheless provides insight into the features of room responses with a

minimum of mathematical complexity. The connection between the reflection properties

of walls and the general wave theory will be made explicit through wall impedances. Three

classes of impedances will be considered. Consider the room depicted below.

The wave equation for this geometry is most easily applied in the rectangular coordinate

system. The room eigenyalues are found by solving the homogenous Helmholtz equation

shown below.

+ + + K2Y = 0 (1-36)

ax2 y2 2

The equation is easily solved by the technique of separation of variables. Let

P (x, y, z) = W (x) V (y) V (z) Substitution of this expression into the wave equation

Z y

figure 1.4 A room with dimensions LLy, and Lz and walls with

uniform impedance will be evaluated using the wave

theoretic formulation.

leads to three ordinary differential equation plus the separation equation as shown below.

d (x)+ 2 = 0 (1-37)

dx

2 (y) +K2 = (1-38)

dx

d2 (1-39)

d z2 z+K

dz

K2 =K2+K2+K2 (1-40)

x y z

The solution of these equations is trivial.

(x) = A e-ik- + A2eikx (1-41)

yV(y)= Be-iky +B2eikyy (1-42)

S(z)= C1e-ikzz 2eikz (1-43)

O----Y-N

The constants A1,A2,B1,B2,C1, and C2 are determined by matching boundary conditions.

The solution of three specific examples are outlined.

Case i. Re[z]-= and Im[z]=very large.

The method of solution of the three ordinary differential equations is identical. Consid-

er the equation in x. As a boundary condition guarantee the conservation of momentum is

satisfied at the walls (x=0 and x=Lx).

dV iMp d iop

Vp -p P= = z-P P = 0 (1-44)

at z dx z

dP

As limizI -> -+ 0 The boundary conditions are satisfied as follows.

dx==: = iK 1 -iKxA A = A (1-45)

d iKLx -iKL =-K

d x=L = iKA (e' --e KLx) = --2Al SinLx= Lx

X X

(1-46)

a is any integer. The two boundary conditions have been used to solve for A2 and Kx. Sim-

ilar solutions are found for B2,C2,Ky, and Kz. Substituting these results in the expression

for Y (x, y, x) and the separation equation leads to a solution of the room eigenmodes and

eigenfrequencies as shown below.

anx pf y yn iz

T(x,y,z) = Dcos cos cos (1-47)

Lx LY L

cK c a 2 2 2

f(a, P,' ) 2 = 2( + ) (1-48)

y z

The constant D=AIBICI is determined from an initial condition.

case ii. Re[z] = 0.

Satisfy the new boundary conditions at x-0 and x=Lx. A solution for A2 and Kx can be

found by solving the two transcendental equations or by using methods form the theory of

d Kpc Kpc1-4)

x = 0 = A + iKx) +A2 iKx (1-49)

d A Kpc iKxLx iKxLx Kpc-iKxLx -iKL

Ix' x Lx -A (--e +iKxe ) +A( iKxe ) =

(1-50)

conformal mapping. If X < 0, the corresponding eigenfrequencies are lower than for the

hard wall case (Im[z]>>). For X > 0 the corresponding eigenfrequencies are higher than

the hard wall case. Because the impedance is completely reactive, the room will continue

to support standing waves, but with phase terms included in the sinusoidal terms of the ei-

genmodes.

case ii. Arbitrary z.

The form of the boundary conditions are the same as case ii with X replaced by z. Ei-

genmodes can not support standing waves because of the lossy component of z represented

be (X)

by e-

(1-51)

T (x, y,z) = De-C(xy s z) COS( +x) cOS ( +Z )COS(

X y z

It is clear from matching boundary conditions in the three cases outlined above that the

room impedance is key to understanding the behavior of the room. By using the eigenmode

expansion technique and the inhomogenous Helmholtz equation the pressure field can be

determined for any location in the room. An initial condition is also required in order to

evaluate D. The eigenfrequencies indicate those frequencies at which the room has reso-

nant properties. Note that the number of eigenfrequencies increase as the cube of the upper

frequency limit of the source. The wave theoretic technique provides a complete descrip-

tion of the acoustic signal in an enclosed space, and provides a solid theoretical foundation

for the field of acoustics.

Temporal Properties of a Listening Room

Each reflection of an acoustic wave must be described by its delay in arrival from the

direct acoustic wave at the listener location, the direction from which the reflected wave ar-

rives at the listener location, and the strength of the reflection. In this section we will dis-

cuss the time delay and strength of reflected acoustic energy. The rate at which reflected

acoustic energy arrives at a listener location is an important factor in evaluating the quality

of a listening room. Ray theory provides a simple and relatively accurate method of deter-

mining the rooms's temporal properties. This theory is completely analogous to ray theory

from geometric optics, and it will therefore be applied with a minimum of discussion. Like

geometric optics, geometric acoustics ignores interference and refraction effects. Never-

theless it is accurate for rooms of ordinary size for acoustic frequency content in excess of

1000 Hz.

If a room is constructed of a series of planes, image sources can be found by succes-

sively reflecting the original source and each subsequent image source about the planes in

a lattice constructed by using the room as a unit cell. The lattice and a cross section plane

are shown in figure 1.5. A spherical shell is chosen with inner radius at distance ct and outer

radius c(t+dt), where dt is much less than t. The number of sources in the spherical shell

corresponds to the number of mirror reflections between t and t+dt. The volume of the

spherical shell, assuming dt

source per room. The number of image sources can be found by dividing the volume of the

shell by the volume of the room. As t increases the number of mirror sources increases as

t2.

Number of Reflections = 4 dt (1-52)

V

As time increases, the reflections have less and less energy associated with them. The

higher order reflections lose energy at each reflecting plane, the amount of which depends

on the characteristic impedance of the wall. An approximation of how the energy decreases

can be made from geometric acoustics by multiplying the initial energy of the direct ray in-

side the listening room by R2, each time the ray path from an image intersects the room

walls or an image room wall.

- ~- -- I-- a' -

- -

figure 1.5 Geometric acoustics can be used to explain the temporal

properties of reflected energy in a room. Image sources can

be formed by successively reflecting the original source and

each subsequent image source about the planes that are

constructed by using the room as a unit cell.

Figure 1.6 demonstrates the concept of an echogram, a graphic method describing the

temporal response of the room. The height of each line represent the intensity of reflection.

0 -- .--

* ..^

__

~iltC~jC~Y

MAO

"W-N\

~L~s~sZ~ Illl~t~F~;~;~;~bbq~(

During the discussion of psychoacoustics, the temporal properties of an audio field, which

generally indicate a pleasing listening room, will be discussed.

time

degree of shading represents

the number of reflections per unit time

figure 1.6 An echogram indicates that as the time delay from the direct

field increases, the number of reflected waves increase, and

the intensity decreases.

The rate of decay in the intensity field is an important factor in creating a good quality

listening room. Recall that the intensity of a wave decreases as a function of the square of

the distance travelled, (ct)2. The intensity of the field will also be decreased due to the at-

tenuating influence of air (or any medium through which the wave traverses). Let a be the

e-folding time constant of this attenuation. Each time the acoustic wave reflects off of a

wall, the intensity will decrease by the square of the wall reflectance. The decay of the in-

tensity of the acoustic wave is summarized in the equation below, where 10 is the intensity

at t-0, and n is the number of reflections of the ray.

= 0 e-actR2nt -act+ntlnlR (1-53)

(ct)2 (ct)2

It is not possible to follow the path of each ray and perform the above calculation. In-

stead the calculation is made by computing the average number of reflections of a ray (n),

assuming the acoustic field is diffuse, i.e. the rays are uniformly distributed over all possible

directions of propagation. The assumption of a completely diffuse wave is never achieved

in an ordinary listening room, and the attenuation of the reverberant field will have large or

small deviations from the simple calculations being outlined depending on the amount by

which the actual acoustic field differs from our assumption of diffuseness.

First express the number of reflections per second which occur at walls perpendicular

to the x-axis for a ray propagating at an angle E with respect to the x-axis, nx (E) .

nx (0) = -cosO (1-54)

Average over all angles O.

(nx ) c (1-55)

S 2L

Similar results are obtained for (ny) and (nz). The average number of reflections along

all three axes is expressed below where S is the surface area of the room and V is the room

volume.

c 1 1 1 cS

n = (nx) + (n ) +(z L y -+ ) 4V (1-56)

Substituting the above into the intensity equation yields the following.

I0 +act+ tlnlRI2

0 4V

= t) 2e (1-57))

The reverberation time is taken as that time at which I(t)/Io = 1X10-6, and the expression

for this time is known as Sabines' Law.

0.16V

4aV-SlnlR12 (1-58)

Spatial Properties of a Listening Room

The spatial distribution of acoustic energy is a key factor in constructing a listening

room. A diffuse field can be defined as a field in which the intensity in a spherical shell

subtended by a solid angle, df, is independent of the direction in which the solid angle is

pointed. For the case of a room which is a rectangular cavity with hard walls, no matter

how small dR, there exists a time at which the field is diffuse. Of course there are no such

walls, and clearly, for a rectangular cavity, the field will not be diffuse. For example, im-

ages located in directions along the covers of the room experience more attenuating reflec-

tions than the other sources.

Fortunately walls are not perfectly smooth and they partially diffuse the acoustic energy

at each reflection. Consider the following simple model. Assume at each reflection the per-

centage of energy reflected specularly is s. Then 1-s represents the portion of energy which

is reradiated by the wall diffusely. This is, of course, an overly simplistic model. All inci-

dent energy is reradiated by the wall according to a directional distribution function. Nev-

ertheless this model does demonstrate approximately the diffusion process. In the steady

state the power present in the total reflected field and in the field reflected specularly can be

related to the reflection coefficient, R, and to s.

00

P o IR12n (1-59)

n=l

00

Ps snlR2n (1-60)

n=l

The percentage of energy power which has been diffused increases as s decreases as shown

below.

00

P-P (1 sn) R2n

s n=

d = p ~-- =

C IRI2n

n=1

Specular Reflection Constant S

(1-61)

figure 1.7 As the reflection constants decreases and the specular

reflection constant increases, the percentage of diffuse

energy in the room increases.

Even more important than wall roughness are the diffusing effects of room furniture and

the deviations of most rooms from a rectangular cavity. Even the presence of people in a

room act to diffuse the acoustic field. Furthermore in the discussion ofpsychoacoustics it

will be shown that a pleasant listening environment can be created with far less than a com-

pletely diffuse field.

Diffuse Energy = f(R,S)

Psychoacoustics

An equalizer will not be capable of exactly reproducing the time varying pressure field

recorded in a concert hall or recording studio. The engineering compromises which must

be made in designing the equalizer must focus on restoring the most psycho-acoustically

significant features of the original signal. The design methodology will be such that impor-

tance will be placed on maintaining only those features of the waveform which a listener

will perceive in an average home listening room. This brief introduction to psycho-acous-

tics is intended to motivate the signal processing objectives of the adaptive equalizer. Spe-

cifically, the level to which signal coloration can be accepted without a loss of listener

enjoyment is discussed. This is largely determined by the frequency response and frequen-

cy resolution of our hearing, and of the masking properties of the signal itself. A descrip-

tion of how the room-induced temporal and spatial characteristics of the signal contribute

to a pleasing sound is investigated. Finally the dynamic range of human hearing is de-

scribed. Stereophony is not discussed as this research does not attempt to process signals

to alter their stereophonic properties.

Coloration

In the time domain a system neither adds coloration to or removes coloration from a sig-

nal if its frequency response can be characterized as shown below.

h (t) = kS(t- T) (1-62)

The function implies no change in the shape of the waveform. The output is a delayed and

amplified (or attenuated) replica of the input. Performing a Fourier transform of the im-

pulse response function gives the frequency domain restrictions for such a system.

H (o) = ke

(1-63)

The frequency domain requirements for an ideal room are the following:

i. The system must have a constant magnitude response, H (o) I = k.

ii The phase response must be linearly proportional to frequency, t (co) = -cot.

Violation of i. can lead to an unintended emphasis in some frequencies over others in ways

not intended by a composer or performer. Violation of ii. gives rise to non-linear phase.

Non-linear phase causes certain frequencies to pass through the system faster than others.

To accurately reproduce the sensation produced by the original acoustic waveform it is

essential that the sound reproduction system have a flat magnitude response. Obviously

some degradation from this standard will not alter the perception of the listener. Neverthe-

less, the fundamental difference between middle C and any other note is that the energy of

the signal is concentrated at 262 Hz and its harmonics. An attenuation of the frequency re-

sponse at middle C, if other unmasked information is present in the signal, will negatively

impact the fidelity of the signal because of the lessening of emphasis on the C-like quality

of the sound. More precisely, we are interested in maintaining the pitch and timbre of the

signal. Pitch of a musical signal is primarily determined by frequency, but is also influ-

enced by the shape of the waveform and the loudness of the signal. The frequency depen-

dency is depicted in figure 1.8, where pitch in mels is plotted against frequency, for a signal

with a loudness level of 60 phons. Note that a change in frequency has a greater effect on

pitch above 1000 Hz than below.

The loudness dependency of pitch can be quite dramatic for pure sinusoidal tones. For

music signals the loudness dependency is much less significant. As an example, to main-

tain the same pitch when increasing the loudness of a musical signal from 60 to 100 phons

requires at most a frequency shift of 2%.

If a signal is not a pure tone, but consists of harmonic tones, due to the nonlinear pro-

cessing of the ear, sum and difference signals are generated. The difference signals deter-

mine the pitch perceived by the listener. For example a signal consisting of the first four

harmonics of a 100 Hz tone, all of approximately the same intensity as the fundamental, is

perceived as a signal of approximately 160 mels. If the fundamental is removed, the dif-

ference signal is unaltered, and the listener still perceives a 160 mel pitch.These effects

3 0 0 0 ..................... ........................ ...................... ....................... ........................ ........................ . ............

300

7 2000 -

500 *

3 4 5 6 7 8 9

ln(frequency)

figure 1.8 Pitch is plotted as a function of frequency for a signal with a

loudness of 60 phons [6].

are not quite so dramatic in music signals as the harmonics are usually considerably less

intense than the fundamental.

Assume that a series of non-harmonically related frequencies do not mask each other.

If we wish to maintain pitch, the relative loudness of the frequencies must be unaltered by

the electronic equipment or the listening room. For a pure sinusoidal it is possible to hear

changes in loudness of only 0.5 phons [6]. Note also that the perceived loudness is a func-

tion not only of the acoustic intensity of the signal, but also the signal frequency. Figure

1.9 illustrates this effect for two equal loudness contours. For frequencies over 1000Hz.,

as frequency increases, the acoustic intensity must increase to maintain the same loudness

level.

Surprisingly, controlling the phase response has no appreciable effect on maintaining

pitch. The phase response, except in extreme cases, cannot be perceived by listeners hear-

ing music signals. Thus while a lack of phase distortion is required for the shape of the

wave to be unaltered by a system, the same requirement is not generally necessary for a sig-

nal to be psycho-acoustically equivalent. The results of Suzuki, Morita, and Shindo[7] in-

dicate that the effects of nonlinear phase on the quality of transient sound is much smaller

compared to the effect one would expect from the degree of change in the signal waveform.

70

60 dB contour. .

6 0 ....." .... .. .............-.. .............. ..

0 ................................. .................. ... ...................

10 -

o 0 dB contour

10 2000 4000 6000 8000 10000 12000 14000 16000

frequency [Hz]

figure 1.9 Equal loudness contours are presented as a function of

frequency and acoustic intensity [5].

Hansen and Madsen[8] found similar results. In their experiment a signal had all of its

frequency components shifted by 900 in phase, producing the most severe distortion pos-

sible in the shape of the waveform while maintaining the magnitude response. Only for

fundamental frequencies below 1000 Hz was a distinctive difference perceived by the lis-

teners involved in the test. The relative insensitivity to phase will be seen to be critical to

the operation of an adaptive equalizer.

Maintaining signal timbre is also a basic requirement in a good sound system. Those

features of the signal which are important to the timbre of sound are poorly understood, and

the study of timbre is still an area of active research. The transient behavior of an instru-.

ment being excited effects the initial behavior of the pressure field. The transients effect

the amount of energy present in each of the overtones. This is an important factor in deter-

mining the unique sound of a particular instrument. Amazingly, our hearing system is able

to pick out these cues even when many instruments are orchestrated together.

Perhaps the most problematic factor for which to account is the level of masking caused

by certain frequency components of a waveform on others. Masking is the degree in which

the threshold of audibility of a signal is raised for non-harmonically related frequency com-

ponents in the same signal, or in a completely separate waveform. Masking occurs when

one signal has frequency components which are close to the frequency content in another

signal. It has been clearly demonstrated that lower frequency components more effectively

mask higher frequency components than vice versa. These effects can be explained by

physiological phenomena. Lower frequencies produce harmonics which stimulate the re-

gion of the basilar membrane sensitive to higher frequencies, thereby masking the sound of

higher frequencies. Figure 1.10 shows the masking effect of a 400 Hz sinusoidal at 60 dB.

The threshold increases most dramatically in the region slightly above 400 Hz.

500 1000 1500 2000 2500 3000 3500

malsled fren uene', QF4rl

4000 4500 500(

The masking effects of a 400 Hz signal are demonstrated

above. The region of the basilar membrane which responds

to the 400 Hz waveform will raise the audibility threshold of

higher frequencies which stimulate the same region of the

basilar membrane [5].

figure 1.10

1`~1~---~--~~-~~-~ '~-I'

Determining exact design criteria to guarantee a true reproduction of pitch and timbre

is made difficult by the complicated dependency on signal frequency content, loudness,

masking, and the transient behavior of the energy distribution in the harmonics of instru-

ments. For example, if most of the time signals above a certain frequency are masked by

lower frequency signals, it would not be important to control the attenuation of these re-

gions of the spectrum. The simplest analysis assumes the worst case scenario in which

masking does not occur, regions of the spectra are of a loudness just at the audibility limit,

and frequencies must be controlled at a level at which any error in loudness would be below

the 0.5 phon detection limit. To maintain timbre we wish to be able to control transients in

the signal caused by the excitation of instruments. Fortunately the phase of the signal will

not have to be carefully preserved because of the relative insensitivity of our hearing to

these effects.

Practical implementation of these requirements is possible only if the frequency reso-

lution of hearing is sufficiently large that filters of reasonable order can carry out the nec-

essary inverse filtering. Frequency resolution for two sinusoids is known as the difference

limen. It has a relatively constant value of 3 Hz below 500 Hz. Above 500 Hz frequency

resolution decreases approximately as a linear function of frequency (see figure 1.11).

= 3 Hz for f < 500 Hz

0.0030f for f > 500 Hz (1-64)

Clearly it will be necessary to control pitch and timbre with a much finer degree of resolu-

tion at low frequencies than at high.

The human hearing organ can be described as a parallel filter bank with characteristic

bandwidths. Fletcher and Munson [9] found that the detection threshold of a tone in broad-

band noise is not affected by the noise bandwidth until it falls beneath the "critical band-

width", which are shown in figure 1.12.

Psychoacoustic Responses to the Temporal Aspects of Reflected Energy

Reflections under certain circumstances serve to enhance the quality of sound by adding

to the direct wave in a way which is reinforcing. For musical signals the reverberation of

the room when within certain bounds gives a pleasing effect. Under other circumstances

figure 1.11

8 1 1.2 1.4 1.6 1.8 2

frequency [Hz] x104

The difference limen gives the frequency resolution of two

sinusoidals. An acoustic equalizer will need to control the

frequency response of low frequency bands with a finer

resolution than high frequency bands [5].

reflections can cause highly undesirable effects. Although many effects can be explained

partially through the physiology of hearing, perhaps the most important factors, at least in

the reproduction of music, are the types of environments with which we have been habitu-

ated and from which our aesthetic senses have been influenced.

The discussion of the temporal effects of sound fields begins by finishing our discussion

of coloration. Strong, evenly-distributed echoes can cause serious degradation in the accu-

rate reproduction of an acoustic wave's spectral distribution. Consider a room with the fol-

lowing simple impulse response.

00

h(t) = an~(t-nTo)

n=O

figure 1.12

(1-65)

100 1000 10000

frequency (Hz]

Because our hearing can be described as a parallel filter

bank, acoustic equalizers are designed to simulate the critical

bandwidths High quality equalizers are designed to have

1/3 octave bands because of their similarity to critical

bandwidths, as seen above.

The power spectral density indicates equally spaced resonances.

IH () 12 I 2

1 2acoscoTo+ a

(1-66)

Figure 1.13 illustrates the frequency response of the room for different echo periods, TO.

As TO becomes larger the frequency components become less separated. Because of mask-

ing effects, the threshold, at which the spectral distribution of the audio signal is perceived

to change, decreases as To increases.

Figure 1.14 gives results for the threshold level at which an average listener perceives

noticeable coloration induced by a reflected wave. These measurements are based on lis-

tening tests of six different music selections.

I frequency [l/T0 Hz]

figure 1.13 As the echo period, To, increases, the frequency components

are more closely separated. Due to masking, coloration

becomes less audible.

In addition to our sensitivity to the frequency domain, our hearing is affected by time

domain properties of reflected waves. Reflected energy which is just barely perceived acts

to increase the loudness of the signal and to change the apparent size of the source. At a

higher loudness and a small delay, the perceived source direction moves. For a large delay

the reflected energy is perceived as a highly annoying echo. Figure 1.15 shows the intensity

at which a distinct echo is perceived as a function of time delay.

If the delay is sufficiently short, the reflected wave can be 10 dB higher than the direct

wave without causing echoing. This phenomenon is know as the Haas effect after the dis-

coverer, and is an important factor in the design of auditoria [10].

Acoustic clarity is a measure of our ability to resolve fine temporal structures in acoustic

waves. The faster the tempo and the higher the degree of attack on the instruments, the

higher is the necessary degree of room clarity. Many measures have been suggested. All

of them rely on determining the amount of acoustic power over a period of time represent-

ing "early" reflections divided by a normalization. One index used for determining the clar-

ity of rooms for musical signals is given in equation 1-67. A value of C = 0 dB provides

0

dB

-5

-10

20 40 60

echo delay [msec]

80 100

As the echo delay decreases, the reflectivity of the room

surfaces must decrease in order to avoid a perceptible change

in the coloration of the acoustic signal. For long delays,

relatively intense echoes may exist without causing

noticeable coloration [5]. This figure shows the level of

10log(a) at which lesteners perceive coloration.

As the time delay of the echo increases, its intensity must

decrease in order to avoid an annoyingly distinct sound. The

above figure shows the threshold intensity for hearing

distinct echoing [5].

figure 1.14

figure 1.15

delay of reflected wave [msec]

.....

sufficient clarity for fast staccato passages. A value of -3 dB is usually acceptable for sym-

phonic music.

S80msecs

Sh2 (t)dt

C = lOlog 0 (1-67)

Sh2 (t)dt

80msecs

Reverberation time of a field is a measure of how the room damps out an acoustic

source. It is usually determined by measuring the time required for the field intensity to

decrease by 60 dB from the moment the acoustic source is turned off. The listening room's

reverberant field acts to mask the details of a music performance which listeners find un-

pleasant, e.g. a minor lack of synchrony in an orchestra. The ideal reverberation time de-

pends on the type of music being performed, and on local styles and current fads. For

chamber music a reverberation time consistent with the rooms in which the compositions

were originally performed is preferred, which is from 1.4 to 1.6 seconds. For symphonies

from the romantic period or pieces utilizing large choirs, a reverberation time of over 2 sec-

onds is preferred. For opera a shorter reverberation time is preferred so that the libretto may

be more easily understood. As an example, the La Scala opera house in Milan has a rever-

beration time of 1.2 seconds [11].

Psychoacoustic Response to the Spatial Aspects of Reflected Energy

For a room to have a good acoustic ambience the listener must have a sense of spatious-

ness. Originally researchers believed that spaciousness was caused by hearing spatially in-

coherent fields. It has been shown, however, that this gives rise only to a phantom source

direction, and does not provide a sensation of spaciousness. The following conditions are

required.

i. The field must be temporally incoherent.

ii. The intensity of the reflected energy must surpass the audibility threshold.

iii. The reflected energy must have components with time delays of less than 100 msecs.

iv. For early reflections, the field must be spatially incoherent, with components

arriving from the lateral directions being of primary importance.

These requirements are generally met in large rooms, but are increasingly difficult to en-

sure in smaller and smaller rooms. It has been shown [6] that spatiousness is independent

of the style of music being performed.

The dynamic range of human hearing is 140 dB. If our hearing were more sensitive it

would not be beneficial as we would start hearing the Brownian motion of air molecules.

In processing music digitally, ideally a sufficient number of bits are available to ensure cov-

erage of the entire dynamic range of musical works at a resolution consistent with our phys-

iological capabilities. The standard CD format provides 16 bit encoding of amplitude

information (approximately 90 dB). Equalizer technology should aim to provide no further

degradation in dynamic range than the limit defined by the CD encoding.

Consequences of Room Acoustics and Psychoacoustics

in the Solution of the Inverse Problem

Several important inferences can be drawn about potential solution methods to the room

inverse problem from the discussion of room acoustics and psychoacoustics. Two obvious

approaches are available as solution methods: 1) a theoretical modelling of the physics of

the listening room acoustics, and 2) experimentally measuring the characteristics of the

room.

The first approach has been briefly reviewed in the discussion of room acoustics. The

key acoustic parameter, complex acoustic impedance, must be carefully measured for each

material present in the listening room. Using these impedances and the usually complex

geometry of most home listening rooms, the Helmholtz equation must be solved. Note that

matching boundary conditions will be impossible to perform analytically, and will require

a numerically intensive calculation using a finite elements algorithm. As a result of the

computations, a set of room eigenvalues and eigenmodes will be determined, from which

the room's Greens function is determined. Even for the case of a simple rectangular room,

the number of room eigenmodes increase as the cube of the upper frequency limit of the

source. For a room of average dimensions, i.e. Lx = 5 m., Ly = 4 m., and Lz = 3.1 m., be-

tween 0 and 20,000 Hz. there are over 57,000,000 [6] resonant room modes. As an addi-

tional complication, except at low frequencies, the modes are extremely dense in the

frequency domain, i.e. the half-width of a mode is much greater than the separation between

modes. Thus the majority of modes are mutually coupling.

The aforementioned calculations are so computationally burdensome that they can not

reasonably be performed. Even if a complete set of modes were available, from which the

Greens function inverse could be calculated, the processing problem would not be feasible.

For the simple room geometry discussed above, there were 57,000,000 room modes, and

an FIR filter length of 100 million tap weights would be required. Realizing that implemen-

tation of such a large filter is impossible, a selection of the most important room modes

would need to be calculated. Because of the mutually coupling nature of the modes, and

the changing spectral distribution of the acoustic source, the "important modes" would be

constantly changing.

The rigorous eigenmode expansion of the room response, the subsequent calculations

of the room's Greens function, and implementation of the inverse on hardware, can not be

performed. As a simplification in the computational complexity of describing a home lis-

tening room, a geometric acoustics approach could be attempted. The previous discussion

makes clear, however, that geometric acoustics gives insight into the time average proper-

ties of the listening room. Depending on the nature of the acoustic source, the approxima-

tions could yield poor results. In addition, the computational complexity of characterizing

the complex acoustic impedances for the room materials and the complicated geometry of

the listening room, continue to result in a huge computational problem. Figure 1.16 sum-

marizes the rationale for rejecting approaching the inverse room problem by modelling the

physics of the room.

The experimental approach must be selected, and this approach is discussed in detail in

chapter two. Several implications of this method must be addressed in light of the basic

principles of room acoustics and psychoacoustics in order to design an intelligent tech-

unique. The experimental approach can be summarized as follows. The listening room is

excited by a broadband noise source. A measurement is made of the acoustic pressure at

the listener location. This measurement is compared with the source signal band by band

with an octave filter bank. An inverse of the room is approximated by ensuring the energy

distribution of the source signal is maintained by the measured signal.

Reason to Reject Waye Theoretic Geometric

Approach Optics Approach

I. Acoustic Impedances of Materials X X

Required.

2. Complicated Boundary Conditions. X X

3. Computational Complexity. X X

4. Resultant Filter Orders.

X

5. Discernability of Important X

Room Modes.

6. Limited Accuracy. X

7. Time-Averaged Properties Only. X

figure 1.16 The figure demonstrates the reasons that modelling the

physics of room acoustics is not feasible for solving practical

room inverse problems.

The discussion ofpsychoacoustics makes clear that acoustic high fidelity requires main-

taining signal pitch, and the level of spatial and temporal coherency, with sufficient spectral

resolution to match the psychoacoustic and physiological characteristics of human hearing.

Equalization can not restore true audio fidelity of the acoustic field as perceived at the lo-

cation of the signal recording. Without a significant increase in the sophistication of the

recording process, coherence information will continue to not be encoded on the CD. As a

result, properties such as clarity, spatiousness, apparent source size, etc. will not be restored

by equalization. The coherence properties of the signal received by the listener in the home

listening room will be determined by the listening room itself.

By ignoring the signal coherence properties, the inverse problem is radically simplified.

In addition we have assumed that the listening rooms are minimum phase. Limiting the in-

verse problem to minimum phase systems does not significantly reduce the robustness of

the technology. From the principles of room acoustics it is known that for a listening room

to be non-minimum phase it must have a bad acoustic design, e.g. a rectangular cavity with

one axis much longer than the others terminated by walls with very low acoustic resistance.

The assumption of minimum phase has the important characteristic that equalization will

not require poles to be placed outside the unit circle.

The filter design problem has been reduced in scope to removing gross coloration ef-

fects of the stereo electronics and the listening room at a sufficiently high level of spectral

resolution that it matches the characteristics of human hearing. The summary of psycho-

acoustics indicated that the phase response of a system does not affect our perception of

pitch. The filter designer is left with the simpler task of calculating the inverse filter which

inverts the magnitude response of the listening room at a resolution of approximately 1/3

octave. The following figure demonstrates the simplified set of design criteria.

Design Factor

I. Spatial Coherence

2. Temporal Coherence

3. Magnitude Response

4.Phase Response

5. Frequency Resolution

6.Amplitude Resolution

7. Stability

+

Not encoded on CD

Not encoded on CD

Not pyschoacoustically

important

Must match critical

bandwidths

16 bits encoded on CD

room responses are

usually minimum phase

The figure demonstrates the design criteria for a

measurement based equalization strategy.

figure 1.17

Yes/No

Comments

CHAPTER 2

AN EQUALIZATION STRATEGY FOR REAL-TIME SELF-ADJUSTMENT

Introduction

Chapter two will introduce the current state-of-the-art equalizer technology. It will be shown

that this technology does not make efficient use of an equalizer's limited degrees of freedom. Be-

cause of the finer frequency resolution required in the lower frequency bands, high filter orders are

required to control the small critical digital frequencies. In the high frequency bands however, the

spectral resolution will be finer than necessary. The Adkins-Principe architecture significantly im-

proved the traditional strategy by introducing a multi-rate octave filter bank. By downsampling

data in the lower bands according to the Nyquist limit, the critical digital frequency requirements

are made proportionally less severe, and the required inverse filtering can be performed by filters

of significantly lower order. As this is the basic approach of the technology introduced in this dis-

sertation, a complete description of this technique will be reviewed in chapter two.

While this strategy is a significant improvement, it does not make use of the time varying spec-

tral distribution of the source. The Adkins-Principe strategy continues to assign the limited degrees

of freedom available in the inverse filters to equalize the entire audio bandpass, even if the source

has no power in large spectral regions. It furthermore continues to equalize regions of the bandpass

where little or no coloration is being induced by the stereo electronics or the listening room. The

approach investigated in this work is to add into the Adkins-Principe architecture adaptive filters.

Adaptive filters by their very nature will assign the most processing resources to regions of the au-

dio bandpass in which the most coloration exists, significantly improving the efficiency and perfor-

mance of equalization. A brief description of the major classes of adaptive algorithms is included.

Because of the requirements of real-time processing, as well as the requirement of high fidelity,

appropriate classes of adaptive algorithms are limited. Because of the low computational complex-

ity, and satisfactory performance the Least Mean Squares (LMS) algorithm is chosen for further

investigation in this research.

The Adkins-Principe equalizer architecture is a parallel multi-rate filter bank.

LP = Lowpass Filter

HP = Highpass Filter

DEC = Decimation

INT = Interpolation

figure 2.0

State-of-the-Art Acoustic Equalizers

It is clear from the preceding section on the physics of room acoustics that a theoretical

modelling of a listening room, even with an unreasonably simplistic geometry and con-

structed with materials of uniform impedance, is such a computationally burdensome en-

deavour that it can not be performed for an individual's listening room. The current

equalizer technology seeks to excite a room with broadband noise in order to stimulate a

broad range of eigenmodes. This signal is measured at a listener location or locations, and

an inverse of the cascade of the home stereo system and the home listening room is estimat-

ed. Equalization is not a panacea which will restore a perfect reproduction of the pressure

field as propagated by the original rendition of the musical performance. In fact, acoustic

equalization can not even reconstruct a pressure field which is psycho-acoustically equiva-

lent. Current technology attempts only to remove gross coloration. The strategy employed

is depicted in figure 2.1. Limitations of this strategy will be examined, with special empha-

sis on those items which the proposed adaptive equalizer will ameliorate.

Snois hoe listening micro-

source equalizer electronic speakers room phone(s)

filter --

bandpass coefficients bandpass

filters filters

band by band

comparator -

figure 2.1 State-of-the-art equalizers are designed as a parallel filter

banks with filter coefficients that are adjusted by comparing

the power collected in each band by a microphone located in

the listening room, with the noise which is used to excite the

room.

After room excitation with a broadband source, current techniques break the measured

signal into octave or 1/3 octave bands using a set of bandpass filters. The power in each

band is compared with the power initially radiated in the same band. The equalizer is ad-

justed so that its filter coefficients give sufficient gain in the attenuated bands that the re-

ceived signal has a flat frequency response, or any pre-programmed response which the

listener finds acoustically'pleasing. The noise source is turned off, and the CD player is

turned on.

Source Excitation

The excitation source utilized in equalizers has been the subject of much debate. Sev-

eral articles [13-14] deal with the varying responses of rooms to different types of sources,

i.e. firecrackers, pistol blanks, etc. The most important characteristics of the test signal are

the signal's spectral content, the amplitude distribution of the source, and the duration of

the signal. State-of-the-art equalizers use pink, Gaussian noise for reasons which are now

discussed.

Except for small, undamped rooms, at low excitation frequencies, the sound pressure

amplitude distribution is independent of the volume, shape, or acoustical properties of the

room. By examining the room's Green's function it is seen that the pressure field is the su-

perposition of many eigenfrequencies, each with its separate damping coefficient. This su-

perposition is complicated due to the finite half-width, and small separation of the room

resonances. Since the eigenmodes are closely spaced and mutually coupling, the field may

be considered to be resultant from a single frequency with randomly distributed amplitudes

and phases. Applying the Central Limit Theorem to both real and imaginary terms of the

pressure, P, leads to the Raleigh distribution of IPI. Figure 2.2 demonstrates the similarity

of the distribution function of pressure to Gaussian excitation.

The spectral content must be sufficiently broad that the eigenmodes excited by a music

signal are also excited by Gaussian noise. Pink noise satisfies this requirement and has

theadditional advantage that it will simplify the signal processing. Pink noise has the prop-

erty that equal power is radiated in each octave. In addition, unlike impulsive sources, the

figure 2.2 A Gaussian noise source is used with many state-of-the-art

equalizers because of its similarity to the pressure

distribution of acoustic data in an enclosed space.

pink noise source is of sufficient duration to "charge" the reactive elements of the listening

room, which will allow a better equalization of the room's steady state response.

Bandpass Filters

Bandpass filters are designed to have bandwidths that approximately match the band-

width characteristics of our human physiology. All high quality equalizers employ a digital

1/3 octave parallel filter bank, and from figure 1.12, it is seen that this resolution will closely

match the critical bandwidths as discussed in the psycho-acoustics review. The center fre-

quencies and upper and lower cut-off frequencies are given in figure 2.3.

To maintain pitch the bandpass filters must be designed with sufficient control of the sig-

nal's spectral characteristics, while realizing that too fine a resolution will not improve our

subjective sense of pitch and will add processing time and expense to the equalizer. The

filter bank is made up of linear FIR lowpass and highpass filters cascaded to form bandpass

filters. Performance is limited by the low digital cut-off frequencies associated with the

lower frequency bands, which require high order filters. Control of the band at the level of

the difference limen (fig. 1.11) is not achieved. Except in the most unusual circumstances

1/3 Octave Filter Bank

Band Center Frequencle [Hz] Band Umia [Hz)

22

25 28

31.5 35

40 44

50 57

57

63 7

71

80 22-28 Hz--

100 88

125 113

160 141

200 176 28-35 Hz---

250 22583

315

40 353

500 440

565

630 56507

707

800 7070

1000 880

100 1130

1250 1414

1600 1760

2000 2250

3150 2250

4000 3530 -[kHz -

5000 4400

6300 5650

8000 7070

10000 8800

12500 11300

16000 14140

20000 17600

22500

figure 2.3 The critical frequencies of an 1/3 octave bandpass filter are

shown above. Because of the similarities with the resolution

characteristics of human hearing, the 1/3 octave filter bank is

most often chosen for state-of-the-art equalizers.

the difference limen is an unnecessarily strict standard due to the masking effects of music

signals. The precise requirements are difficult to evaluate do the complex nature of mask-

ing.

Signal Measurement

Any system which relies on current microphone technology for precise acoustic mea-

surement is problematic. A situation analogous to the Heisenburg Uncertainty Principle

arises when measuring the field introduction of the microphone into the field disturbs the

field itself. In minimizing this effect, high quality microphones are made as small as pos-

sible while maintaining their sensitivity. In addition measuring the pressure field intensity

is not synonymous with measuring signal loudness as was shown in the previous section.

The conversion from intensity to loudness can be accounted for in software. A more diffi-

cult matter is to completely characterize the microphone directionality, which will differ

from the directional properties of our hearing. In practice a precise understanding of the

microphone's directional response is unavailable. A high quality microphone must also

have a fast response if it is not to smear the details of the signal, affecting the perceived tim-

bre and clarity.

Signal Processing

Processing is performed by a dedicated chip which calculates the power in each band

collected from the CD player. The same procedure is performed on the signal received from

the microphone. The two signals are compared, and a calculation of the appropriate filter

coefficients is made in order to guarantee a flat frequency response, or any other preset fre-

quency response, across the audio bandpass. Unfortunately the details of the processing are

not discussed in the literature, probably because of sensitivity towards proprietary technol-

ogy on the part of manufacturers. The measurements of current state-of-the-art equalizers

indicate they are capable of maintaining a flat frequency response across the audio bandpass

to within IdB. At 40 dB between 100 and 1000 Hz for speech and music signals a band

ripple of IdB is not perceptible. At the band edges ripple can be as great as 2 dB before

listeners can perceive coloration. The number of bits used for filter all processing calcula-

tions must be at least 16 in order to avoid degrading the S/N of the CD player. High quality

equalizers allow separate equalization to be performed for each channel of the stereo. The

period of time required for the equalization can be as high as 15 seconds. The filter coeffi-

cients can be stored in one of several memories. A graphic interface is provided to allow

the user to manually set his listening preferences, which are taken into account in the cal- -

culation of filter coefficients. These systems are a large improvement in traditional equal-

ization technology which required time consuming manual adjustment of graphic

equalizers, and depended on the skill of the individual making the adjustments.

Limitations

In the discussion of room acoustics the temporal characteristics of the acoustic field was

introduced using the principals of geometric acoustics. This approach gave a qualitative

understanding, but it is of limited accuracy and validity. The spatial characteristics were

also presented using a simplistic approach to acoustic reflections. A rigorous theoretical

foundation of room acoustics was made from the wave equation. By solving boundary con-

ditions it was shown that the impedances of surface materials, and the room's geometry are

the key parameters in developing a Green's function. Because of the oversimplification of

geometric acoustics, and the huge computational requirements of a wave theoretical ap-

proach, filter coefficients required for an equalizer can not be calculated.

State-of-the-art equalizers determine the room response by exciting a room with a

broadband source. A measurement is made of the acoustic field and the listener location,

and an inverse of the room response is calculated for implementation in an octave filter

bank. The excitation source, signal measurement, signal processing, and equalizer filters

of state-of-the-art equalizers were reviewed. This technology is focused on removing

those effects which are psycho-acoustically most significant. In the presentation of psycho-

acoustics pitch was discussed as a key feature of the music signal. To preserve pitch the

equalizer must remove coloration introduced by the stereo electronics and the listening

room. Phase response was shown to be relatively unimportant in maintaining pitch. The

temporal and spatial characteristics of the acoustic field were discussed as important char-

acteristics in giving a listener the sense of fullness and spatiousness of the sound.

The state-of-the-art equalizer technology described above is able to remove most color-

ation of the stereo-room combination, and thus maintain pitch. The equalizer will not be

able to correct the following effects.

1) The equalizer will not be able to correct coloration due to strong regular reflections.

These types of effects are represented by transfer functions that are non-minimum phase,

and thus would require the equalizer to have poles outside the unit circle. This should not

be a problem unless the room in which the original recording was made, or the home lis-

tening room has a terrible acoustic design, e.g. a rectangular cavity with one axis much

longer than the others terminated in a wall with very low acoustic resistance.

2) The perceived timbre will not be affected by the equalizer. Timbre is related transient

phenomena, and as the equalizer's filters are time invariant, the equalizer will not be able

to restore timbre related features.

3) Equalization can not restore true audio fidelity of the signal because of the loss of

information regarding both temporal and spatial coherence properties of the signal. Unfor-

tunately these characteristics are psycho-acoustically important. There are many reasons

for the lack of fidelity with respect to field coherence properties. Without a significant in-

crease in the level of sophistication in the recording process, the coherence information will

not be included on the CD itself. Microphones used in the recording process, as well as the

equalization process, do not encode information regarding the spatial characteristics of the

signal. Fast temporal structures may also be lost due to the non-zero time constants in all

microphones. In addition the state-of-the-art equalizers utilize only one microphone and

cannot account for stereophonic properties of the acoustic field. As a result of the above

limitations, properties such as clarity, spatiousness, apparent source size, apparent source

location, etc. will not be restored by equalizers. These properties will be maintained by the

characteristics of the listening room itself.

4) The equalization of a music reproduction system is valid for only one listener loca-

tion. If the listener moves locations, the equalization must be performed again. The adap-

tive equalizer will eliminate this problem.

5) The best source with which to excite the room is the actual acoustic signal of interest

- not pink noise. The equalizer's filters should utilize their limited degrees of freedom to

correct frequencies at which distortion is a maximum, and not the entire spectrum. This is

a fundamental limitation which affects the degree to which relative loudness, and hence

pitch, can be maintained. This limitation will be ameliorated by the adaptive equalizer.

The Adkins-Principe Equalizer Architecture

The Adkins-Principe (AP) architecture [1] uses a multi-rate filter bank to produce an

octave band structure by repeatedly filtering and decimating a single input signal. A sepa-

rate highpass filter equalizes each band, making the processing computationally parallel

and greatly reducing the necessary microprocessor speed for real-time operation. The ad-

vantage of multi-rate designs lies in their increased efficiency. The elegance of the structure

can be seen in the block diagram shown in figure 2.0. The proposed real-time equalization

strategy will be based on this architecture. The following section will discuss its character-

istics. The advantages of this architecture over the current state-of-the-art will be made

clear. Its operation will be shown to be consistent with maintaining pitch as described in

chapter one. In addition, the features of the architecture which must be carefully controlled

to assure proper signal reconstruction will be discussed.

Octave Bands Generation and Equalization

The highest frequency band is controlled by passing the original signal through a high-

pass filter. All lower bands are controlled by a cascade oflowpass, decimation, and high-

pass filters. This is accomplished as shown in figure 2.4.

LP1I D

LPL DE

CD kqxA HP2

figure 2.4 The tree structure efficiently implements the octave filter bank. At each

node the signal is equalized by a highpass filter along one branch, and

further bandlimited along a second branch.

The tree structure is an efficient strategy for implementing an octave filter bank. The

input to each branching node has been bandlimited and decimated. The lower branch per-

forms equalization with a highpass filter while the upper branch further reduces the CD

bandwidth. The characteristics of the upper and lower branches are described below.

Each of the upper branches attempts to perform the following lowpass operation.

HLP = 1 10 o < (2-1)

= 0, otherwise

The upper cutoff of band i is equal to the lower cutoff of band i-1. The sampling rate

can be reduced by decimating by M according to Shannon's sampling theorem.

Xdec [nT] = x [nMT] (2-2)

The decimation rescales the digital frequencies of the DFT, as shown in figure 2.5.

Clearly the decimation rate must be coupled to the value of the digital cutoff frequency of

the lowpass filters in order to avoid aliasing.

Decimation by 2

IfA A

figure 2.5 Decimation by M in the time domain expands the digital

frequencies by M, and reduces the peak value of the

spectrum by M. This figure demonstrates the effects for M

equals 2.

Decimation reduces the number of operations required for real-time processing by re-

ducing the data rate by Mi1' for band i. In addition, because the critical digital frequencies

are increased by M, due to the rescaling of digital frequencies engendered by decimation,

the required filter orders for all subsequent filtering operations are considerably smaller, or

a finer frequency will be possible with the same order filters. This is an important charac-

teristic due to the finer resolution psycho-acoustically required in the lower frequency

bands. Figure 2.6 summarizes the upper branch.

At each branching node of the tree structure the bandlimited signal is also sent to a

highpass filter (lower branch) for equalization. If the room were non-filtering the highpass

filter would be designed as follows.

HHP = 1 < M (2-3)

= 0, otherwise

Note that the digital cutoff frequency is and not because of the decimation.

\

Operation of the Upper Branch of the Tree Structure

original signal Lolwass Filter lowpass signal

Decimate by M

figure 2.6 This figures demonstrates the processing of a signal

generated as the summation of two sine waves. The low

frequency sine wave is separated with a lowpass filter and

then decimated. The decimated part of the signal is thrown

away.

In the more realistic case of a room which filters its acoustic input, the highpass filters are

designed with user supplied gains for each band. The graphic interface used on an equalizer

implemented on the NeXT computer workstation [15] is shown in figure 2.7. The interface

indicates gains for seven bands, and interpolates between the frequencies indicated below..

Begin Pausel Stop

1.12 931298 92 1-143283G Sa3-eee

I06 Eo1 T5n7 |3o

%41-423rS294 r|4 6sea71

17conr |i0uin

Ll

46Z

I7-15817022

I kll

figure 2.7 The graphic interface for an equalizer using the AP

architecture was implemented on the NeXT computer with a

user interface as shown in this figure. The highpass filters are

adjusted according to the selections made by dragging the

slides with a mouse.

The combination of filter length and decimation rate must be such that the requirement

of at least 1/3 octave resolution is satisfied. In addition, band ripple, transition bandwidth,

and stopband attenuation must be such to guarantee that there is less than 1dB band ripple

across the entire audio bandpass. The research documented in this dissertation will use the

filter design presented by Adkins and Principe [1]. Their extensive simulations resulted in

a recommended design of 4 bands, each utilizing symmetric finite impulse response (FIR)

filters of order 45, and a constant decimation rate of six. A summary of the frequency res-

olution of the design is given in figure 2.8.

Frequency # of frequencies bandwidit of controlled

Band Range [z] controlled frequencies [octaves]

Band 1 3675-22050 19 0.045 0.250

Band 2 12.5-3675 19 0.045 0.250

Band 3 102-612.5 19 0.045 0.250

Band 4 17-102 19 0.045 0.250

I I \l

fU I 23

f,/2M4 f/2M3

012 3675

f/2M2 f,/2M

frequency [Hzl (not to scale)

22050

ft2

figure 2.8 Frequency resolution for a four band equalizer with filters of

order 45, and a constant decimation rate of 6, is shown

above.

The highpass FIR filters are designed by taking the inverse FFT of the frequency re-

sponse requested by the user with the equalizer's graphical interface. The filters are

smoothed by truncating symmetrically about time = 0 the resulting time domain represen-

tation with a Kaiser window with 3 = 3.5. The symmetric nature of the filters guarantees

linear phase. The time domain representation is shifted to guarantee that the filters are caus-

al. A summary of the process is shown in figure 2.9.

Signal Reconstruction

After equalization has been performed with highpass filters, the data in each band must

be summed in order to reconstruct the signal. The summation is complicated because of

the different sampling rates at which the individual highpass filters operate. In addition

there is an unique delay due to the differing number of convolution operations performed

in the various branches of the equalizer. Immediately proceeding equalization, the signal

41 3 2 1

~AinIiA U

figure 2.9 Operation of the lower branch of the tree structure.

a) User defines desired response; b) Frequency response; c)

inverse fourier transform for time damson representation, (d)

For filter coefficients multiply time domain representation by

a Kaiser window to lessen degradation induced by Gibb's

phenomena.

is interpolated to bring it to the sampling frequency of the preceding band. Interpolation

performs the opposite operation of decimation by inserting M-l evenly spaced samples of

value zero between every pair of data points.

x, -- n = x [kT] if k=n/M=an integer (2-4)

0 otherwise

Interpolation acts to recompress the digital frequencies of the equalized time series in man-

ner precisely opposite to the decimation process. These effects are demonstrated in the fig-

ure 2.10.

Images created by interpolation must be removed with an anti-aliasing lowpass filter.

7t 7C

Note that the digital cutoff frequency is and not because of the interpolation.

M M2

The lowpass filter should have a gain of M to restore the energy lost in the decimation stag-

e.The output from the anti-imaging filters are summed with the output of the preceding

band efficiently by using the inverse tree structure shown in figure 2.11.

Graphic Interface Desired Frequency Response

H (e )

(a) (b)

Tmne Domain Representatlon Kaiser Window

figure 2.10

Interpolation by 3

(a) For decimation by M, M-1 evenly spaced samples of zero

are placed between each pair of samples, (b) frequency

response before interpolation, (c) frequency response after

interpolation.

Inverse Tree Structure

INT LP2

+ INT LP2

SIaquaer

HP ouu

figure 2.11 The inverse tree structure efficiently sums the bands together

after interpolation and anti-imaging, bringing the lower

frequency band up to the sampling rate of the higher

frequency band.

The delay associated with each band is calculated with respect to the highest sampling

rate. Recall that for an FIR filter the delay, Td, is expressed as follows, where Tsi is the

SH(e")

AKAAAAA,

sampling rate of band 1.

(N- 1)

Td 2 Tsl (2-5)

For band i the total delay as seen at the output of the highpass filter is expressed as follows.

Td= 2 1 Mi Ts (2-6)

Si= 0

The delay at all nodes of the inverse tree structure is dominated by the value of Td in the

upper branch. The total delay as seen at the output of the AP structure for a four band sys-

tem is dominated by Td4 and is given below.

2T (N-)M3T ()

Tdo = 2 Td 2 M3 s1 (2-7)

Ttotal d4 2

Processing Speed

Efficient use of computing resources can be made because of the differing sampling

rates of data in various stages of the architecture. Convolutions are performed at different

rates and are staggered in time. Data is sampled such that multiply and accumulate opera-

tions are made in non-overlapping time slots. In this way major timing bottlenecks are

avoided. The CD player sampling rate determines the value of the time slots (1/44,100 =

22.676 microseconds). For the four band design 10 FIR filters are operated. Three filters

perform convolutions at the highest sampling rates (44.1 kHz), three perform convolutions

at the CD rate decimated by six (7.35 kHz), three perform convolutions at the CD rate dec-

imated by 36 (1225 Hz), and one convolution at the CD rate decimated by 216 (204 Hz).

The timing diagram is shown in figure 2.12.

As an example of the computational load of the convolutions indicated in the timing di-

agram above, consider the TI TMS320C40 25Mflop floating point digital signal processing

figure 2.12

HPt, t tt.t.tt.t.ottt t,.

I.Lt t t t f t f t f t t t t tt tt

.LP2 TTI t

LP1I

HP3 t 3

LP2 t 36

LP lt 216

HP4 3

The timing diagram indicates the efficiency of

microprocessor implementation of the AP architecture.

Decimation and selective sampling allow a staggering of the

convolutions which prevent computational bottlenecks.

chip. This processor performs a multiply and accumulate in one cycle. With one chip ded-

icated to each band of the equalizer the number of computations which can be performed

between CD data samples is limited to the following.

25MFLOPS

# of computations = 25MF = 566

44.1KHz

According to the timing diagram above, for FIR filters of order 45, 360 multiply and accu-

mulate operations are required, i.e. 63.6% of the CPU time.

Theoretical Performance

The AP architecture is an efficient implementation of a filter bank. The audio signal is

broken into octaves using a tree structure. Along the lower branch, a bandlimited and

downsampled signal is equalized with a highpass filter. Along the upper branch the signal

is further bandlimited and downsampled according to the Nyquist limit. After equalization

the octave bands are brought to the original sampling rate by upsampling, and using an in-

verse tree structure. Each band has a distinct delay which must be accounted for when re-

combining bands. Because of the octave band structure and downsampling, the critical

digital frequencies are raised as much as possible, and lead to a degree of bandwidth reso-

lution and of band ripple, across the entire audio bandpass, that is superior to performance

advertised on existing equalizer architectures. From figure 2.5 it is seen that frequency res-

olution is finer than 1/3 octave across the entire bandpass. At worst, resolution is 1/4 oc-

tave, while at best the equalizer has 0.045 octave resolution. Computer simulations of the

AP architecture indicate for equalizer settings for a flat response, there is a band ripple of

less than 0.2 dB from 17Hz to 22500 Hz. This value is far superior to the values of 2dB

which are measured on equalizers which are commercially available. The spectral resolu-

tion and the high degree of control of the highpass filters are precisely the characteristics

required to restore the proper pitch of the acoustic signal received at the listener location.

Automatic Adjustment Using Adaptive Filters

Elliot and Nelson [16] have proposed substituting the highpass filters in a standard filter

bank with adaptive filters designed to minimize the mean square of an error signal which is

generated by taking the difference between a pink noise source, and the noise source as col-

lected by a microphone at the listener location. Although this approach has the advantage

of a lower computational complexity than the standard frequency domain techniques, the

most compelling reason for the use of adaptive filters is overlooked, namely, the use of the

music signal itself as a reference signal. This research will substitute adaptive filters for the

highpass filters in the AP architecture as shown in figure 2.13.

In each stage the CD reference signal is bandlimited and decimated in precisely the

manner discussed above. A microphone signal is filtered and decimated in an analogous

manner to the CD signal.' A feedback signal (error signal), which is provided by the differ-

ence between the filtered CD and microphone signals, is sent to an adaptation algorithm

which updates the adaptive filter weights in such a way that the mean square error is mini-

mized. As an example, the structure for band 2 is shown in figure 2.14.

The adaptive equalizer architecture is a modification to the

AP architecture in which highpass filters are replaced by

adaptive filters.

The adaptive filter is outlined in grey. The filtered CD signal

is compared with the microphone signal similarly

bandlimited an decimated. An error signal is generated and

sent to the adaptation algorithm which updates the filter

weights in such a way that mean square error is minimized.

This strategy will improve the state-of-the-art by ameliorating two of the limitations

discussed in chapter one. Specifically, if the listener changes location (assuming the micro-

Band 2

figure 2.13

figure 2.14

phone is also moved) the filter will automatically correct for the changes in the room's re-

sponse because of the equalizer's capability for real-time self-adjustment. Depending on

the type of adaptation algorithm chosen, the updates will be performed either every sam-

ple(44.1 kHz), or after every block of samples(44.1kHz/N, where N is the length of the

block). More importantly, the reference signal used for equalization is the CD signal itself

instead of pink noise. The adaptive filter, when minimizing mean square error, will auto-

matically utilize its limited degrees of freedom to adjust the frequency response in those

portions of the spectrum in which the most energy is present not the entire audio bandpass.

Several adaptation algorithms have been developed for adaptive signal processing

[17-19]. The choice of algorithms depends on several considerations, namely, the station-

arity characteristics of the signal, the data rate of the input signal, the degree of accuracy

required in filtering, and the computational resources available for real-time operation.

Acoustic data is highly non-stationary. Because the adaptive filters must find their optimal

coefficients in a fraction of the time in which the audio signal is quasi-stationary, the adap-

tation process must be fast. Secondly, although the adaptation must occur rapidly, the al-

lowable misadjustment of the filter weights must be relatively small for meaningful

improvement in the current state-of-the-art. Thirdly, due to the high CD sampling rate, the

complexity of the algorithm must not be so great as to preclude carrying out the arithmetic

in real-time. This is a serious concern even with the high speed microprocessors currently

available. Lastly, any adaptation algorithm must not introduce any noise components

which will lower the signal fidelity.

All available adaptation algorithms require trade-offs among the crucial factors outlined

above. Extremely rapid convergence can be achieved, but at the expense of increased com-

putational complexity. Computationally efficient algorithms exist, but convergence is

slowed, and further serious trade-offs between convergence speed and misadjustment need

to be taken into account. Other algorithms may make a good compromise between com-

plexity and convergence speed, but by their nature they contribute undesirable noise into

the system. Consider the tap delay line architecture for an FIR filter shown below, where

Yk represents the output of the feedforward filter at time k, dk represents the desired (or

reference) signal, and ek represents the difference signal between dk and yk (error signal).

figure 2.15

An adaptive tapped delay line with desired signal, dk, and

error signal, ek.

Assume that the sequence {Xk} is quasi-stationary, i.e. the signal has been chosen over a

time window in which it is approximately stationary. The output of the tapped delay line

is the inner product of the n+1 most current filter weights with the filter input.

Yk = XkWk= k k- [ xk- 2 xk-j

(2-8)

The filter weights, W*, which minimize the mean square error, E [ (dk Yk) 2] are ex-

pressed by well-known Wiener formulation: W*=R-P, where R is the matrix E[XXT] and

P is the cross correlation between d and X, E[dX]. The field of adaptive signal processing

is concerned with approximating the solution for W*. There are two standard methods by

which the solution is approximated: gradient search algorithms and least squares (LS) al-

ADAPTIVE TAPPED DELAY LINE

gorithms. The gradient search methods can be generalized as follows:

Wk+l = Wk pV ,, where = E [ (dk -k) 2] = mean square error. This common

sense approach adjusts the weights constituting vector W in a direction opposite to that of

the gradient of the error surface. g represents a constant which regulates the step size of

the weight update. The most common gradient search algorithm is the Least Mean Squares

(LMS) algorithm which estimates V ( = V E [error2] by taking the time average of

V errorr] The approximation is valid over quasi-stationary periods because of the qua-

si-ergodicity of the signal. The LS algorithms differ by estimating the value of R and P

with time averages. LS algorithms can be broken into two categories: algorithms which

calculate new estimates over a window of data (block LS algorithms), and algorithms which

recursively update the estimate after every sample (RLS algorithms). One example of an

LS algorithm is the block covariance method. Let R and P be the time averaged approxi-

mations of R and P, and let to be a windowing function.

Rk M XIX (2-9)

k = MiLlLXI (2-10)

A (2-11)-1

Wk = Rk Pk (2-11)

Certain classes of algorithm can be ruled out of consideration immediately because of

their unsuitability with respect to one or more of the requirements given above. Although

the block LS algorithms converge more quickly than gradient search techniques, after every

stationary window of data the input data autocorrelation matrix, a, must be inverted. Some

of the block algorithms have autocorrelation matrices that are Toeplitz and can use the Dur-

ban algorithm for matrix inversion, requiring O(N2) operations, where N is the length of

the filter. Many of the LS methods(i.e. the covariance method), however, are not Toeplitz

and matrix inversion must be performed using the Cholesky algorithm, requiring O(N3) op-

erations. Furthermore, these algorithms update the weight vector every M samples intro-

ducing an undesirable artifact into the spectrum of the acoustic signal.

Algorithms which minimize error in the frequency domain operate on blocks of data.

As such, they update filter coefficients at regular intervals, introducing a noise component

into the signal. In addition, they require more operations in order to transform the data. Al-

though we are most interested in minimizing the error of the magnitude response of the

transform, good time domain equalization will guarantee good frequency domain equaliza-

tion as well.

The recursive algorithms are more promising. They converge rapidly, and the weight

vector is updated every sample. These algorithms unfortunately have a heavy computation-

al load. Figure 2.16 gives the computational requirements of the most familiar versions of

the RLS algorithm [18].

REQUIRED COMPUTATIONS PER ALGORITHM ITERATION

ALGORITHM MULTIPLICATIONS ADDITIONS SQUARE

& DIVISIONS ROOTS

Modilled Fast

Kalman (MFK) 5N + 13 5N + 13

Fast Kalman 8N + 5 7N + 2

Growing Memory 13N 6 N + 1

Covanance (GMC) 1 +

Sliding Window 13N + 13 12N + 7

Covanance (SWC)

Normalized GMC 12N + 16 7N + 2 3

Normalized SWC 23N 11N 5N

Normalized PLS 9N + 17 5N + 2 2

figure 2.16

The computational complexity for several adaptive

algorithms are displayed above.

All algorithms, except the Kalman types, required significantly more computations than

the LMS algorithm (2N + 1 multiplications and 2N additions). On the basis of computa-

tional considerations, the algorithms seriously considered for the equalizer are reduced to

these two alone. The choice has been made to utilize a fast LMS method as the adaptation

algorithm. Chapters three and four will investigate LMS algorithms in detail.

CHAPTER 3

LEAST MEAN SQUARES (LMS) ALGORITHM CONVERGENCE

WITH UNCORRELATED INPUT DATA

The LMS algorithm converges quickly to an approximately optimum solution (in the

mean square error sense) given that the step size has been properly initialized. Because of

the low computational complexity of the LMS algorithm, it has been selected as the adap-

tive algorithm which will be implemented in the AP architecture. Because the LMS algo-

rithm is a gradient descent technique, there is a trade-off between convergence speed and

misadjustment. Misadjustment measures on average how close the LMS solution is to the

optimum solution, after algorithm convergence. A smaller step size results in a smaller

misadjustment. For fast convergence and low misadjustment an adaptive step size is re-

quired. At the beginning of a quasi-stationary segment of data, the value of the step size

must be initialized in such a way that convergence speed is maximized. In this stage of the

adaptive process the lack of convergence speed provides the major contribution to mean

square error, and misadjustment is unimportant. As the filter converges, the step size must

be reduced because of the increasing dominance of misadjustment.

Chapter three will investigate the optimal value at which the step size should be set at

the beginning of a quasi-stationary segment of acoustic data. The initialization problem has

been rigorously solved for uncorrelated input signals. To efficiently utilize the LMS algo-

rithm on music signals understanding must be extended to the case of correlated input data

(see figure 4.1). Chapter three will provide the necessary background information for this

extension. Chapter three first presents the Wiener-Hopf formulation of the minimization

of mean square error. The characteristics of the error surface are highlighted as part of this

discussion. As a result it is clear that a stochastic gradient descent algorithm is a feasible

approach to the minimization problem. The properties of gradient descent algorithms are

discussed with particular emphasis on the LMS algorithm and it properties. Finally, the

convergence speed of the mean square error for the LMS algorithm is minimized as a func-

tion of step size. The background information in this chapter is largely taken from the work

of Widrow and Stears [19], and their notation will be used throughout.

The Weiner-Hopf Formulation

The Weiner-Hopf formulation [20] gives a method in which filter weights of a tapped

delay (direct form) filter can be optimized when the input to the filter is a stationary sto-

chastic signal. The results hold for a quasi-stationary signal to a greater or lesser degree de-

pending on the precise statistical nature of the signals. The measure by which the filter

weights are optimized is the mean square error (MSE).

Because the Wiener-Hopf equation is the underpinning of all adaptive signal processing

algorithms, it will be reviewed in detail. The derivation of the Wiener filter will yield im-

portant information regarding the nature of the error surface. As a result it will be obvious

that gradient descent techniques will be applicable optimization algorithms. The derivation

of the Weiner solution is presented below.

Derivation of the Wiener Solution

Figure 2.15 shows a tapped delay line. The input sequence { xk}, is a stationary stochas-

tic signal. The output of the filter, yk, is simply the inner product of the n+1 most recent

inputs with their respective filter weights, for a filter of order n+1.

w0 (3-1)

WI

w1

Yk= XkTWk= [xkk-xk-2* k-n 2

*

W

n

The error signal is generated by taking the difference between the desired signal, dk, and

the filter output.

ek = dk-Yk

dk-k Wk

Square the error.

S= (dk- T

e2 =dk k-XkWk)

(3-3)

= d-2dkkWk+XiWkWikXk

Take the expectation of both sides of the above equation.

E [e2] = Ed- 2dkX Wk + kWXk

k k K Wk Xk

(3-4)

= E[d] -2E[dkXkW] +E[XWT kWTXk

By keeping W at a fixed value its subscript can be omitted. Furthermore note that

XTww x = T T

XkWW Xk =*XkXkW

E [e] = E [d2] -2E[d kT+] WW TE[XkXk] W

Define the expectation value of the cross correlation matrix as follows.

k =E[dkT] = E [dkxk

dkxk- 1 *

(3-6)

* dkxk-n]

Define the expectation of the correlation matrix as follows.

(3-5)

(3-2)

Rk = E[XkXk] = E

XkXk

xk lXk

xk nXk

Xkxk 1

Xk- 1Xk- 1

Xkxk- 1

Xkxk n

Xk 1k n

-nk-

(3-7)

The mean square error can now be expressed in a simpler notation. Let r = E [e6].

k = E[d2] -2Pk W+WTRkW

(3-8)

d t

To minimize C, set d- to zero and solve for the optimum weights, W*

dW

(3-9)

d~

-2P+2RW = 0

dW

(3-10)

W* = R-1p

The above equation is the expression of the Wiener Hopf equations. Substituting equation

10 into equation 8 and making use of the symmetry of R results in an expression for the

minimum value of (, .*.

* = E[dk] 2PTR-P + (R-1P) TR (R-P)

= E[d2] -pTw

(3-11)

(3-12)

With a stationary signal in which the cross correlation matrix and correlation matrix

were perfectly determined, it would be possible to solve the Weiner-Hopf equations for W*

and (* Unfortunately in engineering applications of interest neither of these assumptions

is generally true. The field of adaptive signal processing is concerned with estimating W*

without the necessary a priori information required by the Wiener Hopf formulation.

Error Surfaces

Equations eight and ten provide important information regarding the characteristics of

the error surface, which is defined as f = f( W). The most critical result is the hyperpar-

abolic nature of the surfaces. This is demonstrated by changing the coordinate system used

in the expression for C. Let V = W W* be the value of the displacement of the filter

weights from the optimal value. Express MSE in terms of this translated coordinate system,

i.e. f = f(V) The transformation is accomplished as follows. Recall equation 3.8.

S= E[d] + WTRW-2WTP (3-8)

Equation 3-13 is inspired by assuming the proper solution of f(V) and demonstrating its

equivalence to equation 3-8.

(3-13)

= E [d] + WTRW- 2WTRR-1 + PTR-1RR-1 pTR-1P

Substitute W* for R-1P andC* for E [d2] -PTr.

; = r* + WTRW-2WTRw* +PTR-1RW (3-14)

Because correlation matrices are Hermitian, (PTR- ) = R-'P.

S= +WTRW-2WTRW*+W RW* (3-15)

Since WTRW* is a scalar, it must be equal to its own transpose. Again use the Hermitian

property of R.

= * + WTRW+ W*TRW WTRW* WTRW (3-16)

Note that transposition is a linear operator.

S= + (W- W*) TR (W- *) (3-17)

= + VTRV (3-18)

The error surface is clearly a quadratic function of V. As a result, the following impor-

tant characteristics are obvious: 1) the error surface is a hyperparaboloid, 2) the error sur-

face must be concave upwards in order to avoid regions of negative MSE (a logical

impossibility), 3) the optimal set of filter weights are located at the bottom of the hyperpa-

raboloid, and 4) there is only one minimum (no local minima). These observations are crit-

ical in their implication that gradient search techniques are able to find the global minimum.

An error surface for both the translated and untranslated coordinates is shown below.

The Gradient Descent Algorithm

t

The gradient of a surface points in the direction in which the function describing the

surface will be most increased. A common sense approach to finding a surface minimum

is to move in a direction opposite to that of its gradient. This section will show rigorously

the convergence properties for a stationary signal. The trade-off between convergence

speed and misadjustment will be demonstrated explicitly. Finally, a description of the dif-

ficulty in using gradient descent techniques with quasi-stationary signals is discussed.

figure 3.1 The minimum of the error surface is at the origin of the

translated coordinate system. The parabolic nature of the

surface makes gradient descent algorithms feasible.

Let gA represent a variable which regulates the step size of the weight adjustment. Gra-

dient descent(GD) methods can be generalized as follows.

Wk+1 = Wk- Vtk

(3-19)

Figure 3.2 graphically demonstrates how GD methods update themselves on the basis of

past information.

*.

figure 3.2 Gradient descent methods find the error surface minimum by moving in

a direction opposite to that of the error surface gradient.

The trajectory of the filter weights is the projection of the n+1 dimensional error surface

onto the n-dimensional weight space. For a hyperparaboloid the approach is intuitive the

greater the distance from the surface minimum the larger the gradient, and hence, the larger

the weight updates. At the surface minimum the gradient is zero, and no further adjustment

is made. A more rigorous justification follows.

Algorithm Convergence

Recall from equation 3-9 that the gradient is expressed as = -2P+2RW. Substituting

this into equation 3-19 yields the following.

k+1 = Wk + (2P- 2RWk) (3-20)

= Wk + R (2R-IP- 2Wk) (3-21)

Substitute the solution of the Wiener-Hopf equation (equation 3-10) into 3-21.

Wk+ 1 = Wk + 2R (W Wk)

= (I- 2R) Wk + 2R

(3-22)

Switch to the translated coordinate system by subtracting W from both sides of equation

3-22.

Wk+I -W* = (I-2gR) Wk+2gRW*-W*

(3-23)

= (I-2R) (Wk- W)

Vk+1 = (I-2gVR)Vk

(3-24)

Perform a unitary transformation on R. Let A be the diagonal matrix of eigenvalues of R,

and let Q be the matrix in which the columns represent the corresponding eigenvectors.

This transformation assumes R is not singular.

R = QAQ-1 for A =

X1 0

xO

0 12

00

00

and

Q = [q q2 *

0 9 qn]

qi is the eigenvector associated with x.

Substitute the above expression for R into equation 3-24.

(3-25)

Vk+l = (I- 21 QAQ-1) Vk

Note that the following transformation has the effect of decoupling the weight vectors in

the translated coordinate space, i.e. the transformation causes the new coordinates (princi-

pal coordinate system) to be colinear with the eigenvectors of the error surface.

v = Q-1 V (3-26)

With the above transformation the coordinates are aligned with the principal axes of the hy-

perparaboloid. This can be shown as follows.

= + VTRV (3-27)

*+ V (QAQ-1) V (3-28)

= + (QV) TA (Q-V) (3-29)

= C* + VTA t (3-30)

The transformation is illustrated in figure 3.3. Returning to the convergence proof, substi-

tute equation 3-26 into equation 3-25.

Q k+l = (I-2gQAQ1)Qt k (3-31)

Premultiply by Q-1

Quadratic Error Surface in the Principal Coordinate System

figure 3.3 The principal coordinate system has axes which lie along the

principal axes of the hyperparabolic error surface.

k+1 = Q-1 (I-2gQAQ-) Qtk

= (I- 2gA) Vk

By induction Vtk can be expressed in terms of Vto.

Vfk = (I 2gA) kvt0

The update equations show their decoupled nature more clearly in matrix notation.

(3-32)

(3-33)

(1 2p0) k

0

0

0

0

0

(1 -2p1)k

0

0

0

0

0

.. O

0

0

* 0

... (1-2p~ )

(3-34)

The convergence condition can be seen from equation 3-34. If o < 1 < then with perfect

max

knowledge of the gradient vector, the weight vector will converge to the optimal solution.

For
max

(3-35)

Convergence Speed

The speed of convergence is limited by the value of step size, and hence, the largest ei-

genvalue. Consider convergence along coordinate j. A geometric convergence factor can

be defined as follows.

r = (1- 2g .) (3-36)

Recall the power series for an exponential.

2 3

exp(x) = +x+ + + *

* *

(3-37)

1

Let T. = Then x. is approximately the e-folding time of the adaptation process, as

can be seen by truncating the first two terms of the exponential power series.

Vtk

Vtk- 1

Vtk n

*

^k-n

ito

-vt-n-

r. = T.exp (- (3-38)

JJ

The convergence of C to 1* is represented by a learning curve which can be approximated

as the product of exponential terms.

2 2 2

learningcurve = (o-c*) e Toe 1* e "

(3-39)

Learning is limited to the largest e-folding time, which is directly related to the smallest ei-

genvalue and the magnitude of the step size.

Excess Mean Sauare Error *

In the discussion it has been assumed V is known. In fact it is not. Adjustments must

be made in the above equations to account for a noisy gradient estimate. Let V C be the

estimate of V .

Wk+1 = Wk- k (3-40)

Switch to the translated coordinate system.

Vk+1 = Vk- Vk (3-41)

The estimate of the gradient can be expressed as the sum of the gradient plus a noise term.

V k = V

76

From equations 3-19 and 3-42 we conclude the following.

Vek = 2RVk+Nk

(3-43)

Substitute equation 3-43 into equation 3-41.

Vk+ 1 = Vk- (2RVk+Nk)

= (I-2p.R) Vk -Nk

Switch to the principal coordinate system.

QVfk+l = (I- 2gR) QVk--Nk

k+1 = e-1 (I- 2R)Q k -1Nk

= (I-2 2A) V-k -1 Nk

Let Q-1Nk = Nk be the gradient noise projected onto the principal axes.

Vtk+1 = (I-2gA) Vfk- g Nk

(3-44)

(3-45)

(3-46)

(3-47)

Note that equation 3-47 now represents a set of uncoupled difference equations. The matrix

formulation is given below.

(1 2gx0)

0

0

0

0

Vtk

vtk -

tk-n

*

vtk-n

0 ... O

o 0

0

0

* ... 0

0

0 ... (1-2gtn)

By induction Vt k can be expressed in terms of Vto.

k-I

Vk+l = (I-2gLA)klfOt -2 (I-2[pA)JNtk-J-1

j=0

(3-49)

Thus, with gradient noise included, Vtk does not approach zero as k --> o. The degree of

k-1

lim Vk = -9 Y (I- 2gA)JNtk-j. 1

k -~ j=0

j=0

suboptimality is determined by the set of eigenvalues, {Xj} the statistical characteristics-

of Nt, and the step size, i.

The difference between the minimum MSE and the average MSE is defined as the ex-

cess mean square error. Excess MSE provides a measure of the difference between the ac-

(3-50)

0

(1 2x1)

0

0

0

ntk- -2

ntk- 2

*

*

*

k-n-1

tk- 1

Vtk -2

k-n-

vtk -n-i

(3-48)

tual and the optimal performance, averaged over time, and is expressed as follows.

Excess MSE = E [- C*]

= E[VkTAvtk

(3-51)

Recall that after adaptation vt k = (I 2tA)ijN k-i- Substitute this into equation 3-51.

j.0

( k-1 ] T

- j=O0

Excess MSE = E

A 0 (I-2i)' k-i-1

i=0

00 00

i 2 Eij_

i j

(i 2gA)j A (i 2tA) ilk-i- 1] (3-52)

Since (I 2g.A)J and A are both diagonal, they can be commuted.

00 00

ExcessMSE = E[N t+tj 1A (I-2A)i+jfk_ i-

j

[Atm Ta 4

= g.2 E k[N'kj- 1A (I- 4A + 42A2)JNk_-j- 11 (3-53)

Assume that Nt results from independent errors. Then Vi #j there will be no contribution

to excess MSE.

2YE [Ntk A (I-211A) 2JNtk

Excess MSE = A2LE T[NtkA (- 2A) 2Kkj

J

= 2E [k A (I- 2UA) 2j

-1j.

(3-54)

(3-55)

Since I 4pA + 4g2A2 is a diagonal matrix, the convergent infinite series may be simplified

as follows.

S( -4 4A + 4p2A2) = (4A 4p2A2) -1

j=0

Substitute this expression into equation 3-55.

(3-56)

Excess MSE =

gE[ANkA (A gA2) -1k]

=E [Nk (I A) -INtk]

In matrix form the decoupled nature of equation 3-57 is evident.

rcess MSE = E nk ntk 1

4 iL''1'

..- tk- n]

1

1-II0

0 0

1

o o

0 0 0

0 0 0

(3-57)

0

0

ntk

4 k 1

Lntk nJ

1

... 1 x

(3-58)

The matrix equation can be further simplified as follows.

SExcsMSE = Etk -j (3-59)

4 L 1 -4gX1

Equations 3-36, 3-38,3-39, and 3-59 provide a clear indication of the trade-off between

misadjustment and convergence speed. The smaller is the step size the smaller is the mis-

adjustment. Below a certain upper bound, however, convergence speed is slowed. Figure

3.4 illustrates the implications of equations 3-60 and 3-61.

lim Excess MSE = 0 (3-60)

lim T = o

li -- 0 (3-61)

Adaptation in a Ouasi-Stationary Environment

The adaptation process for quasi-stationary input data, while conceptually simple to un-

derstand, is difficult to describe analytically. The optimal weight vector is now a function

of time, i.e. W* = f(k) as the error surface changes shape and moves through a n+1 di-

mensional space for a filter of order n. Widrow has an analytical description of the optimal

step size for the LMS by examining the case of modelling an unknown system by an LMS

tap delay filter [21]. The unknown system is assumed to be a first order Markov process

with 1* constrained to be a constant.

The more general problem for step size optimization is unsolved. Since the weight vec-

tors are adapted on the basis of past information, and the error surface is changing shape

and moving with time, the weight vector error results from two contributions. The first

contribution results from noise in the estimation of the error surface gradient. The second

contribution to error results from estimating the gradient of the error surface at iteration k

using the error surface at iteration k-1. Note the formulation for the filter weights for the

non-stationary case, in the translated coordinate system.

Vk+ = Wk- k = (Wk- E [Wk]) + (E[Wk] -Wk)

(3-62)

Substitution of the above into equation 3-19 yields the following.

k+1 = + (Wk-E[Wk])TRk(Wk-E[Wk]) +

(E[Wk] -Wk*) TRk(E[Wk] -Wk*) +

2E[ (Wk-E[Wkl)TRk(E[Wk] Wk)]

(3-63)

Expanding the third term in equation 3-63 and recognizing Wk is constant over an ensem-

ble, the third term reduces to zero, and ,k+ 1 reduces to the following.

k + 1 = + (error from gradient noise) +

(error from moving error surface)

= + (Wk-E[Wk])TRk(Wk-E[Wk] +

(E [Wk] Wk) TRk(E [Wk] W ) (3-64)

Equation 3-64 is as far as one can go without usually unavailable a priori information

concerning the stationarity characteristics of the signal. A graphic illustration of the oper-

ation of the LMS algorithm with non-stationary input data is shown in figure 3-5.

82

"* *.. ,^', :- ,' 'j^ -^ .^ ^ d I

-.. ', . .

- -' , .

Exces

=f= f ((w*)

=f *)

w2 W1 W2

figure 3.4 The larger step size results in faster convergence, but larger

excess MSE. For this reason the equalizer design will utilize

a varying step size.

W1

:~8~i'-P~i~i, i:

I '

s MSE

Ck+1

Wk

w L---

S W2

"^ .... ,,",- ...... -... .

e:t m --* -- -

figure 3.5

W k+1

Wi

4. :.

For quasi-stationary adaptation their are two contributions to

error in the gradient: 1) the error induced by the typical form

of gradient noise, and 2) the error induced by error surface

lag, i.e. the error induced by using information from a prior

error surface instead of the current error surface. In the

figure above the highlighted update, AWk+ is based on

information concerning Ck instead of ,k+ 1

araa~maes~rr~a~a

The LMS Algorithm with Uncorrelated Input Data

The best known and most widely used gradient method for stochastic signals in adap-

tive signal processing is the LMS algorithm[22]. The method is well-understood for sta-

tionary uncorrelated input data. The method can be extended to the quasi-stationary case,

although the analytic description loses much of its simplicity and is possible only for the

most simple models. This section will provide the theoretical background necessary for a

discussion on the LMS algorithm with correlated input data.

Stationary Model for Uncorrelated Input

Recall the general form of the gradient descent algorithms, Wk+ = Wk p.V t. The

LMS algorithm makes a crude estimate of V (,which nevertheless is quite effective. Let

Vt represent the estimate of V t.

Gradient Descent LMS Approximation

S= [e2] Vt = -2ekXk (3-66)

= 2E eka-

= -2E[ekXk] (3-65)

The LMS approximation replaces the ensemble estimate of V C with a time average. As

the algorithm operates over several iterations, on average the gradient estimate will be in

the correct direction. The LMS descent is illustrated in figure 3.6.

-f (W)

OIu Ig

r """ /_I

figure 3.6 The LMS algorithm approximates the gradient by replacing

the ensemble average of the gradient descent with a time

average. In the time average the LMS gradient will point in

the direction of the surface gradient.

It is a simple matter to demonstrate that the LMS estimate is unbiased.

E [V] = E[-2ekXk] (3-67)

S-2E [Xk (dk XWk) (3-68)

= 2RkWk- 2Pk (3-69)

= Vg (3-70)

If the convergence condition expressed in equation 3-35 holds, the LMS algorithm will con-

verge to W* in the mean.

~" I -'rrl r r^r - . uBm*Q~pl~i~ lulr I ~lx ~ r

1 li Wk = W* in the mean (3-71)

For 0
k **

max

Consider next the variance of the estimate. Assume that after k iterations Wk has con-

verged. Then the gradient estimate is just the noise process Nk.

0

V Z =7t+ Nk (3-72)

= Nk (3-73)

For j > k, Nk = -2ekXk. After convergence ek and Xk are approximately independent.

cov [Vp] = cov [Nj] = E N t] (3-74)

= 4E [eX jX] = 4E [e E [XjX] (3-75)

= 4jR (3-76)

The above does not imply that the variance of the estimate remains finite, or that the mean

square error remains finite. A considerably more stringent requirement on step size is nec-

essary, and is discussed below.

Conditions on i for Finite Variance for Square Error

Horowitz and Senne[2] have studied the convergence properties of the LMS mean

square error for the case of Gaussian, zero-mean, uncorrelated data. Let X = [Xo X1 ... Tn]T

and ak be the diagonal elements of E[ V Vt v].

k= E[ (dk-X k 2] = E[(dk -XW) k(Wk-W)]2

= -2E[XT(Wk-W*) (dk-Xkr )] +

E [ (Wk W) TXX T (3-77)

E - xkxk W*) I

Because {xi} is Gaussian and uncorrelated, Xk and Wk are statistically independent. Fur-

thermore, Xk (dk XTW*) = Pk RkW implies independence. Thus Xk can be factored

out of the second term of equation 3-77. Since Xk is zero mean, the term vanishes altogeth-

er.

S= + E ( Wk )TX kT( k (3-78)

Since Xk is uncorrelated, E [XkXT] is a diagonal matrix and 5 can be expressed as follows.

k = + trace {E [(Wk- W) (Wk- ) E [Xk ]} (3-79)

= + trace E [VkV] E [XkXT] (3-80)

k + Tk (3-81)

Horowitz and Senne next develop a recursive relationship for (k from the definition of the

LMS algorithm.

Wk+ 1 = [I- 2L XkXT] Wk+ 2dkXk (3-82)

Subtract W* and perform a unitary transformation. Let ek = dk XWk .

Vk+ 1 = [I- 2(XkXktT) Vk + 2 1Xek*X (3-83)

Take the expectation of Vt multiplied by its transpose. Because of the independence of

ek* and Xtk, the expectation of their cross terms is zero.

E[Vk+VkT+] = E[VtkV ] -4gAE[Vtk k] + (3-84)

4p2 (E[XkX-kT kvtkXk ]) +4k2i*A

Apply the Gaussian factoring theorem to the third term of equation 3-84.

E[XkxkVT k kT xk = 2A2E[Vtk ] + (3-85)

trace AE[ V k] } A

Substitute the decomposed moment into equation 3-84.

(3-86)

E[ 1k+ lT+ 1] = E[ VkVT] -4gAE[ IVk k] +

8g2A2E[ Vtkv tkT] +4g2trace {AE VkT] } A + 4g2*A

Let Ck = E[ Vt k Vt. Ck can be decomposed as follows.

Ck+1 [i, i] = (1 4gi + 82) Ck [i,i] +(3-87)

n

4g2Xi pCk [i, p] + 412 i* i

p=l

A recursive formulation is now possible for (5k. Let F [i, i] = 1 4gi + 12z2X,? and

F[i,j] = 4Xg2,..

Sk+ = FGk+4g2*X (3-88)

With equation 3-88 a recursive expression is available for .

k+l = k k*+XT[F(k +4pg2 *,] (3-89)

Horowitz and Senne develop convergence conditions which can be seen most easily from

the matrix formulation of equation 3-89. Let .gk = k,O o0 1 ... k,

S= +4 (3-90)

k+l

k+l

4,Lx 1 4x011 ... 1 4g) + 12g 2 ak,

For every entry of F, jj F [i, j] < 1< and gt is real if 0 < 3 This condition on g

max

will guarantee finite mean square error.

Conditions on L for Maximum Convergence Speed and FiniteVariance for Square Error

Feuer and Weinstein [3] have developed an expression for step size which will optimize

convergence speed. The expression is a function of generally unavailable a priori informa-

tion, i.e. all eigenvalues and the initial misadjustment along each of the principal coordi-

nates of the error surface. Given no information concerning k it is assumed that all of

its elements are equal. Horowitz and Senne note that the convergence rate of C is approx-

imately optimized (given no further a priori information) by setting g as follows.

90

d (1-41 + 12.L22 ) =0 (3-91)

dp max max

S3L* -1

6 6 a (3-92)

max

CHAPTER 4

LEAST MEAN SQUARES (LMS) ALGORITHM CONVERGENCE

WITH CORRELATED INPUT DATA

Music signals are not uncorrelated as can be seen from the autocorrelation function for

several musical instruments shown in figure 4.1. To meaningfully discuss step size initial-

ization for an adaptive acoustic equalizer, the understanding of LMS convergence proper-

ties must be extended to the case of uncorrelated input data. Even though the adaptive step

size will be optimized for fast convergence only when a large change occurs in the error

surface, and hence the most recent data sample will be largely uncorrelated with the previ-

ous data, the tapped delay line will have an input vector which, for order n+1, still shares n

components with the previous vector.

Convergence of LMS with Correlated Input

Several mathematicians and engineers have investigated a completely general conver-

gence proof for the LMS algorithm without success. To date no satisfactory proof has been

put forward guaranteeing lim (kh < M, VA < .max. The difficulty in a rigorous proof lies

k -.o max

in the assumption of strong correlation, i.e. VT 2 0, E [XtXt +] > 0. The theoretical work

done in this area is mathematically sophisticated. Fortunately, Macchi and Eweda [23]

have provided a discussion of the theoretical contributions to convergence problems which

are here summarized.

Lyung [24] has demonstrated convergence almost everywhere for Wk under the condi-

tion that the step size is a decreasing sequence tending to zero. He creates a non-divergence

criterion by erecting a suitable barrier on Vk which will always reflect Vk onto a random

compact set. However, he has not shown that lim Wk = W*. Daniell [25] has shown that

k-oo

Scan be made arbitrarily small by choosing p. sufficiently small. However he must use the

assumption of uniformly asymptotically independent observations. In addition he

makesthe restrictive assumption that the conditional moments of observations, given past

observations, are uniformly bounded. This condition is not satisfied even for the case of

1.0

0.4

0.6

0.2 0.4 0.6 0.8 1.0

I (msec)

figure 4.1 The autocorrelation functions are given above for several

different instruments [6]. a) organ music (Bach);

b)symphony orchestra (Glasunov); c)cembalo music

(Frescobaldi); d) singing with piano accompaniment

(Grieg), and e)speech (male voice).

Gaussian input. Farden [26] has found a bound on by making the reasonable assumption

of a decreasing autocorrelation function. He further makes use of reflecting barriers for Vk

to keep Vk in a compact set. Macchi and Eweda [23] find a bound on by assuming only

blocks of M samples are correlated and all moments are bounded. They do not make use

of reflecting boundaries.

The proofs referred to above have assumptions which are representative of acoustic sig-

nals. Acoustic signals do generally have decreasing autocorrelation functions, and for a

0-

sufficient delay, may be assumed to be uncorrelated. In order to determine the step size

which will minimize 1 at the beginning of a quasi-stationary segment of data, the work

of Horowitz and Senne, and Feuer and Weinstein is extended without the assumption of un-

correlated input. The mathematical expressions which result are simplified, and conditions

on g for maximum convergence speed for are found. Simulations are subsequently car-

ried out for different degrees of correlation, step size, and filter order.

Conditions on ti for Convergence of (

Adding correlation to the input data significantly complicates the expression of C for

the LMS algorithm. The approach developed below formulates a recursive expression for

Vk in terms of Vo. The expression is substituted into the equation for ( (equation 3-79).

With the assumption that ek is small, cross terms including ek can be ignored. This re-

sults in a matrix equation for in terms of the initial conditions of the algorithm. The ma-

trix norm is investigated numerically as a function of gI. These results are then compared

with large ensemble averages of C computed with the LMS algorithm. To begin, recall the

expression for the LMS algorithm (equation 3-86) formulated in the translated coordinate

system.

Vk+1 = (I- 291XkXT) Vk + 2e*kXk (4-1)

Recursively formulate Vk in terms of Vo using equation 4-1.

k (term 1)

Vk = 1 [I- 2XkjXT_ ]V +

j=1

k-1 k-i' (term 2)

2g [I -2gXkjXT _] (e i_1Xi_1)

j = 1 = 1 (4-2)

2ie*k- lXk- (term 3)