A NEW ADAPTIVE ALGORITHM FOR THE REALTIME EQUALIZATION
OF ACOUSTIC FIELDS
BY
JEFFREY JAMES SPAULDING
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1992
Dedicated to
my Lord Jesus Christ
through Mary, Queen of Virgins,
in partial reparation of sins.
ACKNOWLEDGMENTS
I would like to express my gratitude to my advisor, Dr. Jose Principe, for his encouragement and support through the course of my studies and research. I would also like to express my gratitude to my supervisory committee members, Professor Taylor, Professor Childers, Professor Green, and Professor Siebein.
My work would not have been possible without the help and support of the students of the Computational Neuroengineering Laboratory. I am deeply appreciative. In addition I would like to express my thanks to Dr. Richard Henry and Mr. Phil Brink of the United States Air Force for their constant support.
I would like to thank my friends and family. It has been difficult to have not had the time to devote to the people I love. I appreciate their understanding. I wish to especially thank my grandmother, Mrs. Ruth Moore, whose bequest made my studies possible.
Finally, I wish to thank my mother and father without whom I would never have had the strength or courage to complete my program.
TABLE OF CONTENTS
ACKNOWLEDGMENTS .i AB STRACT. vi CHAPTERS
I BACKGROUND . 1
Introduction. 1 Basic Physics of Room Acoustics . 4 Psychoacoustics . 23
Consequences of Room Acoustics and Psychoacoustics
in the Solution of the Inverse Problem . 34
2 AN EQUALIZATION STRATEGY FOR REALTIME
SELFADJUSTMENT. 39
Introduction . 39 StateoftheArt Equalizers. 41 The AdkinsPrincipe Equalizer Architecture . 48 Automatic Adjustment Using Adaptive Flters . 57
3 LEAST MEAN SQUARES (LMS) ALGORITHM
CONVERGENCE WITH UNCORRELATED INPUT 63
The WienerHopf Solution. 64 The Gradient Descent Algorithm . 68 The LMS Algorithm with Uncorrelated Input . 84
4 Least Mean Squares (LMS) Algorithm Convergence
WITH CORRELATED INPUT. 91
Convergence of the LM4S Algorithm with Correlated Input . . 91
Conditions on g. for Convergence of . 93 Conditions on g. for High Convergence Speed of . 101 Experimental Results . 102 Conclusions . 108
5 DETERMINATION OF kmax WITH A DIVERGENT ,
LMS (DLMS) Algorithm . 118
Determination of Xmax with a Divergent Gradient
D escent Algorithm . 115
Parameter Selection for the Divergent Gradient Descent
(DGD) Algorithm . 118
Determination of Xmax with a Divergent LMS Algorithm . 120
Parameter Selection for the Divergent LMS Algorithm . 128
Integration of the DIMS Algorithm into the Equalizer Architecture. 139
Summary and Conclusions . 143
6 Validation of Concepts . 146
Test Plan . 146 Test R esults . 158
7 CONCLUSIONS AND RECOMMENDED
EXTENSIONS OF RESEARCH . 199
R eview . 199 Recommended Extensions to this Research . 205
APPENDIX
ECVT Plots of Contiguous Epochs of Audio Data . 208
R EFER EN C E S . 218 BIOGRAPHICAL SKETCH . 220
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
A NEW ADAPTIVE ALGORITHM FOR THE REALTIME EQUALIZATION OF ACOUSTIC FIELDS
By
Jeffrey James Spaulding
May 1992
Chairman: Dr. Jose C. Principe
Major Department: Electrical Engineering
This dissertation presents a solution for the problem of acoustic equalization. Acoustic data are collected from a microphone at a listener location, and compared with a source (CD player). An adaptive signal processing algorithm, based on the Least Mean Squares (LMS), warps the CD signal to account for the filtering effects of the listening room. The algorithm to adapt the coefficients of a multirate equalizer is computationally efficient and can be performed in realtime with current microprocessor technology.
As music is nonstationary, the LMS algorithm will need to undergo rapid convergence to a new set of optimal filter coefficients each time the input signal statistics change. As the LMS is a gradient descent algorithm, fast convergence implies a step size selection which operates at the algorithm's edge of stability. An analytic expression exists to determine the step size for uncorrelated input data as a function of the maximum eigenvalue of the input data. This dissertation extends this work to cover
the case of correlated signals such as music. This result provides verification of heuristic rules that have been proposed in the literature.
To date, formulations involving the maximum input signal power have been utilized as an estimate of the maximum eigenvalue. In this dissertation a new method of determining the maximum eigenvalue is proposed by allowing the LMS algorithm to diverge. For a large and consistent domain of initial condition, iterations, and purposely divergent step sizes, the maximum cigenvalue dominates the rate of algorithm divergence. Simulations were pursued to determine the bounds of these variables for robust operation. This methodology was utilized to analyze the performance of the adaptive, equalizer. on. selected music epochs, and to validate the theory put forward in this dissertation.
CHAPTER 1
BACKGROUND
Introduction
For millennia architects have designed structures with the intent of minimizing distortion of acoustic signals propagating from a speaker or a musical instrument to a listening audience. From the Greek amphitheaters of antiquity to the modem concert halls of today technologies developed over the centuries have resulted in sophisticated techniques which have greatly improved acoustic fidelity. Only in this century has technology been sufficiently advanced to approach the problem of acoustic distortion from the perspective of equalization. Equalization anticipates the filtering introduced by the enclosed space and preprocesses the acoustic signal to take these effects into account. In this way when the acoustic signal arrives at the listener, the effects of the equalizer filtering and the room induced filtering cancel each other.
The revolutionary development of the Compact Disk (CD) player, first introduced to the market in 1982, has completely changed the direction of audio engineering. Research in audio engineering is being directed towards creating alldigital audio systems. )&rth the tremendous improvement in microprocessor speed, audio signals can now be processed digitally in realtime. The research presented in this dissertation will provide a design for an equalizer which will be capable of realtime selfadjustment. The design will be focussed on providing a system to perform equalization in a normal home listening room with standard audio components. It will be based on a new equalizer architecture developed by Atkins and Principe at the University of Florida [1], which uses a multirate filter bank. The Atkins Principe architecture is enhanced so that filter coefficients in each band of the filter bank will be continuously updated in realtime, with coefficient updates being provided using principles from adaptive signal processing.
In chapter two background information will be provided on the current stateoftheart of acoustic equalization. Background information on room acoustics, psychoacoustics, and the stateoftheart of equalizers will lead to conclusions regarding the limitations of acoustic equalizers. An improved strategy of adaptive equalization based on the AdkinsPrincipe equalizer architecture is pro
posed. Chapter two will discuss in detail the theory and operational characteristics of the AtkinsPrincipe architecture. It will be shown to be an elegant and extremely efficient implementation strategy. Chapter two will also discuss the relevant considerations in choosing an adaptive algorithm for realtime updating of filter coefficients. On the basis of algorithmic complexity and the speed of the current generation of microprocessors the Least Mean Squares (LMS) algorithm will be justified as an appropriate algorithm for this application.
The LMS algorithm can be described as a stochastic gradient descent method which converges due to its timeaveraged behavior. For fast convergence speed and low misadjustment in a nonstationary environment the algorithm must have an adaptive step size which can be precisely controlled. When the statistics of the acoustic signal change dramatically, equalizer settings will be incorrectly set and the adaptation algorithm will need to converge as quickly as possible to the new optimal equalizer settings in the least mean squares sense. LMS step size is the key parameter to adjust in order to maximize convergence speed. The theory guiding the optimal value of step size is wellunderstood for uncorrelated input data [231. Chapter three will present the theory of LMS convergence for the case of uncorrelated input data. The theoretical structure of the WienerHopf formulation, the stochastic gradient descent algorithm, and the LMS algorithm with uncorrelated data will be discussed to provide the necessary framework for the more complicated case of the LMS with correlated input data. Chapter four will extend the theory of the LMS algorithm for the convergence properties of mean square error for the case of correlated input data. This theory is necessary to understand adaptation performance for acoustic data which by nature are correlated signals.
The divergence properties of the stochastic descent algorithm and the LMS algorithm are discussed in chapter five as a simple extension of the convergence properties examined in chapter three. The results of this study indicate that the maximum eigenvalue dominatesa diverging gradient descent algorithm. It will be further shown that the optimal value of the step size, after a change in the statistics of the input data, can be wellapproximated as a function of the maximum eigenvalue of the vectors input into the filters performing adap
tive equalization. The standard technique for estimating Xmax is to assume that it is approximately equal to the power in the input vector times the filter order. From systems theory we know that the power is the sum of all eigenvalues, and thus, is less than ?.max* This estimation is excellent under certain restricted conditions, e.g. low filter orders and high eigenvalue spreads. No such strong statements can be made more generally of this approximation technique., A new method of approximating the maximum eigenvalue of a system by allowing an adaptive gradient descent algorithm to diverge is introduced in chapter five. The method is first developed for the stochastic gradient descent algorithm which is based on the statistical properties of an ensemble of independent, identically distributed systems. The gradient descent algorithm will provide a theoretical framework from which to analyze the more complicated LMS algorithm which is based on the timeaveraged properties of a single system.
Chapter six discusses the experiments used to test the equalizer architecture and the algorithms used for realtime selfadjustment. The NeXT computer with a digitizing microphone is used to collect andprocess data. The software used to simulate realtime equalization with the proposed algorithms is discussed. Finally the test and evaluation results are presented.
Chapter seven summarizes the results of research presented in the dissertation. Specifically it reviews the success of the new architecture proposed for realtime selfadjustment of an audio equalizer. It reviews the theory of the convergence and divergence properties of the LMS algorithm with statistically correlated input data. It reviews the theory and performance of a new variation of the LMS algorithm which provides an estimate of the maximum eigenvalue of the input vectors into the equalizer filters. Finally chapter seven discusses the many areas of this research that could be extended
Because the physics of room acoustics and the principles of psychoacoustics motivate many of the key design characteristics of the acoustic equalizer, the remainder of chapter one will provide background information on the pertinent areas of these fields.
Basic Physics of Room Acoustics
Equalization is an inverse problem, i.e. the adaptive equalizer will seek a configuration such that its response will be the inverse of the roomresponse. This section will develop the most simple features of room acoustics because of the basic understanding it will provide into the nature of the signals and systems on which the algorithms and methods introduced in this dissertation will be applied. As electrical engineers are more familiar with electromagnetic radiation than.acoustic, an effort will be made to develop this introduction to the physics of room acoustics in an analogous manner to the standard method of introducing the theory of electromagnetic waveguides, with the most important similarities pointed out where appropriate. A listening room may be thought of as an acoustic coupler, or communications channel, between an acoustic source and a listener. A further limitation that the room is described as a minimum phase system will be imposed. The description of the acoustic channel is particularly complicated because of the importance of both its temporal and spatial dependency. Because of the wave nature of acoustic fields, the sound energy in an enclosed space may be considered to be the mechanical analogy of electromagnetic radiation in a rectangular waveguide.
A discussion of the basic equations and definitions is first presented. Specifically the pressure equation will be discussed as the principal descriptor of acoustic phenomena. The conservation of momentum equation will be shown to supply boundary conditions for the wave equation. A discussion of the reflection and transmission coefficients are discussed. It is shown that many features from optics have analogous results in acousticse.g. Snell's Law and the Law of Reflection. The prominent factor in determining the reflectivity of a wall is the wall's characteristic impedance. The impedance, together with the room's geometry, define the characteristics of the room. A brief discussion of acoustic impedance is therefore presented. The wave equation for a room is developed in a method analogous to methods used for electromagnetic waveguides. The application of this theory is discussedfor a room which is a rectangular cavity with different types of impedance. Lastly the spatial and temporal response of the room is outlined using methods from ray theory. Much
of the discussion is based on introductory texts of Egan[4], Kinsler et al. [5] and Kuttruff [61.
Basic Equations and Definitions
Sound waves are longitudinal waves of regions of greater and lesser pressure. An
acoustic field is described by a wave equation which expresses the deviation from ambient pressure, p, as a function of spatial location r, and time t. The constant of proportionality between the temporal and spatial derivatives is the speed of propagation of the medium, c.
2 c2vp (11)
at2
For a time harmonic plane wave the solution of the pressure equation is of the same form as the expression for a plane wave of electromagnetic radiation. By separation of variables the pressure equation reduces to the following familiar form, where k is the wavevector.
P (r, t) = poe i(t kr) [Kg (12)
The velocity of the vibrating particles, v, is related to the pressure field according to the conservation of momentum. The density of the medium is represented by P.
Dv (13)
VP aBy solving the above equations we can relate the pressure field to the velocity field by an acoustic impedance, z = PC.
V (r, t) = Voe i(otkr) [M] (14)
p (r, t) = pcV(r, t) = zV(r, t)(1)
(15)
Sensory equipment, including the human auditory system, responds to the intensity of the field. In the case of acoustic data, this corresponds to the average energy flowing across a unit area normal to the direction of propagation per unit time.
V 2
PI IMI (16)
In this discussion pressure waves will be considered to be plane waves in order to simplify the mathematical treatment of the phenomena which win be discussed. Note that for an acoustic point source the phase fronts would be spherical in an analogous manner to electromagnetic point sources. The plane wave approximation is only valid when the radius of curvature of the phase front is very large compared to the wavelength of the radiation. We will assume in this discussion that the dimensions of the rooms being considered are sufficiently large that this approximation is valid for frequencies over 1000 Hz. The room under consideration will be a rectangular cavity. These highly idealized cases are studied so that the important features of room responses are made clear. Before room responses can be meaningfully discussed, the reflection of acoustic waves off a wall must be wellunderstood.
Reflection of a Plane Wave by a Wall
The electromagnetic equivalent of reflection off a wall is a plane electromagnetic wave reflecting off an infinite flat plate of complex impedance. Like the electromagnetic equivalent the media impedances determine the behavior of the wave for a given geometry. Consider the situation depicted in figure 1. 1. Let pi be the incident pressure field, let Pr be the reflected field, and let pt be the transmitted field, with corresponding wavevectors ki, k, and kt.
Since we are assuming plane waves we can drop the ydependency from our equations without loss of generalization. Recall that a wavevector K is defined as K = Wc , where c is the velocity of propagation in a given medium, and (0 is the radian frequency. Let co be the speed of propagation in air at standard temperature and pressure (STP) and let
[x
figure 1. 1 The Law of Reflection and Snell's Law are valid for acoustic
radiation as well as electromagnetic radiation. The figure
above indicates the coordinate system and nomenclature
which will be used throughout the discussion of room
acoustics.
n be the ratio of the speed of propagation of a medium from that of air at STP, i.e. n = co/c . Let ux, uy. and uz be the unit vectors of our coordinate system as depicted in figure 1. 1. Then the incident, reflected, and transmitted wavevectors are expressed by equations 17 through 19, where I1=nK0 and E) is the angle between the wall normal and the wavevector.
By matching the pressure field at the wall, the reflection and transmission coefficients can be calculated. The pressure waves propagating along the wavevectors given in equa
Ki = nlK0 [ uxcos i + Uzsinei] (17)
Kr = nlK0 [uxcOSOr +u zsinEr] (18)
Kt = n2K [ uxcos t + UzsinEt] (19)
tions 110 through 112 are shown below.
Pi Pi, 0einlK [ XcosO + ZsinOG] 1k1 (110)
in IKo [XcosOr + ZsinOr]
Pr Pr,0e Ukr (111)
Pt Pt, 0e f2Kï¿½ [XcosEGt+ZsinOE] (112)
The pressure field must be continuous at all point along the boundary, i.e.
Pi, interface + Pr, interface = Pt, interface . These equations yield the Law of Reflection and Snell's Law precisely in the form as the electromagnetic equivalent.
nlK sin(i = nlK sinEr = n2K0sinEt (113)
sinO. = sine Law of Reflection (114)
1 r
nl sin5i = n2sinet Snell's Law (115)
In addition, the conservation of momentum (equation 13) must be satisfied at the wall. This will imply a matching of particle velocities at the interface. Note that because of the plane wave assumption only the derivative with respect to x is nonzero in grad P. The venlK incKo [Xcos Oi + ZsinO,] (116)
Vi =  cosE).iP. 0 e k
n 10K inIKo [Xcos Or + Zsin OrVr = t o s E) UkcSr Pr, 0e kr (117)
n2Ko in2K0 [Xcos 0 + Zsin EM
Vt P o tP Oe Ukt (118)
t j t 2t,0
locity field propagating along the wavevectors given in equations 17 through 19 is described by equations 116 through 118, shown above.
By a proper choice of the coordinate system, i.e. for the wall located at x=0, and by an application of Snell's Law and the Law of Reflection, the phase related terms in V can be ignored, as they are common to Vi, Vr, and Vt. Matching the particle velocities at the interface yields the following expression for the reflection coefficient.
Pi, 0cos9i _ Pr, 0cos(r _ Pt 0cosOt (119)
z1 z1 z2
z2Pi, 0Cos iZ2Pr, ocosEr = Zl (P, 0 + Pr, 0) coset (120)
P,0
0 (z1 cose + z2cosEi) = z2coseiz1 cosEt (121)
Pi, 0 t+Z 20(iz
R(,) Pr, 0 Z c E z2csi  z IcsO t22)
R Ec)=  (I2
Pi, 0 z2cose(i +ZIcoset
If we assume medium one is air, i.e. z 1 = poCo, and medium two is a hard wall, i.e. Z2 z1 , the above equation reduces to the form seen in most elementary books on acousz2cosE)i  P0C0
tics, R cosi+ 0C The transmission coefficients are easily derived from the reflection coefficients as follows.
(123)
Pt (Pi + Pr) = 1 +R = 2z2cosEi
i Pi Zcos z2cO + Zl czsEt
With the expressions for R and T it is clear that the impedances of the materials are the key acoustic parameter of the material, and it is this parameter which determines a room's acoustic properties.
Acoustic Impedance
Acoustic impedance can be separated into a real and an imaginary part. The real part is the acoustic resistance, the component associated with the dissipation of energy. The imaginary part is the acoustic reactance, the component associated with temporary storage of energy. These components are made explicit in the expression shown below.
z (E) o) = P=R+jX [m Kg (124)
The acoustic reactance can be further factored into an inertance and a compliance term. The acoustic inertance term is defined as the effective mass of the element under consideration divided by surface area, M =  with units of Kg/ni2. The acoustic compliance is s2 dx
defined as the volume displacement induced by the application of a unit pressure, C = in units of m4sec2[Kg. A mechanical interpretation of a locally reacting wall is shown in figure 1.2.
z (E,co) =R j oMZC (125)
Let pressure correspond to voltage, velocity to current, resistance to resistance, compliance to capacitance, and inertance to conductance. Then an acoustic system can be replaced by an equivalent electrical system. It is precisely the behavior of these types of circuits that we wish to equalize.
figure 1.2 The mechanical analogy or room impedance is illustrated by
in the figure by a spring for the reactive term and a damper
for the dissipative term.
Resista uc Resistance AAA
Inductance  Inertnce VoltagePressure
Figure 1.2 Condupliance
figure 1.3 A room may be modelled by its electrical circuit equivalent
as shown above.
General Theory of Room Responses
The solution to the wave equation in an enclosed space is a familiar eigenvalue problem. The equation reduces to a Helmholtz equation with boundary equations as shown below. The homogenous part of the Helmholtz equation provides a complete set of orthogonal eigenfunctions and associated eigenvalues, 4n (r) and Xn.
M
B.,
.
I
V2P + K2p = iopq(r) (126)
Pi, interface + Pr, interface = Pt, interface (127)
VP dV iOPp
Vp dV  _ (128)
As a result our source distribution, q(r), can be expanded as a sum of eigenfunctions.V is the enclosed volume of the room.
qCr) = Cn 1yn(r)
(129)
n
Cn = fffq (r) VndV (130)
The unknown solutions to the inhomogenous Helmholtz equation can also be expressed as a summation of eigenfunctions.
P. (r) = IDnVn (r) (131)
n
Substitute the summations for P, (r) and q(r) into the inhomogenous Helmholtz equation and solve for the unknown { Dn } in terms of the known { Cn) as shown below.
XDn[V24fn (r) + K2Vn (r)1 = iowpICnfn (r) (132)
n n
The equation has a simple solution for a point source located at r0, i.e. q(r) = Q5(rro).
P (r) = iQp Vn (r) n (rO)
n J2(K Kn
Note that Kn it the wavevector associated with X. P. (r) is recognized as the Green's function for the room. The Green's function expresses the acoustic transfer function between two points in the room. Note from the Green's function that acoustic fields satisfy the Reciprocity theorem just as in the electromagnetic analogy. The Green's function can be extended to a source, S, emitting a continuum of frequencies as shown below.
00
S (t) f JQ (,) etd(O (134)
00
P (r) f JP(0)eOtdo (135)
00
Equations 133 through 135 provide a complete wave theoretical description. A discussion of the implications of this formulation for a rectangular room are now in order. Simple Rectangular Rooms
This section will apply the general wave theory to an empty rectangular cavity characterized by a single impedance. While this example does not represent the complexities of actual rooms, it nevertheless provides insight into the features of room responses with a minimum of mathematical complexity. The connection between the reflection properties of walls and the general wave theory will be made explicit through wall impedances. Three classes of impedances will be considered. Consider the room depicted below.
The wave equation for this geometry is most easily applied in the rectangular coordinate system. The room eigenyalues are found by solving the homogenous Helmholtz equation shown below.
a_ ___ I I 2 T=0(136)
The equation is easily solved by the technique of separation of variables. Let
TP (x, y, z) = Vi (x) 'V (y) iy (z) . Substitution of this expression into the wave equation
figure 1.4 A room with dimensions Lx,Ly, and L, and walls with
uniform impedance will be evaluated using the wave
theoretic formulation.
leads to three ordinary differential equation plus the separation equation as shown below.
d2 2
d2 (x) + K2 = 0 (138)
dx
d 2 2 (139)
d i (z) + K = 0 (9
dz 0
K2 K+K K (140)
The solution of these equations is trivial.
x (x) =A1 eikxx + A2eikx (141)
(y)= BleikyY + B2eikyy (142)
I (Z) = Ceikzz + C2eikzz (143)
y
041y
The constants A1,A2,B1,B2,C1, and C2 are determined by matching boundary conditions. The solution of three specific examples are outlined. Case . Re[z]O and Im[z]=very large.
The method of solution of the three ordinary differential equations is identical. Consider the equation in x. As a boundary condition guarantee the conservation of momentum is satisfied at the walls (x=O and'x=Lx).
_pdV MOP iCOp
Vp P=*_9dp+ = 0 (144)
V  z dx z
dP
As limIzI  , + 0. The boundary conditions are satisfied as follows.
d =>A = A (145)
dxy~ 'x= iKXA I iKxA2= 1 A2
d iKxal (eiKxLx _ eiKxLx =2Kxay sinKxLx Kx an
(146)
a is any integer. The two boundary conditions have been used to solve for A2 and Kx. Similar solutions are found for B2,C2,Ky, and Kz. Substituting these results in the expression for T (x, y, x) and the separation equation leads to a solution of the room eigenmodes and eigenfrequencies as shown below.
T'(x,y,z) = Dcos _=cos cos (147)
Lx LY,
f(a,,Y) =cK  2 () + )+ ( ) (148)
The constant D=A1B1C1 is determined from an initial condition. case ii. Re[z] = 0.
Satisfy the new boundary conditions at x=0 and x=Lx. A solution for A2 and Kx can be found by solving the two transcendental equations or by using methods form the theory of
d x x (149)
dWtxVIx= 0 = X + iKx) +A ( ~ ) (9
d A Kpc.iKxLx iKxLx Kpc iKxLx K iKxLx
"x = Ix x 2('" x+iKx" ) +A(  x )
(150)
conformal mapping. If X < 0, the corresponding eigenfrequencies are lower than for the hard wall case (Im[z]>>). For X > 0 the corresponding eigenfrequencies are higher than the hard wall case. Because the impedance is completely reactive, the room will continue to support standing waves, but with phase terms included in the sinusoidal terms of the eigenmodes.
csii. Arbitrary z.
The form of the boundary conditions are the same as case ii with X replaced by z. Eigenmodes can not support standing waves because of the lossy component of z represented
be(X)
by e ();
(151)
(X t (Yt Z ) COSDR (Z
v(x,y,z) = De (xrsz) c+x)Cos +y " + )
LX y LY YLz
It is clear from matching boundary conditions in the three cases outlined above that the room impedance is key to understanding the behavior of the room. By using the eigenmode expansion technique and the inhomogenous Helmholtz equation the pressure field can be determined for any location in the room. An initial condition is also required in order to evaluate D. The eigenfrequencies indicate those frequencies at which the room has resonant properties. Note that the number of eigenfrequencies increase as the cube of the upper frequency limit of the source. The wave theoretic technique provides a complete description of the acoustic signal in an enclosed space, and provides a solid theoretical foundation for the field of acoustics.
Temporal ProjMrties of a Listening Room
Each reflection of an acoustic wave must be described by its delay in arrival from the direct acoustic wave at the listener location, the direction from which the reflected wave arrives at the listener location, and the strength of the reflection. In this section we will discuss the time delay and strength of reflected acoustic energy. The rate at which reflected acoustic energy arrives at a listener location is an important factor in evaluating the quality of a listening room. Ray theory provides a simple and relatively accurate method of determining the rooms's temporal properties. This theory is completely analogous to ray theory from geometric optics, and it will therefore be applied with a minimum of discussion. Like geometric optics, geometric acoustics ignores interference and refraction effects. Nevertheless it is accurate for rooms of ordinary size for acoustic frequency content in excess of 1000 Hz.
If a room is constructed of a series of planes, image sources can be found by successively reflecting the original source and each subsequent image source about the planes in ,a lattice constructed by using the room as a unit cell. The lattice and a cross section plane are shown in figure 1.5. A spherical shell is chosen with inner radius at distance ct and outer radius c(t+dt), where dt is much less than t. The number of sources in the spherical shell corresponds to the number of mirror reflections between t and t+dt. The volume of the spherical shell, assuming dt << t, is 47cc 3 i2dt. Note from ray theory that there is only one source per room. The number of image sources can be found by dividing the volume of the shell by the volume of the room. As t increases the number of mirror sources increases as t2.
Number of Reflections = 47cc 3 t 2 dt (152)
V
As time increases, the reflections have less and less energy associated with them. The higher order reflections lose energy at each reflecting plane, the amount of which depends on the characteristic impedance of the wall. An approximation of how the energy decreases can be made from geometric acoustics by multiplying the initial energy of the direct ray in
X
side the listening room by R2, each time the ray path from an image intersects the room walls or an image room wall.
figure 1.5 Geometric acoustics can be used to explain the temporal
properties of reflected energy in a room. Image sources can be formed by successively reflecting the original source and
each subsequent image source about the planes that are
constructed by using the room as a unit cell.
Figure 1.6 demonstrates the concept of an echogram, a graphic method describing the
temporal response of the room. The height of each line represent the intensity of reflection.
During the discussion of psychoacoustics, the temporal properties of an audio field, which generally indicate a pleasing listening room, will be discussed.
C
degree of shading represents
the number of reflections per unit time
figure 1.6 An echogram indicates that as the time delay from the direct
field increases, the number of reflected waves increase, and
the intensity decreases.
The rate of decay in the intensity field is an important factor in creating a good quality listening room. Recall that the intensity of a wave decreases as a function of the square of the distance travelled, (ct)2 . The intensity of the field will also be decreased due to the attenuating influence of air (or any medium through which the wave traverses). Let a be the efolding time constant of this attenuation. Each time the acoustic wave reflects off of a wall, the intensity will decrease by the square of the wall reflectance. The decay of the intensity of the acoustic wave is summarized in the equation below, where 10 is the intensity at t=O, and n is the number of reflections of the ray.
0= actR2nt~ 1 0  act +ntnRj 2 (153)
ct) (Ct)2
It is not possible to follow the path of each ray and perform the above calculation. Instead the calculation is made by computing the average number of reflections of a ray (n),
assuming the acoustic field is diffuse, i.e. the rays are uniformly distributed over all possible directions of propagation. The assumption of a completely diffuse wave is never achieved in an ordinary listening room, and the attenuation of the reverberant field will have large or small deviations from the simple calculations being outlined depending on the amount by which the actual acoustic field differs from our assumption of diffuseness.
First express the number of reflections per second which occur at walls perpendicular to the xaxis for a ray propagating at an angle E with respect to the xaxis, nx (E).
nx (0) = cose (154)
x
Average over all angles E.
(nx) C (155)
Similar results are obtained for (ny) and (nz). The average number of reflections along all three axes is expressed below where S is the surface area of the room and V is the room volume.
C (1 1 1) _ cS
n = (nx)+(ny (z) n L+ y+ z  (156)
Substituting the above into the intensity equation yields the following.
10 act +  tlnIRI2
I 
(Ct) 2 (157))
The reverberation time is taken as that time at which I(t)/I0 = IXIO6, and the expression for this time is known as Sabines' Law.
T = O.16V 2(8
4aV Sln!R12(8
Spatial Properties of a Listening Room
The spatial distribution of acoustic energy is a key factor in constructing a listening room. A diffuse field can be defined as a field in which the intensity in a spherical shell subtended by a solid angle, dfl, is independent of the direction in which the solid angle is pointed. For the case of a room which is a rectangular cavity with hard walls, no matter how small dfl there exists a time at which the field is diffuse. Of course there are no such walls, and clearly, for a rectangular cavity, the field will not be diffuse. For example, images located in directions along the corners of the room experience more attenuating reflections than the other sources.
Fortunately walls are not perfectly smooth and they partially diffuse the acoustic energy at each reflection. Consider the following simple model. Assume at each reflection the percentage of energy reflected specularly is s. Then I1s represents the portion of energy which is reradiated by the wall diffusely. This is, of course, an overly simplistic model. All incident energy is reradiated by the wall according to a directional distribution function. Nevertheless this model does demonstrate approximately the diffusion process. In the steady state the power present in the total reflected field and in the field reflected specularly can be related to the reflection coefficient, R, and to s.
00
P CI I R 2n (159)
n=
00
nl
The percentage of energy power which has been diffused increases as s decreases as shown below.
00
PP E (1 _ sn) IRI2n d_ s = n = 1
E IRI2n
n=l
Specular Reflection Constant S
(161)
figure 1.7 As the reflection constants decreases and the specular
reflection constant increases, the percentage of diffuse
energy in the room increases.
Even more important than wall roughness are the diffusing effects of room furniture and the deviations of most rooms from a rectangular cavity. Even the presence of people in a room act to diffuse the acoustic field. Furthermore in the discussion of psychoacoustics it will be shown that a pleasant listening environment can be created with far less than a completely diffuse field.
Diffuse Energy = f(RS)
Psychoacoustics
An equalizer will not be capable of exactly reproducing the time varying pressure field recorded in a concert hall or recording studio. The engineering compromises which must be made in designing the equalizer must focus on restoring the most psychoacoustically significant features of the original signal. The design methodology will be such that importance will be placed on maintaining only those features of the waveform which a listener will perceive in an average home listening room. This brief introduction to psychoacoustics is intended to motivate the signal processing objectives of the adaptive equalizer. Specifically, the level to which signal coloration can be accepted without a loss of listener enjoyment is discussed. This is largely determined by the frequency response and frequency resolution of our hearing, and of the masking properties of the signal itself. A description of how the roominduced temporal and spatial characteristics of the signal contribute to a pleasing sound is investigated. Finally the dynamic range of human hearing is described. Stereophony is not discussed as this research does not attempt to process signals to alter their stereophonic properties.
Coloration
In the time domain a system neither adds coloration to or removes coloration from a signal if its frequency response can be characterized as shown below.
h (t) = k5 (t T) (162)
The function implies no change in the shape of the waveform. The output is a delayed and amplified (or attenuated) replica of the input. Performing a Fourier transform of the impulse response function gives the frequency domain restrictions for such a system.
H (co) = ke (3
(163)
The frequency domain requirements for an ideal room are the following:
L The system must have a constant magnitude response, IH ((0) 1 k.
ii The phase response must be linearly proportional to frequency, (CO) = cot.
Violation of L can lead to an unintended emphasis in some frequencies over others in ways not intended by a composer or performer. Violation of ii. gives rise to nonlinear phase. Nonlinear phase causes certain frequencies to pass through the system faster than others.
To accurately reproduce the sensation produced by the original acoustic waveform. it is essential that the sound reproduction system have a flat magnitude response. Obviously some degradation from this standard will not alter the perception of the listener. Nevertheless, the fundamental difference between middle C and any other note is that the energy of the signal is concentrated at 262 Hz and its harmonics. An attenuation of the frequency response at middle C, if other unmasked information is present in the signal, will negatively impact the fidelity of the signal because of the lessening of emphasis on the Clike quality of the sound. More precisely, we are interested in maintaining the pitch and timbre of the signal. Pitch of a musical signal is primarily deterrruined by frequency, but is also influenced by the shape of the waveforin and the loudness of the signal. The frequency dependency is depicted in figure 1.8, where pitch in mels is plotted against frequency, for a signal with a loudness level of 60 phons. Note that a change in frequency has a greater effect on pitch above 1000 Hz than below.
The loudness dependency of pitch can be quite dramatic for pure sinusoidal tones. For music signals the loudness dependency is much less significant. As an example, to maintain the same pitch when increasing the loudness of a musical signal from 60 to 100 phons requires at most a frequency shift of 2%.
If a signal is not a pure tone, but consists of harmonic tones, due to the nonlinear processing of the ear, sum and difference signals are generated. The difference signals determine the pitch perceived by the listener. For example a signal consisting of the first four harmonics of a 100 Hz tone, all of approximately the same intensity as the fundamental, is
perceived as a signal of approximately 160 mels. If the fundamental is removed, the difference signal is unaltered, and the listener still perceives a 160 mel pitch.These effect s
3500
3 0 0 0 . . . . . . I . .
2 5 0 0 . . . . . . . . . .
2 0 0 0 . . . . . . . . . . . . .
ILS O O . . I . . . . . .
1 0 0 0 . . . . . . . . .
5 0 0 . . . . . .
01
3 4 5 6 7 8 9
In(frequancy)
figure 1.8 Pitch is plotted as a function of frequency for a signal with a
loudness of 60 phons [6].
are not quite so dramatic in music signals as the harmonics are usually considerably less intense than the fundamental.
Assume that a series of nonharmonically related frequencies do not mask each other. If we wish to maintain pitch, the relative loudness of the frequencies must be unaltered by the electronic equipment or the listening room. For a pure sinusoidal it is possible to hear changes in loudness of only 0.5 phons [6]. Note also that the perceived loudness is a function not only of the acoustic intensity of the signal, but also the signal fiequency. Figure 1.9 illustrates this effect for two equal loudness contours. For frequencies over I OOOHz., as frequency increases, the acoustic intensity must increase to maintain the same loudness level.
Surprisingly, controlling the phase response has no appreciable effect on maintaining pitch. The phase response, except in extreme cases, cannot be perceived by listeners hearing music signals. Thus while a lack of phase distortion is required for the shape of the
wave to be unaltered by a system, the same requirement is not generally necessary for a signal to be psychoacoustically equivalent. The results of Suzuki, Morita, and Shindo[7] indicate that the effects of nonlinear phase on the quality of transient sound is much smaller compared to the effect one would expect from the degree of change in the signal waveform.
70
.
60 dB contour
6 0 . . . . .
s o . . . . . . . .
4 0 . . . . . . . . . . . . .
3 0 . . . . .
. . . . . . . . . . . . . . . . . . . . . .
10
0 dB contour
10
0 2000 4000 6000 8000 10000 12000 14000 16 00
frequency [14Z]
figure 1.9 Equal loudness contours are presented as a function of
frequency and acoustic intensity (5].
Hansen and Madsen[8] found similar results. In their experiment a signal had all of its frequency components shifted by 90' in phase, producing the most severe distortion possible in the shape of the waveform while maintaining the magnitude response. Only for fundamental frequencies below 1000 Hz was a distinctive difference perceived by the listeners involved in the test. The relative insensitivity to phase will be seen to be critical to the operation of an adaptive equalizer.
Maintaining signal timbre is also a basic requirement in a good sound system. Those features of the signal which are important to the timbre of sound are poorly understood, and the study of timbre is still an area of active research. The transient behavior of an instru., ment being excited effects the initial behavior of the pressure field. The transients effect the amount of energy present in each of the overtones. This is an important factor in deter
The masking effects of a 400 Hz signal are demonstrated above. The region of the basilar membrane which responds to the 400 Hz waveform will raise the audibility threshold of higher frequencies which stimulate the same region of the basilar membrane [5].
figure 1.10
mining the unique sound of a particular instrument. Amazingly, our hearing system is able to pick out these cues even when many instruments are orchestrated together.
Perhaps the most problematic factor for which to account is the level of masking caused by certain frequency components of a waveform on others. Masking is the degree in which the threshold of audibility of a signal is raised for nonharmonically related frequency components in the same signal, or in a complex tely separate waveform. Masking occurs when one signal has frequency components which are close to the frequency content in another signal. It has been clearly demonstrated that lower frequency components more effectively mask higher frequency components than vice versa. These effects can be explained by physiological phenomena. Lower frequencies produce harmonics which stimulate the region of the basilar membrane sensitive to higher frequencies, thereby masking the sound of higher frequencies. Figure 1. 10 shows the masking effect of a 400 Hz sinusoidal at 60 dB. The threshold increases most dramatically in the region slightly above 400 Hz.
I
4000 4500 500(
500 1000 1500 2000 2500 3000 3500
asked fre uen Q4ZI
Determining exact design criteria to guarantee a true reproduction of pitch and timbre is made difficult by the complicated dependency on signal frequency content, loudness, masking, and the transient behavior of the energy distribution in the harmonics of instruments. For example, if most of the time signals above a certain frequency are masked by lower frequency signals, it would not be important to control the attenuation of these regions of the spectrum. The simplest analysis assumes the worst case scenario in which masking does not occur, regions of the spectra are of a loudness just at the audibility limit, and frequencies must be controlled at a level at which any error in loudness would be below the 0.5 phon detection limit. To maintain timbre, we wish to be able to control transients in the signal caused by the excitation of instruments. Fortunately the phase of the signal will not have to be carefully preserved because of the relative insensitivity of our hearing to these effects.
Practical implementation of these requirements is possible only if the frequency resolution of hearing is sufficiently large that filters of reasonable order can carry out the necessary inverse filtering. Frequency resolution for two sinusoids is known as the difference limen. It has a relatively constant value of 3 Hz below 500 Hz. Above 500 Hz frequency resolution decreases approximately as a linear function of frequency (see figure 1. 11).
Af = 3 Hz for f < 500 Hz (164)
0.0030f for f > 500 Hz
Clearly it will be necessary to control pitch and timbre with a much finer degree of resolution at low frequencies than at high.
The human hearing organ can be described as a parallel filter bank with characteristic bandwidths. Fletcher and Munson [9] found that the detection threshold of a tone in broadband noise is not affected by the noise bandwidth until it falls beneath the "critical bandwidth", which are shown in figure 1. 12.
Psychoacoustic Responses to the Temporal Aspectsof Reflected EneM
Reflections under certain circumstances serve to enhance the quality of sound by adding to the direct wave in a way which is reinforcing. For musical signals the reverberation of the room when within certain bounds gives a pleasing effect. Under other circumstances
8 1 1.2 1.4 1.6 1.8 2
frequency [Hz] X104
figure 1. 11
The difference limen gives the frequency resolution of two sinusoidals. An acoustic equalizer will need to control the frequency response of low frequency bands with a finer resolution than high frequency bands [5].
reflections can cause highly undesirable effects. Although many effects can be explained partially through the physiology of hearing, perhaps the most important factors, at least in the reproduction of music, are the types of environments with which we have been habituated and from which our aesthetic senses have been influenced.
The discussion of the temporal effects of sound fields begins by finishing our discussion of coloration. Strong, evenlydistributed echoes can cause serious degradation in the accurate reproduction of an acoustic wave's spectral distribution. Consider a room with the following simple impulse response.
00
I an5 (t  nTO)
n = 0
(165)
1 OFOO 10000
frequency (Hz]
Because our hearing can be described as a parallel filter bank, acoustic equalizers are designed to simulate the critical bandwidths . High quality equalizers are designed to have 1/3 octave bands because of their similarity to critical bandwidths, as seen above.
figure 1. 12
The power spectral density indicates equally spaced resonances.
IH (f) 12 = 1 2
1  2acoscoTo +a
(166)
Figure 1. 13 illustrates the frequency response of the room for different echo periods, To. As To becomes larger the frequency components become less separated. Because of masking effects, the threshold, at which the spectral distribution of the audio signal is perceived to change, decreases as To increases.
Figure 1. 14 gives results for the threshold level at which an average listener perceivesnoticeable coloration induced by a reflected wave. These measurements are based on lis
tening tests of six different music selections.
I frequency [1/To Hz]
figure 1.13 As the echo period, To, increases, the frequency components
are more closely separated. Due to masking, coloration
becomes less audible.
In addition to our sensitivity to the frequency domain, our hearing is affected by time domain properties of reflected waves. Reflected energy which is just barely perceived acts to increase the loudness of the signal and to change the apparent size of the source. At a higher loudness and a small delay, the perceived source direction moves. For a large delay the reflected energy is perceived as a highly annoying echo. Figure 1.15 shows the intensity at which a distinct echo is perceived as a function of time delay.
If the delay is sufficiently short, the reflected wave can be 10 dB higher than the direct wave without causing echoing. This phenomenon is know as the Haas effect after the discoverer, and is an important factor in the design of auditoria [10].
Acoustic clarity is a measure of our ability to resolve fine temporal structures in acoustic waves. The faster the tempo and the higher the degree of attack on the instruments, the higher is the necessary degree of room clarity. Many measures have been suggested. All of them rely on determining the amount of acoustic power over a period of time representing "early" reflections divided by a normalization. One index used for determining the clarity of rooms for musical signals is given in equation 167. A value of C = 0 dB provides
0
dB
5
10
20 40 60
echo delay [msec]
80 100
As the echo delay decreases, the reflectivity of the room surfaces must decrease in order to avoid a perceptible change in the coloration of the acoustic signal. For long delays, relatively intense echoes may exist without causing noticeable coloration [5]. This figure shows the level of l~log(a) at which lesteners perceive coloration.
As the time delay of the echo increases, its intensity must decrease in order to avoid an annoyingly distinct sound. The above figure shows the threshold intensity for hearing distinct echoing [5].
figure 1.14
figure 1.15
 delay of reflected wave [macc]
.
sufficient clarity for fast staccato passages. A value of 3 dB is usually acceptable for symphonic music.
 80msecs
f h 2 (t) dt
C = 1010g . 0 (167)
00
f h 2 (t) dt
 80msecs
Reverberation time of a field is a measure of how the room damps out an acoustic
source. It is usually determined by measuring the time required for the field intensity to decrease by 60 dB from the moment the acoustic source is turned off. The listening room's reverberant field acts to mask the details of a music performance which listeners find unpleasant, e.g. a minor lack of synchrony in an orchestra. The ideal reverberation time depends on the type of music being performed, and on local styles and current fads. For chamber music a reverberation time consistent with the rooms in which the compositions were originally performed is preferred, which is from 1.4 to 1.6 seconds. For symphonies from the romantic period or pieces utilizing large choirs, a reverberation time of over 2 seconds is preferred. For opera a shorter reverberation time is preferred so that the libretto may be more easily understood. As an example, the La Scala opera house in Milan has a reverberation time of 1.2 seconds [I I].
Psychoacoustic Response to the SRatial Aspects of Reflected Energy
For a room to have a good acoustic ambience the listener must have a sense of spatiousness. Originally researchers believed that spaciousness was caused by hearing spatially incoherent fields. It has been shown, however, that this gives rise only to a phantom source direction, and does not provide a sensation of spaciousness. The following conditions are required.
i. The field must be temporally incoherent.
ii. The intensity of the reflected energy must surpass the audibility threshold.
iii. The reflected energy must have components with time delays of less than 100 msecs.
iv. For early reflections, the field must be spatially incoherent, with components
arriving from the lateral directions being of primary importance.
These requirements are generally met in large rooms, but are increasingly difficult to ensure in smaller and smaller rooms. It has been shown [6] that spaciousness is independent of the style of music being performed.
The dynamic range of human hearing is 140 dB. If our hearing were more sensitive it would not be beneficial as we would start hearing the Brownian motion of air molecules. In processing music digitally, ideally a sufficient number of bits are available to ensure coverage of the entire dynamic range of musical works at a resolution consistent with our physiological capabilities. The standard CD format provides 16 bit encoding of amplitude information (approximately 90 dB). Equalizer technology should aim to provide no further degradation in dynamic range than the limit defined by the CD encoding.
Consequences of Room Acoustics and Psychoacoustics in the Solution of the Inverse Problem
Several important inferences can be drawn about potential solution methods to the room inverse problem from the discussion of room acoustics and psychoacoustics. Two obvious approaches are available as solution methods: 1) a theoretical modelling of the physics of the listening room acoustics, and 2) experimentally measuring the characteristics of the room.
The first approach has been briefly reviewed in the discussion of room acoustics. The key acoustic parameter, complex acoustic impedance, must be carefully measured for each material present in the listening room. Using these impedances and the usually complex geometry of most home listening rooms, the Helmholtz equation must be solved. Note that matching boundary conditions will be impossible to perform analytically, and will require a numerically intensive calculation using a finite elements algorithm. As a result of the computations, a set of room eigenvalues and eigenmodes will be determined, from which the room's Greens function is determined. Even for the case of a simple rectangular room,
the number of room eigenmodes increase as the cube of the upper frequency limit of the source. For a room of average dimensions, i.e. Lx = 5 m., Ly = 4 m., and Lz = 3.1 m., between 0 and 20,000 Hz. there are over 57,000,000 [6] resonant room modes. As an additional complication, except at low frequencies, the modes are extremely dense in the frequency domain, i.e. the halfwidth of a mode is much greater than the separation between modes. Thus the majority of modes are mutually coupling.
The aforementioned calculations are so computationally burdensome that they can not reasonably be performed. Even if a complete set of modes were available, from which the Greens function inverse could be calculated, the processing problem would not be feasible. For the simple room geometry discussed above, there were 57,000,000 room modes, and an FIR filter length of 100 million tap weights would be required. Realizing that implementation of such a large filter is impossible, a selection of the most important room modes would need to be calculated. Because of the mutually coupling nature of the modes, and the changing spectral distribution of the acoustic source, the "important modes" would be constantly changing.
The rigorous eigenmode expansion of the room response, the subsequent calculations of the room's Greens function, and implementation of the"inverse on hardware, can not be performed. As a simplification in the computational complexity of describing a home listening room, a geometric acoustics approach could be attempted. The previous discussion makes clear, however, that geometric acoustics gives insight into the time average properties of the listening room. Depending on the nature of the acoustic source, the approximations could yield poor results. In addition, the computational complexity of characterizing the complex acoustic impedances for the room materials and the complicated geometry of the listening room, continue to result in a huge computational problem. Figure 1. 16 summarizes the rationale for rejecting approaching the inverse room problem by modelling the physics of the room. 1
The experimental approach must be selected, and this approach is discussed in detail in chapter two. Several implications of this method must be addressed in light of the basic principles of room acoustics and psychoacoustics in order to design an intelligent tech
nique. The experimental approach can be summarized as follows. The listening room is excited by a broadband noise source. A measurement is made of the acoustic pressure at the listener location. This measurement is compared with the source signal band by band with an octave filter bank. An inverse of the room is approximated by ensuring the energy distribution of the source signal is maintained by the measured signal.
figure 1.16
The figure demonstrates the reasons that modelling the physics of room acoustics is not feasible for solving practical room inverse problems.
The discussion of psychoacoustics makes clear that acoustic high fidelity requires maintaining signal pitch, and the level of spatial and temporal coherency, with sufficient spectral resolution to match the psychoacoustic and physiological characteristics of human hearing. Equalization can not restore true audio fidelity of the acoustic field as perceived at the location of the signal recording. Without a significant increase in the sophistication of the recording process, coherence information will continue to not be encoded on the CD. As a result, properties such as clarity, spatiousness, apparent source size, etc. will not be restored
Reason to Reject Waye TheoTetic Geometric
Approach optics Approach
1. Acoustic Impedances of Materials X X
Required.
2. Complicated Boundary Conditions. X X
3. Computational Complexity. x x
4. Resultant Filter Orders.
5. Discemability of Important X
Room Modes.
6. Limited Accuracy X
7. TimeAvera'ged Properties Only. X
by equalization. The coherence properties of the signal received by the listener in the home listening room will be determined by the listening room itself
By ignoring the signal coherence properties, the inverse problem is radically simplified. In addition we have assumed that the listening rooms are minimum phase. Limiting the inverse problem to minimum phase systems does not significantly reduce the robustness of the technology. From the principles of room acoustics it is known that for a listening room to be nonminimum phase it must have a bad acoustic design, e.g. a rectangular cavity with one axis much longer than the others terminated by walls with very low acoustic resistance. The assumption of minimum phase has the important characteristic that equalization will not require poles to be placed outside the unit circle.
The filter design problem has been reduced in scope to removing gross coloration effects of the stereo electronics and the listening room at a sufficiently high level of spectral resolution that it matches the characteristics of human hearing. The summary of psychoacoustics indicated that the phase response of a system does not affect our perception of pitch. The filter designer is left with the simpler task of calculating the inverse filter which inverts the magnitude response of the listening room at a resolution of approximately 1/3 octave. The following figure demonstrates the simplified set of design criteria.
Design Factor
1. Spatial Coherence
2. Temporal Coherence 3. Magnitude Response
4.Phase Response
5. Frequency Resolution 6.Amplitude Resolution
7. Stability
+
Not encoded on CD Not encoded on CD Not pyschoacoustically important
Must match criti bandwidths
cal
16 bits encoded on CD room responses are usually minimum phase
The figure demonstrates the design criteria for a measurement based equalization strategy.
figure 1.17
Yes/No
Comments
1ï¿½!
CHAPTER 2
AN EQUALIZATION STRATEGY FOR REALTRVIE SELFADJUSTMENT Introduction
Chapter two will introduce the current stateoftheart equalizer technology. It will be shown that this technology does not make efficient use of an equalizer's limited degrees of freedom. Because of the finer frequency resolution required in the lower frequency bands, high filter orders are required to control the small critical digital frequencies. In the high ftequency bands however, the spectral resolution will be finer than necessary. The AdkinsPrincipe architecture significantly improved the traditional strategy by introducing a multirate octave filter bank. By downsampling data in the lower bands according to the Nyquist limit, the critical digital frequency requirements are made proportionally less severe, and the required inverse filtering can be performed by filters of significantly lower order. As this is the basic approach of the technology introduced in this dissertation, a complete description of this technique will be reviewed in chapter two.
While this strategy is a significant improvement, it does not make use of the time varying spectral distribution of the source. The AdkinsPrincipe strategy continues to assign the limited degrees of freedom available in the inverse filters to equalize the entire audio bandpass, even if the source has no power in large spectral regions. It furthermore continues to equalize regions of the bandpass where little or no coloration is being induced by the stereo electronics or the listening room. The approach investigated in this work is to add into the AdkinsPrincipe architectum adaptive filters. Adaptive filters by their very natum will assign the most processing resources to regions of the audio bandpass in which the most coloration exists, significantly improving the efficiency and performance of equalization. A brief description of the major classes of adaptive algorithms is included. Because of the requirements of realtime processing, as well as the requirement of high fidelity, appropriate classes of adaptive algorithms are limited. Because of the low computational complexity, and satisfactory performance the Least Mean Squares (LMS) algorithm is chosen for further investigation in this research.
LP = Lowpass Filter HP = Highpass Filter DEC = Decimation INT = Internolation
_P1 HDECHHP4H INT H LP,
The AdkinsPrincipe equalizer architecture is a parallel multirate filter bank.
figure 2.0
StateoftheArt Acoustic Equalizers
It is clear from the preceding section on the physics of room acoustics that a theoretical modelling of a listening room, even with an unreasonably simplistic geometry and constructed with materials of uniform impedance, is such a computationally burdensome endeavour that it can not be performed for an individual's listening room. The current equalizer technology seeks to excite aroom with broadband noise in order to stimulate a broad range of eigenmodes. This signal is measured at a listener location or locations, and an inverse of the cascade of the home stereo system and the home listening room is estimated. Equalization is not a panacea which will restore a perfect reproduction of the pressure field as propagated by the original rendition of the musical performance. In fact, acoustic equalization can not even reconstruct a pressure field which is psychoacoustically equivalent. Current technology attempts only to remove gross coloration. The strategy employed is depicted in figure 2. 1. Limitations of this strategy will be examined, with special emphasis on those items which the proposed adaptive equalizer will ameliorate.
figureu2.1 Sttfteatualize r alcren dege s apralle filte s
banks withfilterofiinstaaraduedbcmpig
thbpwraoled i eachbnyamcopoelctdi
thenlstnnrm with thffcenis which ise caused to citearthe
room.
After room excitation with a broadband source, current techniques break the measured signal into octave or 1/3 octave bands using a set of bandpass filters. The power in each band is compared with the power initially radiated in the same band. The equalizer is adjusted so that its filter coefficients give sufficient gain in the attenuated bands that the received signal has a flat frequency response, or any preprogrammed response which the listener finds acoustically' pleasing. The noise source is turned off, and the CD player is turned on.
Source Excitation
The excitation source utilized in equalizers has been the subject of much debate. Several articles [ 1314] deal with the varying responses of rooms to different types of sources, i.e. firecrackers, pistol blanks, etc. The most important characteristics of the test signal are the signal's spectral content, the amplitude distribution of the source, and the duration of the signal. Stateoftheart equalizers use pink, Gaussian noise for reasons which are now discussed.
Except for smalL undamped rooms, at low excitation frequencies, the sound pressure amplitude distribution is independent of the volume, shape, or acoustical properties of the room. By examining the room's Green's function it is seen that the pressure field is the superposition of many eigenfrequencies, each with its separate damping coefficient. This superposition is complicated due to the finite halfwidth, and small separation of the room resonances. Since the eigenmodes are closely spaced and mutually coupling, the field may be considered to be resultant from a single frequency with randomly distributed amplitudes and phases. Applying the Central Urnit Theorem to both real and imaginary terms of the pressure, P, leads to the Raleigh distribution of IPL Figure 2.2 demonstrates the similarity of the distribution function of.pressure to Gaussian excitation.
The spectral content must be sufficiently broad that the eigenmodes excited by a music signal are also excited by Gaussian noise. Pink noise satisfies this requirement and has theadditional advantage that it will simplify the signal processing. Pink noise has the property that equal power is radiated in each octave. In addition, unlike impulsive sources, the
figure 212 A Gaussian noise source is used with many stateoftheart
equalizers because of its similarity to the pressure distribution of acoustic data in an enclosed space.
pink noise source is of sufficient duration to "charge" the reactive elements of the listening room, which will allow a better equalization of the room's steady state response. Bandpass Filters
Bandpass filters are designed to have bandwidths that approximately match the bandwidth characteristics of our human physiology. All high quality equalizers employ a digital 1/3 octave parallel filter bank, and from figure 1. 12, it is seen that this resolution will closely match the critical bandwidths as discussed in the psychoacoustics, review. The center frequencies and upper and lower cutoff frequencies are given in figure 2.3.
To maintain pitch the bandpass filters must be designed with sufficient control of the signal's spectral characteristics, while realizing that too fine a resolution will not improve our subjective sense of pitch and will add processing time and expense to the equalizer. The filter bank is made up of linear FIR lowpass and highpass filters cascaded to form bandpass filters. Performance is limited by the low digital cutoff frequencies associated with the
lower frequency bands, which require high order filters. Control of the band at the level of
the difference limen (fig. 1.11) is not achieved. Except in the most unusual circumstances
1/3 Octave Filter Bank
[and Center Frequencles [Hz] Band Um [Hz)
22
25 28
31.5 35
40 44
so 57
so 2228 Hz
100 88
125 113
160 141
200 176
250 225
315 283
400 353
500 440
630 565
0 707
1000 880
1130
1250 1414 0
1600 1760
2000 2250
3150 2825
4000 3530 Iz:
5000 4400
6300 5650
8000 7070
10000 8800
1200 11300
16000 14140
20000 17600
22500
figure 2.3 The critical frequencies of an 1/3 octave bandpass filter are
shown above. Because of the similarities with the resolution
characteristics of human hearing, the 1/3 octave filter bank is
most ofterichosen for stateoftheart equalizers.
the difference limen is an unnecessarily strict standard due to the masking effects of music
signals. The precise requirements are difficult to evaluate do the complex nature of masking.
Signal Measurement
Any system which relies on current microphone technology for precise acoustic measurement is problematic. A situation analogous to the Heisenburg Uncertainty Principle
arises when measuring the field  introduction of the microphone into the field disturbs the
field itself. In minimizing this effect, high quality microphones are made as small as possible while maintaining their sensitivity. In addition measuring the pressure field intensity is not synonymous with measuring signal loudness as was shown in the previous section. The conversion from intensity to loudness can be accounted for in software. A more difficult matter is to completely characterize the microphone directionality, which will differ from the directional properties of our hearing. In practice a precise understanding of the microphone's directional response is unavailable. A high quality microphone must also have a fast response if it is not to smear the details of the signal, affecting the perceived timbre and clarity.
Signal Procusing
Processing is performed by a dedicated chip which calculates the power in each band collected from the CD player. The same procedure is performed on the signal received from the microphone. The two signals are compared, and a calculation of the appropriate filter coefficients is made in order to guarantee a flat frequency response, or any other preset frequency response, across the audio bandpass. Unfortunately the details of the processing are not discussed in the literature, probably because of sensitivity towards proprietary technology on the part of manufacturers. The measurements of current stateoftheart equalizers indicate they are capable of maintaining a flat frequency response across the audio bandpass to within I dB. At 40 dB between 100 and 1000 Hz for speech and music signals a band ripple of I dB is not perceptible. At the band edges ripple can be as great as 2 dB before listeners can perceive coloration. The number of bits used for filter all processing calculations must be at least 16 in order to avoid degrading the SIN of the CD player. High quality equalizers allow separate equalization to be performed for each channel of the stereo. The period of time required for the equalization can be as high as 15 seconds. The filter coefficients can be stored in one of several memories. A graphic interface is provided to allow the user to manually set his listening preferences, which are taken into account in the cal culation of filter coefficients. These systems are a large improvement in traditional equalization technology which required time consuming manual adjustment of graphic equalizers, and depended on the skill of the individual making the adjustments.
Limitaftons
In the discussion of room acoustics the temporal characteristics of the acoustic field was introduced using the principals of geometric acoustics. This approach gave a qualitative understanding, but it is of limited accuracy and validity. The spatial characteristics were also presented using a simplistic approach to acoustic reflections. A rigorous theoretical foundation of room acousticswas made frornthe wave equation. By solving boundary conditions it was shown that the impedances of surface materials, and the room's geometry are the key parameters in developing a Green's function. Because of the oversimplification of geometric acoustics, and the huge computational requirements of a wave theoretical approach, filter coefficients required for an equalizer can not be calculated.
Stateoftheart equalizers determine the room response by exciting a room with a
broadband source. A measurement is made of the acoustic field and the listener location, and an inverse of the room response is calculated for implementation in an octave filter bank. The excitation source, signal measurement, signal processing, and equalizer filters of stateoftheart equalizers were reviewed. This technology is focussed on removing those effects which are psychoacoustically most significant. In the presentation of psychoacoustics pitch was discussed as a key feature of the music signal. To preserve pitch the equalizer must remove c . coloration introduced by the stereo electronics and the listening room. Phase response was shown to be relatively unimportant in maintaining pitch. The temporal and spatial characteristics of the acoustic field were discussed as important characteristics in giving a listener the sense of fullness and spaciousness of the sound.
The stateoftheart equalizer technology described above is able to remove most coloration of the stereoroom combination, and thus maintain pitch. The equalizer will not be able to correct the following effects.
1) The equalizer will not be able to correct coloration due to strong regular reflections. These types of effects are represented by transfer functions that are nonminimum phase, and thus would require the equalizer to have poles outside the unit circle. This should not be a problem unless the room in which the original recording was made, or the home listening room has a terrible acoustic design, e.g. a rectangular cavity with one axis much
longer than the others terminated in a wall with very low acoustic resistance.
2) The perceived timbre will not be affected by the equalizer. Timbre is related transient phenomena, and as the equalizer's filters are time invariant, the equalizer will not be able to restore timbre related features.
3) Equalization can not restore true audio fidelity of the signal because of the loss of information regarding both temporal and spatial coherence properties of the signal. Unfortunately these characteristics are psychoacoustically important. There are many reasons for the lack of fidelity with respect to field coherence properties. Without a significant increase in the level of sophistication in the recording process, the coherence information will not be included on the CD itself. Microphones used in the recording process, as well as the equalization process, do not encode information regarding the spatial characteristics of the signal. Fast temporal structures may also be lost due to the nonzero time constants in all microphones. In addition the stateoftheart equalizers utilize only one microphone and cannot account for stereophonic properties of the acoustic field. As a result of the above limitations, properties such as clarity, spaciousness, apparent source size, apparent source location, etc. will not be restored by equalizers. These properties will be maintained by the characteristics of the listening room itself.
4) The equalization of a music reproduction system is valid for only one listener location. If the listener moves locations, the equalization must be performed again. The adaptive equalizer will eliminate this problem.
5) The best source with which to excite the room is the actual acoustic signal of interest
 not pink noise. The equalizer's filters should utilize their limited degrees of freedom to correct frequencies at which distortion is a maximum, and not the entire spectrum. This is a fundamental limitation which affects the degree to which relative loudness, and hence pitch, can be maintained. This limitation will be ameliorated by the adaptive equalizer.
The AdkinsPrincipe Equalizer Architecture
The AdkinsPrincipe (AP) architecture [1] uses a multirate filter bank to produce an octave band structure by repeatedly filtering and decimating a single input signal. A separate highpass filter equalizes each band, making the processing computationally parallel and greatly reducing the necessary microprocessor speed for realtime operation. The advantage of multirate designs lies in their increased efficiency. The elegance of the structure can be seen in the block diagram shown in figure 2.0. The proposed realtime equalization strategy will be based on this architecture. The following section will discuss its characteristics. The advantages of 'this architecture over the current stateoftheart will be made clear. Its operation will be shown to be consistent with maintaining pitch as described in chapter one. In addition, the features of the architecture which must be carefully controlled to assure proper signal reconstruction will be discussed. Octave Bands Generation and Eq4ualization
The highest frequency band is controlled by passing the original signal through a highpass filter. All lower bands are controlled by a cascade of lowpass, decimation, and highpass filters. This is accomplished as shown in figure 2.4.
64'
futeKadiite aln aI seodbrnh
The tree structure is an efficient strategy for implementing an octave filter bank. The input to each branching node has been bandlimited and decimated. The lower branch performs equalization with a highpass filter while the upper branch further reduces the CD bandwidth. The characteristics of the upper and lower branches are described below.
Each of the upper branches attempts to perform the following lowpass operation.
HLp = 1, 10) <7 (21)
= 0, otherwise
The upper cutoff of band i is equal to the lower cutoff of band ii. The sampling rate can be reduced by decimating by M according to Shannon's sampling theorem.
Xde [nfl = x [nMT] (22)
The decimation rescales the digital frequencies of the DFT, as shown in figure 2.5.
Clearly the decimation rate must be coupled to the value of the digital cutoff frequency of the lowpass filters in order to avoid aliasing.
Decimation by 2
figure 2.5 Decimation by M in the time domain expands the digital
frequencies by M, and reduces the peak value of the
spectrum by M. This figure demonstrates the effects for M
equals 2.
Decimation reduces the number of operations required for realtime processing by reducing the data rate by M'' for band i. In addition, because the critical digital frequencies
are increased by M, due to the rescaling of digital frequencies engendered by decimation, the required filter orders for all subsequent filtering operations are considerably smaller, or a finer frequency will be possible with the same order filters. This is an important characteristic due to the finer resolution psychoacoustically required in the lower frequency bands. Figure 2.6 summarizes the upper branch.
At each branching node of the tree structure the bandlimited signal is also sent to a highpass filter (lower branch) for equalization. If the room were nonfiltering the highpass filter would be designed as follows.
HHP = l, I101 < (23)
= 0, otherwise
Note that the digital cutoff frequency is jq and not  because of the decimation.
\
Operation of the Upper Branch of the Tree Structure
original signal Lowasl Filter Iowpass signal
U Decimate by M
+ +ï¿½ + .+ ï¿½ +.ï¿½ ï¿½ . . . .
figure 2.6 This figures demonstrates the processing of a signal
generated as the summation of two sine waves. The low
frequency sine wave is separated with a lowpass filter and
then decimated. The decimated part of the signal is thrown
away.
In the more realistic case of a room which filters its acoustic input, the highpass filters are designed with user supplied gains for each band. The graphic interface used on an equalizer implemented on the NeXT computer workstation [ 15] is shown in figure 2.7. The interface indicates gains for seven bands, and interpolates between the frequencies indicated below.
figure 2.7 The graphic interface for an equalizer using the AP
architecture was implemented on the NeXT computer with a user interface as shown in this figure. The highpass filters are
adjusted according to the selections made by dragging the
slides with a mouse.
The combination of filter length and decimation rate must be such that the requirement of at least 1/3 octave resolution is satisfied. In addition, band ripple, transition bandwidth, and stopband attenuation must be such to guarantee that there is less than ldB band ripple across the entire audio bandpass. The research documented in this dissertation will use the filter design presented by Adkins and Principe [I]. Their extensive simulations resulted in a recommended design of 4 bands, each utilizing symmetric finite impulse response (FIR) filters of order 45, and a constant decimation rate of six. A summary of the frequency resolution of the design is given in figure 2.8.
Band 2 612.53675 19 0.045 0.250
Band 3 102612.5__ 19 j0.045  0.250
Band 4 17102 J 19 J0.045  0.250
i U IU 1 37 220510
f,12M4 f1W f/(2 f.12M f
f,/2M3 frequency [Hzj (not to scale)
figure 2.8 Frequency resolution for a four band equalizer with filters of
order 45, and a constant decimation rate of 6, is shown
above.
The highpass FIR filters are designed by taking the inverse FFT of the frequency response requested by the user with the equalizer's graphical interface . The filters are smoothed by truncating symmetrically about time = 0 the resulting time domain representation with a Kaiser window with 0 = 3.5. The symmetric nature of the filters guarantees linear phase. The time domain representation is shifted to guarantee that the filters are causal. A summary of the process is shown in figure 2.9. Signal Reconstruction
After equalization has been performed with highpass ifiters, the data in each band must be summed in order to reconstruct the signal. The summation is complicated because of the different sampling rates at which the individual highpass filters operate. In addition there is an unique delay due to the differing number of convolution operations performed in the various branches of the equalizer. Immediately proceeding equalization, the signal
Grephic Interface Desired Frequency Response
figure 2.9 Operation of the lower branch of the tree structure.
a) User defines desired response; b) Frequency response; c) inverse fourier transform for time damson representation, (d) For filter coefficients multiply time domain representation by
a Kaiser v."'Indow to lessen degradation induced by Gibb's
phenomena.
is interpolated to bring it to the sampling frequency of the. preceding band. Interpolation performs the opposite operation of decimation by inserting MlI evenly spaced samples of value zero between every pair of data points.
x i~nT x [kT] if k=n/M=an integer (24)
0 otherwise
Interpolation acts to recompress the digital frequencies of the equalized time series in manner precisely opposite to the decimation process. These effects are demonstrated in the figure 2. 10.
Images created by interpolation must be removed with an antialiasing lowpass filter.
I& 7
Note that the digital cutoff frequency is Tand not 0 because of the interpolation. The lowpass filter should have a gain of M to restore the energy lost in the decimation stage.The output from the antiimaging filters are summed with the output of the preceding band efficiently by using the inverse tree structure shown in figure 2.11.
Graphic Intedoce
Desired Frequency Response
Interpolation by 3
(a) For decimation by M, M I evenly spaced samples of zero are placed between each pair of samples, (b) frequency response before interpolation, (c) frequency response after interpolation.
jW
H(e
/\AAAAA,
figure 2. 10
figure 2.11 The inverse tree structure efficiently sums the bands together
after interpolation and antiimaging, bringing the lower
frequency band up to the sampling rate of the higher
frequency band.
The delay associated with each band is calculated with respect to the highest sampling
rate. Recall that for an FIR filter the delay, Td, is expressed as follows, where Ts, is the
sampling rate of band 1.
(N 1)
Td 2 TS 1 (25)
For band i the total delay as seen at the output of the highpass filter is expressed as follows.
Tdi (N 2 1) 1 Mi] TS, (26)
i = 0
The delay at all nodes of the inverse tree structure is dominated by the value of Td in the upper branch. The total delay as seen at the output of the AP structure for a four band system is dominated by Td4 and is given below.
2 T M 3 TS 1 (27)
'Tdtotal d4 2
Processing SDsQd
Efficient use of computing resources can be made because of the differing sampling rates of data in various stages of the architecture. Convolutions are performed at different rates and are staggered in time. Data is sampled such that multiply and accumulate operations are made in nonoverlapping time slots. In this way major timing bottlenecks are avoided. The CD player sampling rate determines the value of the time slots (1/44, 100 22.676 microseconds). For the four band design 10 FIR filters are operated. Three filters perform convolutions at the highest sampling rates (44.1 kHz), three perform convolutions at the CD rate decimated by six (7.35 kHz), three perform convolutions at the CD rate decimated by 36 (1225 Hz), and one convolution at the CD rate decimated by 216 (204 Hz). The timing diagram is shown in figure 2.12.
As an example of the computational load of the convolutions indicated in the timing diagram above, consider the TI TMS320C40 25Mflop floating point digital signal processing
figure 2.12
H , t t
I Ptf tff t t t
II
LPI
HP3
LP2 t I
LPt 216
HP4
I p 216
The timing diagram indicates the efficiency of microprocessor implementation of the AP architecture. Decimation and selective sampling allow a staggering of the convolutions which prevent computational bottlenecks.
chip. This processor performs a multiply and accumulate in one cycle. With one chip dedicated to each band of the equalizer the number of computations which can be performed between CD data samples is limited to the following.
# of computations = 25MFLOPS = 566 44.1KHz
According to the timing diagram above, for FIR filters of order 45, 360 multiply and accumulate operations are required, i.e. 63.6% of the CPU time. Theoretical Performance
The AP architecture is an efficient implementation of a filter bank. The audio signal is broken into octaves using a tree structure. Along the lower branch, a bandlimited and downsampled signal is equalized with a highpass filter. Along the upper branch the signal is further bandlimited and downsampled according to the Nyquist limit. After equalization the octave bands are brought to the original sampling rate by upsampling, and using an in
verse tree structure. Each band has a distinct delay which must be accounted for when recombining bands. Because of the octave band structure and downsainpling, the critical digital frequencies are raised as much as possible, and lead to a degree of bandwidth resolution and of band ripple, across the entire audio bandpass, that is superior to performance advertized on existing equalizer architectures. From figure 2.5 it is seen that frequency resolution is finer than 1/3 octave across the entire bandpass. At worst, resolution i s 1/4 octave, while at best the equalizer has 0.045 octave resolution. Computer simulations of the AP architecture indicate for equalizer settings for a flat response, there is a band ripple of less than 0.2 dB, from 17Hz to 22500 Hz. This value is far superior to the values of 2dB which are measured on equalizers which are commercially available. The spectral resolution and the high degree of control of the highpass filters are precisely the characteristics required to restore the proper pitch of the acoustic signal received at the listener location.
Automatic Adjustment Using Adaptive Filters
Elliot and Nelson [I16]have proposed substituting the highpass filters in a standard filter bank with adaptive filters designed to minimize the mean square of an error signal which is generated by taking the difference between a pink noise source, and the noise source as collected by a microphone at the listener location. Although this approach has the advantage of a lower computational complexity than the standard frequency domain techniques, the most compelling reason for the use of adaptive filters is overlooked, namely, the use of the music signal itself as a reference signal. This research will substitute adaptive filters for the highpass filters in the AP architecture as shown in figure 2.13.
In each stage the CD reference signal is bandlimited and decimated in precisely the manner discussed above. A microphone signal is filtered and decimated in an analogous manner to the CD signal.' A feedback signal (error signal), which is provided by the difference between the filtered CD and microphone signals, is sent to an adaptation algorithm which updates the adaptive filter weights in such a way that the mean square error is minimized. As an example, the structure for band 2 is shown in figure 2.14.
figure 2.13
The adaptive equalizer architecture is a modification to the AP architecture in which highpass filters are replaced by adaptive filters.
The adaptive filter is outlined in grey. The filtered CD signal is compared with the microphone signal similarly bandlimited an decimated. An error signal is generated and sent to the adaptation algorithm which updates the filter weights in such a way that mean square error is minimized.
figure 2.14
This strategy will improve the stateoftheart by ameliorating two of the limitations discussed in chapter one. Specifically, if the listener changes location (assuming the micro
phone is also moved) the filter will automatically correct for the changes in the room's response because of the equalizer's capability for realtime selfadjustment. Depending on the type of adaptation algorithm chosen, the updates will be performed either every sample(44.1 kHz), or after every block of samples(44. I kHz/N, where N is the length of the block). More importantly, the reference signal used for equalization is the CD signal itself instead of pink noise. The adaptive filter, when minimizing mean square error, will automatically utilize its limited degrees of freedom to adjust the frequency response in those portions of the spectrum in which the most energy is present  not the entire audio bandpass.
Several adaptation algorithms have been developed for adaptive signal processing
[1719]. The choice of algorithms depends on several considerations, namely, the stationarity characteristics of the signal, the data rate of the input signal, the degree of accuracy required in filtering, and. the computational resources available for realtime operation. Acoustic data is highly nonstationary. Because the adaptive filters must find their optimal coefficients in a fraction of the time in which the audio signal is quasistationary, the adaptation process must be fast. Secondly, although the adaptation must occur rapidly, the allowable misadjustment of the filter weights must be relatively small for meaningful improvement in the current stateoftheart. Thirdly, due to the high CD sampling rate, the complexity of the algorithm must not be so great as to preclude carrying out the arithmetic in realtime. This is a serious concern even with the high speed microprocessors currently available. Lastly, any adaptation algorithm must not introduce any noise components which will lower the signal fidelity.
All available adaptation algorithms require tradeoffs among the crucial factors outlined above. Extremely rapid convergence can be achieved, but at the expense of increased computational complexity. Computationally efficient algorithms exist, but convergence is slowed, and further serious tradeoffs between convergence speed and misadjustment need to be taken into account. Other algorithms may make a good compromise between complexity and convergence speed, but by their nature they contribute undesirable noise into the system. Consider the tap delay line architecture for an FIR filter shown below, where
Yk represents the output of the feedforward filter at time k, dk represents the desired (or reference) signal, and ek represents the difference signal between dk and Yk (error signal).
figure 2.15
An adaptive tapped delay line with desired signal, dk, and error signal, ek.
Assume that the sequence {Xk} is quasistationary, i.e. the signal has been chosen over a time window in which it is approximately stationary. The output of the tapped delay line is the inner product of the n+l most current filter weights with the filter input.
Yk =Xk Wk= [xk xk l Xk2 x 
(28)
The filter weights, W*, which minimize the mean square error, E [ (dk  Yk) 2] , are expressed by wellknown Wiener formulation: W*=R'P where R is the matrix E[XXT] and P is the cross correlation between d and X, E[dX]. The field of adaptive signal processing is concerned with approximating the solution for W*. There are two standard methods by which the solution is approximated: gradient search algorithms and least squares (LS) al
gorithms. The gradient search methods can be generalized as follows: Wk+ I = Wk  V , where = E [ (dk Yk) 2] = mean square error. This common sense approach adjusts the weights constituting vector W in a direction opposite to that of the gradient of the error surface. g. represents a constant which regulates the step size of the weight update. The most common gradient search algorithm is the Least Mean Squares (LMS) algorithm which estimates V = V E [error2] by taking the time average of V [error2] . The approximation is valid over quasistationary periods because of the quasiergodicity of the signal. The LS algorithms differ by estimating the value of R and P with time averages. LS algorithms can be broken into two categories: algorithms which calculate new estimates over a window of data (block LS algorithms), and algorithms which recursively update the estimate after every sample (RLS algorithms). One example of an LS algorithm is the block covariance method. Let f and P' be the time averaged approximations of R and P, and let co be a windowing function.
lJXXT
Rk = l(dllT
Pk = (210)
A1
1
Wk = Rk Pk (211)
Certain classes of algorithm can be ruled out of consideration immediately because of their unsuitability with respect to one or more of the requirements given above. Although the block LS algorithms converge more quickly than gradient search techniques, after every stationary window of data the input data autocorrelation matrix, h, must be inverted. Some of the block algorithms have autocorrelation matrices that are Toeplitz and can use the Durban algorithm for matrix inversion, requiring O(N2) operations, where N is the length of the filter. Many of the LS methods(i.e. the covariance method), however, are not Toeplitz and matrix inversion must be performed using the Cholesky algorithm, requiring O(N3) operations. Furthermore, these algorithms update the weight vector every M samples introducing an undesirable artifact into the spectrum of the acoustic signal.
Algorithms which minimize error in the frequency domain operate on blocks of data. As such, they update filter coefficients at regular intervals, introducing a noise component into the signal. In addition, they require more operations in order to transform the data. Although we are most interested in minimizing the error of the magnitude response of the transform, good time domain equalization will guarantee good frequency domain equalization as well.
The recursive algorithms are more promising. They converge rapidly, and the weight vector is updated every sample. These algorithms unfortunately have a heavy computational load. Figure 2.16 gives the computational requirements of the most familiar versions of the RLS algorithm [18].
REQUIRED COMPUTATIONS PER ALGORTHM ITERATION
ALGORITHM MULTIPLICATIONS ADDITIONS SQUARE
& DIVISIONS ROOTS
Modified Fast
Kalman(CMFK) N15N1
Fast Kalman 8N +5 7N+ 2
Growing Memory
Covariance CGMC) 13N + 6 11N +1
Sliding Window 13N +13 12N+ 7
Covarnance (SWC) ________ _____Normalized GMC 12N +16 7N +2 3
Normalized SWO 23N 11 N 5N
Normalized PLS GIN +17 5N +2 2
figure 2.16 The computational complexity for several adaptive
algorithms are displayed above.
All algorithms, except the Kalman types, required significantly more computations than the LMS algorithm (2N + 1 multiplications and 2N additions). On the basis of computational considerations, the algorithms seriously considered for the equalizer are reduced to these two alone. The choice has been made to utilize a fast LMS method as the adaptation algorithm. Chapters three and four will investigate LMS algorithms in detail.
CHAPTER 3
LEAST MEAN SQUARES (LMS) ALGORITHM CONVERGENCE WITH UNCORRELATED INPUT DATA
The LMS algorithm converges q uickly to an approximately optimum solution (in the mean square error sense) given that the step size has been properly initialized. Because of the low computational complexity of the LJMS algorithm, it has been selected as the adaptive algorithm which will be implemented in the AP architecture. Because the LMS algorithm is a gradient descent technique, there is a tradeoff between convergence speed and misadjustment. Misadjustment measures on average how close the LMS solution is to the optimum solution, after algorithm convergence. A smaller step size results in a smaller misadjustment. For fast convergence and low misadjustment an adaptive step size is required. At the beginning of a quasistationary segment of data, the value of the step size must be initialized in such a way that convergence speed is maximized. In this stage of the adaptive process the lack of convergence speed provides the major contribution to mean square error, and misadjustment is unimportant. As the filter converges, the step size must be reduced because of the increasing dominance of misadjustment.
Chapter three will investigate the optimal value at which the step size should be set at the beginning of a quasistationary segment of acoustic data. The initialization problem has been rigorously solved for uncorrelated input signals. To efficiently utilize the LMS algorithm on music signals understanding must be extended to the case of correlated input data (see figure 4. 1). Chapter three will provide the necessary background information for this extension. Chapter three first presents the WienerHopf formulation of the minimization of mean square error. The characteristics of the error surface are highlighted as part of this discussion. As a result it is clear that a stochastic gradient descent algorithm is a feasible approach to the minimization problem. The properties of gradient descent algorithms are discussed with particular emphasis on the LMS algorithm and it properties. Finally, the convergence speed of the mean square error for the LMS algorithm is minimized as a func
tion of step size. The background information in this chapter is largely taken from the work of Widrow and Steamns [19], and their notation will be used throughout.
The WeinerHoof Formulation
The WeinerHopf formulation [20] gives a method in which filter weights of a tapped delay (direct form) filter can be optimized when the input to the filter is a stationary stochastic signal. The results hold for a quasistationary signal to a greater or lesser degree depending on the precise statistical nature of the signals. The measure by which the filter weights are optimized is the mean square error (MSE).
Because the WienerHopf equation is the underpinning of all adaptive signal processing algorithms, it will be reviewed in detail. The derivation of the Wiener filter will yield important information regarding the nature of the error surface. As a result it will be obvious that gradient descent techniques will be applicable optimization algorithms. The derivation of the Weiner solution is presented below.
Derivation of the Wiener Solution
Figure 2.15 shows a tapped delay line. The input sequence I xk}1, is a stationary stochastic signal. The output of the filter, yk, is simply the inner product of the n+1 most recent inputs with their respective filter weights, for a filter of order n+1.
W0 (31)
W I
=k XTWk [xkxk xk2* Xkn] 2
Wn
The error signal is generated by taking the difference between the desired signal, dk, and the filter output.
ek = dkYk
dkXWk
Square the error.
e 2 dk T
ek =(kXkWk)
(33)
d22 T T T
=d2dkk Wk+XiWkWiXk
Take the expectation of both sides of the above equation.
E [e2] E [ d 2dkXk Wk + TWkWXk]
(34)
Ed22E[ T T T
kEd2 2EtdjCkWk] +E[XkWkWTXkl
By keeping W at a fixed value its subscript can be omitted. Furthermore note that
T = W w
XkWW Xk =*XkXkW
2 2E T T T
E [ek] = E [dk]  2E [dkk] W +W E [Xkk] W
Define the expectation value of the cross correlation matrix as follows.
kE [dkX] = E[dkxk
dkxk 1 *
(36) dkxkl]
Define the expectation of the correlation matrix as follows.
(35)
(32)
T
Rk =E [XkXk] E
XkXk Xk  lXk xk  nXk
XkXk I Xk lXk I
XkXk 1
XkcXk~n Xk Xk  n
Xk  nXk  n
(37)
The mean square error can now be expressed in a simpler notation. Let = E [e2].
= E [dk]  2PkM + WTRkW
(38)
To minimize , set  to zero and solve for the optimum weights, W*
(39)
i 2P+2RW = 0 dW
(310)
W = RP
The above equation is the expression of the Wiener  Hopf equations. Substituting equation 10 into equation 8 and making use of the symmetry of R results in an expression for the minimum value of , *.
= E [dk]  2pTR P + (R1p) TR (R P) = E[d2] 2pTw
(311)
(312)
With a stationary signal in which the cross correlation matrix and correlation matrix were perfectly determined, it would be possible to solve the WeinerHopf equations for W* and *. Unfortunately in engineering applications of interest neither of these assumptions is generally true. The field of adaptive signal processing is concerned with estimating W* without the necessary a priori information required by the Wiener  Hopf formulation. Error Surfaces
Equations eight and ten provide important information regarding the characteristics of the error surface, which is defined as = f(W). The most critical result is the hyperparabolic nature of the surfaces. This is demonstrated by changing the coordinate system used in the expression for . Let V = W  W* be the value of the displacement of the filter weights from the optimal value. Express MSE in terms of this translated coordinate system, i.e. = f(V) . The transformation is accomplished as follows. Recall equation 3.8.
= E[d] + WRW 2WTP (38)
Equation 313 is inspired by assuming the proper solution of f(V) and demonstrating its equivalence to equation 38.
(313)
= E[d] + wTRW2WTRR + TR1 RR1 pTR1p
Substitute W* for R1P and* for E [d1 pjrw*
= * +WTRw2wTRwï¿½+PTR1RW (314)
Because correlation matrices are Hermitian, (PTR1) T = R1'P.
 +WTRW2WTR+RR (315)
Since WTRWV is a scalar, it must be equal to its own transpose. Again use the Hermitian property of R.
+ ~*WTRW+ WTRW WTRW WTRW (316)
Note that transposition is a linear operator.
= *+ (W W)TR(WW* (317)
+' ~ VTR V (318)
The error surface is clearly a quadratic function of V. As a result, the following important characteristics are obvious: 1) the error surface is a hyperparaboloid, 2) the error surface must be concave upwards in order to avoid regions of negative MSE (a logical impossibility), 3) the optimal set of filter weights are located at the bottom of the hyperparaboloid, and 4) there is only one minimum (no local minima). These observations are critical in their implication that gradient search techniques are able to find the global minimum. An error surface for both the translated and untranslated coordinates is shown below.
The Gradient Descent Algorithm
The gradient of a surface points in the direction in which the function describing the surface will be most increased. A common sense approach to finding a surface minimum is to move in a direction opposite to that of its gradient. This section will show rigorously the convergence properties for a stationary signal. The tradeoff between convergence speed and misadjustment will be demonstrated explicitly. Finally, a description of the difficulty in using gradient descent techniques with quasistationary signals is discussed.
figure 3.1 The minimum of the error surface is at the origin of the
translated coordinate system. The parabolic nature of the
surface makes gradient descent algorithms feasible.
Let g. represent a variable which regulates the step size of the weight adjustment. Gradient descent(GD) methods can be generalized as follows.
Wk+l = Wkp.Vtk
(319)
Figure 3.2 graphically demonstrates how GD methods update themselves on the basis of
past information.
A =f (v) =f (W)
Ld
figure 3.2 Gradient descent methods find the error surface minimum by moving in
a direction opposite to that of the error surface gradient.
The trajectory of the filter weights is the projection of the n+l dimensional error surface onto the ndimensional weight space. For a hyperparaboloid the approach is intuitive  the greater the distance from the surface minimum the larger the gradient, and hence, the larger the weight updates. At the surface minimum the gradient is zero, and no further adjustment is made. A more rigorous justification follows. Algorithm Convergence
Recall from equation 39 that the gradient is expressed as  2P+2RW. Substituting this into equation 319 yields the following.
Wk+1 = Wk+g(2P2RWk) (320)
= Wk + g.R (2R1P 2Wk) (321)
=f M
=f(W*)
Gradient Descent
Wk+1 = Wk.tV~k
Substitute the solution of the WienerHopf equation (equation 310) into 321.
Wk+l = Wk+2gIR(W* Wk)
= (I2iR)Wk +2RW
(322)
Switch to the translated coordinate system by subtracting W from both sides of equation 322.
Wk+IW* = (I2iR)Wk +2IRW*W*
(323)
= (I 2 gR) (Wk W*)
Vk +1 = (I2gR)Vk
(324)
Perform a unitary transformation on R. Let A be the diagonal matrix of eigenvalues of R, and let Q be the matrix in which the columns represent the corresponding eigenvectors. This transformation assumes R is not singular.
R = QAQ1 for A =
X1 0 0 X 2
00
00
and
Q= [qlq2 *
0 * qn]
qi is the eigenvector associated with x.
Substitute the above expression for R into equation 324.
1 (325)
Vk+l = (I29QAQ1) Vk
Note that the following transformation has the effect of decoupling the weight vectors in the translated coordinate space, i.e. the transformation causes the new coordinates (principal coordinate system) to be colinear with the eigenvectors of the error surface.
V = Q1V (326)
With the above transformation the coordinates are aligned with the principal axes of the hyperparaboloid. This can be shown as follows.
= * + VTRV (327)
=*+ T (QAQ1) V (328)
S*+ (QV) TA (Q V) (329)
= * + vT A v (330)
The transformation is illustrated in figure 3.3. Returning to the convergence proof, substitute equation 326 into equation 325.
QVk+I = (I2gLQAQ1 )Q k (331)
Premultiply by Q
figure 3.3
The principal coordinate system has axes which lie along the principal axes of the hyperparabolic error surface.
k + I= Q1 (I 2gQAQ1) Qlk
= (I2gA)V14 By induction 1 k can be expressed in terms of 4 0.
Vtk = (I  2gA) ko0
The update equations show their decoupled nature more clearly in matrix notation.
Quadratic Error Surface in the Princpal Coordinate System
f (Q1V)
: X.

(332)
(333)
(1  2p.0) k
0 0 0 0
0
(1  2p%1)k
0 0 0
0
0
0 0
* 0
. (1  2gx n )k
(334)
The convergence condition can be seen from equation 334. If 0 < 11 < __, then with perfect
max
knowledge of the gradient vector, the weight vector will converge to the optimal solution.
For O 1 , lim Vf k = 0
rO
(335)
Convergence Speed
The speed of convergence is limited by the value of step size, and hence, the largest eigenvalue. Consider convergence along coordinate j. A geometric convergence factor can be defined as follows.
rj= (1 2g ) (336)
Recall the power series for an exponential.
x2 x3
exp(x) = ! 3+. ! +
* 0
(337)
Let = I 1 Then T. is approximately the efolding time of the adaptation process, as can be seen by truncating the first two terms of the exponential power series.
Vtk
Vtk 1
Vtk  n
i'to
vtn
rj = exp (338)
J
The convergence of to 17 is represented by a learning curve which can be approximated as the product of exponential terms.
22 2
learningcurve = (e Te . * * e
Learning is limited to the largest efolding time, which is directly related to the smallest eigenvalue and the magnitude of the step size. Excess Mean Sauare Error
In the discussion it has been assumed V is known. In fact it is not. Adjustments must be made in the above equations to account for a noisy gradient estimate. Let V be the estimate of V .
Wk+1 = WkRV~k (340)
Switch to the translated coordinate system.
Vk+1 = VkJ'tV~k (341)
The estimate of the gradient can be expressed as the sum of the gradient plus a noise term.
V~k = V~k+Nk (342)
76
From equations 319 and 342 we conclude the following.
V~k = 2RVk+Nk
(343)
Substitute equation 343 into equation 341.
Vk+l = Vkg(2RVk+Nk)
= (I2tR)VkgNk Switch to the principal coordinate system.
QVfk+l = (I2gR)QVtk9Nk
Vtk+ 1 = 1 (I 2tR) QVfk  IQ1Nk
= (I2 2A) Vk 9Q1Nk
Let Q'Nk = Ntk be the gradient noise projected onto the principal axes.
Vlk+ 1 = (I2gA) Vfk  gAk
(344) (345) (346) (347)
Note that equation 347 now represents a set of uncoupled difference equations. The matrix formulation is given below.
(1  2gx 0)
0 0 0 0
Vtk
'tk  1 vtkn
o 0
0 0
ï¿½ 0
0
0 . (1  2gtn)
By induction V" k can be expressed in terms of Vt0.
kI V'k~l = (I2A)klfO j E (I2[LA)JNkJl j=O
(349)
Thus, with gradient noise included, V1k does not approach zero as k  c. The degree of
ki
lim Vfk = 9 Y (I 2gA)JNtkj_ I k  j=0
suboptimality is determined by the set of eigenvalues, {X ji, the statistical characteristicsof N, and the step size, g.
The difference between the minimum MSE and the average MSE is defined as the excess mean square error. Excess MSE provides a measure of the difference between the ac
(350)
0
(1  2tx1)
0 0 0
ntk I
tk 2
kn 
Vtk 1 Vtk 2
vtk  n 
(348)
tual and the optimal performance, averaged over time, and is expressed as follows.
Excess MSE = E[C*]
=I~ T[fA14
(351)
kI Recall that after adaptation vt k =  (  2;tA)JNft kj 1. Substitute this into equation 351.
j0
( j=Ok ]
Excess MSE = E
kI
A  I (i29A)'iNki1
i= 0
00 00
:[4 ij_1 i j
(1  2tA)j A (i  2tA) iltk i  1] (352)
Since (I  2g.A)J and A are both diagonal, they can be commuted.
00 00
Excess [kj 1A (Ik2A) i+JNk_ i ]ij
2 T
g2.2XE kj_ 1A (I41A+42A2)jhkJN 1] (353)
Assume that N results from independent errors. Then Vi #j there will be no contribution
to excess MSE.
2 E [Ntk T I 2 )2 N k
Excess MSE = ]L E k A (i 2A) Kt
= P2E[ { (t 2) 2j Ntk
(354) (355)
Since I  4pA + 4g2A2 is a diagonal matrix, the convergent infinite series may be simplified as follows.
E (i  4tA + 42A2)j = (4tA  4gt2A2) 1
j=0
Substitute this expression into equation 355.
(356)
Excess MSE =
gE[NtkA (A  gA2) 1N]k]
4E[Adk(lgA)lNtk]
In matrix form the decoupled nature of equation 357 is evident.
rcess MSE =  E [tk nk _1
4 iL'1
ï¿½. ntkn]
1
1  II0
0 0
0 1 0
1
o o
0 0 0
0 0 0
(357)
0
0
ntk
ntk
1
(358)
The matrix equation can be further simplified as follows.
Excess MSE ï¿½EF nftk* 1j (359)
4L10  4gX1J
Equations 336, 338,339, and 359 provide a clear indication of the tradeoff between misadjustment and convergence speed. The smaller is the step size the smaller is the misadjustment. Below a certain upper bound, however, convergence speed is slowed. Figure
3.4 illustrates the implications of equations 360 and 36 1.
Urn Excess MSE = 0 (360)
Ri t = (361)
Adaptation in a OuasiStationarv Environment
The adaptation process for quasistationary input data, while conceptually simple to understand, is difficult to describe analytically. The optimal weight vector is now a function of time, i.e. W/ = f (k) , as the error surface changes shape and moves through a n+1 dimensional space for a ifiter of order n. Widrow has an analytical description of the optimal step size for the LMS by examining the case of modelling an unknown system by an LMS tap delay filter [21]. The unknown system is assumed to be a first order Markov process with 17 constrained to be a constant.
The more general problem for step size optimization is unsolved. Since the weight vectors are adapted on the basis of past information, and the error surface is changing shape and moving with time, the weight vector error results from two contributions. The first contribution results from noise in the estimation of the error surface gradient. The second contribution to error results from estimating the gradient of the error surface at iteration k
using the error surface at iteration ki. Note the formulation for the filter weights for the nonstationary case, in the translated coordinate system.
Vk+l = WkWk = (WkE[Wk]) + (E[Wk]Wk)
(362)
Substitution of the above into equation 319 yields the following.
k+l + (WkE[Wk])TRk(WkE[Wk]) +
(E[Wk]  wk*) TRk(E[Wk] Wk*) + 2E[(WkE[Wk])TRk(E[Wk] Wk*)]
(363)
Expanding the third term in equation 363 and recognizing Wk is constant over an ensemble, the third term reduces to zero, and k+ 1 reduces to the following.
k+1 = + (error from gradient noise) +
(error from moving error surface) = ;*+ (WkE[Wk])TRk(WkE[Wk]) + (E[Wk]  Wk*) TRk(E[Wk]  W ) (3)
Equation 364 is as far as one can go without usually unavailable a priori information concerning the stationarity characteristics of the signal. A graphic illustration of the operation of the LMS algorithm with nonstationary input data is shown in figure 35.
MSE
IExcess MSE
figure 3.4 The larger step size results in faster convergence, but larger
excess MSE. For this reason the equalizer design will utilize
a varying step size.
()
w * W k+1
W2' W
figure 3.5 For quasistationary adaptation their are two contributions to
error in the gradient: 1) the error induced by the typical form
of gradient noise, and 2) the error induced by error surface
lag, i.e. the error induced by using information from a prior
error surface instead of the current error surface. In the
figure above the highlighted update, AWk+lI is based on
information concerning k instead of k + V*
The LMS Algorithm with Uncorrelated Input Data
The best known and most widely used gradient method for stochastic signals in adaptive signal processing is the LMS algorithm[22]. The method is wellunderstood for stationary uncorrelated input data. The method can be extended to the quasistationary case, although the analytic description loses much of its simplicity and is possible only for the most simple models. This section will provide the theoretical background necessary for a discussion on the LMS algorithm with correlated input data. Stationary Model for Uncorrelated Input
Recall the general form of the gradient descent algorithms, Wk+ I = Wk  LV k. The LMS algorithm makes a crude estimate of V ,which nevertheless is quite effective. Let V represent the estimate of V 4Gradient Descent LMS Approximation
a [e2 V = 2ekXk (366)
= 2 E ek , ]
= 2EI ekXk] (365)
The LMS approximation replaces the ensemble estimate of V with a time average. As the algorithm operates over several iterations, on average the gradient estimate will be in the correct direction. The LMS descent is illustrated in figure 3.6.
. .'. . . .
figure 3.6 The LMS algorithm approximates the gradient by replacing
the ensemble average of the gradient descent with a time
average. In the time average the LMS gradient will point in
the direction of the surface gradient.
It is a simple matter to demonstrate that the LMS estimate is unbiased.
E[V ] = E[2ekXk]
T
2E[Xk(dkXkWk)]
(367) (368) (369)
= 2RkWk 2Pk
= V (370)
If the convergence condition expressed in equation 335 holds, the LMS algorithm will converge to W* in the mean.
1
For0< p< kmax
Consider next the variance of the estimate. Assume that after k iterations Wk has converged. Then the gradient estimate is just the noise process Nk.
0
V =7'(+Nk (372)
= Nk (373)
For j > k, Nk = 2ekXk. After convergence ek and Xk are approximately independent.
coy [Vp] = cOv [N] = E[tNtf] (374)
= 4E[eXjXJl = 4E[eIE[XXJ] (375)
= 4jRj (376)
The above does not imply that the variance of the estimate remains finite, or that the mean square error remains finite. A considerably more stringent requirement on step size is necessary, and is discussed below.
Conditions on L. for Finite Variance for Square Error
Horowitz and Senne[2] have studied the convergence properties of the LMS mean
square error for the case of Gaussian, zeromean, uncorrelated data. Let X = [Io X" T.n]T and Ck be the diagonal elements of E[Vt vt7.
= E[(dk Wk)2] = E[(dkkW*) X(WkW )]
= * 2E[XT(Wk W*) (dk4xrW )] +
E[(Wk )W) TX T (377)
J x(xk (w W)I
Because {xi) is Gaussian and uncorrelated, Xk and Wk are statistically independent. Furthermore, Xk (dk  XTW*) = P  RkW* implies independence. Thus Xk can be factored out of the second term of equation 377. Since Xk is zero mean, the term vanishes altogether.
+ E[(WkTX Tt)] (378)
k X" +i (Wk  W ' "
Since Xk is uncorrelated, E [Xk4] is a diagonal matrix and can be expressed as follows.
k= + trace E[(WkW)E[X j (379)
= ;*+ trace V V ] E [X T]} (380)
k = + AT(Tk (381)
Horowitz and Senne next develop a recursive relationship for (Yk from the definition of the LMS algorithm.
Wk+1 = [I2gLXkXT]Wk+2tdkXk (382)
T *
Subtract W* and perform a unitary transformation. Let ek = dk  X Wk.
V'k+ 1 I 21t(XktkT) ] Vt + 2TeV*TXT (383)
Take the expectation of Vt multiplied by its transpose. Because of the independence of ek and Xtk, the expectation of their cross terms is zero.
E[Vt,,k+j = E[ Vtk k] 4g1AE[Vtk k4] +f tV (384)
4p.2 (E[XFkXk kltkXtkk]) +4g2i*A Apply the Gaussian factoring theorem to the third term of equation 384.
E[XkxkV jk]T = 2A2E[Vtk1/k] + (385)
trace{ AE[ VtkVk] }A
Substitute the decomposed moment into equation 384.
T Vf Vff(386)
E[fk+ lif+ 1] = E[Vtk T] 4gAE[ IkVTk] +
8g2A2E[ Vtkvt] + 4g2trace { AE[ VkVtT] } A + 4t2 *A Let Ck = E[ Vt kVt[]. Ck can be decomposed as follows.
Ck+ I [i, i] = (1 4gi% + 8g2X) Ck [i, i] + (387)
n
4g2Xi I X pCk [ i'p] + 4gx2 * Xi pil
A recursive formulation is now possible for (5 . Let F [i, i] = 1  4gki + 12 g 2X2 and F[ij] = 4 g.2,.
Gk+1 = FY k + 4g2 *X (388)
With equation 388 a recursive expression is available for .
Sk+l = k= *+,T[F k+42*,] (389)
Horowitz and Senne develop convergence conditions which can be seen most easily from the matrix formulation of equation 389. Let "(k = [k0 ',. "',] (390)
;k *+ 4;L 2e*;1O 1l1 . L] +
k+l
n
k+l
4g;0 + 1212;L2 40x 4g;L X
'14g 1 1n 40Z 1 4gz + 12g2;2 .4gx1 X ,1
0 1 nk,1
4,L0x 1 4gx0 1 . 1  4g) + 12g2 Xk,n
1
For every entry of F, 11 F [i,j] 11 < 1 and gi is real if 0: < 3 . This condition on g will guarantee finite mean square error. Conditions on 1L for Maximum Convergence Spmed and FiniteVariance for Sauare Error
Feuer and Weinstein [3] have developed an expression for step size which will optimize convergence speed. The expression is a function of generally unavailable a priori information, i.e. all eigenvalues and the initial misadjustment along each of the principal coordinates of the error surface. Given no information concerning Tk,' it is assumed that all of its elements are equal. Horowitz and Senne note that the convergence rate of is approximately optimized (given no further a priori information) by setting gï¿½ as follows.
90
d (1  4g)L + 12g2;L2 0
dp max max
(392)
CHAPTER 4
LEAST MEAN SQUARES (LMS) ALGORITHM CONVERGENCE WITH CORRELATED INPUT DATA
Music signals are not uncorrelated as can be seen from the autocorrelation function for several musical instruments shown in figure 4.1. To meaningfully discuss step size initialization for an adaptive acoustic equalizer, the understanding of LMS convergence properties must be extended to the case of uncorrelated input data. Even though the adaptive step size will be optimized for fast convergence only when a large change occurs in the error surface, and hence the most recent data sample will be largely uncorrelated with the previous data, the tapped delay line will have an input vector which, for order n+1, still shares n components with the previous vector.
Convergence of LMS with Correlated Input
Several mathematicians and engineers have investigated a completely general convergence proof for the LMS algorithm without success. To date no satisfactory proof has been put forward guaranteeing lim k < M' VA. < Rma. The difficulty in a rigorous proof lies in the assumption of strong correlation, i.e. Vt 2!0, E [XtyX +,] > 0. The theoretical work done in this area is mathematically sophisticated. Fortunately, Macchi and Eweda [23] have provided a discussion of the theoretical contributions to convergence problems which are here summarized.
Lyung [24] has demonstrated convergence almost everywhere for W. under the condition that the step size is a decreasing sequence tending to zero. He creates a nondivergence criterion by erecting a suitable barrier on Vk which will always reflect Vk onto a random compact set. However, he has not shown that lim Wk = ".Daniell [25] has shown that k
can be made arbitrarily small by choosing g. sufficiently small. However he must use the assumption of uniformly asymptotically independent observations. In addition he makesthe restrictive assumption that the conditional moments of observations, given past observations, are uniformly bounded. This condition is not satisfied even for the case of
figure 4.1 The autocorrelation functions are given above for several
different instruments [6]. a) organ music (Bach);
b)symphony orchestra (Glasunov); c)cembalo music (Frescobaldi); d) singing with piano accompaniment
(Grieg), and e)speech (male voice).
Gaussian input. Farden [26] has found a bound on by making the reasonable assumption of a decreasing autocorrelation function. He further makes use of reflecting barriers for Vk to keep Vk in a compact set. Macchi and Eweda [23] find a bound on by assuming only blocks of M samples are correlated and all moments are bounded. They do not make use of reflecting boundaries.
The proofs referred to above have assumptions which are representative of acoustic signals. Acoustic signals do generally have decreasing autocorrelation functions, and for a sufficient delay, may be assumed to be uncorrelated. In order to determine the step size which will minimize 1 at the beginning of a quasistationary segment of data, the work of Horowitz and Senne, and Feuer and Weinstein is extended without the assumption of un
correlated input. The mathematical expressions which result are simplified, and conditions on g~ for maximum convergence speed for are found. Simulations are subsequently carried out for different degrees of correlation, step size, and filter order.
Conditions on IL for Convergence of
Adding correlation to the input data significantly complicates the expression of for the LMS algorithm. The approach developed below formulates a recursive expression for Vk in terms of V0. The expression is substituted into the equation for (equation 379). With the assumption that ek* is small, cross terms including ek* can be ignored. This resuilts in a matrix equation for in terms of the initial conditions of the algorithm. The matrix norm is investigated numerically as a function of gi. These results are then compared with large ensemble averages of computed with the LMS algorithm. To begin, recall the expression for the LMS algorithm (equation 386) formulated in the translated coordinate system.
Vk+lI = (I  2JI9XJ) Vk + 2ge*kXk (41)
Recursively formulate Vk in terms of V0 using equation 41.
k (termi1)
Vk =11[I 2 gXk _4Tj] VQ+
jj=k
k Iki (term 2)
j (42)
2ie * klXk_ (tenn 3)
