AUTOMATED DETECTION AND QUANTIFICATION

OF PETIT MAL SEIZURES IN THE ELECTROENCEPHALOGRAM

BY

JOSE CARLOS SANTOS CARVALHO PRINCIPLE

A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL

OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

1979

ACKNOWLEDGEMENTS

The author would like to thank his supervisory

committee for their aid, guidance and criticism during the

preparation of this dissertation. Especially the author

would like to thank Dr. Jack Smith for his willingness to

share facilities, personal contacts and responsibilities.

It was an unconventional but effective learning experience,

which went beyond the restricted field of EEG studies. The

friendly relationship created during these years is deeply

appreciated.

There are many others who contributed their knowledge,

skills and time for this dissertation to become a reality.

The author would like to thank Sonny, Yvonne, Shiv, Nate,

Dan, Alan and Salim for their large, small, direct or

indirect contributions. Special thanks go to Fred for his

constant help-in the data collection phase.

The author also would like to thank the Department of

Electrical Engineering of the University of Florida,

Comissao Permanente INVOTAN (NATO--Portugal), and the

Institute of International Education for the financial

support received throughout his graduate studies.

The author sincerely thanks his family for their sup-

port and confidence, especially his wife for her constant

companionship in spite of the stress created by the long

working hours devoted to the laboratory.

iii

TABLE OF CONTENTS

Page

ACKNOWLEDGEMENTS . . . . . . . . . ii

ABSTRACT . . . . . . . . . . . vi

CHAPTER

I LITERATURE SURVEY. .... . ... 1

Spectral Analysis . . . . . . . 1

Time Domain Approach ......... . 18

Selection of the Method of Seizure

Detection . . . . . . . . 25

II MICROCOMPUTER BASED DIGITAL FILTER DESIGN . 30

Preliminary Considerations . . . . 30

Design Criteria . . . . . . . 35

Filter Transfer Function . .. . . . 38

Lowpass to Bandpass Transformations ... 41

Transformations to the Z Plane . . .. 43

Finite Length Effects of the

Implementation . .. . . . . . 53

Practical Considerations . . . ... 84

III A MODEL FOR PETIT MAL SEIZURES AND ITS

IMPLEMENTATION . . . . . .. 94

Detection Problem . . . . . ... 94

A Petit Mal Seizure Model . . . ... 98

Definition of the Implementation Scheme . 106

System Implementation . . . . . .. .136

Testing of the Program . . . . ... 157

IV SYSTEM EVALUATION AND PRESENTATION OF

RESULTS . . . . . . . . . 165

Description of Data Collection . . . 165

System Evaluation . .. . . . . . 170

Statistics of Seizure Data . . . .. .187

Analysis of the False Detections and

Proposed Improvement in the Detector . 224

Proposed System Utilization and Concluding

Remarks . . . . . . . . . 233

APPENDIX

I CHEBYSHEV FILTER DESIGN . . . .

Chebyshev Polynomials . . . .

Automated Design . . . . . .

Filter Design Program . . . .

II DIGITAL FILTER IMPLEMENTATION . .

Preliminary Considerations . . .

Implementation of the Filter Algorithm

III FLOW CHARTS AND PROGRAM LISTINGS . .

REFERENCES . . . . . . . . .

BIOGRAPHICAL SKETCH . . . . . . .

Page

239

239

242

244

250

250

253

265

333

346

. .

r

Abstract of Dissertation Presented to the

Graduate Council of the University of Florida

in Partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

AUTOMATED DETECTION AND QUANTIFICATION

OF PETIT MAL SEIZURES IN THE ELECTROENCEPHALOGRAM

By

Jose Carlos Santos Carvalho Principe

August 1979

Chairman: Jack R. Smith

Major Department: Electrical Engineering

This dissertation deals with the automated analysis

and quantification of Petit Mal (PM) paroxysms in the human

electroencephalogram (EEG). A petit-mal detector was built

to help neurologists evaluate drug effectiveness in the

treatment of PM epilepsy. The decision of basing the

detection scheme in a microcomputer provided completely new

directions for the design process because of the versatil-

ity and great computational capabilities of the machine.

The problem of nonrepeatability of characteristics in

the analog filters was avoided by replacing them by digital

filters. A microcomputer based digital filter design pro-

cedure was developed, taking into consideration the compu-

tation speed, the filter internal gain and the noise

characteristics of different topologic structures. To

yield sufficient output signal to noise ratio a 16 bit

microcomputer and a 12 bit A/D converter were utilized.

With the procedure, a fourth order bandpass filtering func-

tion can be accomplished in less than 300 ps.

Time domain parameters were selected to arrive at a PM

seizure model which would be applicable to the detection of

classical PM and PM variant epilepsies. The parameters

were translated into electronic quantities and implemented

in a microcomputer.

For the first time in the automated analysis of EEG,

a microcomputer based system was built for the processing

of one channel of data, using a completely digital, real

time, detection scheme. All the programs which comprise

the detector were written in assembly language, and care

was taken to reduce the interactions between program

modules. The top-down program approach was utilized.

The system was tested and evaluated with data from

epileptics collected in the Veterans Administration Hospi-

tal, Neurology Service, from an ongoing drug study. A

telemetry link was used to record the EEG, allowing the

patient to move at will in a 3 x 5 meters room. Six

patients who showed the largest number of seizures were

selected for the evaluation. A total of 70 hours of data

was analyzed.

The agreement in the detections with the human scorer,

for seizures (Sz) greater than three seconds, was 86 per-

cent. If the sorting of seizures in the groups 3 < Sz

vii

< 10 sec and Sz > 10 sec is taken into consideration, the

agreement decreased to 77 percent.

Besides outputting the information about number of

seizures, their duration and time of occurrence, the detec-

tor also presents a detailed analysis of the detection

parameters. These results, the first of their kind ever

reported, include the mean and variance for the PM recruit-

ing period, the half period of the slow waves, the delay

between the slow wave and the spike, and the amplitude of

the filtered spike and slow wave. They show the constancy

of the period measures inter and intra seizures for the

same patient and the higher variability of the amplitude

measures.

Preliminary results correlating pairs of the detection

parameters are also presented for three patients. The

automated detection and quantification of PM seizures,

using the present system, are critically brought to a

focus.

viii

CHAPTER I

LITERATURE SURVEY

Two research methods have been applied to automated

electroencephalogram (EEG) studies: spectral analysis and

time domain analysis. It is the purpose of this chapter to

survey the basic different techniques and assumptions

involved and to present a better picture of the advantages

and limitations of each method. Our attention will be

mainly focused on techniques used or related to detection

of pathological paroxysmal events in the EEG, and the

literature referenced herein denotes this emphasis.

Spectral Analysis

Before addressing this subject, let us first summarize

the general assumptions made about the EEG that validate

the application of the technique. The theory behind the

frequency domain approach using a nonparametric model, as

in conventional power spectral analysis, handles the EEG

signal as a stochastic process. The statistical properties

of the process essentially influence the analytical

results. One major difficulty is exactly the different

opinions about the stationarity and Gaussian behavior of

the EEG process which are prerequisites to carry out spec-

tral analysis and to use the power spectrum as a sufficient

descriptor of the EEG parameters. At the present time this

is a very hot topic where one can read reports that show

the short time stationarity of the EEG for sequences below

20 seconds for a probability of error of 10 percent (Cohen

& Sances, 1977; Kawabata, 1973; Saltzberg, 1972). However,

the work of Elul (1967, 1969) and Dumermuth (1968) points

out the nonstationary behavior of EEG sequences as short as

4 seconds. Another important characteristic, albeit less

addressed, is the Gaussianity of the EEG. The work of Elul

(1967, 1969) and McEwen and Andersen (1975) suggested that

the behavior of the EEG followed a Gaussian distribution

32 to 66 percent of the time depending upon the behavioral

state of the subject (active mental task to relaxed state,

respectively). It also displayed a spatial distribution

with closer Gaussian characteristics in the occipital

leads. Some of the discrepancies that are reported in the

literature can be explained by the different sampling rates

used in the A/D conversion, since oversampling may distort

the local statistical properties of the digital waveforms

(Persson, 1974). Another reason, and maybe more plausible,

just demonstrates the variability of the EEG with subject

and behavioral state.

In practice 'the stationarity requirement can be loos-

ened to stationarity during the observation period, and if

the process departs slightly from Gaussianity, ancillary

parameters like skewness and kurtosis could be employed

along with spectral analysis. If stationarity holds, but

the segments are highly non-Gaussian, higher order statis-

tics as bispectra (Dumermuth et al., 1970) could be needed.

Undoubtedly, the early work on EEG quantification of

Grass and Gibbs (1938), Walter and coworkers (1963, 1966,

1967) brought improvement to the field of electroenceph-

alography, which at the time was faced with amplitude meas-

urements of randomlike activity. Maybe more important yet,

it brought a consistent analysis technique, well established

from other areas with an enormous amount of computational

power. After the introduction of the Fast Fourier Trans-

form (FFT) algorithm in 1965 by Cooley and Tukey, this

power increased manifold. At this point, however, the

means began to obscure the basic constraints of the method.

Let us review then the methods of spectral analysis.

One of the basic results is to obtain the power spectrum,

i.e., an estimate of the mean square value or average

intensity of the EEG as a function of frequency. It dis-

plays the decomposition of the total variance into contri-

bution from the individual frequency bands. To obtain the

power spectrum one can apply Fourier transform to correlo-

grams (indirect method, Blackman and Tukey, 1958), or take

directly the Fourier transform of the data and square its

absolute value--periodograms (direct method, Dumermuth &

Keller, 1973; Matousek, 1973), or use autoregressive tech-

niques (Gersch, 1970; Mathieu, 1970). After the introduction

of the FFT the direct method is by far the most widely used.

To find the periodogram, the FFT of the finite EEG sequence

is taken. The finite time series can be thought of as the

multiplication of the infinite time series X(t) with a

rectangular window function of duration T. In the fre-

quency domain this corresponds to the convolution of the

spectrum of X(t) with a sinf/f which introduces finite

resolution and leakage in the spectrum of X(t) and there-

fore affects any spectral quantity extracted from it. The

finite resolution and leakage are always coupled together

and derive from general relations between the time (dura-

tion) and frequency (bandwidth) domain transformations

(Thomas, 1969). It is only possible to improve one at

expense of. the other. One family of approximation func-

tions that yield the optimum compromise prolatee spheroidal

functions) are quite difficult to implement and are seldom

used (Temes et al., 1973), so more readily available func-

tions are preferred. The best solution is to deal with

each problem separately: use a window (Hammings, Kaiser,

Tukey, Barlett, see Childers & Durling, 1975) to smooth the

data which will improve side lobe rejection; and increase

the record length to improve the resolution. The last

requirement can only be accomplished at the expense of

longer computation times and is directly constrained by the

stationarity of the data. This concept deserves a further

explanation because sometimes the increase of the record

length, by appending zeros at the end of the data record,

is thought to increase the "resolution." As a matter of

fact, the procedure displays more points of the discrete

Fourier transform (DFT), and so a better approximation to

the Z transform is achieved. However, the Z transform

remains the same (as long as the record length is con-

stant), so the initial resolution is kept unchanged.

Another problem encountered is the estimation of the

power spectrum. Let us resume the estimation process. To

compute the essential statistical properties of a Gaussian

random process the more general way is to evaluate the

autocorrelation function by ensemble average. The method

is computationally expensive since various sample sequences

of the process have to be available. If the random process

is ergodic (Lee, 1960; Thomas, 1969), then it is correct to

compute the autocorrelation function by a time average only

over one sequence. There are fewer memory requirements

involved, but as the properties of the random process are

to be inferred from the ones calculated in the sample

sequence, consistent estimators have to be used (i.e.,

estimators that are unbiased and whose variance decreases

with the number of data points). It is very easy to find

consistent estimators for the autocorrelation functions

(Oppenheim and .Shafex:, 1975). The problem arises when the

goal is to estimate the power spectrum. It is not true in

general that the Fourier transform of a consistent esti-

mator of the autocorrelation function (periodogram) will

produce a good estimate of the power spectrum (Barlett,

1953; Jenkins & Watts, 1969). When an infinite number of

points is used to evaluate the periodogram, very rapid

fluctuations are present in the estimate of the spectrum.

This fact shows that increasing the number of data points

of the observation sequence will not change the statistical

properties of the estimator since the individual signifi-

cance of the data points remains the same. To increase the

stability of the spectral estimates, the number of indepen-

dent observations have to be higher or in some way the sta-

tistics of the observation have to be improved. These two

points lead to the two main methods of increasing the sta-

bility of the estimation--the averaging of periodograms

(Barlett, 1953), and the smoothing by means of a window

(Welch, 1967).

Let us describe briefly the two methods. The Barlett

method averages the periodogram over certain frequency

bands. To achieve this the data are divided in k segments;

the periodogram is calculated for each segment; and the

final estimate is the average of the various periodograms.

The procedure leads to a bias estimate, but the variance

decreases with k (overlapped segments can be used at

expenses of lower independence of individual segments and

smaller resolution). The number of points on the power

spectrum is decreased by k.

The Welch method windows segments of the data before

the periodogram is calculated. Generally the triangular

window is used to ensure positive estimates of the power

spectrum. The methodology is the same as for the Barlett

method, and Welch was able to show that the variance also

decreased with k and the reduction on resolution was also

present. Both methods give similar results, so generally

the Barlett method is preferred to save one step (Dumer-

muth, 1968; Matousek & Petersen, 1973; Hagne et al., 1973).

Although the variance of the estimate can be made

arbitrarily small, there is a trade-off for resolution.

Therefore, if an unknown spectrum is going to be estimated,

great care must be exercised not to miss peaks in the power

spectrum and at the same time trust the peaks displayed.

In EEG the modified periodogram as an estimator of the

power spectrum has been widely used. One of the pioneering

works was done at UCLA to investigate the EEG activity of

astronaut candidates by D. O. Walter and coworkers (Walter,

1963; Walter, Rhodes, Brown & Adey, 1966; Walter, Kado,

Rhodes & Adey, 1967; Walter, Rhodes & Adey, 1967). Also,

Dumermuth and coworkers in Switzerland (Dumermuth, Huber,

Kleiner & Gasser, 1970; Dumermuth, 1968) use extensively

this technique. It has been applied in automated sleep

scoring (Walter et al., 1967; Caille, 1967; Rosadini et al.,

1968), age-dependent EEG changes (Hagne et al., 1973;

Matousek & Petersen, 1973), schizophrenia (Giannitrapini &

Kayton, 1974; Etevenon et al., 1976), studies of EEG in

twins (Dumermuth, 1968), studies of EEG background activity

(Walter et al., 1966; Zetterberg, 1969; Gevins et al.,

1975), brain lesions (Walter et al., 1967), and also drug

evaluation (Matejcek & Devos, 1976; KUnkel et al., 1976).

Sometimes the power spectrum is not the only parameter

used. Dumermuth et al. (1972) uses the bispectrum, that

is, the Fourier transform of the second order autocorrela-

tion function R(tl, t2, tl+t2) to investigate coupling

between EEG frequencies. Incidentally, the bispectrum

analysis shows that, unless for the a and 8 frequency

bands, the function is practically zero, as would be the

case for a Gaussian random process. The coherence spectrum

--ratio of the square of the cross spectrum over the spec-

tra of the individual waveforms--is also used to investi-

gate correlation between two EEG channels (Dumermuth et

al., 1972; Walter et al., 1967).

The most bothersome question in the application of the

periodogram as an estimator is the fact that there are no

"best criteria" to determine the combination, window

duration-overlap for a particular data sequence. That is

one of the reasons why other more consistent methods (in

the sense that an error criterion can be defined and leads

to a minimization), as the autoregressive (AR) and auto-

regressive moving average (ARMA), have been recently intro-

duced (Fenwick et al., 1967; Isaksson, 1975; Gersch &

Sharp, 1973). The assumption behind their application is

that the EEG sequence selected is a sample from a station-

ary time series. Therefore, it can be modeled as the out-

put of a linear system (if transients are excluded) with a

white noise input. The output can then be interpreted as

the linear superposition of the natural modes of the system.

The eigenvalues extracted from the parametric time series

model of the EEG are the characteristic frequencies and

their associated damping factors. The damping reflects the

extent to which the EEG will exhibit an oscillatory or ran-

dom appearance. Unlike nonparametric (conventional) spectra

analysis, the relative dominance of the frequencies in the

EEG can be associated with the ordering of the magnitudes

of the eigenvalues.

In the AR model, the linear system transfer function

is approximated by an all-pole function. The two problems

are the determination of the filter poles location and the

order of the approximation polynomial. There are three

main minimization procedures to fit the AR model to the

data: one evaluates the filter coefficients as the maximum

likelihood estimate when the input to the linear system is

white Gaussian noise and the output is the data sequence

available (maximum likelihood method). Another evaluates

the filter coefficients in such a way that the mean square

error between the next sample and the predicted one, given

the data sequence, is minimized (linear prediction). Fi-

nally, the other calculates the filter poles in such a man-

ner that the estimated values of the sequence maximize the

entropy of the autocorrelation function (maximum entropy).

The three methods have been developed under different con-

straints, but recently van den Bos (1971) and Smylie et

al. (1973) proved their equivalence. It is also inter-

esting to note that all the methods arrive at the same

matrix equation involving the autocorrelation function

(Yule Walker equation (YW)). To estimate the filter coef-

ficients, the YW equation can be solved. Then the autocor-

relation functions for various lags have to be computed and

a matrix inversion performed. Algorithms due to Levinson

(1947) or Durbin (1960) are generally preferred to solve it

recursively.

A technique due to Burg (1967) suggests another esti-

mation of the coefficients that does not require prior

estimate of the autocorrelation function. It fits succes-

sively higher order prediction error operators to the input

data by convolving the filter in both forward and backward

direction. The square of the two error series are added to

obtain the error power,and the filter coefficients are

determined by a minimization procedure. A recursion rela-

tion is also available (Burg, 1967, 1972; Anderson, 1974).

Actually the two different techniques (YW method and Burg)

use the data in two different ways. The YW assumes that

the data is zero beyond the limits of the observation win-

dow. The Burg technique extends the estimation of the

autocovariance coefficients beyond the initial number of

points in such a way that the entropy of the autocovariance

function is maximized at each step. As can be expected,

the two techniques have quite different properties. The

estimator by the YW method has the attractive property that

its mean square error is generally smaller than other esti-

mators (Jenkins & Watts, 1969). However, the estimates are

very sensitive to rounding (Box & Jenkins, 1970) mainly for

random processes that display peaking power spectrum, which

means that the resolution obtained is not always enough.

This is to be expected since the YW technique effectively

windows the data. The Burg technique does not display this

shortcoming of lack of resolution. It has been shown

(Pusey, 1975) that it can resolve two tones arbitrarily

close if the S/N ratio is high enough. However, the var-

iance of the estimator is larger than for the YW and does

not decrease to zero monotonically.

The correct identification of the order of the AR

model that approximates the data is vital in the computa-

tion of the power spectrum (Ulrich & Bishop, 1975). The

criterion generally used is Akaike's Final Prediction error

(FPE) that is defined as the mean square prediction error

(Akaike, 1969, 1970). The YW estimate of the order of the

process using this criterion tends to be conservative but

with a small variance. The Burg estimate of the order dis-

plays a large variance, so generally some upper bound must

be imposed in the search. Another approach is to use the

first minimum in the estimated error (Ulrich & Bishop,

1975).

While the AR model is an all-pole approximation to the

random process, the ARMA model is a pole-zero approxima-

tion. Computationally it is the more expensive (in time

and memory dimensions) and more difficult to implement, but

the two techniques for AR computations could be modified

for this more general case (Treitel et al., 1977). How-

ever, as an infinite AR process can approximate any ARMA

process, this property of the AR model is preferred. The

FPE criteria to determine the order of the AR model can

still be used, but there is a tendency to overestimate the

order (Ulrich & Bishop, 1975).

After briefly reviewing the three approximation models

(MA, AR, ARMA) to estimate the power spectrum, a question

must still be answered. From the sequence of the random

- process available how should one decide what model to use?

SThis question (identification problem) has not yet been--

answered in general, and only in some cases the physical

, knowledge of the generation process of the data has helped.

: It is surprising that we have not seen published any work

Sin this direction by the users of these techniques in EEG.

On the contrary, the approximation properties of the AR

Model have been exclusively used. There are reports in the

Literature (Treitel et al., 1977) that clearly show that

this can lead to inappropriate approximations, which means

that there is no single correct technique to calculate the

spectrum in the absence of knowledge about the physics of

generation.

In EEG the autoregressive model has been used by Fen-

wick et al. (1969) to predict evoked responses. Pfurt-

scheller and Haring (1972) used AR models to attempt data

compression. Gersch & Sharp (1973) used AR models for

multivariate EEG analysis. Wennberg & Zetterberg (1971)

used ARMA models to extract alpha, beta, delta, and theta

band intensities for automatic EEG classifications.

Mathieu (1970, 1976) used AR models in sleep scoring.

Jones (1974) compared the window lag estimation with AR

estimator to determine power spectral and coherence func-

tions in the neonate EEGs. Gersch and Yonemoto (1977) com-

pared the AR and ARMA models for the power spectrum estima-

tion of EEG. The same authors (1973) also used parametric

AR models to apply the Shannon-Gelfand-Yaglom amount of

information measure in sleep scoring.

In epilepsy AR models have been used to-analyze the

ictal event. Gersch and Goddard (1970) used the partial

coherence among time series to extract information about

driving. Tharp (1972) and Tharp and Gersch (1975) use a

similar procedure to determine seizure focus. Herolf

(1973) and Lopes da Silva et al. (1975) also use an AR model

to perform seizure detections by inverse filtering.

Another concern faced in spectral analysis is the

presentation of the results. Due to the specific tech-

niques involved, the results need to be further processed

to be readily interpreted by the neurologist. The main

goals are data reduction and readability. The compressed

spectral array of Bickford et al. (1973) has been exten-

sively used and summarizes pretty well the changes of EEG

with some external factors (different behavioral states,

tracking of alpha waves with light stimuli, hemispheric

symmetry, acute slow wave abnormalities). The canonogram

method of Gotman and Gloor (1973) consists of a spatial

arrangement of polygons of different sizes related to

the ratio of (delta + delta)/(alpha + beta) which is

thought to be a good descriptor of localized brain abnor-

malities. It is considered to be a good descriptor for

detection of slow wave abnormalities and asymmetries in the

EEG if a preprocessing for gross artifacts in the raw data

is performed.

The results of optimum filtering have also been applied

to extract certain features from the EEG. One of the ear-

liest techniques was matched filtering (Smith et al., 1969;

Zetterberg, 1973; Saltzberg, 1971). A matched filter is

the filter that maximizes the signal-to-noise ratio, since

it transforms all the available information (energy) of the

input signal in a voltage at a specific time T. The pre-

requisites for its construction or modeling are the a priori

knowledge of the waveform shape and the stationarity of the

background noise. A matched filter is in fact a correla-

tion detector, and a relatively simple way of implementing

digitally one (for a white Gaussian noise) is to select a

wave pattern, store it backwards in memory, and perform a

correlation with the input (template matching). If the

noise is not white, we have to prewhiten it and separate

the implementation into two steps: first, divide the

Fourier transform of the desired wave pattern by the power

spectrum of the noise and second, inverse transform the

resultant spectrum to obtain the template that can be

stored in the computer memory. For EEG, prewhitening is a

must, since the background activity (noise, if a particular

pattern has to be detected) is not white. Matched filter-

ing has been used for spike detection in EEG by Saltzberg

(1972), Zetterberg (1973), and more recently by Barlow and

Sokolov (1975) and Eftang (1975).

Another result borrowed from communication theory is

inverse filtering. The inverse filter is generally applied

to extract information about the arrival times of the indi-

vidual components of a composite waveform (Childers & Dur-

ling, 1975; Robinson, 1967). It is required to know the

shape of the waveform and to make the stationarity assump-

tion on the background noise. The filter is nonrealizable

(the output is theoretically a delta function), and great

care must be taken in the required inversion of the Fourier

transform of the waveform not to divide by zero. Inverse

filtering also deteriorates the S/N ratio, and a compromise

has to be reached between resolution and signal-to-noise

ratio.

In EEG inverse filtering is used in a different way.

Barlow and Dubinsky (1976) use inverse filtering to gener-

ate the impulse responses of bandreject filters (EEG power

spectrum square root individualize band of interest -

subtract from uniform spectrum (with amplitude equal to

maximum) IDFT (to get impulse response)). This procedure

is highly sensitive to truncation effects, and so poor fil-

ters result. Lopes da Silva et al. (1975) use a linear

prediction scheme. He implements the inverse filter of the

all-pole approximation model of the EEG and monitors the

error between the predicted value and the real data points.

Deviations above a prescribed threshold are correlated with

nonstationarities of the input EEG data. Very similar

techniques are used in EEG adaptative segmentation proce-

dures (Bodestein & Praetorious, 1977) which could eventual-

ly lead to transient event detection. A combination of the

nonstationary information of the phasic events and the

sharpness of the spikes was combined by Birkemeier et al.

(1978) to increase the resolution separation of the clusters

of epileptic transients and background activity. The

system requires settings for each patient and works better

in epochs with few spikes (otherwise they may bias the

estimation). It is sensitive to artifacts.

Etevenon et al. (1976) use null-phase inverse filter-

ing to deconvolve the EEG and then detect fast activity.

His method is equivalent to the one just described (Lopes

da Silva), but it is executed in the frequency domain.

From the EEG sequence the power spectrum is calculated

through the periodogram, and after smoothing and whitening

the amplitude spectrum is obtained by evaluating the square

root. Then the inverse of the spectrum is computed, multi-

plied by the Fourier transform of the incoming EEG, and the

inverse FFT taken. This signal accentuates sharp tran-

sients contained in the EEG, and after threshold detection

they can be extracted.

To cope with the EEG nonstationarities, recently there

have been reports on Kalman-filtering (Isaksson & Wennberg,

1976). If the stationarity constraints are removed from

the noise in the ARMA model, it can be shown that the best

fit in the least mean squared sense to the time series is

achieved by the Kalman filter (Kalman, 1960; Isaksson,

1975). So, this filter is a generalization of the Wiener

filter for nonstationary data and is closely related to

autoregressive techniques. It consists basically of the

ARMA fit to the time series, but due to the inclusion of a

feedback loop with variable gain (Kalman gain) the locations

of the poles of the system are allowed to vary. The key

parameter in the design is the setting of the Kalman gain

since it weighs the new value of the residual signal with

its past values and changes accordingly the coefficients of

the all-pole filter. The gain controls then the adapta-

bility of the filter. If the gain is set at zero, the

Kalman filter is just an ARMA fit for the incoming data and

is nonadaptable. If the gain is increased, then we have

adaptable properties and the tracking is improved with the

gain. One important consequence of the adaptability of the

filter is to avoid the smoothing effect present in other

spectral methods. Incidentally, by comparing the autocor-

relation function obtained by any of the stationary spec-

tral methods with the autocorrelation function estimated

from the output of the Kalman filter, some knowledge about

the limits of the record length to preserve stationarity

can be inferred. Isaksson and Wennberg (1976) point out

that about 90 percent of the records of 20 s are stationary

which agrees with some of the previous observations. At

the present time, only Isaksson (1975) and Isaksson and

Wennberg (1976) have used this technique.

Time Domain Approach

If one can relate the degree of quantization to the

maturity of a field of science, electroencephalography is a

newborn. The lack of quantization has two main conse-

quences: the inexistence of objective criteria to charac-

terize the EEG phenomenon that weakens any purely analytical

research method (like spectral analysis); the resort to a

multitude of soft criteria (i.e., not unique), each based

on empirically derived parameter values that try to trans-

late the highly nonlinear way the electroencephalographer

reads the EEG.

Visual analysis is a true time domain method of detec-

tion of structural features like diffuse and localized

change in frequency and voltage pattern, changes in the

topographic distribution and in the interhemispheric sym-

metry, sharp and rhythmic activity, paroxysms and unstable

or irregular time course. The basic patterns and clinical

EEG features are composed on the following step-by-step

procedure: the analysis of the period and. amplitude-

forms the wave concept that is associated with the gradient

to give the simple grapho-element. These can be integrated

in a sequence to produce the complex grapho-elements from

which, using superposition, the notion of pattern is

acquired. The variability in time and hemispheric location

gives rise to the time domain structure and topography of

the EEG. The job of the electroencephalographer is to

individualize these elements and compare them to his

acquired set of "normal activity" in order to make the

diagnosis. Of course, the boundaries are quite fuzzy,

highly subjective, and not always consistent (Woody, 1968;

Rose, 1973).

The techniques are then dependent upon the specific

detection task desired, and no simple overview is possible

besides the translation to electronic terms of the decom-

position process: analysis of the wave's amplitude in cer-

tain frequency ranges, which implies broad bandpass filter-

ing, zero crossing, and threshold analysis. Depending upon

the criteria formulated the next step can be the testing of

the grapho-elements throughout further processing (sharp-

ness or repetition period, for instance) or testing of a

pattern by coincidence logic. What makes the process some-

what erratic is that there is no methodology beyond the

extensive comparison with the results obtained by the

electroencephalographer and accordingly modify the imple-

mentation or the criteria until an agreement is reached.

Due to the diversity of techniques, the detection of

abnormal brain activity (spikes) will be emphasized. One

of the first criteria to detect spikes was the sharpness

(Buckley et al., 1968; Ktonas, 1970; Walter et al., 1973;

Gevins et al., 1975). This was achieved by monitoring the

first and second derivatives of the EEG and comparing them

with a fixed threshold set for each patient. The parameter

was found generally unsatisfactory since differentiation

increases the energy of the incoming signal at high fre-

quencies (i.e., extends the bandwidth) and the detection

system becomes very sensitive to muscle artifacts. The

same basic idea was further improved by Carrie (1972a,

1972b, 1972). He established a moving threshold set by

background activity. Although the system was still sensi-

tive to high frequencies, since they biased the threshold,

he reported better results. Gevins et al. (1975, 1976)

also used the curvature parameter (second derivative), but

to improve the system performance the duration and fre-

quency of occurrence are also introduced in the detection

criteria. The threshold is automatically set for each

patient by an empirical algorithm.

A more complex model of abnormal spikes which included

different slopes for the leading and trailing edges of the

spike, plus a parameter related to the time it takes the

wave to reach maximum slope, was introduced by Ktonas and

Smith (1974), and it seems to describe fairly well spike

activity. Smith (1974) used some of these parameters to

implement a spike detector that gave good agreement without

adjustment of detector thresholds. Basically, it consists

of a set of gated monostables with "on" times related to

the leading edge first derivative (negative excursion),

sharpness of the apex, and trailing edge first derivative

(positive excursion). The system requires fairly high

sampling rates and is somewhat sensitive to the amplitude

of the incoming signal and to muscle artifact, although in

a smaller degree than the systems that computed the second

derivative of the EEG.

Another detection scheme uses the amplitude-frequency

characteristics of the background activity and of the

spikes to define detection boundaries in a plane. It is a

modified scheme of the period amplitude-analysis of Leader

et al. (1967) and Carrie and Frost (1971). Ma et al.

(1977) used it to establish optimum decision boundaries

that turn out to be nonlinear to accommodate the dependence

of amplitude threshold with duration. It is necessary to

use each patient's background activity and an average value

taken from a population of subjects to estimate the bound-

ary (assuming the clusters of normal and abnormal activity

are jointly Gaussian distribution with equal a priori

probabilities). The results seem promising but are depend-

ent on the availability of the sets and are computationally

complicated and expensive (in time and memory dimensions).

Harner and Ostergen (1976) also used a similar approach

called sequential analysis to display the amplitude-

duration of the EEG but preserving the time information.

His methods show the clustering of paroxysmal spike

activity and some patterns as spike and wave. It is

possible to choose between the display of half or full

waves and have a better qualitative view of the rising and

falling phase of spike and wave activity. At this time,

however, the system is only qualitative (Harner, 1973)

since the definition of boundaries for spikes or spike and

wave complexes need medium computers and more resolving

power. One important theoretical defect of amplitude-

duration techniques, still not solved, concerns the mixing

of two frequencies having similar amplitudes.

Pattern recognition was also applied to the detection

of paroxysmal events in EEG (Serafini, 1973; Viglioni,

1974). The theoretical method for the determination of the

relevant parameters by performing data reduction (linear

principal component analysis (Larsen, 1969) and nonlinear

homeomorphic procedures (Shepard, 1966)) have hardly been

applied to the EEG (Gevins et al., 1975). The parameters

chosen are once again few in number and related to visual

analysis: amplitude and the average value of the waves,

number of zero crossings and mean value of zero crossings.

The experimental results show that the criteria are satis-

factory, but it requires a training set, is critically de-

pendent upon the length of the normalization interval, and

requires a fairly large number of decomposition (orthonor-

mal) functions. Following the same line of pattern recogni-

tion Matejcek and Schenk (1974), Remond and Renault (1972),

and Schenk (1974, 1976) applied a vectorial iteration tech-

nique to decompose the EEG. First, the maxima and minima

of the envelope of waveform are estimated, and the higher

order component is taken as the mean.value; the second step

is an iteration procedure and uses the result of the pre-

ceding analysis. The process is repeated until a feature-

less waveform is obtained. With these wave components the

half waves analysis is used to extract further information.

The appealing property of this method is the relatively

small computation involved (compared to the FFT) and the

strict resemblance with the visual (pattern) analysis.

Gotman and Gloor (1976) describe and evaluate a set of

parameters to describe the EEG at the waveform level. The

technique can therefore recognize phasic interictal events

in the EEG. Its main application is to generate informa-

tion about epileptic focus and degree of abnormality of a

particular record. No mention is made on its use in the

detection and quantification of generalized seizures. The

other work reviewed fell in one of the previously described

methods. Vera and Blume (1978) use the derivative method

to analyze on line 16 channels of EEG. Chick et al. (1976)

use a scheme similar to Smith (1974).

As far as petit mal (PM) activity is concerned,

Jestico et al. (1976) use a bandpass filter 2-4 Hz to

detect the slow wave component and measure the duration of

the paroxysm. Kaiser (1976) used the duration of the spike

and of the slow wave monitored at a certain voltage level

to detect the PM activity. In a PDP-12 Ehrenberg and Penry

(1976) used zero crossing information and a measure of

integrated amplitude from a combination of 4 EEG channels

to detect PM activity. The system stores the duration

and time of occurrence of the seizure. The overall agree-

ment, consensus versus machine, is 85 percent. It is, how-

ever, interesting to note that in the paper's extensive

discussion, no mention is made on artifacts that may cause

potential problems like chewing and body movements, which

suggests that the investigators had available rather clean

data. The system is reported to be sensitive to slow wave

sleep. Carrie and Frost (1977) also described a spike and

wave detector. It is composed of a spike detector, using

the sharpness criteria (Carrie, 1972a), an EMG detector and

an information of the amplitude of the background activity

(Carrie, 1972b). One channel is analyzed on line,and the

system stores the duration of the paroxism and its time of

occurrence. The agreement is very high for seizures

greater than 3 sec (85 percent) but drops off to 25 percent

for Sz between 1-3 sec. The patients were reported to have

well-defined PM paroxisms and were free of medication.

From the EEG data available to us it seems that the large

amount of high energy artifacts that resemble the PM pat-

tern when the patients are in an unrestrained environment

dictates the use of more powerful pattern recognition algo-

rithms.

Johnson (1978) implemented in a microcomputer a PM

detector based on the repetition properties of the pattern,

i.e., spikes followed by slow waves. He used two analog

filters (22-45 Hz spike, and 1.5-4 Hz slow wave), followed

by threshold logic to input the basic elements to the

microcomputer memory, where the pattern recognition takes

place. The system had no false detects in selected epochs,

but was very sensitive to irregularities of the pattern in

the middle of the ictal event.

Selection of the Method of Seizure Detection

The requirements put on the analysis method related to

the specific problem of detection of paroxismal events are

the following:

1) The detection shall be performed in real time in

one channel, using a microcomputer.

2) The detector shall have high resolution capabili-

ties and be insensitive to high energy artifacts.

3) It shall also be able to quantify in detail the

paroxisms.

The computation time constraint is probably enough to

make the selected choice of analysis obvious, since FFT

algorithms that run real time in simple microcomputers,

with workable resolutions, are not known to exist. But

even in the affirmative case, it seems that spectral analy-

sis is not tailored to event detection in the EEG because

the spectrum produces a smoothing (leakage/resolution). To

compensate for it, longer sequences and/or special tech-

niques parametricc spectral analysis) need to be used,

which increase the computation time.

After obtaining the spectrum, some type of pattern

recognition must be utilized to judge about the presence of

certain frequency components coupled with the event, which

incidentally must assume a complete knowledge about the

background spectrum and the relations time patterns-

frequency patterns. The question arises, why not work with

the time patterns to begin with?

Another problem is how can the information from fre-

quency analysis be translated to the clinician who sees

time patterns and wants information such as time of occur-

rence (1 sec), duration, amplitude, etc.?

It seems a much more natural choice to use time domain

techniques.

The other technique reviewed is matched filtering,

which is the optimum detector to extract patterns from a

noisy background, when the noise is stationary and the pat-

terns known. The problem with its application to EEG is

the variability of the patterns (around 10 percent,

Smith, 1978) and the drastic change in the power of the

noise (background activity), which deteriorates the detec-

tor performance. Yeo (1975) was able to show that a zero

crossing detector, although not optimum, performed better

when the patterns were allowed to vary. The false alarm

rate of the detector is also independent of the noise

power.

The technique which detects nonstationarities in the

EEG have some drawbacks besides being computationally

expensive. The inverse filtering can not differentiate

among types of nonstationarities. Therefore, a preproces-

sing or other ancillary methodology needs to be developed.

Kalman filtering is quite sensitive to the gain setting.

For high gains small changes can be exaggerated, and so one

is faced with the problem of calibrating the filter, which

is usually done with a control data set (artificially gen-

erated). The direct dependence of the gain on the type of

data, the dependence of the error signal on the power of

the input sequence, and some transient behavior of the

system are shortcomings.

It seems that the popularity of frequency domain tech-

niques in EEG detection stems from the availability of

standard software packages and a fairly well developed (but

often forgotten) mathematical methodology. Up to the pre-

sent, one of the drawbacks of the time domain approach is

the implementation medium (hardware) which requires a full

engineering development, not always accessible to the

clinicians. However, the present innovation trend in

microcomputer systems may very well bring the possibility

of standardization through the software implementation of

detectors, at low price.

To build an electronic detector where some parameters

(amplitude in a prescribed frequency range, zero crossing,

sharpness) have to be monitored and decisions made, the

omissions created by the poor quantization (definitions) of

the EEG process have to be filled. That is the reason why

soft (e.g., nonunique) criteria have to be created and the

result of the detection always compared to the EEG scoring

by the experts.

The attractiveness of the procedure resides in the

small number of assumptions needed on the data and the

simple and fast implementation of most of the parameters.

On the other hand, there is no theoretical analysis backing

it; therefore, the methodology almost exclusively used is

trial and error. The process is long and sensitive to

errors in the choice of the key parameters or misjudgements

of the criteria.

From the time domain techniques known there was some

hesitation between employing the rather well-established

techniques developed by Smith et al. (1975) to analyze

sleep EEG and the ones proposed by Gotman and Gloor (1976)

of implementing definitions in the raw EEG. Since the im-

plementation of a completely digital petit mal detector

around a microcomputer was, per se, a task which involved

quite a few unknown steps, it was decided to use the tech-

nique which was best documented. However, it is expected

that this work will give quantitative bases for comparing

the two different methodologies, with respect to the appro-

priateness of using prefiltering in the detection schemes

of short nonstationarities, like spikes.

Chapter II will be devoted to the design of digital

filters using microcomputers and study in detail the hard-

ware selection of A/D converter number of bits and

computation wordlength to obtain a certain output signal

to noise ratio.

In Chapter III the model for the petit mal activity

utilized in the detection will be presented, along with its

implementation on a microcomputer. The system's testing

will also be explained there.

As this research work is primarily engineering

oriented (a new instrumentation system is designed around a

new model of the petit mal activity), the system's evalua-

tion is presented in Chapter IV. Preliminary data demon-

strating the high resolution capabilities of the detector

and its use in quantifying the petit mal seizure data will

also be presented.

CHAPTER II

MICROCOMPUTER BASED DIGITAL FILTER DESIGN

The purpose of this-chapter is to analyze the trade-

offs in digital filter implementation using microcomputers

and arrive at a comprehensive design procedure. The design

approach proceeds from the study of the digital filter

noise factor to the choice of a microcomputer wordlength

and A/D converter precision, so that a specified output

signal to noise ratio could be obtained. This approach is

quite general and may be valuable even if future technolo-

gies will make practical other implementation media (e.g.,

bit slice microcomputers).

Preliminary Considerations

For quite some time the EEG research group at the

University of Florida has been developing nonlinear methods

for the detection of EEG waveforms. Basically the detec-

tors include bandpass filters followed by zero crossing and

threshold analysis of the filtered data to extract informa-

tion about period, amplitude and number of in-band waves in

the raw EEG. The superiority of this detection method for

EEG activity has been established when the variability of

the patterns is fairly high (greater than 10 percent--

Smith,. 1978). The bandpass filters have up to the

present been implemented in analog form. The remainder of

the pattern recognition algorithm is implemented digitally.

Analog filters have the disadvantage of being very sensi-

tive to changes of the component values produced by envi-

ronment parameters like temperature, humidity, and aging.

Although there are design methods that can minimize the

sensitivity to component changes (e.g., leap frog), they

are more difficult to design since in such structures there

is asubstantial:number of feedback loops and the change of

one filter parameter implies the modification of a large

number of filter components. Another problem to which

analog filters are sensitive is the tolerance in component

values. To design two filters with identical character-

istics some or all of the components have to be hand

matched. However, this procedure only applies to that

particular point in time, since the matched components can

have different aging coefficients and so evolve differently

in the long run.

The identical filter characteristics are stressed here

because, at the present time, a sensitivity analysis which

can be applied to the detection of EEG waveforms is not

available. However, it is known from experience that very

small differences in the filter parameters and/or pattern

criteria produce drastic changes in the detection (Smith,

1979). Therefore, the only way similar detection charac-

teristics can be guaranteed between two systems is to match

every detection stage.

The substitution of the analog filters by digital ones

is also a natural extension of the present instrumentation

since the other functions of the pattern recognition algo-

rithm are already digitally implemented. The repeatibility

and uniformity of characteristics will be therefore

ensured.

The implementation of digital filters could be accom-

plished basically in two different ways: building special

purpose machines to implement the filter algorithm (hard-

ware) or making use of general purpose computers and writing

adequate software to accomplish the filtering function.

The two techniques possess different properties that are

worth comparing. The main advantage of the hardware reali-

zation is speed since the processor's architecture is in-

tentionally adapted to the special type of processing, e.g.,

multiplications and additions (Gold et al., 1971; de Mori

et al., 1975). Generally, the processor is microprogrammed

and the arithmetic unit is implemented in fast ECL logic.

There are some minor modifications to the basic procedure,

the use of Read Only Memories to implement the multipliers

being the most interesting (Peled & Liu, 1974). However,

the development of a special processor is very expensive

and implies the availability of specialized laboratories

and the work of a diversified group of researchers.

On the other hand, the software implementation of

digital filters can be performed by anyone who masters a

computer language as long as the recursion relation is

available. This fact per se has the potential to expand

the field of application of digital filter signal proces-

sing. This approach is also less expensive since the

machine may be a general purpose computer, and so there is

no extra cost involved in the new application. The draw-

back is the computer's fixed architecture which is not

adapted to calculate recursion relations.

For a lot of applications the requirements of the

implementation can be met with mini-computers and even

microcomputers. The principal elements of the requirement

set are speed, type of arithmetic, wordlength. Digital

algorithms work in digital representations of real world

(analog) signals. It is well known (Thomas) that to make

a digital representation of an analog signal unique (i.e.,

the analog signal can be reconstructed again from the

digital samples) a maximum frequency must be assumed in the

signal spectrum. This is the same as saying that the time

domain representation of the signal is restricted to change

at less than a prescribed rate, governed by the time/

frequency inverse relationship. The theorem that relates

the uniqueness of the digital representation and the

sampling frequency is the Nyquist theorem, and it imposes

a lower bound on the sampling frequency (sampling frequency

equal to two fM). This theorem assumes two impossibilities:

first, that the signal spectra are frequency band limited,

and second, that the past and future history of the signals

are completely available for reconstructing the analog

signal. Here we will not discuss the validity of the

hypothesis but will acknowledge the fact that the signal's

digital representation is an approximation. To describe

sampled waveforms in the time domain a sampling frequency

higher than the Nyquist rate should be utilized. There-

fore, for real time processing a sampling frequency between

200 and 300 Hz seems adequate for most applications (Smith,

1979), which means, in the worst case, that the computation

time per sample must be less than 3.34 ms. An analysis of

the clock frequencies and the instruction cycles of today's

microcomputers shows that hundreds of operations can be

performed in this interval. It seems a comfortable margin

when one compares this number with the apparent simplicity

of the recursion algorithm--a few multiplications and addi-

tions. "Apparent" is stressed because we are generally..

led to'think in terms of floating point arithmetic. It is

a good exercise to estimate the enormous number of opera-

tions needed to perform a floating point operation when

using numbers greater than one as sole representations.

It turns out that the computation time becomes prohibitive

if floating point is utilized. There are other types of

arithmetic, like the block floating point (Oppenheim,

1970), which do not seem to bring any advantage for our

application.

The most severe limitation in the use of fixed point

arithmetic is the small dynamic range, since the maximum

representable number is 2b, where b+l is the number of bits

in the processor wordlength. The wordlength is also

coupled to the precision of the representation because the

numbers that represent the constants (and signals) must be

quantized to fit the wordlength. For instance, if an 8 bit

processor is used, the constants are represented only by

two hexadecimal digits.

The choice of software implementation of digital fil-

ters in microcomputers imposes stringent constraints which

must be carefully examined to ensure that the analog input

signal to noise ratio is not degraded. Nevertheless, the

use of microcomputers is thought a good choice due to their

cost, size, availability, and the ease with which the fil-

tering function is actually performed, requiring primarily

software knowledge. In this specific application, as

rather sophisticated pattern recognition algorithms will

be necessary, the microcomputer will also be time shared,

to accomplish these functions.

Design Criteria

The design of digital filters can be broadly divided

in two phases. The first is related with the determination

of the filter algorithm and the second with the implementa-

tion of the recursion relation. Generally they are taken

independently since the type of problems encountered are

quite different. In the design of the filter algorithm a

machine of infinite wordlength is assumed. The goal is to

arrive at a recursion relation that better fits the

filter's frequency (or time) specifications. In the second

phase the objective is to choose filter implementations

which optimize, in some sense, the finite wordlength

effects of any practical processor. The most important

problems of the finite length effects are the finite

dynamic range of the computation, which may cause over-

flows, and the finite precision of the constants, arith-

metic, and input data, which produces error that may

degrade the signal to noise ratio.

It turns out that the two design phases are not com-

pletely independent,as will be shown. The microcomputer

implementation is expected to be very sensitive to the

above-mentioned parameters since the arithmetic must be

fixed point (small dynamic range which increases the proba-

bility of overflow) and the wordlength is relatively small,

giving a heavier weight to the finite effect errors (also

called -roundoff: errors). For this reason it was thought

convenient not to separate the two design phases in order

to have a better perspective of the interactions between a

specific recursion relation and its implementation.

As various routes may be taken for digital filter

design, a criterion to compare different design procedures

(hence different implementations) is necessary. Here

design procedure means selection of appropriate transforma-

tion methods to arrive at the algorithm for the filter

transfer function, with maximum simplicity in terms of

number of operations, providing at the same time good

overflow and noise properties.

To design a filter with a specific output signal to

noise ratio using the minimum requirements of computation

wordlength and processing time, the following factors need

to be analyzed:

1) Filter transfer function

2) Filter internal magnification

3) Noise characteristics of filter topology

4) Processing speed.

The problem as enunciated is relatively different from

the design procedure commonly found in the literature,

since there the optimization is studied with respect to

only one parameter (most generally to sensitivity to round-

off), neglecting any other considerations. It turns out

that the optimum structures possess a much higher number of

multipliers which will mean slower computation time in our

application. Examples are as follows: The synthesis of in-

finite impulse response filters (IIR) with low roundoff

noise based on state space formulations has been presented

by Hwang (1976) and Mullis and Roberts (1976). Their struc-

tures have N2 more multipliers than the canonical structures,

where N is the filter order. Another example will be the

structure proposed by Barnes et al. (1977), which is free

from overflow oscillations, but requires N+l more multi-

pliers than the canonical structures. Multiplier extrac-

tion was also considered (Szczupack & Mitra, 1975),

but the structures obtained lack the optimum sensitivity

characteristics. Another interesting idea is presented in

Chang (1978), where the decrease in the roundoff noise is

accomplished by feeding back the discarded bits, properly

weighed, to the input of the adder following the place

where the product is quantized. This procedure doubles the

multiplier number. Szczupack.andMitra (1978) propose the

reduction of the roundoff noise by ensuring the presence of

zeros in the noise transfer functions. This procedure

leads to zeros that are complex, therefore requiring extra

multipliers for its realization. Many more examples could

be given, but the picture remains unchanged; decrease of

roundoff noise means more multipliers. For a microcomputer

implementation of digital filters multiplication is by far

the most time-consuming operation, and it is therefore

necessary to minimize its use. It seems appropriate to say

that for microcomputer implementation a suboptimal solution

would be the best, weighing the effect of the increase in

computation time and the decrease in the variance of the

output noise. This analysis is not known to exist, so the

procedure chosen utilized canonical structures as well as

slight variations in their topology (adders as a variable).

Filter Transfer Function

The characteristics of the filters presently used for

sleep EEG studies have been obtained through trial and

error. Basically the filters are low Q, second order

bandpass filters (Smith, 1978). The present design uses a

slightly underdamped frequency response. It is hard to

qualify the desirable properties of the filters since there

is no general theory of EEG detection. The only possible

criteria are to understand the filter function in the

detection scheme and hopefully arrive at some rules that

will serve as guidelines for filter selection.

The purpose of using EEG filters in sleep studies is

to attenuate out-of-band activity to enable further signal

processing (e.g., zero-crossing). On the other hand, the

filters shall not mask out-of-band activity so that it will

look like in-band activity for the rest of the processing

algorithm. Every filtering function produces a certain

masking, since the filter output is the convolution of the

filter's impulse response with the input. However, the

weight of the smoothing is controlled by the Q of the fil-

ter, hence the use of low Q filters. At present no other

characteristics of the filters, like phase distortion or

group delay, have been brought into the picture of filter

selection (Smith, 1978). The in-band amplitude character-

istics have also been neglected due to the wide variability

of the amplitude of the EEG. The above-mentioned analysis

refers to EEG sleep studies. For the epilepsy application

the knowledge is still less quantitative due to less expe-

rience. From the study of the petit mal pattern it can be

concluded that the fast attenuation characteristics are of

paramount importance for the slow wave filter (activity as

high as 5 Hz) since the spike principal component can be as

low as 11 Hz, roughly one octave away. Otherwise, the

spike energy will degrade the periodic appearance of the

slow wave filter output, preventing the use of full cycle

detection, which has been considered the best (Smith,

1978).

After this quick analysis it can be said that the

overwhelming filter characteristics are the good out-of-

band attenuation and the low Q. The Q is a factor con-

trolled by the filter parameters and not by the filter

type. Therefore, a "good" EEG filter shall display fast

out-of-band attenuation. With this constraint in mind, the

filter type chosen will be the Chebyshev, since it displays

the optimum out-of-band attenuation rate for a certain

order, when all the zeros are assumed at infinity (Wein-

berg, 1962). Elliptic filters have similar attenuation

characteristics but are more difficult to design due to the

proximity of the poles and zeros. Also for a digital fil-

ter implementation this fact means complex zeros and subse-

quently more multiplications.

To design the digital filters and use the present

experience with analog filters it was decided to accomplish

the design in the S plane and use one of the mapping rules

to the Z domain (impulse invariant or bilinear). It was

thought important to have an automated facility to design

the filters from the frequency domain filter characteris-

tics, i.e., center frequency, bandwidth, ripple in-band and

out-of-band attenuation rates. The use of Chebyshev

polynomial is suited for this goal as shown in Appendix I.

The procedure to determine the Chebyshev lowpass goes

as follows:(Principe et al., 1978):

1. From a given ripple factor (E) in the passband,

the lowpass filter poles can be obtained from (1).

2. To determine n, the polynomial order from the

attenuation in dB/OCT, (3) can be used.

s = -sin h42 sin( 2k+l

+ j cos h42 cos( 1) (1)

1 1

+ + F-- +1+ )n

sin h02 = 1 E

2

(2)

+ 1 + 1 + 1 + 1 + 1

cos h 2 -= 2-

22

log ( 10

n = (3)

log(2 + /3)

Lowpass to Bandpass Transformations

After designing the lowpass filter transfer function,

the bandpass can be obtained using one of the standard

transformations--the narrow-band or the wide-band

(Blinchikoff, 1976). In the narrow-band, the lowpass

filter poles are transformed by the relation

w -w

s = 2 1sILp jw0

BP 2 LP *

where w2, w1 are the upper and lower cut-off frequencies

and w0 the center frequency. For the wide-band the poles

pairs are given by

a>0

W0 [-~a+bj(-8-a)

W [-ca-bj(18+a)]

02 2

A = l---(a )

4

a / A2+B2 +A

-- 2 -

a<0

w0[-Ya+bj (8+a)]

0 [-a-bj -a)

B ay-

B = 2

2

f2

b A2+B2 -A

2

and the lowpass poles ate -ajS. Here the transformation

of the poles is stressed since it will avoid polynomial

factorization to obtain the bandpass digital filter in a

cascade form.

The two transformations possess quite different prop-

erties, the most important for this application being the

different Q of the poles. Looking closely at (4) it can be

seen that the bandpass poles lay on a parallel to the jw

Pair 1

Pair 2

where

axis. As the Q is proportional to the ratio w/a, the higher

frequency poles always possess greater Q. For narrow-band

filters the difference is small, but it can become quite

appreciable for wide-band filters. Hence, the narrow-band

does not transform the time characteristics (overshoot,

group delay) of the lowpass design (Blinchikoff, 1976).

However, it is widely used due to its simplicity.

With the wide-band transformation, the poles of the

bandpass filter lay in a straight line from the origin,

ensuring that the ratio w/a is constant. The impulse

response of the bandpass filter is a time scaled version of

the lowpass. The disadvantage of the wide-band is a more

tedious design procedure. It will be shown later that the

different characteristics of the transformations will be

valuable for the control of the frequency response of the

digital filter and the finite length effects of the imple-

mentation.

Transformations to the Z Plane

Now that the bandpass pole locations are known, it

will be seen how the Z plane pole pairs can be obtained.

Basically there are two different types of mapping rules

from the S to the Z plane. One, the rational transforma-

tions, map the entire jw axis onto the unit circle. There

are several ways this can be accomplished (a series expan-

sion of e t), but the lowest order approximation is the

bilinear transformation given by

2 z-1

s t (6)

T z+l

The particular type of isomorphism (an infinite length line

mapped into a finite line) implies a nonlinear mapping, in

the sense that a Az has different correspondence to a Aw,

depending on the value of z. As a matter of fact, the

transformation follows a tangent rule as can be easily

shown (Oppenheim & Shafer, 1975). Therefore, to get the

correct frequency characteristics in the Z domain, the S

plane filter critical parameters (center frequency, band-

width) have to be properly prewarped (Childers & Durling,

1975). Nevertheless, a piecewise linear response is mapped

onto a piecewise linear response, preserving the boundaries.

It is important to point out that the prewarping does not

change the nonlinear properties of the transformation, but

compensates for it. So a distortion of the phase response

of the filter will always be present.

One of the most interesting properties of the bilinear

transform is the absence of aliasing. The theoretical

explanation is very easily understood. As the entire jw

axis is mapped onto the unit circle, the folding frequency

(z=-l) corresponds to w=-, and the attenuation of any

realizable network shall also be infinitely large. How-

ever, from the Z domain point of view the reasoning does

not seem explanatory since the Z axis is the same whether

the filter is mapped using a rational or a nonrational

transformation. In the particular case of a filter, the

responses must "peak" up at the same place in frequency to

make any designs useful, and the response is repeated in-

definitely at fs intervals (no matter what jw frequency

corresponds to fs/2). The answer must be found in the ele-

ments that constitute the network in the Z domain and which

are dependent upon the transformation used. In the case of

the bilinear, there is always at least one zero mapped at

z=-l, which sets at this point the amplitude of the fre-

quency response at zero. So the zero is responsible for the

good attenuation properties of the bilinear transform near

the folding frequency. This observation is worth noting

since it will be useful in the design using nonrational

mappings.

The best known nonrational transformation is the impul-

sive invariant (Oppenheim & Shafer, 1975). There are a few

slight modifications, and the direct substitution method

will be employed here (Childers & Durling, 1975), where

(s-a.) (z-e-iT). (7)

1

ai are the S plane filter poles. Due to the periodicity of

the exponential function, the transformation is not one to

one. The net effect is that each strip of width ws in the

S plane will be overlayed onto the unit circle; i.e., the

points wl+Kws (K=1,2,...) will all be mapped into the same

point as w1. Mathematically, the frequency response of the

digital filter H(ejw) is related to the frequency response

of the analog filter (Oppenheim & Shafer, 1975) as

H(ejw) = H (j+j L k). (8)

T a T T

Ha(s) is the Laplace transform of the filter impulse

response and T the sampling interval. It is clear that

only if H(w) is bandlimited to ws/2, (8) will be a reason-

able description. This fact excludes the use of the direct

substitution to design high pass filters and in general any

function which will display appreciable energy at half the

sampling frequency. Anyway it can be expected that the

attenuation characteristics of the high frequency end of

the filters designed with the straight direct substitution

method will be worse than if the bilinear was employed.

For the case of bandpass filters, the zeros of the S plane

transfer function, if not at s=-, can fold back in the

bandpass, ruining the design. Therefore, it is a good

design procedure not to map the zeros of the S plane trans-

fer function using this transformation.

It was decided to place the zeros independently in the

Z domain. As the filter is a bandpass,at least one zero

must be placed at z=l to block the d.c. gain. Taking into

consideration the preceding discussion of the effect of the

zero at z=-l, it was decided to place the other zero(s) of

the transfer function at z=-l to obtain the good attenua-

tion characteristics of the bilinear transform. The other

convenient properties of the zeros at z=l are the

implementation with simple adders (or subtractors) which

are fast and do not contribute any roundoff errors.

It may be expected that the Z domain filter character-

istics will not display constant gain across the passband,

since a linear relation was used to map the poles and now a

zero is being placed at w=ws/2. To compensate for this

fact, one technique places a different number of zeros at

z=l, according to the relation (Childers & Durling, 1975)

l+cos wT

k = (9)

1-cos wcT

where wc is filter center frequency.

Bearing in mind the discussion of the lowpass to band-

pass transformations, it can be expected that the narrow-

band will compensate in part the asymmetry since the high

frequency pole has already a higher Q than the low fre-

quency one. It turns out that the narrowband transforma-

tion used with the direct substitution requires one zero at

each of the locations. The number of zeros required for

the other combinations (i.e., direct substitution with

wideband transformation and bilinear) is shown in Table I

for second order filters. From the results obtained with

the EEG filters (Principe et al., 1979) it can also be con-

cluded that the sensitivity of the passband gain to differ-

ent location of the filter center frequency is much lower

than (9) predicts.

TABLE I

REQUIRED NUMBER OF ZEROS

Z =1 Z = -1

Direct substitution 1

narrow-band

Direct substitution 2 1

wide-band

Bilinear 2 2

Another conclusion of the analysis which is very

important regards the filter passband gain. In digital

filter design the control of the filter midband gain is

generally accomplished by properly scaling the filter

transfer function. It is suggested that the correct

handling of the Z plane filter design may constitute an

alternate way to control the midband gain, at least in some

cases. An example given in (Principe et al., 1979) is

worth mentioning. For the delta filter (0.1-3 Hz,

fs=100 Hz), the midband gain was decreased from 878 for the

filter designed with the narrowband direct substitution to

87 just by using the wideband transformation (both fre-

quency responses met the specifications).

Let us analyze in general the effect of the zeros at

z=l in the transfer function. For the zero at z=l the

numerator takes the form

z-11 = lejWT-1 = 4 sin2 T (10)

and for the zero at z=-l

Iz+l1 = 4 cos2 wT (11)

2-.

It is concluded that the effect of the zero at z=-l has the

symmetric effect of the zero at z=l, with respect to fs/2,

so whatever conclusion is reached for one will apply for

the other in the symmetric region. Fig. la shows the

attenuation characteristics of EQ(10),(11). Now suppose

4 sin2 w

2

Tr/2 wT

Fig. la. Frequency Response of One Zero

4 wT

16 cos wT

2

16 sin4 WT

7r/2

rr wT

Fig. lb. Frequency Response of Two Zeros

that instead of one zero there are two. The numerator

becomes (Fig. Ib)

2 4 wT

Iz-1 = 16 sin -. (12)

The region of the unit circle where the two zeros give

smaller magnification is

4 wT 2 wT

16 sin -- < 4 sin -

or

wT < arc sin 1 wT < 600. (13)

2 2

It can be said that the wide-band transformation will

yield smaller magnification transfer functions when the

filter passband falls in the region 00

experimentally verified (Balakrishnan, 1979). The narrow-

band transformation shall be used to design filters which

have passband between 600

used for filters 1200

ters fall in the range 0

essentially excluded from the study. Of course, these

results must be taken as approximations since for each

transformation the pole Q's are slightly different.

The design procedure just described has been imple-

mented in a Fortran IV program, presented in Appendix I.

The inputs are the frequency domain specifications of

center frequency, bandwidth, ripple in-band and out-of-

band attenuation rates. The program chooses the filter

order to meet the specifications and gives the values of

the S plane bandpass filter poles, the z domain filter

poles and the quadratic factors C and D for each second

order resonator given by (a cascade implementation is

assumed)

C = 2R(zi)

(14)

D =/R (i)+I(zi).

E i m i

The number of zeros at each of the points (z=l) is

printed. The frequency response of the filter is also

plotted for a quick check on the design.

The above procedures design infinite impulse response

(IIR) filters. For EEG applications, where the attenuation

characteristics seem to be more important than the linear

phase characteristics, the IIR seem to be preferable to the

finite impulse response filters (FIR). However, it is

worth mentioning that the FIR can be designed to present a

linear phase across the passband, which may be important

for applications where the phase information of the input

will be an important parameter. Another application which

may call for FIR is the situation where multi-sampling

rates will be used.

One of the great problems in digital filter design, as

mentioned earlier, is the lack of independent midband gain

control. Therefore, when a narrow-band filter needs to be

designed, the gain shall be expected to be high. To

decrease it, scaling can be utilized at expenses of dynamic

range, or for some cases an alternate transformation can

give smaller gains. The other possibility is to decrease

the sampling frequency, making the filter appear wider and

therefore lowering the Q. Although for certain areas like

speech processing this is not of great practical value, for

EEG it is a viable procedure, since the EEG rhythms can be

considered not coupled. The penalty paid is higher compu-

tation time as a digital lowpass filtering shall be per-

formed before the reduction of the sampling frequency to

avoid aliasing. Peled (1976) has a detailed analysis of

the procedure including setting of lowpass corner frequency

and new sampling frequency.

The rationale for using lowpass FIR filters is the

following: If decimation is desired (lower sampling fre-

quency at the output), only the samples which will be used

need to be computed. In the case of FIR filters, where the

output is only a function of the past values of the input,

this can be accomplished easily. For IIR filters the

present output is a function of past values of the input

plus of the output, which means that every point must be

computed. The threshold where the FIR filter is prefer-

able to the IIR is given by the ratio of decimation and the

difference in filter order to keep similar attenuation

characteristics.

An automated procedure to design IIR bandpass

Chebyshev filters has been presented. It was shown that

it is possible to design bandpass filters using the direct

substitution to map the filter poles and a technique

derived from the bilinear to map the filter zeros. The

relations between the design and the implementation are

apparent in the choice of integer zeros (to cut computa-

tion time and roundoff noise) and in the choice of the

transformation, which may help controlling the midband

filter magnification.

Finite Length Effects of the Implementation

After obtaining the filter transfer function in the

Z domain, the filter algorithm is the inverse Z transform

of H(z). For a second order function of the special type

which will be used here, H(z) can be written as in (15) and

y(nT) as in (16).

(z+l)

H(z) = (z+l) (15)

(z2-Cz+D)

Yn = Cyn-l-Dyn-2+Xn-Xn-1 (16)

The filtering will then be accomplished by computing (16)

in the microcomputer.

If an infinite wordlength machine is assumed, the

implementation problem is nonexistent. Practically finite

wordlengths must be used leading to the fact that almost

every digital filter is nonlinear. For this reason the

output of the digital filter deviates from what is actually

desired. This leads to the study of various ways of

computing the filter recursion relation. As was said in

the introduction for EEG applications, fourth order analog

bandpass filters were found satisfactory. Following the

basic rules explained in the previous section the transfer

function for the class of EEG filters can be written as

H(z) = (z-l)(z+)q (17)

2 2

(z -C1z+D1)(z -C2z+D2)

where

17p, q72.

There are three basic ways to implement (17). One is

called the direct realization and implements the fourth

order polynomial directly. The high sensitivity of this

realization to the finite length effects of the implementa-

tion is well known (Gold & Rader, 1969). The other two are

the cascade and the parallel form. For filters with the

zeros on the unit circle, the cascade form can be imple-

mented with fewer multipliers (Childers & Durling, 1975).

Therefore, it will be the only one studied. The variables

which condition the implementation are the type of arithme-

tic (fixed or floating point), the number representation

(sign magnitude, two's complement and one's complement),

and the quantization methods (truncation or rounding).

For fixed point arithmetic, the implementation is

based on the assumption that the location of the binary

point is fixed, so numbers must be aligned before

performing the additions. If two fixed point numbers are

added, overflow can occur (Gold & Rader, 1969), and as

there is a linear relation between dynamic range and word-

length, the probability of overflow is large. For the

fixed point multiplication, overflow can never occur.

However, quantization of the true value shall be generally

made, since the product of two b-bits numbers is two

b-bits long and it is general practice to use a constant

wordlength throughout the filter calculations. For float-

ing point representation a positive number F is represented

by F=2CM, where M, the mantissa, is a fraction, and c, the

characteristic, can be either positive or negative.

Quantization of the mantissa is necessary for both the

addition and multiplication. Overflow is also theoreti-

cally possible, but as there is an exponential relation

between wordlength and dynamic range, it can be essentially

excluded. It is therefore assumed that floating point is

overflow free.

Truncation and rounding are the two forms of quantiza-

tion. The quantization error is defined by

e = x-Q(x) (18)

where Q(x) is the portion of x, the input signal, retained.

The quantization error depends upon the type of number

representation used as shown in Fig. 2. In a microcomputer

implementation, the two's complement is the natural choice

because it is employed in the processor's arithmetic unit.

ROUNDING

Q(x)

A=2-b

x

-A/2

PR(e)

1/A

A/2

TRUNCATION

Q (x)

/ /

/ *

-b

A=2-b where b+1 is register

length.

Fig. 2. Quantization Error

(e)

Se

Therefore, the following analysis will be restricted to

two's complement truncation and rounding. For rounding the

error is bounded by (Oppenheim & Shafer, 1975)

-b- -b

-2-b 2-b (19)

while for truncation it is

-2 -b+l0

(20)

0Ze<2-b+l x<0

where b+l is the number of bits of the wordlength. The

truncation error is bigger than the roundoff error, but

filters realized with truncation need fewer operations, as

truncation is readily done in the microcomputer.

Filter Internal Magnification

Fixed point arithmetic must be chosen for the imple-

mentation of digital filters in today's microcomputers due

to computation time constraints. Therefore, one of the

serious problems in the implementation phase is the possi-

bility of overflow in the additions. Overflow causes gross

errors in the filtered output and even the possibility of

sustained oscillations (Ebert et al., 1969).

An important consideration in the design is to ensure

that when an overflow, produced by the input, occurs, the

filter will recover in a short time. When the recovery is

short, overflow propagation is limited and the filter is

said to have a stable forced response (Claasen et al.,

1976). It can be proved that the conditions to guarantee

stability of the forced response are equivalent to the zero

input stability of the system; i.e., the filter poles must

lay in a specific region inside the unit circle. For the

special case of second order filters there is a result that

can be easily determined and will ensure filters with no

overflow propagation. Consider the second order filter

(16) with zero input

Yn = Cyn-l-DYn-2. (21)

The roots of the filter must lay inside the unit circle,

which implies that the values C and D must lay inside the

triangle shown in Fig. 3b. Ignoring roundoff error and

assuming two's complement arithmetic, the output of the

filter subject to overflow will be

Yn = f(Cyn--Dn-2)

where f( ) is a sawtooth waveform (the characteristic of

two's complement arithmetic shown in Fig. 3a).

It is clear that if

ICyn-1-Dyn_21<1 (22)

then yn will be correctly interpreted. So a sufficient

condition to have filters with no limit cycles (nor

propagation of overflow) is

60

2b-l

b-1 x

Fig. 3a. Overflow Characteristics of Two's Complement

D

1

\1

/7 I I I I I \C

Fig. 3b. No Overflow Propagation Region

Icl+JID <1 (23)

which is shown in Fig. 3b as the square hatched. This con-

dition is very restrictive for most applications; therefore,

most of the practical designs will display the undesirable

effect of overflow propagation.

Since the overflow is a nonlinearity in the system, it

can be expected that various implementations (different

ways of calculating the recursion relation) will display

different overflow properties. The parameters one can

choose to control the overflow are special type of adders,

scaling, and exploring topologic differences in the struc-

tures.

When the result of an addition is bigger than the

register length in two's complement representation, there

is an abrupt change in sign. It has been shown (Fettweis &

Meerkitter, 1972) that at least for the class of wave

digital filters, saturation arithmetic allows for the

absence of parasitic oscillations. Here saturation arith-

metic means the type that holds the signal at maximum level

when an overflow occurs. This implies that overflow must

be monitored. The relations between the type of saturation

arithmetic, distortions at the output and stability are a

present area of research (Claasen et al., 1976). No general

results are available.

Scaling the signal level is the most general way to

handle overflow. To prevent adder overflow the input, xn,

must be scaled so that the output, yn<1, for all n. There

are a number of approaches that can be considered to accom-

plish this (Oppenheim &'.Shafer, 1975).

As

Yn = hkXn-k (24)

k=0

where hk is the filter response, it is easy to see that

lynl I Ih kllxn-kl. Z (maxlxkl) I hkl. (25)

k=0 k=0

Therefore, by making maxlxkl = 1/ [ jhkl, overflow of Yn can

k=0

be prevented. This means on the other hand that the zero

forming paths of the filter must be multiplied by

k = 1/ |hk Experiments have shown that this scaling

k=0

policy is too pessimistic and does not allow good use of

the full dynamic range of the wordlength.

A second approach would be to assume the input sinus-

oidal, with a frequency equal to the resonant frequency(es)

of the filter. This is equivalent to requiring (H( ) is

the filter's frequency response)

IH(eJX) I 1 -r X i (26)

and is accomplished by multiplying the coefficients by k,

the scaling constant, so that

a -+a J+a2e-2J

max k 1 27)

-jx -2jA

l+ble +b2e

where a0, al, a2, bl, b2 are the coefficients of the second

order filter. This scaling policy is sometimes too opti-

mistic, that is to say, it may lead to overflow. A third

approach uses the frequency domain relation Y(z)=H(z)X(z).

2

The energy of the output signal is C yk, which by

k=0

Parsevals relation means

2 1 2x je j jI

k 2 20 Y(e )Y(e- )dX. (28)

k=0

Using Schwartz inequality

S2 2r 2r jX

k Yk (f H(e )jH(e -)dX( 27 X(ej3)X(e )dX)

S2 2r ) -jX

SXkf0 H(e )H(e )dX. (29)

k=0

Therefore, scaling constants K may be chosen satisfying

Sk 2H(e l)H(e )dX z 1. (30)

Again, in some applications this policy would be too opti-

mistic (Gold & Rader, 1969). There is available a general

theory, which enables signal modelling by constraining the

input norms (Jackson, 1970). For the EEG analysis a

1/f type of spectrum may be an adequate model. However,

this study was not pursued due to two main reasons. First,

the modelling is subject to some gross approximations,

mainly for abnormal EEG. Second, no matter what methodol-

ogy is used, scaling involves the basic approach of in-

creasing the number of multipliers. In our case, where

implementation of bandpass transfer functions were obtained

without multipliers in the zero forming paths, scaling

means increasing tremendously the computation time, as

will be shown quantitatively later.

It was decided to allow for a controlled increase in

the computation wordlength instead of scaling the filter

transfer function. Hence, the internal magnification

analysis serves only as an indication of how many bits

shall be allocated above the input wordlength to avoid

overflow (each power of two corresponds to one bit).

Sinusoidal analysis was employed to obtain the information

of the wordlength increase, due to its simplicity. It

consisted of calculating, using a computer simulation,

the maximum value of the transfer functions from the input

to the output of each node in the structure.

It has been shown that scaling.uses most efficiently

the dynamic range of the wordlength for a particular struc-

ture (Jackson, 1970). However, scaling is essentially a

method of controlling the internal magnification. If a

certain .filter needs to be built, in a specific machine and

with particular structure, scaling just enables its imple-

mentation with a good use of dynamic range. Sometimes this

methodology may lead to impractical filters, mainly when

microcomputers are used as the implementation medium (DeWar

et al., 1978). This motivated the search for structures

which would display reduced magnifications. Let us take as

an example the canonical structure shown in Fig. 4a and

also the structure called feedforward of Fig. 4b. They are

the only ones which will be compared in this work. Their

big advantage from the microcomputer implementation point

of view is the minimum (canonic) number of multipliers.

The internal magnification in node 1, Gl, is the one which

will constrain the choice of the wordlength. As is also

known, the zeros may counteract the magnification effect of

the.poles. In the canonical structure, as the feedback sig-

nals are always taken before the zeros are implemented,

G1 is independent of the zero placement and number. On the

other hand, in the feedforward structure, one of the zero

forming paths is introduced before the feedback signals are

taken, allowing for decreased gains when the filter poles

are close to the zeros, i.e., in our application when the

filters are low frequency (zero at z=l) or close to fs/2

Fig. 4a. Canonic Structure

Fig. 4b. Feedforward Structure

(zero at z=-l). This fact has been experimentally noticed

in (Mick, 1975). It may also give a."rule of thumb" for

the pairing of zeros and poles in the feedforward struc-

ture: the zero closer to filter passband shall be imple-

mented first for reduced magnification. The goal is to

bound the maximum magnification by the input-output filter

gain. When this is possible, both structures present the

same wordlength requirements. However, they may display

quite different noise power at the output. Fig's. 5a and b

present the structures for fourth order filters designed

with the direct substitution, narrow-band, wide-band, and

the feedforward implementation. The implementation with

the canonic form is pictured in Fig. 5c.

Noise Analysis

The effect of truncation (rounding) is apparent in

three main areas: the A/D conversion, the multiplications,

and the coefficient quantization. As the phenomenon has

the same basic characteristics, its modelling will be over-

viewed first.

Quantization is a nonlinear operation on the signals,

but can be modelled, as a superposition on the signal of a

set of noise samples shown in Fig. 6 (Rabiner & Gold,

1975). The quantized samples will be expressed by

x(n) = Q[x(n)] = x(n)+e(n) (31)

where x(n) is the exact sample and e(n) the quantization

G1

1

G5

Fig. 5a. 4th Order Filter. Feedforward Implementaton.

2r G

3,

FT F 7~1I.1

e3

14

Fig. 5b. 4th Order Filter. Canonic Implementation.

0

G

0

4th Order Filter. Feedforward Implementation (Wide Design).

Fig. 5c.

G,------.- 5_

Fig. 5d. 4th Order Filter. Canonic Form (ID) with Scaling.

e(n)

Fig. 6. Additive Model for Quantization

Xn=x (t)

x (n) =x (n) +e (n)

error. It is common to make the following assumptions on

the additive noise process (Gold & Rader, 1969):

1. The sequence of error samples e(n) is a sample

sequence of a stationary random process.

2. The error sequence is uncorrelated with the

sequence of exact samples x(n).

3. The random variables of the error process are

uncorrelated; i.e., the error is a white noise

process.

4. The probability distribution of the error process

is uniform over the range of quantization error.

It is worth pointing out that the above-mentioned

hypotheses are approximations that do fail for special

types of signals, the most important being the class of

constant inputs. But when the sequence x(n) is a complex

signal and the quantization step is small, it gives good

results. Therefore, the model presented in Fig. 6 will be

used for most of the quantization effects in the filter

implementation.

The assumption that the error is signal independent is

a good approximation for rounding, but for truncation it is

clearly not true, since the sign of the error is always

opposite to the sign of the signal (20). However, it is

easily shown (Oppenheim & Shafer, 1975) that this effect

can be incorporated in the mean of the noise process. For

truncation, the mean value me and the variance a2 are

given by

given by

-b

-2

me =

(32)

-2b

72 = 2

e 12

and for rounding by

me = 0

(33)

-2b

2 2

e 12

The probability density function for the error process is

shown in Fig. 2. The autocovariance sequence of the error

is assumed to be

Yee(n) = 26(n) (34)

for both rounding and truncation. The ratio of signal

power to noise power is a useful measure of the relative

strengths of the signal and the noise. For rounding or

truncation the signal to noise ratio is

2 2

a2 a2 2b 2

s = _s = (12.2 )o (35)

2 -2b s

ae 2 2b/12

or when expressed with a logarithmic scale

2

aC 2

SNR = 10 log 1- ) = 6.02b+10.29-10 log0 (a). (36)

ae

When the signals are properly scaled to avoid clipping,

(36) can be written as (Oppenheim & Shafer, 1975)

SNR = 6b-1.24 dB = 6b (37)

which shows clearly the interrelationship between dynamic

range and quantization error in any signal processing

algorithm. For each bit added to the wordlength, the sig-

nal to noise ratio improves approximately 6 dB.

It was stated earlier that the class of constant

inputs needs a special modelling. One case that deserves

studying is the zero input condition, since it may arise

often in real applications. Theoretically, if x(n) is zero

in (16), y(n) will then converge to zero if the filter is

assumed stable. It has been observed, however, that in the

actual system the output does not always display the theo-

retical behavior. All components of the filter can only

attain a finite number of values since they are quantized.

This makes the real filter a finite state machine. An im-

mediate consequence of this is that if the filter output

does not converge to zero with zero input, it must become

periodic after some finite time. The periodic oscillation

is referred to as limit cycle or zero input limit cycle

(Rabiner, 1972) and is the prototype of correlation between

noise samples. The assumptions underlying Fig. 6 are then

not applicable, and the best method for investigating zero-

input stability is the second method of Lyapunov (Liu,

1971). The general analysis is very difficult to carry

out, and here only the results will be presented for the

case of two's complement truncation and the canonical

second order structure. Parker and Hess (1971) showed that

the occurrence of limit cycles can occur in the horizon-

tally hatched region of the triangle shown in Fig. 7. The

cross hatched region is the stability region (Claasen, et al.,

1976). Simulations on digital computers have shown that no

limit cycles occur in the remaining region inside the tri-

angle. Probably more important is to have an idea on the

bound of the amplitude of the limit cycle. The absolute

bound on the amplitude of a limit cycle of any frequency is

given by

1

1-C l+l D

max < (1-D)-

S(max 1-/1

(l-D) /1-C/4D

D<0 or D>0 and 2/DV5CI

1

D>0 and D(2//D-)l)jCj2/VD

1

D>0 and ICI-2D(2/D-1)2

(38)

for complex roots (Peled, 1976). This bound is too pessi-

mistic, and so the RMS bound is usually preferred (Sand-

berg & Kaiser,::1972).. It is given by

1

1- C +D

( yi2) "

(l-D)/-C2/4D

D7O or D>0 and

JCI 4D2/(l+D)

D>0 and ICI<4D2/(l+D)

(39)

Fig. 7. Stability Region for Second Order Filter

10

1

10-1

0

o o

0 0

0

10 12 14 16 18

Wordlength

Fig. 8. Quantization Error Versus Wordlength Cascade Form

(after Rabiner and Gold)

I I I

where p is the period of the oscillation. A few points

about limit cycles are worth noting. Filters implemented

with rounding possess much smaller stability regions than

when implemented with truncation. Filters using floating

point arithmetic may possess large amplitude limit cycles.

In fixed point arithmetic the limit cycle amplitude is

generally small (within the noise floor of the filter).

It is a general rule that when the amplitude is high, the

source of the oscillation is caused by overflow nonlinear-

ities (Claasen et al., 1976). Each implementation of the

same second order structure will display its own stability

region, the one presented being just a guideline.

As the study of zero input limit cycles is very easy

to accomplish with the real filter, further discussion will

be postponed to the implementation phase.

Coefficient quantization. Usually the coefficients of

a digital filter are obtained by a design procedure which

assumes very large precision. For practical realizations

the coefficients must be quantized to a fixed, small, num-

ber of bits. As a consequence, the frequency response of

the actual filter deviates from the ideal frequency

response. Actually this effect is similar to the effect of

tolerances in the analog components.

There are two general approaches to the analysis. The

first treats coefficient quantization errors as intrinsi-

cally statistical quantities, and the effects of the

quantization can be modelled as a stray transfer function

in parallel with the corresponding filter (Rabiner & Gold,

1975). The second approach is to study each individual

filter separately and optimize the pole locations by com-

paring the transfer functions of the filter realized with

high coefficient precision and the one obtained with the

actual values. Rabiner and Gold present experimental

results for the error as a function of the wordlength.

The table for cascade realization is shown in Fig. 8. For

filters realized as a cascade of second order structures,

the normalized error variance is below 10-2 for wordlengths

of 16 bits. This is considered a very good practical

agreement. The degradation is very steep towards shorter

wordlengths, the variance being 8 to 10 bit coefficients.

Therefore, 16 bit coefficients are thought necessary for

practical designs.. The extrapolation of their results for

any filter realized as a cascade of second order sections

is not adequate, since the grid of possible pole locations

inside the unit circle is not constant. The grid is

defined by the intersections of concentric circles corre-

sponding to the quantization of r2 and straight lines cor-

responding to quantization of r cos 8 (standard complex

notation is used). The density of pole locations is less

in regions close to the real Z axis as can be shown in the

following way. Consider a filter with pole pair (L,K)

where

80

(40)

-1 k

8 = wT = cos- .

R 2v'

Assuming the errors small

Ar r AL + r Ak

(41)

Ae = AL + h Ak

and substituting (40) yields

Ar = 1 AL

2r

(42)

S AL Ak

2r2tan 8 2r sin 8

Since Ae=tAwR, the error sensitivity is directly propor-

tional to the sampling rate. Furthermore, the second equa-

tion shows that error in angle 6 is greater when 8 is small

(poles with small imaginary part). This point shall be

carefully noted since there is a tendency to believe that

higher sampling frequencies merely cause the digital filter

to behave more like the corresponding analog filter. This

is true only if infinite precision in the computations is

assumed. For digital filters, it is therefore convenient

to define the wide-band concept in the z domain, since a

wide-band analog filter can correspond to a narrow-band

digital filter, if the sampling frequency fs utilized is

higher than the Nyquist rate. Also, the implementation of

bandpass filters with low center frequencies are the ones

which are most sensitive to quantization effects and so

should be the ones closely monitored. Practically this can

be accomplished by checking the frequency response of the

filters obtained with the truncated coefficient values to

see if they meet the.design specifications...

A/D conversion noise. The A/D conversion noise is the

uncertainty derived from the use of finite precision to

represent the analog signals. The assumption of independ-

ent signal and noise allows one to proceed with the noise

computations while ignoring the signal. Let h(nT) be the

system impulse response and H(z) the corresponding transfer

function. Then the output autocorrelation function R (nT)

for the noise e(nT) is

Rl(k) = i y(n-k)y(n)

n=0

CO CO CO

= [ h(p)e(n-k-p) I h(m)e(n-m)

n=0 p=0 m=0

= h(p)h(m)Rx(k-m-p). (43)

pm

The variance is obtained for k=0; therefore

2 = R (0) = [ h(p)h(m)Rx(m-p). (44)

y y pm

Using the property of the noise autocovariance function

expressed by (34)

a2 = a2 h2(m) (45)

y em=o

where a is given by (33). Using Parseval's relation (44)

e

becomes

2 = 02 h2(m) = 2 -- H(z)H(z-l)z-ldz (46)

y em=0 e27Tj c

where c is taken as the unit circle.

This expression is often easier to compute than the

infinite summation and will be extensively used.

The transfer function for the signal and A/D conver-

sion noise are the same. Since bandpass filters will be

implemented, and the frequency spectrum of the input noise

is white, the bandwidth of the output noise is narrower at

the output, yielding effectively a2
out in

A/D conversion noise is not the only source of noise intro-

duced in the computations. As the goal is to determine the

signal to noise ratio at the output, the total noise at the

output must be evaluated including the contribution of the

A/D noise.

The result of (46) implies that the system transfer

function is realized without any error. In practice this

is not the case, but assuming the A/D conversion noise

uncorrelated with the other noise sources, the total output

noise can be evaluated by simple addition.

Roundoff noise in the multiplications. It is general

practice to quantize the output of the multipliers to keep

the same wordlength throughout the filter calculations.

Essentially the multiplication roundoff noise is the same

process as the A/D noise. However, the placement of the

noise sources depends upon the structure used and how the

arithmetic is actually performed. This means that, in

general, the transfer function for the multiplication

roundoff noise does not coincide with the systems transfer

function. Fig's 5a and 5c show the placement of the noise

sources (el, e2, e3, e4) for the feedforward and canonical

structure. If the addition was performed in double preci-

sion, there would be only one noise source at the output of

the adders, but the computation time would have been

increased. Truncation of the result is accomplished by

just dropping the 16 least significant bits of the product.

To compute equation (46) the transfer function from

the noise sources to the output must be determined, along

with the weight a of the noise sources. When the compu-

in

station wordlength is the same as the input number of bits,

there is no question what the weight shall be. However, in

this case the computation wordlength is bigger than the

input number of bits; hence an analysis was undertaken to

see which is the limiting factor. Actually this was a very

misleading point in the noise analysis, and so two explana-

tions based in different arguments will be presented: From

the point of view of the fixed point arithmetic, before an

addition is performed, the addends must be aligned. If the

filter computations are correct, the part of the product

retained must be aligned with the input; otherwise, the

additions which involve the input will not yield the cor-

rect result. Hence, the least significant bit of the part

of the produce retained has the same weight of the input,

even when the computation wordlength is bigger than the

input number of bits. The alternate argument follows the

"binary point" across the various shifting operations to

yield the correct alignment. This will be explained in

Appendix II.

To evaluate the total noise at the filter output, the

roundoff noise power produced by each source will be added

to the input quantization noise, since it is assumed that

the noise sources are uncorrelated. There are three quan-

tities that are important noise parameters: the peak in-

band noise referred to 1 volt, the A/D converter precision,

2 2

and a dimensionless number 02/02 which is a measure of

out in

the noise gain of the system and may be translated into the

number of bits of the output that are "noisy" (one bit cor-

2 2

responds to powers of a ot/2i = 3.98). From this ratio

out in

and the square of the filter midband gain G the noise fac-

tor K

2

K 2 2 (47)

ou in

cout/ in

can be defined and enables direct comparisons between

structures and designs, since it is independent of the in-

put noise variance. The value KZ1 sets the boundary for

the case where the noise created by the digitalization and

the calculations will degrade, in a r.m.s. sense, the in-

put (analog) signal to noise ratio. The output signal to

noise ratio (peak signal power to r.m.s. noise) can be

obtained from (47) through a multiplication by A2 /a where

in

A is the input signal amplitude. This definition of S/N

ratio seems appropriate for EEG signal processing since the

various signals of interest are narrow-band and fall in the

filter passband.

The filters being of order four, an exhaustive search

to find the best cascading of sections and best pole zero

pairing was possible. The Dinap program (Bass et al.,

1978) was extensively applied to access the noise (and mag-

nification) properties of different structures. Fig. 9

shows a typical output. In (Balakrishnan, 1979) a detailed

explanation of the program and its use is presented.

Practical Considerations

Processing Speed

In the internal magnification section it was said that

the filters were going to be implemented with a computation

wordlength bigger than the input number of bits. This is

not the general way of filter implementation, which scales

0

MAXIMUM MtAGNIF'(;ICA ION.- 0. 6,0016OE 02;

FRIEGUENCY ANAIt. SIS

CHCD'GYSIIEV 411* ORDER I'IHPULSE INV'J. IANDPASS

C2, 02. i-1; CI, D1, --1 SI01.- WAVF

FREGUFNCY (H) lMAr;N I'UDE( 1:D) MACNr ruil)1r

MA ITMUM HMIGNIFICArIONP O. 20.25195E 02;

FRE(IIWNCY ANALYYSIS

CHEI'(YStIEV T ORn lRI:UU IMi'U INl'l BANI)PASS

C2.,2, -, C ,I), -1 SIU.l WAVE

FREGUENCY (H) HAGNIT UDE (DB) MA'. NI UDE

MAXIMUM IIw AONIFiCATIONI-- 0 230083E 02

FREQUENCY ANALYSIS

CHEDYSIIEV -11H oRDIER 1II1PUL.SE INV. DANDPASS

C2, Drt ,+1i CI, Di, 1 SI.tlhU WAVE

FREQUENCY (H) MAN I TUDI)E D) MAON ITUDF

MAXIMUM MAGNIFICAIION- 0 479117E 02

NODE NUMlBlER 4

PIHAlE (D)

NODJE NUI'JER 7

PHASE (D)

NODE NUMBER 11

PHIASE (D)

Fig. 9a. Filter Magnification Analysis Using DINAP

REAL PART

REAL PART

REAL PART

IMAGE. PART

IMAO. PART

IMAGE. PART

NOISlS ANALYSI' I 11. t ihlt' ;r 1,1,

MIAGNNI TUME

-59 86543 .

- 64. 66954.

-69. 47366. .

"-74. 27779.

-79. 08191..

-133. 80603..

-88. 69016.

--93. 49420.

It -1 i Ir IS K

-9-b 29840..

-103. 10251.

-10/.90663.

0. 10E-01

0. 4Lh.i Oi

* *ir

I; r:

it lit

nann

it* i i if

* *(If *

*it# **** *.t

wavessoaataaa.

0 "EA- 01

O. L6E 02 0. 4E 02 o. 3;E 02

0 1LL 02 0. ;, 02 0. 21E 02

o. 40E b2

0. 36E 02

FREQUENCY (HTZ)

SIUfT( ((2/W*rI'ITEr'.RAi. ) O. 30E-03 VOLTS

-FRELiU'0LNY ANAL\YSI- NODE NIUIM1LER 16

Fig. 9b. Filter Noise Analysis Using DINAP

I;ANDPASS

H R

Sit..HI li" W i i O il iI'PU ..! fiV

( I),', 'I Di 1 I ;1 CC 1 i 1W I.'If.

FREQUENCY ANALYSIS NODE IUi RitiH

MAGNITUDE

60.01599..

54. 01489. .

48.01379..

42.01270..

36.01160. .

30.01050..

24.00940.

18. 0030. .

witt A

*k It. it

II a

14 ;

16 C-fEYVSHIi.V 4111 ORo I.H I'iPUL.r INJV. BANDPAS'L

CD, D;l. I 1, D I.. i I.UtL WAVE

. . . . . . . . . . . . . . . . . . . . . . .

12. 00720

6. 00610

.-

o. 00500. -

o. 10E-OJ

O. i40 01

* I

-t n It

If i If it I. If

ht" f",IfUli**I"aIH fk* 1" 1 *

........tikQidkkk it#

0 //E )91

0. 12E 02

0. J&E 02

0. 20E 02

0. 24E 02

0.32E 02 0.40E 02

0 2-S 02 0. 36E 02

FREQUENCY (IITZ)

Fig. 9c. Filter Frequency Response (Slow Wave)

the signals within the filter structure. It has been shown

(Jackson, 1970) that scaling uses more efficiently the com-

putation wordlength. The values for the scaling factors

G1, G2, and G3 are obtained from the internal magnification

analysis. In the scaling design the zeros can no longer be

realized as simple adders (subtracters). To use the full

dynamic range of the processor the zeros shall be realized

as multipliers, increasing the required number from 4

(Fig. 5c) to 9 (Fig. 5d). In a microcomputer implementa-

tion this increase slows down drastically the computation,

because the multiplication is by far the most time consuming

operation. To save computation time, the multiplications

can be approximated by shift operations (Oppenheim &

Shafer, 1975), but it defeats the purpose of the design

procedure. Even in this case, the network complexity in-

creases appreciably (5 more operations per cycle). It can

be concluded that scaling better uses the dynamic range at

expenses of longer computation times. In (Principe et al.,

1979) the results were presented for the class of EEG fil-

ters which show that with the nonscaling design only one

extra bit is needed to obtain the same output signal to

noise ratio. Therefore, for the microcomputer implementa-

tion the increase in computation time was thought more

demanding than the nonoptimum use of dynamic range.

The nonscaling design coupled with a careful filter

design enables the implementation of the class of EEG

filters with only 4 multiplications per cycle, allowing

sampling frequencies up to 3 KHZ.

Signal to Noise Ratio Specification

Now that the output noise can be calculated and the

filter magnification is known, the important step of choos-

ing the hardware to obtain a given output signal to noise

ratio can be addressed. From the specification involving

the output signal to noise ratio and the degradation pro-

duced in the filtering, the input signal to noise ratio can

be calculated, which sets the input wordlength. If the

analog dynamic range is smaller, an A/D converter that

spans the analog dynamic range shall be chosen and a shift

left operation up to the requirement of the input word-

length shall be performed. From the internal magnification

analysis the increase in wordlength can be determined and

added to the input wordlength to set the computation number

of bits.

In the present form a 12 bit A/D converter and a 16

bit microcomputer are being utilized, allowing for the

implementation of filters with gain of 16 or less. There

may be cases where a longer computation register length may

be desirable, but on the other hand a 12 bit A/D converter

may be excessive for EEG processing, mainly when the data

are prerecorded on analog tape. The dynamic range of a

Sangamo 3500 is 46 dB at 60"/sec. (manufacturer specifica-

tions), which suggests that an 8 bit A/D is sufficient.

However, the usable signal level really depends on the

detection scheme used. For example, a sine wave of 30 Hz,

-46 dB below maximum signal, was recognized with a zero

crossing detector at the tape slowest speed (15/16"). It

also depends on which digital parameters the algorithm

uses. In speech processing figures on 20 dB between the

smallest signal that is going to be processed and the A/D

noise are frequently used (Gold & Rader, 1969).

The deterioration of the input signal to noise ratio

due to the filtering must also be taken into consideration.

It was shown (Principe et al., 1979) that for the case of

sleep EEG filters, implemented with the nonscaling design,

the noise factor of the implementation ranged from 0.45 to

2.14, i.e., yielding in the worst case a degradation of

1 bit. For all these reasons an 8 bit A/D converter is not

recommended to represent the EEG data. A 10 bit A/D

(dynamic range of 54 dB) is probably a good choice for EEG

applications. If the input of 12 bits obtained with the

present A/D converter is scaled to 10 bits, filters with

internal gains up to 64 could be directly implemented in a

16 bit microcomputer.

Structures which will introduce the lowest noise and

at the same time present the smallest magnification (i.e.,

for which the input-output gain is the limiting factor) are

desired. These specifications are in fact contradictory,

and the solution is generally a trade-off between increase

in wordlength and large signal to noise ratio. There is no

present theory to arrive at the general rule to pair the

poles and zeros and to cascade the second order sections to

obtain the "best" solution. The following rule is sug-

gested to set the pairing of the poles and zeros and the

order of the cascade. From the general analysis (every

possible combination of poles and zeros and ordering) the

first thing is to check if the greatest magnification is

implementable in the choice of the microcomputer and A/D

converter. If it is, then just choose the ordering that

displays the lowest a 2. If.the. maximum increase in

out*

wordlength required is too big, then choose the pairing

which is accommodated by the combination and then pick up

the ordering which has the lower noise. As the general

analysis may not be available, it is thought important to

extend the discussion (although heuristically) of the pair-

ing of the poles and zeros and section ordering. For the

case of fourth order filters the rules are simple: If the

higher Q pole is realized first, then the output signal to

noise ratio can be expected higher. To realize the highest

possible signal to noise ratio, the higher Q pole shall be

paired with the zero further away from its resonant fre-

quency. However, this methodology can lead to very large

magnifications (or the possibility of overflow). To obtain

a compromise between the magnification and the signal to

noise ratio, the higher Q pole shall be paired with the

zero closer to its resonant frequency and realized first.

There is no advantage in realizing the higher Q pole last,