Title Page
 Table of Contents
 List of Tables
 List of Figures
 Previous research
 Diagnostic classification
 Treatment planning
 Conclusions and future researc...
 Biographical sketch

Title: Analytical models for diagnostic classification and treatment planning for craniofacial pain.
Full Citation
Permanent Link: http://ufdc.ufl.edu/UF00089748/00001
 Material Information
Title: Analytical models for diagnostic classification and treatment planning for craniofacial pain.
Series Title: Analytical models for diagnostic classification and treatment planning for craniofacial pain.
Physical Description: Book
Language: English
Creator: Leonard, Michael Steven
Publisher: Michael Steven Leonard
Publication Date: 1973
 Record Information
Bibliographic ID: UF00089748
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: alephbibnum - 000580622
oclc - 14057039

Table of Contents
    Title Page
        Page i
        Page ii
        Page iii
        Page iv
    Table of Contents
        Page v
        Page vi
    List of Tables
        Page vii
    List of Figures
        Page viii
        Page ix
        Page x
        Page 1
        Page 2
        Page 3
        Page 4
        Page 5
        Page 6
        Page 7
        Page 8
        Page 9
    Previous research
        Page 10
        Page 11
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
    Diagnostic classification
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
        Page 23
        Page 24
        Page 25
        Page 26
        Page 27
        Page 28
        Page 29
        Page 30
        Page 31
        Page 32
        Page 33
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
        Page 39
        Page 40
        Page 41
        Page 42
        Page 43
        Page 44
        Page 45
        Page 46
        Page 47
        Page 48
        Page 49
        Page 50
        Page 51
        Page 52
        Page 53
        Page 54
        Page 55
        Page 56
        Page 57
        Page 58
    Treatment planning
        Page 59
        Page 60
        Page 61
        Page 62
        Page 63
        Page 64
        Page 65
        Page 66
        Page 67
        Page 68
        Page 69
        Page 70
        Page 71
        Page 72
        Page 73
        Page 74
        Page 75
        Page 76
    Conclusions and future research
        Page 77
        Page 78
        Page 79
        Page 80
        Page 81
        Page 82
        Page 83
        Page 84
        Page 85
        Page 86
        Page 87
        Page 88
        Page 89
        Page 90
        Page 91
        Page 92
        Page 93
        Page 94
        Page 95
        Page 96
        Page 97
        Page 98
        Page 99
        Page 100
        Page 101
        Page 102
        Page 103
        Page 104
        Page 105
        Page 106
        Page 107
        Page 108
        Page 109
        Page 110
        Page 111
        Page 112
        Page 113
        Page 114
        Page 115
        Page 116
        Page 117
        Page 118
        Page 119
        Page 120
    Biographical sketch
        Page 121
        Page 122
        Page 123
        Copyright 1
        Copyright 2
Full Text



Michael Steven Leonard



To my wife,



Without the considerable contributions of time and effort by the

members of his committee, it would have been impossible for the author

to have canpleted this dissertation. In particular, the author expresses

gratitude to his Chairman, Dr. Kerry Kilpatrick, for his encouragement

and direction during the course of this research effort. The author also

thanks Dr. Kilpatrick for his editorial assistance during the development

and organization of this manuscript. The author thanks Dr. Richard

Mackenzie and Dr. Stephen Roberts for providing the initial direction for

this research. Additionally, the author is grateful to Dr. Than Hodgson

and Dr. Donald Ratliff for their assistance in evaluating and refining

the author's ideas throughout this project. The author expresses his

gratitude to Dr. Thomas Fast and Dr. Parker Mahan for the contribution of

their extensive knowledge about craniofacial pain to the author's research.

The author is deeply appreciative of Dr. Fast's and Dr. Mahan's willing-

ness to spend many hours examining dental records and their endurance of

the nomenclature and idiosyncracies of this mathematical-modeling effort.

Financial support for this research was provided by the Health Systems

Research Division, J. Hillis Miller Health Center. The division's sup-

port in conjunction with a traineeship granted by the National Science

Foundation made it possible for the author to undertake this research.

The author is also grateful to the Industrial and Systems Engineering

Department for the contribution of computer funds. Additionally the au-

thor thanks Dr. William Solberg, University of California at Los Angeles;

Dr. Daniel Laskin, University of Illinois; and Dr. David Mitchell,

University of Indiana, for providing access to the patient records

employed in this modeling effort.

The author would like to express his thanks to the secretarial staff

of the Health Systems Research Division for their translation of the au-

thor's 'first-order' approximation to handwriting into a draft of this

manuscript. Their tolerance of a multitude of last minute changes made

by the author has been appreciated.

Finally, the author thanks his wife, Mary, and his parents, Dorothy

and Charles Leonard, for their encouragement and support throughout the

course of this research.


August, 1973


ACKNOIW EDGI ME TS .................................................

LIST OF TABLES............... ....................................

LIST OF FIGURES...................................................

ABSTRACT o o ...................... ............................


1. Introduction .........................................

1.1 Craniofacial Pain...............................

1.2 Research Objective..............................

1.3 Dissertation Overview...........................

2. Previous Research.....................................

2.1 Bayesian Classification Models....................

2.2 Non-Parametric Classification Models..............

2.3 Finite-Horizon Treatment Planning.................

2.4 Uncertain-Duration Treatment Planning............

3. Diagnostic Classification...........................

3.1 Model Components..................................

3.2 Alternative Interpretations of Linear Separability

3.3 Model Validation .................................

3.4 Minimum-Cost Symptcn-Selection Algorithm.........

3.4.1 Algorithm Development......................




















3.4.2 Statement of the Minimum-Cost Symptom-
Selection Algorithmn.......................

3.4.3 Computational Considerations..............

3.5 Model Applications ..................................

4. Treatment Planning....................................

4.1 Model Camponents........ ....................

4.1.1 Patient States...........................

4.1.2 Transition Probabilities..................

4.1.3 Cost Structure............................

4.2 Selection of Optimal Treatments..................

4.3 Model Validation ...............................

4.4 Model Applications................................

5. Conclusions and Future Research ........................


A Craniofacial-Pain Patient Data Vector..................

B Modified Fixed-Increment Training Algorithm............

C Application of the Minimum-Cost Symptom-Selection
Algorithm ...............................................

D Treatment Alternatives for Craniofacial-Pain Patients...

E Stability of Transition-Probability Estimates...........

F Flow Charts of Patient-State Transitions ...............

G Patient-State Treatment Selections.....................

H Application of the Patient-State-Labeling and Optimal-
Treatment-Selection Procedure .........................


BIOGRAPHICAL SKETCH..............................................

























1. Survey of Diagnostic-Classification YMdels.............. 12

2. Correlation Between Significant Symptoms and
Discriminant-Function Weights........................... 30

3. Tests of Diagnostic Classifier Accuracy................. 32

4. Classification Variability Among Dental Practitioners... 35

5. Mean Transit Times Through the Craniofacial-Pain Care
System ............... ............................... 75




1. Terporanandibular Joint ................ ............. 3

2. Diagnostic-Classification and Treatment-Planning
Process for Craniofacial Pain......................... 7

3. Craniofacial-Pain Diagnostic Alternatives............... 18

4. Procedure 2........... .......... .................... 53

5. Diagnostic-Classification Transitions .................. 64

6. Patient-Visit Inconvenience Cost....................... 69

7. Application of the Mbdified Fixed-Increment Algorithm... 90

8. Multiple-State History-Augnented Process............. 115


Abstract of. Dissertation Presented to the
Graduate Council of the University of Florida in Partial
Fulfillment of the Requirement for the Degree of Doctor of Philosophy



Michael Steven Leonard

December, 1973

Chairman: Dr. Kerry E. Kilpatrick
Major Department: Industrial and Systems Engineering

This dissertation presents a systematic approach to craniofacial-

pain diagnosis and treatment planning using analytic models of the under-

lying decision-making processes. Patient diagnoses are generated by a

linear pattern-recognition classifier trained with a sample of preclas-

sified craniofacial-pain patient data. For this classifier, an algorithm

is developed that minimizes the total cost of the set of features employed

in the classifying process. Diagnostic classifications, augmented by a

history of prior treatment applications, provide the state descriptions

for a Markovian decision model of the treatment-planning process. Cranio-

facial-pain patient records from four university dental clinics serve as

a data base for model construction and validation.

The analytic models provide a means of duplicating the diagnostic

classifications and treatment plans of experts. Approximately 90% of

the diagnostic classifier's classifications and 93% of the treatment-

planning model's treatment selections concurred with the decisions made

by experts in the field of care for craniofacial-pain patients. Moreover,

the models permit an examination of the critical considerations associated

with both decision-making processes. These capabilities are discussed

in terms of applications of the models in teaching, research, and in the

practice of dentistry.



The rapid pace of developments in medical and dental research pre-

vents the practicing physician and dentist from fully utilizing each new

diagnostic and treatment-planning aid as it is published. In each of the

last four years an average of 215,000 new publications have been written

to supplement the knowledge of the health-care practitioner [1]. Con-

currently, the pressures of an ever-increasing patient load force prac-

titioners to select the most expeditious means for diagnosing disorders

and selecting treatments. For example, the medical general-practitioner

(1970) saw an average of 173 patients a week [2], and the median dental

practitioner (1971) saw two patients an hour [3]. Given these circu~r-

stances, practitioners may overlook possible diagnostic and treatment al-

ternatives or they may apply inappropriate treatments. If meaningful

analytic descriptions of the diagnostic and treatment-planning processes

can be developed, these models can assist educators in training new prac-

titioners, researchers in evaluating and disseminating new developments,

and practitioners in improving the quality of patient care [4].

Developing models of the diagnostic-classification and treatment-

planning process requires an understanding of the underlying physiological

processes of diseases and the mechanisms of their cures. Obviously, the

effects of disease and the means of cure vary from one health-care prob-

lem to another. Thus, modeling efforts in diagnosis and treatment plan-

ning must be integrally related to the facet of health care that is under

study. This reality prohibits the model builder from making broad state-

ments about the applicability of his models to other health-care environ-

ments. Accordingly, the models developed in this dissertation are spe-

cifically oriented toward the health-care problem presented in Section

1.1 with the understanding that the results of this modeling effort may

not be applicable to the whole of health-care diagnosis and treatment


1.1 Craniofacial Pain

The head and face are subject to chronic, persistent,
or recurrent pain more often than any other portion of the
body. Pain in the head or face has a greater significance
to patients than any other pain. It may arouse fears that
the patient is in danger of losing his mind or that he has
a tumor of the brain. In addition, the emotional state of
the patient is adversely influenced because it is generally
known by the layman that the profession's knowledge of the
causes of these pains is meager and that methods of treat-
ment are inadequate [5, p. v].

H. Houston Yerritt, M.D., Dean
Columbia University College of
Physicians and Surgeons

One source of the pain Dr. Yerritt describes is dysfunction of the

temporomandibular joint. The temporamandibular joint, see Figure 1,

provides the articulation between the mandible and the cranium. This

joint is unique both in its structure and its function. Within the plane

of the temporcmandibular joint, lateral, vertical and pivoting motion is

permitted. In addition, the joint is the point of articulation for the

only articulated complex that contains teeth. With this joint, "motion

is directed more by the musculature and less by the shape of the artic-

ulating bones and ligaments than is the fact for other joints" [5, p. 34].

The fact that joint motion is highly dependent on musculature im-

plies that then mandibular dysfunction occurs there is same disturbance

Right tempornandibular articulation

Inset: Anatomical features of the temporoandibular joint

Mandibular Fossa

Articular Eminence

-- Meniscus

Mandibular Condyle



of the intricate neuromuscular mechanisms controlling mandibular move-

ment [5]. Emotional tension may also lead to hypertonicity of the

striated masticatory muscles resulting in facial pain or altered sensa-

tion without evidence of peripheral dysfunction. In addition, abnormal

occlusal contacts of the teeth may affect muscle tonicity resulting in

mandibular dysfunction [5]. Moreover, the temporamandibular joint is

prone to disorders common to all joints: rheumatoid arthritis, osteo-

arthritis, traumatic injuries, neoplasms, and nonarticular disorders.

Although the term 'craniofacial-pain' is a broad classification for pain

in the head and face, the term is used in this dissertation to describe

pathological, congenital, hereditary-based, or emotional causes of pain

in and around the temporcmandibular joint.

Though the degree of severity may vary, one or more of the following

four 'cardinal smptcnrs' are exhibited by the craniofacial-pain patient:

pain, joint sounds, limitation of motion, and tenderness in the mastic-

atory muscles [6]. Accampanying these symptcas the patient may complain

of, or the practitioner may find, hearing loss, burning sensations, mi-

graine-like headaches, vertigo, tinnitus, subluxation, luxation, dental

pulpitis, sinus disease, glandular disorders, occlusal disharmony, and

radiographic evidence of joint abnormality. The degree of association

of these additional symptoms and findings with the etiology of the joint

disorders is subject to considerable variation.

Paralleling these areas of anatomic dysfunction is the possibility

that the craniofacial-pain patient may be suffering from psychic dis-

orders. In no other type of patient seen by the dentist does psychic

condition play a larger role [7]. Most craniofacial-pain patients have

symptoms or signs of anxiety, and a sensory preoccupation with the oc-

clusion of their teeth [8]. Many of these patients can be characterized

by a heavy reliance on denial, repression, and projection of their psy-

chic disorders in order to maintain their self-concept of emotional sta-

bility [6]. Often the complaints these patients relate to the practi-

tioner are not compatible with any objective signs.

The practitioner who manages the care of craniofacial-pain patients

assumes a difficult task. For same of these patients, diagnosis is ob-

vious. Generally, however, the craniofacial-pain patient presents a cmn-

plex combination of signs and symptoms [7]. More than one disease en-

tity normally accounts for the patient's symptoms and most craniofacial-

pain patients suffer from a pain-dysfunction complex involving a ccmbina-

tion of masticatory muscle disorders, occlusal disharmony, emotional

tension, and anxiety [5]. Nevertheless the possibility of multiple

almost sub-clinical etiologic factors combining to produce the dysfunc-

tion and pain must be considered. The close relationship of organic and

emotional disorders as they appear in craniofacial-pain patients provides

the examining dentist with the problem of discriminating which factor is

primary in the etiology of the patient's dysfunction [7]. Unfortunately,

the terporcarndibular joint is one of the most difficult areas of the

body to examine radiographically [8]. Hence, with these patients, the

dentist relies to a large degree on tests of emotional stability and

physical examination by visualization, palpation, and auscultation [7].

Therapeutic measures for the care of craniofacial-pain patients are

as varied as the factors contributing to the disorder. "A small percent-

age of patients with symptoms referrable to the temporamandibular joint

will portray such a confusing picture that consultation with other. dental

or medical specialists is indicated" [7, p. 129]. The majority of these

patients will exhibit symptoms that lead to any one of several alterna-

tive courses of patient care. Altering the occlusion of the natural

teeth is one means of treating craniofacial-pain patients. Although in

many cases minor occlusal abnormalities are only contributing factors to

a patient's pain, attention by the dentist to occlusion is at least

partially successful for a majority of craniofacial-pain patients [8].

However, it is important in early therapy not to alter the occlusion ir-

reversibly. Treatment by means of tooth extraction or endodontics, jaw

fixation, prosthetic devices, or by topical treatments may also be sug-

gested by the patient's symptoms. The articular surface of the mandib-

ular condyle has an excellent reparative capacity [6]. Thus, the use of

sedatives, antibiotics, and muscle relaxants, along with physical therapy,

oftenleads to patient 'cures' as these treatments ease the patient's pain

and increase jaw mobility while natural restoration of the joint is in

progress. If, after a reasonable length of time (3 to 6 months) the pa-

tient's symptoms are not relieved, the dentist may consider referral to

another source of care or therapy such as surgery [7].

Typically, the health-care process for craniofacial-pain patients

may be viewed as following the format of Figure 2 [9]. When a patient

is admitted into the care system, he undergoes a data-collection process.

This involves taking a 'full and pertinent' patient history and a phys-

ical examination of the areas of discomfort. The data gathered consist

of symptoms, signs, medical and/or dental history, physical examination

findings, psychosocial information, and so forth. Once these elements

have been elicited, a diagnosis is attempted. If this is not yet pos-

sible, the severe symptoms are treated and the patient's health state is




When initial treatment does not result in a 'cure' for the cranio-

facial-pain patient, treatment effects are evaluated and new data col-

lected. When a patient's diagnostic classification leads to a course

of treatment that is not within the realm cf the practitioner's special-

ty he is referred to a more appropriate care source. 1Mnitoring is con-

tinued on those patients not rejected frcm the system at this point, and

the patient is discharged when he is symptcn-free. However, when other

disorders have been isolated during the course of treatment, the patient

is recycled through the classification-treatment process.

The diagnosis-treatment sequence is not fixed. Treatment can begin

prior to a diagnostic classification or treatment can follow a diagnosis.

Moreover, there may be many diagnostic-treatment data-acquisition cycles

before the patient is considered 'vell.'

1.2 Research Objective

.The introductory discussion of the need for diagnostic and treatment-

planning models, and the brief description of the craniofacial-pain care

system, provide the setting for a statement of the research objective un-

derlying this dissertation. This objective is to derive analytic repre-

sentations of the decision processes involved in selecting diagnostic

classifications and planning treatments for craniofacial-pain patients.

A diagnostic-classification model that duplicates the classification of

expert practitioners is sought. For treatment planning, the modeling

goal is to provide a structure for interaction of the critical considera-

tions associated with the treatment-selection process. These analytic

representations will be structured to permit their application as teaching

devices in the training of dental practitioners, as methods of testing the

effects of new diagnostic tools and treatment applications, and as aids to

the practice of dentistry.

This research objective will be met by developing:

1. A diagnostic-classification model based on the theory

of non-parametric pattern classification, with

a. criteria for applicability of the modeling technique

to diagnostic classification

b. model validation for craniofacial-pain patients

c. development of a minimum-cost symptom-selection


2. A Markovian representation of the treatment-selection

process, with

a. justification for utilizing a Markovian model of

the underlying care system

b. model validation for craniofacial-pain patients

3. A description of potential model applications in teaching,

research, and practice.

1.3 Dissertation Overview

In Chapter 1 the motivation and scope of this dissertation was pre-

sented. Chapter 2 provides a review of literature relevant to the diag-

nostic and treatment-selection processes. A model of the diagnostic-

classification process is developed in Chapter 3. Chapter 4 follows

with an analytic representation of the treatment-planning process. Con-

clusions derived from this model-building effort, and suggestions for

future research, are presented in Chapter 5.

I '



Over three-hundred publications have been addressed to the problem

of modeling the diagnostic and treatment-planning process. Spanning

-fourteen years, this research has considered such diverse problems as

the classification of liver biopsies [10] and the optimal plan for

treating mid-shaft fractures of the femur [11]. At least ninety-one

disorders have been utilized as environments for developing diagnostic

and treatment-planning models. The magnitude of this research effort

emphasizes the need for analytic representations of these complex deci-

sion-making processes.

Fortunately, the significant contributions in this voluminous

literature can be neatly partitioned into four distinct categories. Re--

search in diagnostic classification has been based either on the applica-

tion of Bayesian statistics or on the use of non-parametric pattern

classifiers. Treatment planning has been presented as either a finite-

horizon decision problem or as an application of decision analysis to a

Markov process of uncertain duration. This section presents a brief dis-

cussion of each of these categories and evaluates their suitability as

analytic representations of the process of providing health care for

craniofacial-pain patients.

2.1 Bayesian Classification Models

Bayesian diagnostic-classification models, such as [12, 13, 14,

15, 16], make a diagnosis on the basis of selecting a patient's 'most

probable' disease state. The Bayesian classifier is an elementary type

of parametric pattern-classification model. In general, parametric

classifiers make use of one or more of the statistical characteristics

of the dispersion of the data being classified to establish rules for

data classification. With the Bayesian models, only the conditional

probabilities for exhibiting sets of synptcmns, given a particular dis-

ease, are tabulated from past medical data. Then, utilizing Bayes'

theorem, the probabilities for the presence of alternate diseases

dl,d2,...,dn can be calculated as a function of the symptcm-cciplex S

the practitioner observes in the patient. Bayes' theorem provides that

for each of the d.

P(djIS) = C(S)P(Sdi)P (di)
where C(S) = 1/[Z P(Sjic)P(dk)],

hence, a patient with symptcm-ccmplex S is classified in disease-group i


P(dilS) = max p(dIS).

A survey of the results of application of Bayesian models is given in

Table 1.

Although the percentage of correct diagnoses in most of these test

applications is high, there are several reasons why a Bayesian diagnos-

tic model is not used as the means of generating diagnostic classifica-

tion in this dissertation. The first reason is the difficulty in ac-

quiring the proportional presence of alternate diseases P(di), i=l,2,...,n,

in the population of patients that are to be classified by the model.

These 'prior' probabilities of having a particular disease are a function



Bayesian Classifiers


Disease Group

Number Of
Patients In

% Correct

Nontoxic Goiter 88

Bone Tumor 77

Thyroid 268

Congenital Heart 202

Gastric Ulcer 14

Non-Parametric Classifiers






Disease Group





Number Of
Patients In





% Correct















of seasonal variation, geographic location, population demography, and

many other factors. Secondly, valid Bayesian analysis requires the

analyst to determine the dependence among exhibited symptoms for each

disease considered by the diagnostic model. In this respect, the prob-

abilities for the presence of groups of symptoms are independent for

saoe diagnostic alternatives and strongly correlated for others [4]. The

third reason for not selecting a Bayesian model is the massive storage

requirement dictated by the necessity of keeping the set of conditional

probabilities. These conditionals, P(S di) for every observable symptcm-

complex S and. every disease i considered, must be at hand each time the

model is used. For example, given ten alternate diseases and ten symp-

toms for which no assumptions of between-symptma independence can be made,

storage is required for 10 (210-1), or 10,230, conditional probabilities.

2.2 Non-Parametric Classification models

Non-parametric diagnostic models, like [17, 18, 19, 20], utilize

non-parametric pattern classifiers, a form of pattern recognition model-

ing. In the literature on pattern recognition, the term 'non-parametric'

implies that no form of probability distribution is assumed for the

dispersion of symptom data in establishing the rules for pattern classi-

fication. These models do assume, however, that classes of symptno data

are distinct entities and, hence, a patient with a particular set of

symptom S cannot simultaneously occupy more than one diagnostic state.

That is, the models assume a deterministic classification for each pat-

tern viewed by the pattern classifier where every observable pattern has

one, and only one, correct classification.

Non-parametric modeling permits the analyst to bypass the difficult

problems of explicitly determiinng the conditional probabilities for,

and the dependence. among, symptams that are required for Bayesian analysis.

With the non-parametric classifier, a diagnosis is generated for the

practitioner by evaluating a discriminant function associated with each

diagnostic classification, gi(.), i=1,2,...,n. As was the case with the

Bayesian models, the values of these discriminants are a function of the

symptcm-ccrplex S exhibited by the patient. The patient's diagnostic

classification corresponds to that disease whose associated discriminant-

function value is maximum. That is, a patient with symptoms S is classi-

fied in disease-group i if

gi(S)>gk(S) for all k 7 i.

Results frcm scae of the applications of pattern-recognition classi-

fiers are presented in Table 1. In these test applications diagnostic

accuracy was consistently high. Because of these models' ease of imple-

rentation and small storage requirements, a non-parametric pattern classi-

fier is preferable as a vehicle for generating diagnostic classifications.

The use of a non-parametric classifier is further motivated by features

of the care process for craniofacial-pain patients discussed in Chapter 3.

2.3 Finite-Horizon Treatment Planning

In the realm of research on modeling the treatment-planning process,

several authors [9, 21, 22] have presented schemes for analysis that

utilize methods for making decisions under risk and uncertainty. The

treatment-selection process has alternately been defined as a two-person

zero-sum game, structured as a decision tree, and modeled as a Markov

process of limited duration. Treatment costs and the 'costs' of occupy-

ing 'non well' or terminal patient states, provide the basis for select-

ing an 'optimal' treatment plan. Finiteness of the planning horizon is

assured either by establishing a maximum permissible number of treatment

applications, or by considering at any stage of analysis the effects of

a fixed number of future treatments. Validation of the decisions gen-

erated by these models has thus far been limited to checks on the feasi-

bility of the treatment regimens selected. Unfortunately, the finite-

horizon models either do not consider the possibility of a patient's

prolonged stay in the health-care system, as is the case of the models

with a maximum number of possible treatments, or, where only a fixed

number of future treatments is considered, they provide no more than a

heuristic treatment-selection procedure.

2.4 Uncertain-Duration Treatment Planning

Bunch and Andrew [11] have considered the possibility of prolonged

occupation of the same diagnostic state during the course of a patient's

progression through the care system. In their Markovian representation

of the care system for mid-shaft fractures of the femur, they provide

this modeling refinement. As a consequence of this modification, the

number of treatment decisions made for each patient is a random variable

with no fixed upper bound. Howard's iterative scheme for policy selec-

tion [25] provides the means for choosing the optimal treatment regimen

by selecting treatment alternatives that maximize the relative 'value'

of occupying each disease state. Although the Bunch and Andrew model did

not consider return visits to the same disease state, a more generalized

Markovian representation could incorporate that possibility. Neverthe-

less, the proximity to reality that this category of transient Markovian

models provides requires considerable effort as holding-time distribu-

tions, treatment 'costs,' and transition probabilities must be supplied

by the analyst for all treatment alternatives at each of the disease

states in the care system.

The data collected on craniofacial-pain patient progressions

through the care system reveal that both prolonged occupation of a

single diagnostic state and return visits to the same state occur fre-

quently. Moreover, as will be discussed in Chapter 4, there are several

characteristics of the craniofacial-pain care system that permit reduc-

tions in the number of input parameters required for a transient Markovian

model of this system. Therefore, an uncertain-duration transient

Markovian representation of the health-care process has been selected as

the means of evaluating the effectiveness of alternative treatment regi-

mens on patients with craniofacial pain.



The analytic model developed to provide diagnostic classifications

for craniofacial-pain patients is based on the principles employed in

non-parametric pattern classification. The patterns classified by this

diagnostic model are vector representations (see Section 3.1 and Appen-

dix A) of the craniofacial-pain patient's physical and emotional status.

In the first sections of this chapter the' theoretical background for the

diagnostic model is established. This discussion is followed by a pre-

sentation of the validation procedures used to evaluate model perfor-

mance. Next, an algorithm is developed to reduce the 'costs' associated

with model utilization. The chapter closes with a discussion of poten-

tial applications of the craniofacial-pain diagnostic classifier in

teaching, in research, and in the health-care process.

3.1 Model Components

In the initial phase of the development of the diagnostic-classi-

fication model a set of possible alternative diagnostic classifications

was established for craniofacial-pain patients. Figure 3 provides a

list of these possible classifications. Note that the alternative classi-

fications in Figure 3 are not mutually exclusive as a craniofacial-pain

patient classified in same diagnostic alternative 'A' could also have

the disorder specified by sane other diagnostic alternative 'B.'

However, for the purposes of this dissertation, each patient's diagnostic


1. Temporomandibular Joint Arthritis -Developmental

2. Temporamandibular Joint Arthritis -Infectious

3. Temporoaandibular Joint Arthritis--Osteo (Degenerative)

4. Temporamandibular Joint Arthritis--Traumatic (Acute)

5. Temporamandibular Joint Arthritis--Traumatic (Chronic)

6. Myopathy-Acute Trauma

7. Myopathy--Myositis

8. Oral Pathology-Dental Pathology

9. Vascular Changes--Migrainous Vascular Changes

10. Myofacial Pain-Dysfunction Malocclusion-Balancing Interferences

11. Myofacial Pain-Dysfunction Malocclusion-Lateral Deviation of Slide

12. Myofacial Pain-Dysfunction Malocclusion-Uneven Centric Stops

13. Myofacial Pain-Dysfunction Psychoneurosis-Anxiety/Depression

14. Myofacial Pain-Dysfunction Bruxism

15. Myofacial Pain-Dysfunction Reflex Protective Muscular Contracture

16. Myofacial Pain-Dysfunction Loss of Posterior Occlusion

17. Neuropathy




classification is made on the basis of specifying that etiological fac-

tor that requires most immediate action on the part of the attending

practitioner. Thus, diagnostic classification of a patient into diag-

nostic alternative 'A' signals that the etiology specified by that al-

ternative should determine the course of the patient's care.

The next step in model development isolated relevant data which

measured the physiological and psychological status of craniofacial-pain

patients. In particular, this step of model development sought those

elements of patient status that practitioners employ in their own classi-

fication of craniofacial-pain patients. Appendix A presents a list of

these data elements. Wherever it was feasible, measures of patient

status were segmented to amplify the significance of particular readings

of each measure. Thus, for example, while the duration of a patient's

pain is a continuous measure of his status, it is important for the pur-

poses of classification to know whether a craniofacial-pain patient's

duration of pain is less than 3 weeks, from 3 to 6 weeks, or longer than

6 weeks. For this measure of patient status, a short history of pain

indicates a strong possibility of a recent traumatic injury while pain

over a long period is more likely associated with long standing arthritic

or psychic disorders.

To facilitate the development of an analytic model of the diagnostic-

classification process, a vector representation of the relevant elements

of patient data has been developed. The vector permits the notation of

any of the data elements shown in the listing in Appendix A. The pre-

sence of any of the items found in Appendix A is recorded in a patient's

data vector by an entry of '1' in the vector-dimension corresponding to

the item number, while the absence of a vector item is noted by a '0'

data-vector entry. For example, referring to the listing in Appendix A,

*a male patient would have the following fifth, sixth, and seventh ele-

ments in his data vector

while a pre-menopausal female would have the series of elements

This vector notation of a patient's status serves as the input data for

a non-parametric pattern classifier that assigns a diagnostic classifica-

tion to the patient's dysfunction.

Non-parametric pattern classification, as described in Meisel [23]

and Nilsson [24], is the process of creating decision surfaces that

separate patterns into homogeneous classes, C, i=1,2,...,p, specified

by the analyst. In the craniofacial-pain diagnostic model, the Ci are

the diagnostic alternatives shown in Figure 3. Classification of a pat-

tern (a patient's-data-vector) into one of the classes is performed by

a pattern classifier composed of a maximum detector and a set of dis-

criminant functions. These discriminants, g (a), j=l,2,...,p, are single-

valued functions of each patient's data-vector a. If a. represents a

data vector for a patient whose correct diagnostic classification is the

ith diagnostic alternative, then the gj(a) are chosen so that

gi(ai)>gj(ai) i, j=l,2,...,p, j-i.

The craniofacial-pain classifier uses linear discriminant functions.

These discriminants are linear in the sense that they provide mappings

from E" to El that exhibit the form

gj (a) = all+a2 j2+...+anjnj (n+l)

where in the patient-data-vector a, the value of ar denotes the presence

(ar = 1) or absence (ar = 0) of patient-data-vector item r; and the

Wjk. k=l,2,...,n+l, are constants associated with the j discriminant
function called 'weights.' These discriminant-function weights,

Wjk, j=l,2,...,p, k=l,2,...,n+l, provide an analytic means of duplicating

the correct classification of each pattern observed by the non-parametric

classifier. They provide a link between a pattern's correct classifica-

tion and the individual components of the pattern's vector representa-

tion. In essence, each discriminant's weights are additive elements

whose component sums have significance in terms of a isolating pattern's

correct classification. These weights are a mathematical means of stor-

ing information already known about the correct classification of observed

pattern vectors. Moreover, the weights can be interpreted fran the

point of view of the significance that the practitioner places on each

data-vector component. A discussion of this interpretation of the dis-

criminant-function weights appears in Section 3.2.

Central to the use of linear discriminant functions is the assump-

tion that the space of observable patient data vectors is linearly

separable, for by definition [24],

a pattern space A is linear and its subsets of patterns

AAl ,... ,A are linearly separable if and only if linear

discriminant functions g ,g2,... ,g exist such that

for all a in A. g. (a) >gj (a)

for all i=1,2,... ,p, j,2,...,p, ji.

In the context of diagnostic classification, the assumption of linear

separability implies that there exists a set of hyperplanes that parti-

tion the space of observable patient data vectors into convex homogeneous

regions, each region representing a unique diagnostic classification.

Rosen [26] has provided a restatement of this assumption in the require-

ment that the sets of data vectors corresponding to each diagnostic al-

ternative have non-intersecting convex hulls. In either form, this is

a fairly restrictive assumption on the dispersion of patient data vec-

tors (see Section 3.2).

Selecting the 'weights' for each of the discriminant functions is

a process known as 'training.' For the linear non-parametric classifier,

training generates each discriminant function's wjk's by applying a sys-

tematic algorithm to the members of a set of representative patterns with

pre-established classifications. Nilsson [24] discusses several algorithms

suitable for training the craniofacial-pain diagnostic classifier. In

the course of using these algorithms for model development, a new 'mod-

ified fixed-increment' training algorithm was constructed (see Appendix

B). Employing the new algorithm has resulted in a reduction of approx-

imately 35% in the amount of training time required to derive the weights

for the craniofacial-pain classifier.

Symbolically, the craniofacial-pain diagnostic classifier, with its

set of trained weights, can be represented in the following format:

let a. = the 296-dimension data vector describing patient 'i'

aik = the kth element in the data vector describing patient

'i', whose value is either zero or one, k=l,2,...,295

(by definition ai,296=1)

Cj = diagnostic alternative 'j', j=l,2,...,17

dij = the value of the discriminant function for diagnostic

alternative 'j' generated by the data vector of patient

W. = the 296-dinension vector of weights associated with
diagnostic alternative 'j'

Wk = the k element in the weight vector W.,
jk -3
that is

a = [ailai2,...ai295

Wj =I [wjl' j2 ,wj295'wj2961
d. = a.W. ai jk
di -3-73 k= 1 ik jk

where T denotes vector transposition. Patient 'i' is classified in

diagnostic alternative Cj when d i>dis for every s/. If m.x di is

not unique, then it is not yet possible to classify patient 'i' into

one of the diagnostic alternatives. Treatment is prescribed for severe

synptcas and classification is attempted at a later date.

Data from four sources were used to construct and verify the diag-

nostic-classification model, as well as the treatment-planning model

presented in Chapter 4. Contributions of clinical records came from

the dental schools at the universities of California at Los Angeles,

Florida, Illinois, and Indiana. In all, the records of 250 patients,

involving a total of 480 patient-practitioner interactions, form the

data base for model building and validation. The relevant information

from each of these patient visits has been recorded in the data-vector

format of Appendix A. A diagnostic classification from Figure 3 was

assigned to each of these patient data vectors by either Dr. Thomas B.

Fast, Chairman of the Division of Oral Diagnosis, or by Dr. Parker E.

Mahan, Chairman of the Department of Basic Dental Sciences, at the

College of Dentistry, University of Florida.

With this basic structure for the diagnostic-classification model,

the classified patient data vectors, and the training algorithm presented

in Appendix B, an initial test was performed to verify that the space of

observed patient data vectors was separable by linear discriminant func-

tions. Application of the modified fixed-increment training algorithm

to the set of 480 data vectors verified this requirement, as the algo-

rithm terminated in a set of feasible discriminant-function weights.

Using the discriminant functions these constants determine, it is possi-

ble to duplicate the pre-established diagnostic classifications for each

of the patient data vectors.

This first test of the diagnostic classifier established that a non-

parametric classifier could be employed to reproduce the original clas-

sifications for each data vector used in model construction. However,

this test does not reveal how well the classification model will perform

on patient data not employed in developing the discriminant-function

weights. The remainder of this section, and Section 3.3, address the

question of how the diagnostic classifier performs on 'new' patient data

vectors, that is, vectors that have no duplicate in the training sample.

Model training has created a set of weights that, by the definition

of the training procedure, correctly classify every patient data vector

that lies within the bounds of the training-sample pattern-class convex

hulls. Since every data vector is a binary vector, new patient data

vectors must fall outside the convex hulls established by the training-

sample vectors. Yet, if new data vectors have a number of data-vector

elements that are identical to those of the training-sample vectors

with the same diagnostic classification, then this relationship will be

reflected in a 'close proximity,' as measured by a Euclidean-distance

function, between each new vector and its associated training-sample

convex hull. Given this close proximity, the classifier's discriminant

functions should correctly classify most new data vectors as these vec-

tors will lie within or near the boundaries of the appropriate discrim-

inating hyperplanes. Hence, the key to providing adequate classifier

performance for new data vectors lies in devising data-vector-represen-

tations of patient data for which the data vectors of a canron diagnostic

classification exhibit strong similarity.

In the introductory discussion of the elements of patient data used

in the patient data vector, it was pointed out that an effort was made to

select components of patient status that assist the practitioner in his

selection of diagnostic classifications for a craniofacial-pain patient.

Then these elements were partitioned to generate as much discriminating

information as possible from each data element. In terms of the alter-

nate diagnostic classifications, these elements of patient data were

chosen so that all patients in any one diagnostic classification would

have a unique combination of exhibited or non-exhibited data-vector ele-

ments. Employing these carefully constructed qualitative data elements

resulted in a set of 'natural' gaps in the vector representations of

patient data from alternate diagnostic classifications. The fact that

there are portions of the pattern space that cannot be occupied by any

data vector, and partitions of the space where the vectors of each clas-

sification must lie, assiststhe classifer in making correct classifica-

tions of data not used in model construction.

As Section 3.3 shows, this discussion is not meant to imply that the

craniofacial-pain diagnostic classifier can, in its present state of

development, correctly classify every new data vector. What has been

stated is that a knowledge of the underlying classifying process can

be employed in constructing the data vector examined by the classifier,

and that fully utilizing this information will lead to a classifier that

can be expected to be capable of performing well on new patient data.

Of course, this discussion has been predicated on the separability of

the underlying pattern space of data vectors. If this requirement is

not met by same form of patient-data-vector representation, classifica-

tion of patients by linear classifier is not possible.

The next section of this chapter provides relationships between

linear separability and the data that may be observed in a health-care

system for which diagnostic classification by linear discriminants is

being considered. This section has a dual purpose. First, linear sep-

arability is couched in 'non-gemaetric' terms. Second, and more impor-

tantly, using the craniofacial-pain health-care system as an example

of the section's developments provides information about the suitability

of the non-parametric classifier as a model of the decision-making pro-

cess associated with diagnostic classification in this care system.

3.2 Alternative Interpretations of Linear Separability

The criteria for pattern space separability are mathematically

concise. Unfortunately, these separability criteria are not readily

expressible in non-geometric terms. The discussion developed in this

section provides the reader with scme non-geometric criteria that indi-

cate when the use of a non-parametric pattern classifier should be con-

sidered as a means of generating diagnoses for a medical or dental dis-


The first criterion is associated with a probabilistic measure of

symptom exhibition. Given a patient who exhibits sane set of symptoms

S, non-parametric pattern classification requires that P[SIC] = 1 for

the diagnostic alternative 'C.' that describes the patient's current
diagnostic status, and P[S Ck] = 0 for all other diagnostic alternatives
'Ck.' However, assume that for the disorder in question the probability

of exhibiting any relevant symptom has been calculated fran historical
data, that is, estimates of P[siCjC] are available for all relevant

symptoms si and all diagnostic alternatives Cj. Then, if the following
decision rule leads to the correct classification of a majority of the

patients with the disorder in question, utilization of a non-parametric
classification model should be investigated:

classify a patient who exhibits the set of symptoms S in the
j diagnostic alternative if

T P[silCj] > P[silCk] for all kj. (1)

s.iS s.iCS

Since (1) holds if and only if

log [T P[silCj]] > log [T P[silCk]] for all k-j,
.eS siES

decision rule (1) can be expressed in terms of logarithms. Let the set

of symptoms S be represented as a row vector a with the elements of a
assigned values as follows:
ai = 1 if symptom s is an element of S

and ai = 0 if symptan s is not an element of S,

where n is the total number of relevant symptnos. Form the column vectors

Wj = [log P[sllC], log P[s2|Cj],..., log P[snlCj]T

Then log [ P[si.C.]] = aW., and decision rule (1) can be restated as


classify a patient who is characterized by the vector a in the
j diagnostic alternative if

aW. > a for all kij. (2)

Note that decision rule (2) is identical to the decision rule employed

in non-parametric pattern classification.

This equivalence implies that if (1) holds for every preclassified

patient examined, the values log P[siC j] form a set of feasible discrim-

inant-function weights. If (1) leads to the correct classification of

a majority of the patients examined, it is logical to assume that there

may be a set of feasible discriminant-function weights. This assumption

was examined using the craniofacial pain patient data. From the data

vectors classified in Diagnostic Alternatives 13, 14, and 15, a total of

189 patient visits, the P[siC .] were calculated. Each data vector was

then classified with decision rule (1), and 164 of the data vectors

(86.7%) were assigned to their pre-established diagnostic alternative.

The second criterion provides a subjective measure of the feasibil-

ity of using a non-parametric pattern classifier. If symptoms for most

of the diagnostic alternatives, associated with the disorder of interest,

can be isolated such that

1. a patient's exhibition of a subset of these symptoms leads

the practitioner to a selection of one of the diagnostic

alternatives, or

2. a patient's exhibition of a subset of these symptoms leads

the practitioner to eliminate from further consideration

one of the diagnostic alternatives,

then the use of a non-parametric classifier as a means of generating

classifications should be investigated.

The linear non-parametric classifier employes a weighted sum of

the symptoms exhibited by each patient in its discriminating functions.

If symptoms can be isolated that are significant to the classification

of patients with the disorder under investigation, then there is a

'natural' weight for each of these symptans in the decision-making pro-

cess used by the practitioner. The existence of these natural weights

increases the probability that a training algorithm will be able to find

a feasible set of discriminant-function weights. Indeed, the relative

importance of the significant symptoms may be reflected in the magnitude

of the discriminant-function weights generated by the application of a

training algorithm.

As an example, the significant symptoms associated with two cranio-

facial-pain diagnostic alternatives, Alternatives 4 and 14, were isolated

by Dr. Fast. A comparison of these symptars and their associated dis-

criminant-function weights revealed a high degree of correlation between

symptom significance and discriminant-function weights, see Table 2.

The reader should note that both of the criteria discussed in this

section are heuristic approximations to the gearetric requirement for

pattern space separability. However, if the disorder under investigation

meets one or both of these criteria, it may be possible to employ a non-

parametric classifier to diagnose the disorder since the requirement for

pattern space separability is most likely met.



Diagnostic Alternative 4: Temporamandibular Joint Arthritis-Traumatic
Significant Symptoms Weights

(+) Duration of Pain (less than 3 weeks) + 3

(+) History of Trauma (accidental) +30

(+) Preauricular Pain +11

(-) Salivary Gland Disease -12

(-) Otitis 1

(discriminant-function weights for Diagnostic Alternative 4 range
from -19 to +37)

Diagnostic Alternative 14: Myofacial Pain-Dysfunction Bruxisn
Significant Symptams Weights

(+) Duration of Pain (more than 6 weeks) +15

(+) Facets + 2

(+) Bruxism and/or Clenching +56

(-) History of Trauma (accidental) -16

(-) Salivary Gland Disease 5

(discriminant-function weights for Diagnostic Alternative 14 range
fran -23 to +56)

Note: For both Diagnostic Alternatives
(+) indicates a symptom that leads the practitioner to classify
a patient in that diagnostic alternative

(-) indicates a symptom that leads the practitioner to classify
a patient in saoe other diagnostic alternative


3.3 Mbdel Validation

Validation of the craniofacial-pain diagnostic-classification

model presented in Section 3.1 has been accomplished by three types of

validating procedures. The discussion presented in the preceding sec-

tions, and in particular the relationship between significant symptans

and their associated weights shown in Table 2, reveal a close proximity

between the decision-making process the practitioner utilizes and the

non-parametric classifier's symptam-weighing scheme. This section pre-

sents two other procedures employed in evaluating the diagnostic clas-

sification model's performance.

The first procedure involved testing the diagnostic accuracy of

the classification model on patient data that were not employed in model

construction. Six classification tests were run in sequential order.

In the first five of these tests random samples of 50 patient-data-vec-

tors were drawn frcm the data base of 480 vectors discussed in Section

3.1. Then, as each of the tests was performed, the training algorithm

in Appendix B was applied to the remaining 430 data vectors. With the

weights derived from the training algorithm, the sample of 50 patients

was classified. The model-generated classifications for each of the

data vectors were compared to the classifications assigned to the vectors

when they were created. As each test classification of a sample was

completed, the diagnostic classifier's discriminant-function weights were

set equal to zero, the sample of data vectors was returned to the data

base, and the next test's random sample was drawn. A summary of the re-

sults of these tests of diagnostic accuracy is presented in Table 3.

In each of the first five tests it was possible for a patient who

has had multiple practitioner-visits to have same of the vectors repre-



Number of
Data Vectors













Number of
Data Vectors
Correctly Classified Classifier Accuracy







Mean Classifier Accuracy 89.7%

Standard Deviation of Classifier Accuracy 3.5%





senting these visits in a test's randan sample and sane vectors used

in model construction. Such occurrences lead to test results that over-

estimate classifier accuracy. Hence, in Test Six, a random sample of

all of the patient data associated with 40 patients (a total of 51

patient data vectors) was selected. This sample was classified by the

diagnostic-classification. model using the remaining 429 data vectors as

a data base. The results of this test are included in the data shown

in Table 3. There is one other possible factor affecting the classifier's

accuracy as measured by these tests. It is conceivable that there were

duplicate data vectors in the data base of 480 patient-data-vectors.

If duplicates do exist and were included in both the test samples and

the samples' training bases, measures of classifier accuracy will be

overly optimistic. However, since 'noise' is introduced by the variabil-

ity among craniofacial-pain patients and generated in the practitioner's

transcribing of the elements of patient data into the data-vector format,
and since there are 2295 possible data vectors, the probability that two

or more of the data-based patient vectors include an identical specifica-

tion of data-vector elements is small enough to justify neglecting this

possibility and its effects.

The results sunmarized in Table 3 reveal that the diagnostic-clas-

sification model performs well in duplicating the diagnostic classifica-

tions originally assigned by the reviewing practitioners, Dr. Fast and

Dr. Mahan. Moreover, the size of the test samples was quite large in

relation to the data base employed in developing each test's diagnostic

model. As new data became available and are incorporated in the para-

meters of the model, the accuracy of the craniofacial-pain diagnostic

classifier can be expected to increase slightly.

The second validating procedure established a measure of variability

on the diagnostic classifications that might be given by different dental

practitioners. The discussion presented in Section 1.1 related the dif-

ficulties associated with diagnosing craniofacial-pain disorders. Prac-

titioners with varying kinds of professional experience can be expected

to reflect their dissimilar backgrounds in differing diagnostic classi-

fications for these patients. To measure the variability associated with

dissimilar backgrounds, five craniofacial-pain data vectors were selected

from the data base employed in constructing the craniofacial-pain diag-

nostic classifier. Four dentists frcm the staff of the College of Den-

tistry at the University of Florida were asked to review these patient

data vectors and assign to each of them a diagnostic classification.

Table 4 summarizes their assignments and also includes the diagnostic

classification originally given by the reviewing practitioners.

The variability in diagnostic assignments reflected in Table 4 re-

affirms the justification for the research objectives set forth in

Section 1.2. Some of the differences in the practitioners' choices of

diagnostic classifications can be explained by the limited amount of

data contained in each of the data vectors, and the less-than-full med-

ical statement of each of the diagnostic alternatives. Nevertheless, a

diagnostic-classification model that generates classifications that are

in 90% agreement with those of experts in the field provides a sizeable

improvement over the variability in classification assignments exhibited

in Table 4 in which only half the respondents agreed on a single diag-

nosis in four out of five cases.



Diagnostic Classification for

Patient 1 Patient 2 Patient 3 Patient 4 Patient 5+

Classification 4 13 15 15 9

Practitioner 1 1 7 15 15 3

Practitioner 2 6 12 15 8 3

Practitioner 3 4 15 15 15 13

Practitioner 4 4 15 15 14 *

* No classification given

+ Patient 5 exhibited a minimal amount of input data (only 17
non-zero data-vector entries)

These four dental practitioners exhibited 100.0% agreement of the
diagnosis on one of the five patients, and 50.0% agreement on the
diagnostic classification of the remaining four patients.

3.4 Minimum-Cost Symptom-Selection Algorithm

The craniofacial-pain diagnostic-classification model detailed in

the previous sections of this chapter has been structured upon the data

vector of the 295 relevant signs, symptoms, and items of patient history

shown in Appendix A. To utilize this model, the practitioner must ex-

amine a patient for the presence or absence of each of these data vector

elements. Although the cost in time and fees varies fran item to item,

there is an expense to the practitioner, and to the patient, associated

with checking each element in the data vector. Hence, it is logical to

investigate the possibility of finding a reduced data vector that 'costs'

less for the patient and practitioner to use and yet still permit cor-

rect classification of all craniofacial-pain .patients.

A review of the literature (see Meisel [23] Chapter 9 for a survey)

reveals that many authors have considered the task of selecting a set

of features to be used in a pattern-classification scheme. Traditional

methods of viewing this problem are based on a search for a transforma-

tion that takes a given set of patterns into scme 'new' pattern space

where separation by discriminant functions is possible. Measures of

pattern class separability are employed to evaluate the effects of

transforming the set of patterns from one space to another. In general,

these transformations take a pattern representation in 'n' features and

create a set of 'r' (r
'new' features are linear combinations of the original features. How-

ever, to reduce the 'costs' associated with using the craniofacial-pain

diagnostic classifier, a transformation must be found that decreases

the size of the data-vector pattern space by eliminating features rather

that combining them. For example, assume patients were diagnosed on

the basis of body-temperature and blood-pressure readings. Traditional

techniques for feature selection might employ a linear combination of

body temperatures and blood pressure measurements as one 'new' feature.

The transformation sought in this investigation would lead to the clas-

sification of patients by either body temperature or blood pressure

alone if this were possible. This example will be used again in Section

3.4.1 to illustrate the algebraic and geometric structure of the problem.

Nelson and Levy [27] have attacked the problem of selecting a re-

duced set of unaltered features for use in a classification scheme.

These authors attach a cost to the use of each available feature, and

employ a ranking scheme to measure each feature's discriminating power.

Then, under a restriction on the total cost of features employed, they

develop an algorithm that selects the set of features that maximizes the

classifier's discriminating power. Unfortunately, their scheme does not

guarantee the selection of a subset of original features that contain

enough 'information' to permit pattern class separation by discriminant

function. Therefore, a new algorithm is presented in this section that

minimizes the cost of the set of features used by the pattern classifier

yet insures that all patterns can be correctly classified by a set of

linear discriminant functions. In the remainder of this section the

more general terms 'feature,' 'pattern,' and 'pattern class' will be

used respectively to represent a data vector item, a patient's data vec-

tor, and a diagnostic classification.

The problem of finding a minimum-cost collection of features would

not be considered if there did not already exist a set of 'n' features

by which the patterns under examination could be correctly classified

by linear discriminants. That is, given a 'n' dimensional representa-


tion of each of the 'm.' patterns in each of the 'p' pattern classes

m m m m
i = [ail ,ai2 ,...,ain ,], ml-1,2,...,mi, i=-,2,...,p,

amc k=l,2,...,n, equals either zero or one, there must exist

a set of 'n+1' dimensional W.'s, j=l,2,...,p, such that

a." (W.-W.) > 0 for all m=l,2,...,m. (3)




Letting A. be. the mi- (n+l) dimensional matrix of patterns in pattern-

class i, then the requirement of (3) can be written in the following


A(W.-.) > 0 i=1,2,...,p

j=l,2,...,p .


If such pattern representations and W. 's exists, then a solution to the
following problem yields a minimum-cost collection of pattern-classifying


P1: minimize CX

subject to Ai[X O(Wi-Wj)] > 0 i=l,2,...,p




S1 1 1
ai ai2 ... a. 1
where A. = i in
_l2 2 2
a 2 a 2 a 1


m. m. m.
a. i a. L .. a. i 1
11 1ii ...2 in

Wi = [wil' i2'''' win' in+1

C = [C1C2,...,cn,0]

X_ = [x1,x2,...,xn,1T

and ik is an unrestricted variable

cj is the cost of using feature j

x 0 if feature i is not used

1 if feature i is used

Note: The [ notation is to be read as element by element

multiplication i.e., QOR = S [si] = [q.ijr.ij].

3.4.1 Algorithm Development

The algorithm developed to solve problem P1 is an enumerative

algorithm similar in structure to that of Balas [28]. Unfortunately,

the non-linear nature of problem Pl's constraints prohibits full imple-

mentation of the more powerful techniques used in implicit enumeration

on linear integer problems. The structure of these constraints and

their effect on the optimization of P1 will be discussed in a step-by-

step development.

The minimum-cost feature-selection algorithm does not solve P1 to

the extent of finding the values of the vectors W., i=l,2,,,.,p. This

algorithm does find the minimum-cost collection of features X* and the

total cost associated with using these features, and guarantees the

existence of W. vectors associated with this optimal feature set. Given

this guarantee, the modified fixed-increment algorithm frcm Appendix B
can be employed to find the vectors W., i=l,2,...,p.

Choose same solution to P1. By hypothesis there exists at least

one solution (X,Wi,W2,... ,W) to P1 where X = [1,1,...,1,1]. Suppose

there is sauce other solution (X, 2... ,W') where one or more elements

xi in the X vector are equal to zero. For the constraint matrices in P1,

A. [X [ (W -W.) > 0 i=1,2,...,p



If the matrix products [A. X] = A., i=l,2,...,p are constructed, then

each set of constraints in P1 can be written in the form

(Wi-W.) > 0 i=l,2,...,p (4)


j i.

The creation of the A. is called the zeroing process. Of the col-
umns of A., A. retains all columns j of A. where x. = 1, and substitutes
1 1
a column of zeros for each of those columns k in Ai where xk = 0. Using

the zeroing process, the feasibility of any possible solution vector X

to P1 can be examined in terms of the A. O X this vector X creates.

As an example of the zeroing process for a particular set of patterns,

let a be a two-dimensional patient-data-vector a1 = [aai] where
[ 2
i if patient i has normal body temperature

1 if patient i has abnormal body temperature


i O if patient i has normal blood pressure
2 1 if patient i has abnormal blood pressure .

Assume two diagnostic categories, X and Y, where data vectors a and
2 1 2
aX are reclassified in category X and data vectors y and a are pre-

classified in category Y.

1 2 1 2
If a = [1,0], a = [1,11], = [0,0], and a = [0,1]

then =[ 0 and A = 01.

Graphically the pattern space can be represented as

2 2
1 1
a ax .

Consider the vector X= I then [Ax X] = [1 1 and [A X] =[0

Graphically the pattern space, as transformed by X can be represented as

12 12


The vector X effectively creates a representation of each patient data

vector in terms of the patient's body temperature alone.

Note that relation (2) is the requirement for pattern separability

by linear discriminants. Hence, a vector X is a component in a feasible
^ A A ^
solution (X,W ,W ,...,W ) to P1 if and only if there exist W. i=l,2,...,p,
-1K P-1
such that (2) holds for all ifj. As discussed in Section 3.1, a pattern

space is linearly separable, and hence, feasible W. exist, if and only if

the individual pattern classes have non-intersecting convex hulls. For

the pattern vectors considered in this section, the individual components

of each of the patterns in each pattern class are either zero or one. As

there is a one-to-one correspondence between the individual patterns in

a pattern class and the vertices of the pattern class's convex hull, the

convex hull of a pattern-class Ai can be expressed as all convex combina-
^m Consider
tions of the individual pattern-class vectors ai, m=l,2,... ,m.. Consider

the following examples of the convex-hull representation of linear separa-


Assume a = [1,0], aX = [1,1], a = [0,0], and a2 [0,1].

Graphically this pattern space can be represented as

2 2

Feature 2 -Y *-0 X
1 1

Feature 1
1 2
where the line X from al to a2 represents the convex hull of pattern-
1 2
class X and the line Y from a_ to a_ represents the convex hull of

pattern-class Y. Since X and Y do not intersect, implying that the

space is linearly separable, it is possible to draw an infinite number

of lines 0 that serve as discriminating hyperplanes.

1 2 1 2
Assure aX = [1,0], a = [0,1], a = [0,0], and a = [1,1].

Graphically this pattern space can be represented as

2 2

Feature 2 --

Feature 1

1 2
where the line X from a_ to a_ represents the convex hull of pattern-
1 2
class X and the line Y from a_ to ay represents the convex hull of

pattern-class Y. Since the lines X and Y intersect, the pattern space

is not linearly separable, and hence, it is impossible to draw a discri-

minating hyperplane 0.

Therefore, the following condition is equivalent to condition (4):
A t
a vector X is feasible to Pl if and only if there do not exist Us and U

such that
^ t ^
UA = U A for any s=1,2,...,p (5)




.i i i i
= [Ul,u2'"..um

uk > 0 for all k=l,2,...,m.

1 uk = 1 for all i=l,2,...,p.

Checking the feasibility of some vector X by condition (5) yields

[p(p-l)]/2 distinct subproblems. Each of these subproblems may be

characterized as follows:
let A = A and A = B with A and B having columns a.

and bj respectively for any A and At.

m. m.
P2: Find u. > 0, u.=l, and v. > 0, E3 v.=l
i=l 1 3j=l 1
such that
m. m.
E1 u.a. = Z3 v.b.
i=l 7 j=l 3

If such u. and v. exist for any one of the subproblems then X is not
1 J
feasible to Pl. Because the number of subproblems is large even for a

relatively small number p of pattern classes, there is justification for

seeking methods to expedite the solution of each subproblem P2.

To achieve this goal, a series of conditions will be presented that

characterize same of the criteria necessary to the existence of a solu-

tion to subproblem P2. In addition to establishing criteria for exis-

tence, these conditions provide a means for reducing the size of the

matrices A and B. This reduction will be discussed after the conditions

are established.
th k
Condition 1: If the kth row of A has all el ments ai, i=1,2,... ,m

equal to zero (one) and the kth row of B has all
elements bk, j=1,2,...,m, equal to one (zero) then no
m. m.
u.>0, 3 u.=l and v.>0, Z3 v.=l exist such that
1 i=l 1 3 j=1 3

Justification 1:

m. m.
1 u.a. = E3 v.b.
i=l 1 j=l 3 3

Under Condition 1 there is no set of convex combina-
th th
tions of the k row elements of A and of the k row

elements of B such that the combinations are equal.

Condition 2:

Justification 2:

Hence, there can be no set of convex combinations

of the columns of A and of B such that the combina-

tions are equal.

m. m.
since no ui> 0, Z u.=l and v.>0, E1 v.=l
i=l 1 3- j=l 3
exist such that
m. m.
1 u.a. = 3 v.b.
i=1 j=l 3 3
m. m.
no u.>0, El u.=l, and v.>0, E3 v.=l
i=l j=l 3

exist such that

m. m.
Z1 u.a. = E3 v.b.
i=l j=l 3

th k
If the k row of A has all elements ai, i=l,2,...,mi,

equal to zero (one) and the kth row of B has all

elements bk, i=1,2,...,m., equal to zero (one), the

kth row of matrixes A and B can be eliminated without

loss of possible solutions to subproblem P2.

Under Condition 2 every convex combination of the k

row elements of A and of the kth row elements of B

are equal. Hence, a set of convex combinations of the

columns of A and of the columns of B are equal if and

only if the convex combinations of the remaining rows

(all rows except the k row) are equal, Symrbolically,

let aik denote the pattern a. whose k component has

been eliminated and similarly let bjk denote the

elimination of component k from pattern b., then as

m. m.
ZE u.a. = 3 v.b. ,
i=l 1 j=l 3 3

for any choice of
m. m.
u.>0, E1 u.=l and v.>0, EZ v.=l,
Si=l : I j=1 I

m. m.
E3 u.a. = E3 v.b.
i=l 1 j=l 3

if and only if
m. m.
EI u.a = v.b.
i=l i j=l 3jk

Condition 3:

Justification 3:

If the kth rw of A has all elements ai, i=l,2,... ,mi,

equal to zero, and some br equals one,
m. m.
no u.>0, 1 u.=l, and v.>0, v >0, E3 v.=l
1- i=l 1 r j=l 1

exist such that
m. m.
1 u.a. = E3 v.b.
i=l 1 1 j=l 31

Under Condition 3 any convex combination of the col-

umns of B that includes a non-zero product of the

column b results in a k row term greater than zero.

The value of the k row term for any convex combina-

tion of the columns of A is equal to zero. Hence, no

set of convex combinations of the columns of A and B

can be equal if the combination for B includes a

specification that vr>0. Symbolically,

if v >0,

then for any choice of vj, j=1,2,...,m., j3r,

Condition 4:

Justification 4:


m. A
where v >0 and 3 v.=l
r j=l

m. k m k
3 v.b > E u.a. =0
j=l 3 3 i=l

for any choice of u. such that u.>0 and i u.=l.
1 1- i=l 1

Hence, if v >0, there exist no u.>0, u.=l
r i=l 1
and v.>0, j/r, E3 v.=l such that
3 j=l 3

m. m.
3 u.a. = 3 v.b..
i=l I I j=l 3 3
th k
If the kt row of A has all elements a., i=l,2,... ,mi,
equal to one, and some b equals zero,
m. m.
no u.>0, I u.=1 and v.>0, v >0, 3 v.=l
1 i=l 1 3- j= 3

exist such that

m. m.
E1 u.a. = 3 v.b.
i=l 1 j=1 3 3

Condition 4 is similar to Condition 3 in that any

convex combination of the rows of B that includes a

non-zero product of the rth column yields a kt row

term whose value cannot equal any convex combination

of the kt row elements of A. Symbolically,

for any choice of u. and v., where v >0,

m. m.
3 v.b. < u.a. = 1.
j=l 3 3 i=l 1

Note that Conditions 3 and 4 can also be stated, and justified, with

the role of the elements of the A and B matrices reversed.

Given this set of four conditions, consider the following row par-

tition of the A and B matrices:-

A* B*

Al B

A= A" B= B1

A0 C
AC. B0

A0 B0

where by appropriate change of rows in A and B

1. every element in each row of Al is a one

2. every element in each row of B, is a one

3. every element in each row of A0 is a zero

4. every element in each row of B0 is a zero.

The partitions A, Bl, A, and B are the rows of A and B corresponding

to B1, Al, B0, and A0, respectively, and A* and B* are the remaining rows

of A and B. With this partitioning and the four previously established

conditions, the size of the data vectors associated with many of the

[p(p-l) /2 subproblems P2 can be significantly reduced. The reduction
process, Procedure 1, can be stated in this manner:

Step 1: If for same row k in Al (B1) each element in the corre-

sponding row of B1 (A) is equal to one, then row k

of A and B can be eliminated by Condition 2.

Step 2: If for same row k in A0 (B) each element in the corre-
spending row of B0 (AO) is equal to zero, then row k of

A and B can be eliminated by Condition 2.

Step 3: If for scene row k in AO (B0) the corresponding row in

B6 (AO) has all elements equal to one or if for same row

k in A, (B) the corresponding row in B1 (A,) has all

elements equal to zero, then this particular subproblem

P2 has no feasible solution by Condition 1. Procedure 1

and the search for a solution to P2 are terminated at

this point because the convex hulls of pattern-classes

A and B do not intersect.

Step 4: If for some row k in A, (B1) the corresponding row in

Bc (Ac) has one or more elements equal to zero, i.e.,

k k k kk k
b = b =...=b = 0 (a=a =...=at=0) then
r s t r s t

columns br,bs'... bt (ar,as, ...,at) can be eliminated by

Condition 3.

Step 5; If for same row k in A0 (B0) the corresponding row in

B0 (A0) has one or more elements equal to one, i.e.,

k k k kk k
br = bs =.=b = 1 (a=a =...=a =1) then

columns b,bs,...,bt (ar,a s....at) can be eliminated by

Condition 4.

Step 6: If the use of Steps 1, 2, 4, and 5 has eliminated all

elements of both matrices, then this particular subproblem

has an infinite number of feasible solutions by Condition

2. Procedure 1 and the search for a solution to P2 are

terminated at this point because the convex hulls of the

pattern-classes A and B intersect.

Step 7: If the use of Steps 1, 2, 4, and 5 has eliminated one or

more rows or columns from either matrix then repartition

the matrices and return to Step 1, otherwise terminate

Procedure 1.

In coding Procedure 1 for computer processing, there is no need to

physically partition the rows of the A and B matrices. Summing the

elements in any row of A or B reveals whether the individual elements in

the row are all equal to zero or are all equal to one. Given this infor-

mation, the steps from Procedure 1 determine whether a pattern is re-

moved from A or B, whether a row in A and B is removed, or whether the

procedure should be terminated because no feasible set of convex combina-

tions for P2 exists.

As an example of the use of Procedure 1 consider the set of matrices

A and B in subproblem P2 were

0 1 1 0 1 1 1

S1 0 0 0 0 0 0 0
A= B=
1 0 0 0 1 1 1 0

0 1 1 1 1 1 1 0

In the first application of the steps of Procedure 1:

1. Column 4 can be eliminated from matrix A by Step 4 and

2. Column 1 can be eliminated from matrix A by Step 5.

After the first application of the steps of the procedure

1 1 11 11 1

A 0 0 B 0 0 0 0
A= B=
00 1 1 1 0

1 1 1 1 0

In the second application of the steps of Procedure 1:

1. Row 1 can be eliminated from both matrices by Step 1

2. Row 2 can be eliminated frcm both matrices by Step 2 and

3. Column 4 can be eliminated frcm matrix B by Step 4.

After the second application of the steps of the procedure
0 0 1 1 1
A= B=
1 1 1 1 1

In the third application of the steps of Procedure 1:

1. Row 2 can be eliminated frcm both matrices by Step 1 and

2. Procedure 1 can be terminated by Step 3.

Hence, for this set of A and B matrices, subproblem P2 has no feasible


Although the use of Procedure 1 may lead to a reduction in the size

of most subproblems, the pattern vectors (ai and bj) for each of these

problems may still be quite large. Restating subproblem P2 as a linear

program yields

P3: minimize [0 0]

subject to IA -B [u= [
11...1 00...0

00...0 11...1

and U>0

where the existence of any solution vectors U* and V* signals the inter-
section of the convex hulls of pattern-classes A and B.

Consider the dual of P3, .written in the following form:

P4: maximize [0 1 11 1

subject to A 01 1 HI
-B l 0U 1 <

I,,Xl X2 unrestricted in sign,

Note that P4 may have many associated ir variables, but has only as many

constraints as the number of patterns in A and B (as reduced by Procedure

1). P4 always has at least one solution to its constraint set. Thus, if

an application of a linear-programming algorithm to P4 reveals the exis-

tence of an unbounded solution, then P2 has no solution. Therefore, if

and only if P4 has a bounded solution do ui and vj exist such that

m. m.
1 u.a. = E3 v.b.
i=l j=l

u. > 0, E u. = 1
and .
v. > 0, E3 v. = 1.
3 j=1 I

The preceding discussion with its development of a reduction proce-

dure and dual formulation provides the structure for a second procedure.

Procedure 2 establishes a mechanism to verify the feasibility of any

assignment of zeros and ones to the X vector of problem P1, see Figure 4.

That is, given some vector X and a set of patterns a., in=l,2,...,m.,

and i=l,2,...,p, the [p(p-l1)/2 subproblems P2 are formed by zeroing out



the appropriate pattern-vector elements. Then Procedure 1 is applied

to each subproblem. Finally, for each pair of pattern classes the

boundedness of the dual formulation P4 is examined. Vector X represents

a feasible set of a pattern-classifying features for P1 if and only if

each of the [p(p-l)]/2 subproblem formulations P4 is unbounded.

Before a statement of the algorithm to solve problem P1 is presented

several terms must be defined. The assignment vector is defined as a

listing of variables xi, elements of the vector X in Pi, whose values have

been determined by the steps of the algorithm. The elements in this vec-

tor are recorded with the value of their assignment, either zero or one.

These elements are entered in the vector in the order they were assigned,

with the first algorithm assignment in the first (left) position. For

example, consider the assignment vector

[x4 = 0, 10 =1, x2 = 0].

This vector records that the algorithm first assigned x4 equal to zero,

then assigned x10 equal to one, and its last assignment was x2 equal to

zero. Feasibility of a solution X, as determined by the assignment-vector

cc~ponent values, is checked by Procedure 2 with the value of those vari-

ables not included in the assignment vector temporarily set equal to one.

The value V of an assignment vector is defined as minus one times the

sum of the costs associated with each of the variables in the assignment

vector, multiplied by the value assigned to the respective variable.

For the example assignment vector, [x4 = 0, x10 = 1, x2 = 0], where

c4 = 5, cl0 = 2, and c2 = 7, the assignment vector has the value

V= (-1)-[5(0) + 2(1) + 7(0)] = -2.

3.4.2 State.rnt of the Minimum-Cost Simptcm-Selection Algorithm

Step 0: Create the assignment vector (at this point the vector is

null as there is no variable assignment in the vector).

Set V*=-= and go to Step 4.

Step 1: Start at the right side of the assignment vector and move

to left, stopping at the first variable assigned a zero

value. If no variable in the assignment vector has a

zero assignment, go to Step 2. Otherwise go to Step 3.

Step 2: Calculate V for the assignment vector. If V is greater

than V*, record the values of the variables in the assign-

ment vector as the optimal solution X* to P1. Otherwise,

record (as the optimal solution X* to P1) the values of the

variables in the best current solution X. Terminate the


Step 3: Change the value of the variable isolated in Step 1 to an

assigned value of one, and eliminate from the assignment

vector all variable assignments to the right of this new

assignment. If the assignment vector includes the assign-

ment x.=l for every xi in X return to Step 2. Otherwise go

to Step 4.

Step 4: Select a variable xk that is not an element of the assign-

ment vector. Assign this variable the value Xk=0 in the

assignment vector. Use Procedure 2 to check the feasibility

of this assignment. If the assignment vector is not fea-

sible, go to Step 6. Otherwise go to Step 5.

Step 5: If the assignment vector with the new assignment xk=0 does

not include an assignment for every xi in X, return to

Step 4. Otherwise go to Step 7.

Step 6: If the assignment vector with the assignment Xk=l (xk is the

variable selected in Step 4) does not include an assignment

for every xi in X, return to Step 4. Otherwise go to Step 7.

Step 7: Calculate V for the assignment vector. If V* is greater

than V, go to Step 1. Otherwise go to Step 8.

Step 8: Record as the best current solution X the values of the

variables in this assignment vector. Set V*=V, and return

to Step 1.

Note that in the course of applying this algorithm all solutions are

considered and the best current solution is replaced only when another

solution has a larger associated value. As the number of possible solutions

is finite, the algorithm must terminate, and at this termination the value

of the optimal solution and its assignments are known. An application of

the minimum-cost symptcm-selection algorithm is presented in Appendix C.

3.4.3 Computational Considerations

Returning to the setting of diagnostic classification of craniofacial-

pain patients, application of the minimum-cost symptom-selection algorithm
would require an enumeration (explicit or implicit) over 22 possible

solutions in order to find the optimal collection of data-vector elements.

As the number of possible solutions is prohibitively large, heuristic

modifications to the symptan-selection algorithm are required for this

application. One possible modification could employ the fact that only

a few of the elements in the patient data vector have large associated

'costs' for their utilization. In particular, the eight elements of

radiographic data and the two measures of emotional trauma are significant-

ly more 'costly' to examine than the other items in the data vector.

With this modification, the algorithm would only consider eliminating

these ten high cost features. Another heuristic approximation to the

optimal collection of features might rank the data-vector elements in

order of descending cost of utilization. Procedure 2 would then be used

to eliminate these components one by one, starting with the item of high-

est cost, until the procedure signaled an infeasible solution to P1. Cer-

tainly, other heuristics might also be developed to exploit the structure

of this algorithm.

3.5 Model Applications

The structure of the craniofacial-pain diagnostic-classification

model permits model utilization for a variety of purposes. Since the

model is developed in terms of general data-vector and diagnostic-alterna-

tive parameters, these model components can be altered to suit the appli-

cation in question. This section presents a brief discussion of sane of

the possible applications of the diagnostic classifier.

In a teaching environment, the diagnostic-classification model with

its set of discriminant weights can be stored for ccmputer-terminal ac-

cess. Then, on a set of tutorial example patients, students can compare

their diagnoses with those of the diagnostic model. Moreover, the student

can interact with the classifier in constructing his own 'sample' patients

for the classifier to diagnose. Finally, the student can request the

classifier to relate those discriminant-function weights that the model

employs in considering the 'significance' (Section 3.2) of any one or

group of symptoms.

The effectiveness of new diagnostic tests can be evaluated using the

minimirn-cost symptams-selection algorithm. This algorithm provides an

immediate nieasure of the 'worth' of new research developments. Given a

cost for employing a new test, the algorithm returns an evaluation of

the test's classifying capability. The algorithm reveals whether the

test is included in the mininum-cost collection of features and whether

the use of the new test permits the practitioner to discontinue other

examination procedures. Additionally, the algorithm can be employed to

point out new areas for research, as it isolates diagnostic alternatives

where correct classification of patients is difficult using existing tests

and procedures.

As employed in the practitioner's office, the diagnostic classifier

will provide a direct link between the practicing dentist and the kn3w-

ledge of experts in the field of craniofacial pain. Information will

flow over the link in both directions. As new patients are seen by the

practitioner, the record of each visit will be reviewd by experts and

then used to supplement the data base employed in model construction.

Then, when developments dictate, new sets of discriminant-function weights

can be transmitted to the dental practitioners. This kind of interaction

results in a more accurate and representative diagnostic classifier as

the patient-sample data base becomes larger.



The selection of treatment regimens for craniofacial-pain patients

is modeled as a 4arkovian decision process. The states in this Marko-

vian model are descriptions of a patient's health-care status and the

decision alternatives are feasible treatments for the patient's dys-

function (see Section 4.1). In the first two sections of this chapter,

motivation for the rodel structure is provided and the components of

the decision model are developed. The third section provides a descrip-

tion of the validating procedures used to determine the appropriateness

of the model and the model-generated treatment decisions. This chapter

closes with a discussion of potential teaching, research, and private

practice applications of the treatment-planning model.

4.1 Model Components

Several model-building components frcn the craniofacial-pain care

system are isolated to permit the construction of a Markovian represen-

tation of this system. A set of state descriptions that characterize,

for decision-making purposes, the status of craniofacial-pain patients

is presented in Section 4.1.1. Then transition probabilities measuring

the effects of treatment applications are discussed in Section 4.1.2.

Section 4.1.3 overlays the model's state descriptions and transition

probabilities with costs accrued during the patient's progression through

the care system. These components are integrated and verified in the

discussions of Sections 4.2 and 4.3.

Values for many of the treatment-planning model's parameters were

gathered from the set of patient records discussed in Section 3.1. As

the patient histories from the contributing university dental clinics

were reviewed, notations of treatment applications and time between suc-

cessive visits were made for each patient-practitioner interaction. The

values of the remaining model parameters were either estimated by the

reviewing practitioners, Dr. Fast and Dr. Mahan, or were gathered from

responses to questionnaries completed by patients who visited the

University of Florida's Dental Clinic. In modeling the complicated pro-

cess of care for craniofacial-pain patients, several simplifying assump-

tions were made. This section provides the motivation for these assump-

tions and presents the notation employed in the analytic description of

the treatment-planning process.

4.1.1 Patient States

In general, a Markovian system structure requires that the current

state of the system completely characterizes the probabilities associated

with future state occupancies of the system. To fully satisfy this

Markovian condition for state structure in the craniofacial-pain treat-

ment-planning nodel would require that the model include as distinct mod-

el states every possible combination of diagnostic classifications a pa-

tient might have occupied, in conjunction with every combination of treat-

ment applications he might have undergone, during his stay in the care

system. Unfortunately, such a model would have an infinite number of

'patient states.'

However, for a majority of craniofacial-pain patients the know-

ledge of a patient's prior treatment record, coupled with his current

diagnostic classification, is adequate to determine his prior diagnostic

classifications. Even in the cases where the current classification

and prior treatment record do not provide a total description of a pa-

tient's condition, these elements of patient status do provide signifi-

cant information about the probabilities associated with a patient's

future status in the care system. For example, in the data employed in

model construction, 47 craniofacial-pain patients occupied Diagnostic

Alternative 15 and were treated with an application of drugs at least

once. Eight of these patients were 'well' after a first treatment with

drugs, while 39 required multiple applications of drugs or other treat-

ments during their stay in the system. Yet of the 12 patients who were

given two applications of drugs, 9 were 'well' following the second

repetition of drug therapy. Thus, while the overall data-based transi-

tion-probability estimate for a transition from Diagnostic Alternative

15 into the well state following any one application of drugs is .36,

the transition-probability estimate for a transition into the well state

following two successive applications of drugs is .75. Hence, for this

diagnostic classification, information on the prior application of drugs

is important in determining a patient's future status in the care system.

This form of 'current diagnostic classification augmented by treat-

ment record' patient-state description is employed in the craniofacial-

pain treatment-planning model as an approximation to a 'true' Markovian

state structure. Each of the diagnostic alternatives shown in Figure 3

forms the basis for a collection of patient states. The diagnostic al-

ternative is augmented with a record of treatments that have been applied

since the patient entered the care system. Appendix D provides a list

of the treatment alternatives that may be prescribed for craniofacial-

pain patients. The record of each treatment given to the patient is noted

in the patient-state descriptions without regard to its chronological

order. For example, a patient's occupation of the state 'JI1,2,2'

denotes that he is currently classified in diagnostic alternative J,

and that since he entered the care system he has been treated with one

application of treatment 1 and two applications of treatment 2.

.Augmenting the patient-state descriptions with treatment history

expands the dimensionality of the state space, yet the number of history-

augmented states remains finite for two reasons. The treatment records

used in model construction reveal that, for sane combinations of diag-

nostic alternatives and treatment applications, there is a feasible

limit to the number of treatment repetitions that can be given to any

one patient. Thus, the first reason for a finite state space is that no

patient state in the treatment-planning model includes more repetitions

of a particular treatment than the clinical data have established as a

feasible limit. As an example, the records of patient visits used in

model construction establish a feasible limit of only one application

of treatment 18 for patients classified in any of the diagnostic alter-

natives. Therefore, the treatment-planning model includes patient states

that exclude treatment 18 as a portion of their treatment history or

exhibit the form

'JI. .. ,18,...'

for each diagnostic classification 'J' where 18 is a feasible treatment.

The second reason for a finite state space is that there is a 'boundary

application' of many treatments such that neither the treatment-record

data nor the reviewing practitioners established differences between the

transition probabilities for the boundary application and those for

further repetitions of the treatments (see Section 4.1.2 and Appendix E).

In Diagnostic Alternative 13, for example, the first application of treat-

ment 24 is the boundary repetition of that treatment. Hence, multiple

repetitions of treatment 24 are not added to the state description of

patient states based on Diagnostic Alternative 13, as the additional

information on multiple applications does not influence transition pro-

babilities associated with this treatment's effectiveness. Thus, a

second application of treatment 24 for a patient who continues to be

classified in Diagnostic Alternative 13 places the patient in a state

of the form

'131 ...,24,....

The craniofacial-pain treatment-planning model includes two terminal

patient states in addition to the patient states that are based on diag-

nostic alternatives. One or the other of these two terminal states,

'well' or 'referred,' represents the patient's status when he exits the

care system. A patient exists the system in the 'well' state when the

effects of treatment applications result in sufficient improvement so

that no further treatment is required. The patient moves into the 're-

ferred' state in lieu of further treatment. This alternative to treat-

ment is selected when the 'expected costs' of remaining in the care sys-

tem exceed the costs of referring the patient to another source of care

(see Section 4.1.3).

4.1.2 Transition Probabilities

Patient-state transitions that involve a change of diagnostic clas-

sification follow one of two basic formats, see Figure 5. For the initial

diagnostic classifications in Format I, with each treatment application,

the patient either remains in his original diagnostic classification or

he transits into the well state. For Format II, the six diagnostic al-

ternatives shcwn in the lower illustration form a different structure.

Format I

Patients whose first-visit diagnostic classification is Diagnostic

Alternative 1, 2, 3, 4, 5, 6, 10, 11, 14, 16, or 17, make transitions out

of their original classification 'I' according to the following figure:

Format II

For patients originally classified in Diagnostic Alternative 7, 8, 9, 12,

13, or 15, the following kinds of diagnostic-classification transitions

are possible:



Here it is possible for the patient to alternate between any one of

several diagnostic classifications during the course of his stay in the

care system. Note that in both formats for diagnostic-classification

transitions a patient moves into the referred state not as a result of

a treatment application, but rather as an alternative to further treat-


To these underlying diagnostic-classification transitions the cranio-

facial-pain treatment-planning model adds a record of the changes in

treatment history. Appendix F displays complete charts of all of the

diagnostic-alternative-based patient states included in the treatment-

selection model. In these charts the patient states are connected by

arcs that represent feasible transitions from one state to another. Not

shown in the charts are the well and referred patient states and the arcs

that connect every diagnostic-alternative-based state with these terminal


Howard [25] establishes that in terms of the policy decisions gen-

erated by a Markovian decision model, holding-time distributions are im-

portant only insofar as they affect the mean weighting time in each sys-

tem state and the expected costs of each state occupancy. The records

of the patient visits employed in model construction revealed that, in

the care of the patients described by the data, one or more treatments

were prescribed at each visit, and a series of return visits was scheduled

for the patient following his initial interaction with the practitioner

if return visits were warranted. Under these conditions, specifying

holding-time distributions for the time between successive patient-state

transitions does not refine the model. Therefore, the treatment-planning

model employs a Markovian rather than semi-Markovian representation of

the care system, since a 'n' visit holding time in a particular patient

state can be modeled with no loss of information as 'n' repetitions of

the 'virtual' transition frcm the state in question to itself. Care for

craniofacial-pain patients is modeled as a discrete-stage Markovian sys-

tem with the beginning of visits to the practitioner serving as stage


Using the history-augmented patient states, transition probabilities

are specified in terms of the treatment that generated the transformation.

In making a state-transition following a treatment, a patient must move

to a state that includes that treatment as a portion of its state descrip-

tion. For example, following application of treatment 'k,' a patient

must progress frcm patient-state 'IIm,n' to 'JIk,m,n' where 'I' may be

equivalent to 'J.' The only exception to this rule is in the application

of a treatirnrt beyond its boundary number of repetitions. Here, if treat-

nmnt 'k' has a boundary number of two, then following an application of

treatment 'k' three or more times a patient progresses from patient state

'IIk,k,m,n' to 'JIk,k,m,n' where again 'I' may be equivalent to 'J.'

This structure is indicated because inclusion of more than the boundary

number of applications (two in this case) in the state description does

not affect the transition probabilities.

Estimates of the values of the transition probabilities were ob-

tained from the patient records discussed previously. A discussion of

the stability of these probability estimates under variations in patient

data is presented in Appendix E. Where the data on the effects of treat-

ment alternatives were limited, the data-generated probability estimates

were refined by estimates frnm the reviewing practitioners. Notationally,

transition probabilities are represented in the analytic model in the

following form;

pk = the probability of making a transition from

patient-state 'I' to patient-state 'J' following

the application of treatment-alternative 'k.'

4.1.3 Cost Structure

A patient's progression through the craniofacial-pain system gener-

ates a niltitude of implicit and explicit costs. The explicit costs can

be measured in terms of the dollar charges paid by the patient or the

practitioner during the patient's stay in the system. Other costs are

implicit in nature and can be quantified only as they relate to the

'opportunities' lost by the patient and the practitioner %wile the pa-

tient remains in the care system. For modeling purposes four major

system costs have been isolated. These costs are:

(a) Cost of treatment applications

(b) Cost of the practitioner and his staff's


(c) Cost to the patient of occupying a non-well


(d) Patient-referral cost.

Although these costs do not encompass all of the system costs, they mea-

sure significant explicit and implicit charges associated with a patient's

stay in this system. In the treatment-planning model, each of these costs

is charged on a per-patient-visit basis.

Costs of the various treatment applications and the costs associated

with the practitioner and his staff's services were estimated by the re-

viewing practitioners. Estimates of treatment and care-system service

costs were partitioned by diagnostic classification as well as treatment

category. The cost estimates reflect typical charges in a dental clinic


The inconvenience experienced by a patient in making a visit to the

practitioner was used as a measure of the cost of occupying a 'non-well'

patient state. Estimates of this inconvenience cost were gathered from

responses to a questionnaire completed by patients at the University of

Florida's Dental Clinic. These were general dental patients not neces-

sarily suffering from craniofacial pain. Figure 6 shows the distribution

of these patient estimates.

Values for patient-referral costs were composed of the sum of three

distinct estimates. The first component was an estimate of the total

fee charged by the practitioner receiving the referred craniofacial-pain

patient. Record transferral and duplication costs, as well as the fees

lost by the referring practitioner, formed the second component. The

third component of the patient-referral cost is a measure of the incon-

venience experienced by the referred patient, a value estimated by using

a multiple of the value of the inconvenience cost discussed in the pre-

ceding paragraph. Appendix G provides a justification for using this

particular combination of components in the referred-cost estimates.

Symbolically, the patient-state transition costs (negative constants)

are represented in the analytical model as
c j = the sum of the costs generated by the transition

from patient-state 'I' to patient-state 'J'

following the application of treatment 'k.'

This sum includes the type (a), (b), (c), and (d) costs appropriate to

each patient-state transition.

Fifty-eight patients at the Unive-rsity of Florida's Dental Clinic responded

to the following questions:

How much would you estimate that this trip to the
Dental Clinic cost you in terms of lost wages, baby-
sitting fees, transportation costs, and other costs
that you may have had to pay so that you could
be here for your appointment?

The distribution of these. estimates is shown in this histogram.





0.- 1.- 10.- 20.- 30.-
.99 9. 19. 29. 39.

40.- 50.- 60.- 70.- 80.-
49. 59. 69. 79. 300.

The mean value for these 58 estimates of patient-visit inconvenience costs

was $30.72.



4.2 Selection of Optimal Treatments

The craniofacial-pain treatment-planning model is transient in the

sense that only two of the model's patient states, well and referred, can

represent the patient's status when he exits the health-care system. In

a stochastic sense, only the terminal states' are recurrent as they alone

possess non-zero long-run probabilities of state occupancy. Hence, the

choice of treatment alternatives at each patient state is made with the

goal of minimizing the costs accrued by the patient as he passes through

the diagnostic-alternative-based patient states into one of the recurrent


For notational convenience, in the analytic model the well patient

state is denoted as state 'W' and the referred state as state 'R.' In

modeling the care system for craniofacial-pain patients there is no

justification for providing costs for the transitions from states 'R'

and 'W' to themselves, hence, 'cR,R and W' are set equal to zero.

Analytically, the treatment-planning model is made monodesmic; i.e.,

having only one recurring state, by defining pR,W=1 and p WR0. The

total number of states, not including states 'W' and 'R,' is denoted by

'S.' With these definitions and the notation introduced in the previous

section, a procedure for selecting the set of optimal treatment decisions

is developed.

Howard [25] has shown that for a monodesmic, transient Markovian

decision model, a set of optimal decisions is defined as those decisions

that maximize the expected-value 'v of occupying each system-state 'I.'

Since the treatment-planning model for craniofacial-pain patients fits

into this category of decision model, a modification of Howard's algorithm

is employed to select optimal treatment regimes. The process of select-

ing an optimal set of treatments is accomplished by finding the set of
treatment alternatives kl,k2,... ,k that maximize each of the vI (the

expected value of occupying patient-state 'I' given treatment alternative

'k ') where
kI k k\
Iv = r + p P v I=1,2, ...,S
all patient
states J

kp kI kI
I all patient P
states J
With treatment-augmented patient states, maximizing the v can be

carried out in the following manner:

1. Group for simultaneous analysis all patient states possessing

a common treatment history, where one or more of the treatments in this

history are at their boundary level. Each of the 'T' sets of states

complying with this description forms an analysis set B., j=1,2,...,T.

2. Label sequentially the patient states, starting with state W

as 1, state R as 2, and then selecting numbers for the remaining unlabeled

patient states on the basis that the one with the most treatments in its

history receives the next number-label. For example, state 'JIl,2,2,4'

would be labeled with a smaller number than state 'JI2,6,6.' When the

numbering scheme reaches the members of one of the analysis sets isolated

in Step 1 (above), numbers for the members of that set may be arbitrarily

assigned. Given this state numbering scheme, the selection of optimal

treatments can proceed dynamically since for each state I that is not a

member of an analysis set, I=1,2,...,S, I/Bj, j=1,2,...,T


and for the states of set B, .j=l,2,...,T
V = r + Z p V + E p i IcB.
JeB. J=l

where t = the number of last non-group B. state imme-

diately preceding the smallest number-labeled

state in B..

Thus, the process of selecting optimal treatments proceeds recur-

sively from the state of smallest number-label to the one of largest

number-label, stopping to consider simultaneously the values of a number

of states only when an analysis set is encountered.

Howard's value iteration and policy improvement algorithm [25] is

employed only in the case of selecting treatments for the analysis-set

patient states. An example of this section's labeling and optimization

procedure is presented in Appendix H.

This optimization procedure was applied to the states of the cranio-

facial-pain treatment-planning model. Appendix G presents a list of the

optimal treatment selections for each of the model's patient states.

4.3 Model Validation

Validation of the craniofacial-pain treatment-planning model was

accomplished in two phases. In the first phase of validation, the indi-

vidual components of the Markovian representation were examined by the
reviewing practitioners. The second phase of model validation compared

model-generated treatment decisions with those made by the reviewing ex-

perts. In addition, statistics generated by the model were compared to

the care-system description provided by the patient records from the

university dental clinics. This section discusses the results of these

validating efforts.

The review of model components was accomplished as values for the

model parameters were collected. Some of the data-based estimates of

transition probabilities and boundary-level application numbers did not

conform to expert judgment about the effects and effectiveness of vari-

ous treatment applications. When these disparities occurred, the esti-

mates were modified to reflect expert judgment.

The general structure of the patient states was reviewed to insure

that the representation shown in Appendix F did in fact portray a set of

logical progressions through the care system. Although this examination

established the validity of the patient progressions, the review did

point out one deficiency in the model's structure. The number and types

of treatment alternatives available for use at each patient state were

determined by records of actual applications of these treatments in the

data used for model construction. It was the judgment of the reviewing

practitioners that in several cases the selection of treatment alterna-

tives for a patient state did not include the 'most appropriate' treat-

ment alternative. Nevertheless, model deficiency can readily be correct-

ed. With the collection of data on the effects of these 'most appropriate'

treatments, these additional treatment alternatives can be incorporated

as decision alternatives for the patient states in question.

The reviewing practitioners made selections of treatments for each

of the model's patient states. In those cases where the model's treat-

ment alternatives did not include the practitioners' 'most appropriate'

choice of treatments, the practitioners made a selection from the same

list of alternatives used by the model. Appendix G lists their choices

of treatment along with each model-generated selection. The two sets of

treatment plans include the same treatment selection for 87 out of 94

patient states, or 92.6% of the patient states. The 7 differences in

treatment selections arise in part from the approximations the treatment-

planning model employs in its representation of the care system and in

part fromnslight inconsistencies in the practitioner's treatment selections.

One last test was performed to verify the suitability of the Mark-

ovian representation of the craniofacial-pain care system. Mean transit

times through the care system to one of the terminal states were calcu-

lated using the model-generated treatment decisions, and each of six

first-visit patient states. These model-generated transit times were

compared to estimates of the same statistics gathered from the patient

records contributed by the university dental clinics. Table 5 presents

the values of both sets of statistics. The close correlation of these

values reveals that the treatment-planning model not only duplicates the

decisions of experts, but also provides a structure for gathering other

relevant information about the underlying care system.

4.4 bodel Applications

Like the diagnostic-classification model presented in Chapter 3, the

craniofacial-pain treatment-planning model has been structured to permit

its utilization in a variety of applications. Markovian modeling provides

an analytic representation of the craniofacial-pain care system as well

as establishing a means of making treatment selections. This section dis-

cusses applications of the model's analytic representation and treatment

selections in teaching, in research, and in practice.

The model-generated treatment decisions reveal which treatments are

most frequently used in the care of craniofacial-pain patients. In a

teaching environment, this information can be used to specify treatment-




Model Truncated Patient
For a Patient ~Wose First Generated Iodel- Record
Diagnostic Classification Was Estimate* Estimate+ EstimateV

Myopathy-Myositis 1.50 1.34 1.35

Oral Pathology-Dental Pathology 1.11 1.04 1.08

Vascular Changes-
Migrainous Vascular Changes 3.89 3.42 3.06

Myofacial Pain Dysfunction-
Uneven Centric Stops 1.86 1.43 1.50

Myofacial Pain Dysfunction-
Anxiety/Depression 3.87 3.47 3.18

Myofacial Pain Dysfunction-
Reflex Protective Muscular
Contracture 1.90 1.79 1.87

The values in these sets of estimates are specified in terms of the

number of patient visits in which the patient occupies a non-well or

non-referred patient state.

Note: The treatment-planning model considers the possibility of

'infinite duration' occupancy of non-well or non-referred


+ These truncated estimates were generated frcm the treatment-

planning model on the conditional basis that a patient must

transit into either the well or the referred state by his

fifth patient visit.

V The maximum number of visits for any patient described by

the clinical data was five patient visits.

application techniques that should be emphasized in training dental stu-

dents in craniofacial-pain care. Moreover, the parameters employed in

model development, in particular the transition probabilities and refer-

ral costs, are themselves valuable instructional materials in developing

the dental student's treatment-selection skills.

The treatment-planning model provides a method for evaluating new

developments in treatment for craniofacial-pain patients. With estimates

of the effectiveness of his new treatment, the researcher can use the

craniofacial-pain treatment-planning model to get two immediate responses.

First, the optimization technique of Section 4.2 will determine if this

new treatment provides 'better care' for the patient than any of the

other treatment alternatives the model has to choose fram. Second, if

optimal treatment selections for the model include the new treatment, the

model's statistics will show improvement in length of stay, and other

relevant measures of treatment effectiveness, introduced by using this

new treatment.

In the office of the practicing dentist, the treatment-planning mod-

el's decisions could provide a concise reference of the treatment selec-

tions suggested by experts in the field of craniofacial pain. Moreover,

the practitioner would have a chance to contribute to the refinement of

the listing as the treatment records of his patients could supplement

the data used in model construction. In addition, the practitioner could

employ the statistics associated with the treatment-planning model in

scheduling the length, and number, of his appointments for craniofacial-

pain patients.





This dissertation has presented analytic models of the decision pro-

cesses associated with diagnosing and selecting treatments for a partic-

ular health-care problem. The selection, construction, and testing of

these models have been discussed in sace detail. Meanwhile, the model

building effort itself has been the source of a number of insights into

decision-making in a health-care environment. These insights will be

reflected in this chapter's discussion of the dissertation's central re-

search conclusion and suggestions.of topics for future investigation.

The similarity between the decision-making processes employed by

the practitioner and the analytic structure of this dissertation's models

is quite revealing. In both diagnosis and treatment planning for cranio-

facial-pain patients it appears that the practitioner, like the analytic

models, makes 'first-order' decisions. The linearity of symptom signifi-

cance (a first-order polynomial of symptom weights), and the present-

patient-state dependency of transition probabilities measuring treatment

effectiveness (a first-order stochastic dependence) provide a means of

generating decisions that closely approximate the decisions made by dental

practitioners. This general conclusion on the applicability of first-

order decision techniques to craniofacial-pain diagnostic classification

and treatment planning characterizes the central development of this


Given this summary statement, there are several logical extensions

to this dissertation's research that should be examined in future inves-

tigations. The following suggestions identify some of the more fruitful

areas for further research efforts. These suggestions are ordered in

the author's view of their significance.

1. This dissertation's research found that first-order decision-

making models are valid descriptions of the underlying thought processes

employed by the craniofacial-pain practitioner. It is possible that these

first-order descriptive decisions are 'suboptiral' and that higher order

decision-making tools might yield prescriptive, or 'optimal,' diagnostic

classifications and treatment plans for craniofacial-pain patients. That

is, considering the interaction between significant symptoms and multiple-

state dependency for patient-state transitions may lead to optimal diag-

nostic and treatrent-selection decisions. As the models themselves can

readily be increased in their decision-making 'order,' an investigation

into this possibility would be hampered only by the necessity of collect-

ing an elaborate data base. Nevertheless, such an investigation should

be undertaken in this, the most significant, of future research areas.

2. As this dissertation's analytic models can be applied directly

to any health-care problem where there is verification that practitioners

make first-order decisions, one potential avenue of future research would

be to isolate those health-environments where these kinds of decisions

are made. However, a word of caution is interjected at this point. Math-

ematical modeling demands an underlying structure for the process being

modeled. Yet, in a process dealing with a product that is subject to

considerable variation, such as the care of a patient in a health-care

system, isolating an underlying process structure is difficult. Moreover,

the problem of finding process structure is compounded in the health-care

field by a lack of unifying and consistent nomenclature. In the health-

care field, scholarly literature and historical precedent can serve as

the justification for two or more contradicting sets of terminology for

the same anatomical structure or physiological process. Thus, in re-

searching the generality of first-order decision-making techniques, the

investigator must consider process variability and nomenclature incon-

sistency before he mrrkes any statement about the applicability of this

dissertation's decision-making tools to other health-care environments.

3. A non-geometric discussion of the criteria for pattern space

separability was presented to provide a means of characterizing health-

care disorders for which diagnostic classification by a linear pattern

classifier might be feasible. Unfortunately, this dissertation's tech-

niques are heuristic and do not provide an exact reproduction of the

underlying mathematical specifications. Future research in this area

could lead to a precise statement of non-geametric criteria for linear

separability, and thus provide an indirect means for evaluating potential

applications of linear non-parametric classifiers.

4. This dissertation's minimum-cost symptom-selection algorithm

represents a clear departure frcm previous research in feature selection.

The algorithm's utilization of the convex-hull representation of pattern

space separability makes this development unique in the literature of

feature selection. However, the algorithm's method of checking the fea-

sibility of potential feature collections is extremely tedious. A more

efficient method to check feature-collection feasibility may be revealed

through future investigations in this area.



5. From a mathematical-programming point of view, the syptam-

selection algorithm represents one of a limited number of techniques

capable of solving a problem with non-linear constraints. The algorithm

seeks an optimal assignment of components, where the feasibility of any

assignment is determined by the existence of a set of discriminating com-

ponent multipliers. In this more general context, the structure of the

algorithm may be applicable in a variety of problem areas not directly

related to the feature-selection problem. The possibility of employing

the algorithm in this general setting should be investigated.

6. In modeling the treatment-planning process for craniofacial-pain

patients the concept of boundary-level treatment applications was intro-

duced. Boundary numbers on the effects of repeated treatment applications

are likely to occur in data derived from the care of patients with a va-

riety of physiological disorders. Further investigations of this phenom-

enon may result in more effective methods of predicting which treatments

will have boundary-level application numbers, and more efficient statis-

tical techniques to determine values for these numbers.

7. The training algorithm developed in the construction of the

craniofacial-pain diagnostic classifier generates a feasible integer so-

lution to a large number of linear constraints. This algorithm is both

efficient and easily coded for computer applications. An investigation

of the uses of this algorithm in a mathematical-progranming setting may

reveal applications in solution techniques for more general integer pro-


8. Potential applications have been suggested for the diagnostic-

classification and treatment-planning models in teaching, in research,

and in practice. The models and their applications have been presented so



that they might readily be employed by sare future investigator. Actual

applications of the models should yield significant contributions to

the effectiveness of the teacher, researcher, and practitioner.



Referral Throu


005 Male

Ae Group

Duration of Pair

Character of Pai

Change in Charac

001 Medical GP

003 Dental GP

006 Female

008 0 -

010 40 -





.n 016






ter of Pain




Medical Specialist

Dental Specialist

Female, menopausal

or post menopausal

19 009 20 39

55 011 56 up

Less than 3 weeks

Frcm 3 to 6 weeks

More than 6 weeks


Aching 017 Burning

Cutting 019 Discomfort

Dull 021 Pressure

Pricking 023 Sharp

Soreness 025 Stinging

Tenderness 027 Throbbing

028 Constantly getting worse

029 Got worse, then plateaued

030 Got worse, plateaued, then better

031 Getting better

032 Intermittent periods without pain

033 No change since beginning

List of Drugs Taken


















History of Trauma

Mild Analgesics; Asprin, APC, etc,

Moderate Analgesics (non-narcoticl

Strong Analgesics: Narcotics and

Synthetic Narcotics

Anti-anxiety Agents: Mellaril, etc.

Anti-arthritic Agents: Steroids, etc.

Anti-depressives: Tofranil, etc.

Birth Control Pills

Hormone Preparations

Anti-inflammatory Agents.

Muscle Relaxants: Valium

Muscle Relaxants: Meprobaniate

Muscle Relaxants: Others

Sedatives: Barbiturates, etc.

Other Drugs




location of Swelling





97 ~09^
101 102 10

105 1 10

109/ 110 11

)2 113 1 4


Location of Tenderness


Location of Pain


Limited Jaw Opening

Joint Sounds




243 Yes

Pain accompanying joint sound

Frequent headaches
Headache associated with joint pain



249 Taste

251 Visual acuity

Upper Respiratory Infection

253 In


Changes in

Evidence of 254 ArI

255 Eve3

256 Neu

257 Otit

258 Salj

259 Sint

260 Strc

261 Vasc

Facets 262 1 3

Lateral Slide Preraturities


try's Syndrome



vary gland disease



;ular disease

263 4 -- up

264 On working side

265 On balancing side

Tooth Ache 266 Yes

Biting Stress Tooth Mobility 267 Yes

Recent Restorative or Dental Prosthesis

Jaw Deviates on Opening 269

Impinrgeent of Coronoid Process
on ZygaCmatic Arch

Meniscus-Condyle Dyscoordination

Padio caphic Examination 275










273 Left 274 Right

Mandibular condyle apposition

(such as spur formation)

Mandibular condyle resorption

(such as flattening of anterior-

superior surface or irregular surface)

250 Hearing

252 Perception of light

touch on face

Conjunction with beginning

STMJ pain


Radiographic Examination

Emotional T

Bruxism or

Uneven Cent

History of

History of







?rauma 283 Anxiety

Clenching 285 Yes

:ric Stops 286 Yes

Lengthy Dental Procedures

General Anesthesia 28i

ossa' opposition

ossa resorption

rticular eminence apposition

rticular eminence resorption

evidence of fracture

clinical or radiographic

evidence of pathoses

284 Depression

287 Yes

3 Yes

Tinnitus 289 Yes

Extraction of Teeth 290 Less than 6 veks k

291 Leaving a space the

Preauricular Pain 292 Yes

Alteration of Inter-Occlusal or Inter-Arch Space

prior to '1IT pain

at permits extrusion

293 Yes

Paresthesia 294 Yes

Luxation or Subluxation

295 Yes



In presenting the modified fixed-increment training algorithm the

following notation is employed:

p = the number of classification categories

t = the number of training-sample row vectors

a = training sample row vector number 'k' preclassified
in category 'j', j=l,2,...,p, k=l,2,...,t, and

k=i[mod t] where 'i' is the index of the training-

algorithm iteration
(i) th
W = the 'j column of weights (the constraints in the
'j 't discriminant function) used in the 'ith'

iteration of the training algorithm, j=l,2,...,p.

a = non-negative constant specified by the analyst

to adjust the size of the 'dead zone' [23] in dis-

cririnant function values, i.e., a > 0

S= positive constant specified by the analyst to adjust

the scale of the weight vectors, i.e., S > 0.

Using this notation, let a) be the i pattern examined by the

algorithm, then

case 1: if a.k) Wi) > ak) W + a for all cj
S_ -3 -all c

let W(i+l) = Wi) for all c.
-c -C

case 2: if ak W) < a (k) (i) + a for a subset B of the
-3 -3 -3 -z
p discriminants z E B,

(i+l) (i) (k)
let W = Ba ] z c B
z -z -[
W(i+) = W(i) for all c / {B U j}
w: -C
-c -c
and w4(i+) = W + B[a (k)] where nB = the number

of discriminants in

the subset B.

The algorithm is terminated when the values of the W., j=l,2,...,p, have
not changed during a complete cycle of the t training patterns, i.e.,

when W.-1w. +2.. .=W for all j where 0 is the last case 2 pattern
-j --3
examined by the algorithn.

This algorithm is guaranteed to terminate in a set of feasible

W4, j=1,2,...,p, if the training sample is linearly separable and a and B

have been appropriately selected. If the training sample is linearly

separable, the algorithm will converge for any fixed value of a > 0,

where 8 is selected appropriately large. Hence, the algorithm is nor-

mally applied to a training sample with a=0 and B=1. If the algorithm

converges, these constants can be adjusted and the training algorithm


The justification for specifying a non-zero a (a = size of the

dead zone) is that as a is increased the accuracy of the classifier is

increased in making classifications of data not used in developing the

discriminant-function weights. For example, with the craniofacial-pain

diagnostic classifier and the test samples discussed in Section 3.3,

the diagnostic model correctly classified approximately 5% more of the

test samples' data vectors when the model was trained with a=30, 8=3

(versus an original training with a=0, =1l).

Proof that the algorithm converges if feasible weight vectors
W., j=l,2,...,p, exist (that is, the sample space is linearly separable)

is developed in Nilsson [22]. Nilsson's proof can be directly applied
since for any set of feasible W.

(k) (k) *
a. W. > a. W + a
--3J -- + a-

for all k=l,2,...,t, and z=l,2,...,p, zfj, while for any W j=,2,.,p

a!k) w(i) < a(k) (i)
-3 -3 --3 -z

for sane k and sane z.

Typically, a training algorithm is applied to the members of a

training sample without prior knowledge of whether the sample pattern

space is linearly separable. The algorithm is allowed to process sample

patterns until it either converges on a set of discriminating hyperplanes

or it has run for a 'reasonable' amount of time without termination. Ex-

perience with medical data and the modified fixed-increment algorithm

has shown that if there is a set of discriminating hyperplanes, the

algorithm will find it in no more than 3 complete cycles for each of the

pattern classes. For example, if there are 5 pattern classes and the

pattern space can be linearly partitioned, the algorithm should terminate

in no more than 15 full cycles through the training data. This rough

measure of training time provides an index for establishing a limit on

ccrputer processing time.

An application of the modified fixed-increment training algorithm

is presented in Figure 7.

Given the training sample of the form a = [ai,a ,11 where

a = [0,0,11

2 = [1,0,1]

a3 = [0,1,1]

the training sample patterns can be represented in 3-dimensional

space by 1 3
Sa a3


The modified fixed-increment algorithm with a = 0 and 8 = 1

proceeds as follows:


[0,0,1] [ 0

[1,0,1] [ 0

[0,1,1] [-1

[0,0,1] [-1

[1,0,11 [-1

*[0,1,1] [-2

*[0,0,1] [-2

*[1,0,1] [-2

Hence, the se

(* indicates correct sample classification)

W- W W aW aW aW
-1 -2 -3 -1 -2 -

, 0, 0] [ 0, 0, 0] [ 0, 0, 0] 0 0

I, 0, 2] [ 0, 0,-1] [ 0, 0,-1] 2 -1 -]

, 0, 1] [ 2, 0, 1] [-1, 0,-2] 1 1 -

.,-1, 0] [ 2,-1, 0] [-1, 2, 0] 0 0

L,-1, 2] [ 2,-1,-1] [-1, 2,-1] 1 1 -

,-1, 1] [ 3,-1, 0] [-1, 2,-1] 0 -1 ]

,-1, 1] [ 3,-i, 0] [-1, 2,-1] 1 0 -

,-1, 1] [ 3,-1, 0] [-1, 2,-1] -1 3 -

>t of weights generated by this training sample is

W = [-2,-1, 1]

W2 = [ 3,-1, 0]

W3 = [-1, 2,-l].








University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs