ANALYTICAL MODELS FOR DIAGNOSTIC CLASSIFICATION AND
TREATMENT PLANNING FOR CRANIOFACIAL PAIN
By
Michael Steven Leonard
A DISSERITTION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA IN PARTIAL
FUIFIJThENT OF THE REQUIPFE~ETS FOR THE DEGREE OF
DOCIOR OF PHILSOPHY
UNIVERSITY OF FLORIDA
1973
To my wife,
Mary
ACKNOEDGEMENTS
Without the considerable contributions of time and effort by the
members of his committee, it would have been impossible for the author
to have canpleted this dissertation. In particular, the author expresses
gratitude to his Chairman, Dr. Kerry Kilpatrick, for his encouragement
and direction during the course of this research effort. The author also
thanks Dr. Kilpatrick for his editorial assistance during the development
and organization of this manuscript. The author thanks Dr. Richard
Mackenzie and Dr. Stephen Roberts for providing the initial direction for
this research. Additionally, the author is grateful to Dr. Than Hodgson
and Dr. Donald Ratliff for their assistance in evaluating and refining
the author's ideas throughout this project. The author expresses his
gratitude to Dr. Thomas Fast and Dr. Parker Mahan for the contribution of
their extensive knowledge about craniofacial pain to the author's research.
The author is deeply appreciative of Dr. Fast's and Dr. Mahan's willing
ness to spend many hours examining dental records and their endurance of
the nomenclature and idiosyncracies of this mathematicalmodeling effort.
Financial support for this research was provided by the Health Systems
Research Division, J. Hillis Miller Health Center. The division's sup
port in conjunction with a traineeship granted by the National Science
Foundation made it possible for the author to undertake this research.
The author is also grateful to the Industrial and Systems Engineering
Department for the contribution of computer funds. Additionally the au
thor thanks Dr. William Solberg, University of California at Los Angeles;
Dr. Daniel Laskin, University of Illinois; and Dr. David Mitchell,
University of Indiana, for providing access to the patient records
employed in this modeling effort.
The author would like to express his thanks to the secretarial staff
of the Health Systems Research Division for their translation of the au
thor's 'firstorder' approximation to handwriting into a draft of this
manuscript. Their tolerance of a multitude of last minute changes made
by the author has been appreciated.
Finally, the author thanks his wife, Mary, and his parents, Dorothy
and Charles Leonard, for their encouragement and support throughout the
course of this research.
M.S.L.
August, 1973
TABLE OF CONTENTS
ACKNOIW EDGI ME TS .................................................
LIST OF TABLES............... ....................................
LIST OF FIGURES...................................................
ABSTRACT o o ...................... ............................
Chapter
1. Introduction .........................................
1.1 Craniofacial Pain...............................
1.2 Research Objective..............................
1.3 Dissertation Overview...........................
2. Previous Research.....................................
2.1 Bayesian Classification Models....................
2.2 NonParametric Classification Models..............
2.3 FiniteHorizon Treatment Planning.................
2.4 UncertainDuration Treatment Planning............
3. Diagnostic Classification...........................
3.1 Model Components..................................
3.2 Alternative Interpretations of Linear Separability
3.3 Model Validation .................................
3.4 MinimumCost SymptcnSelection Algorithm.........
3.4.1 Algorithm Development......................
iii
vii
viii
ix
1
2
8
9
10
10
13
14
15
17
17
26
31
36
39
3.4.2 Statement of the MinimumCost Symptom
Selection Algorithmn.......................
3.4.3 Computational Considerations..............
3.5 Model Applications ..................................
4. Treatment Planning....................................
4.1 Model Camponents........ ....................
4.1.1 Patient States...........................
4.1.2 Transition Probabilities..................
4.1.3 Cost Structure............................
4.2 Selection of Optimal Treatments..................
4.3 Model Validation ...............................
4.4 Model Applications................................
5. Conclusions and Future Research ........................
Appendices
A CraniofacialPain Patient Data Vector..................
B Modified FixedIncrement Training Algorithm............
C Application of the MinimumCost SymptomSelection
Algorithm ...............................................
D Treatment Alternatives for CraniofacialPain Patients...
E Stability of TransitionProbability Estimates...........
F Flow Charts of PatientState Transitions ...............
G PatientState Treatment Selections.....................
H Application of the PatientStateLabeling and Optimal
TreatmentSelection Procedure .........................
BIBLIOGRAPHY......................................................
BIOGRAPHICAL SKETCH..............................................
55
56
57
59
59
60
63
67
70
72
74
77
82
87
91
97
99
104
110
114
118
121
LIST OF TABLES
Tables
1. Survey of DiagnosticClassification YMdels.............. 12
2. Correlation Between Significant Symptoms and
DiscriminantFunction Weights........................... 30
3. Tests of Diagnostic Classifier Accuracy................. 32
4. Classification Variability Among Dental Practitioners... 35
5. Mean Transit Times Through the CraniofacialPain Care
System ............... ............................... 75
vii
LIST OF FIGURES
Figures
1. Terporanandibular Joint ................ ............. 3
2. DiagnosticClassification and TreatmentPlanning
Process for Craniofacial Pain......................... 7
3. CraniofacialPain Diagnostic Alternatives............... 18
4. Procedure 2........... .......... .................... 53
5. DiagnosticClassification Transitions .................. 64
6. PatientVisit Inconvenience Cost....................... 69
7. Application of the Mbdified FixedIncrement Algorithm... 90
8. MultipleState HistoryAugnented Process............. 115
viii
Abstract of. Dissertation Presented to the
Graduate Council of the University of Florida in Partial
Fulfillment of the Requirement for the Degree of Doctor of Philosophy
ANALYTICAL MODELS FOR DIAGNOSTIC CLASSIFICATION AND
TREATMENT PLANNING FOR CRANIOFACIAL PAIN
By
Michael Steven Leonard
December, 1973
Chairman: Dr. Kerry E. Kilpatrick
Major Department: Industrial and Systems Engineering
This dissertation presents a systematic approach to craniofacial
pain diagnosis and treatment planning using analytic models of the under
lying decisionmaking processes. Patient diagnoses are generated by a
linear patternrecognition classifier trained with a sample of preclas
sified craniofacialpain patient data. For this classifier, an algorithm
is developed that minimizes the total cost of the set of features employed
in the classifying process. Diagnostic classifications, augmented by a
history of prior treatment applications, provide the state descriptions
for a Markovian decision model of the treatmentplanning process. Cranio
facialpain patient records from four university dental clinics serve as
a data base for model construction and validation.
The analytic models provide a means of duplicating the diagnostic
classifications and treatment plans of experts. Approximately 90% of
the diagnostic classifier's classifications and 93% of the treatment
planning model's treatment selections concurred with the decisions made
by experts in the field of care for craniofacialpain patients. Moreover,
the models permit an examination of the critical considerations associated
with both decisionmaking processes. These capabilities are discussed
in terms of applications of the models in teaching, research, and in the
practice of dentistry.
CHAPTER 1
INTRODUCTION
The rapid pace of developments in medical and dental research pre
vents the practicing physician and dentist from fully utilizing each new
diagnostic and treatmentplanning aid as it is published. In each of the
last four years an average of 215,000 new publications have been written
to supplement the knowledge of the healthcare practitioner [1]. Con
currently, the pressures of an everincreasing patient load force prac
titioners to select the most expeditious means for diagnosing disorders
and selecting treatments. For example, the medical generalpractitioner
(1970) saw an average of 173 patients a week [2], and the median dental
practitioner (1971) saw two patients an hour [3]. Given these circu~r
stances, practitioners may overlook possible diagnostic and treatment al
ternatives or they may apply inappropriate treatments. If meaningful
analytic descriptions of the diagnostic and treatmentplanning processes
can be developed, these models can assist educators in training new prac
titioners, researchers in evaluating and disseminating new developments,
and practitioners in improving the quality of patient care [4].
Developing models of the diagnosticclassification and treatment
planning process requires an understanding of the underlying physiological
processes of diseases and the mechanisms of their cures. Obviously, the
effects of disease and the means of cure vary from one healthcare prob
lem to another. Thus, modeling efforts in diagnosis and treatment plan
ning must be integrally related to the facet of health care that is under
study. This reality prohibits the model builder from making broad state
ments about the applicability of his models to other healthcare environ
ments. Accordingly, the models developed in this dissertation are spe
cifically oriented toward the healthcare problem presented in Section
1.1 with the understanding that the results of this modeling effort may
not be applicable to the whole of healthcare diagnosis and treatment
planning.
1.1 Craniofacial Pain
The head and face are subject to chronic, persistent,
or recurrent pain more often than any other portion of the
body. Pain in the head or face has a greater significance
to patients than any other pain. It may arouse fears that
the patient is in danger of losing his mind or that he has
a tumor of the brain. In addition, the emotional state of
the patient is adversely influenced because it is generally
known by the layman that the profession's knowledge of the
causes of these pains is meager and that methods of treat
ment are inadequate [5, p. v].
H. Houston Yerritt, M.D., Dean
Columbia University College of
Physicians and Surgeons
One source of the pain Dr. Yerritt describes is dysfunction of the
temporomandibular joint. The temporamandibular joint, see Figure 1,
provides the articulation between the mandible and the cranium. This
joint is unique both in its structure and its function. Within the plane
of the temporcmandibular joint, lateral, vertical and pivoting motion is
permitted. In addition, the joint is the point of articulation for the
only articulated complex that contains teeth. With this joint, "motion
is directed more by the musculature and less by the shape of the artic
ulating bones and ligaments than is the fact for other joints" [5, p. 34].
The fact that joint motion is highly dependent on musculature im
plies that then mandibular dysfunction occurs there is same disturbance
Right tempornandibular articulation
Inset: Anatomical features of the temporoandibular joint
Mandibular Fossa
Articular Eminence
 Meniscus
Mandibular Condyle
FIGURE 1
4EMPOKRIOANDIBULAR JOINT
of the intricate neuromuscular mechanisms controlling mandibular move
ment [5]. Emotional tension may also lead to hypertonicity of the
striated masticatory muscles resulting in facial pain or altered sensa
tion without evidence of peripheral dysfunction. In addition, abnormal
occlusal contacts of the teeth may affect muscle tonicity resulting in
mandibular dysfunction [5]. Moreover, the temporamandibular joint is
prone to disorders common to all joints: rheumatoid arthritis, osteo
arthritis, traumatic injuries, neoplasms, and nonarticular disorders.
Although the term 'craniofacialpain' is a broad classification for pain
in the head and face, the term is used in this dissertation to describe
pathological, congenital, hereditarybased, or emotional causes of pain
in and around the temporcmandibular joint.
Though the degree of severity may vary, one or more of the following
four 'cardinal smptcnrs' are exhibited by the craniofacialpain patient:
pain, joint sounds, limitation of motion, and tenderness in the mastic
atory muscles [6]. Accampanying these symptcas the patient may complain
of, or the practitioner may find, hearing loss, burning sensations, mi
grainelike headaches, vertigo, tinnitus, subluxation, luxation, dental
pulpitis, sinus disease, glandular disorders, occlusal disharmony, and
radiographic evidence of joint abnormality. The degree of association
of these additional symptoms and findings with the etiology of the joint
disorders is subject to considerable variation.
Paralleling these areas of anatomic dysfunction is the possibility
that the craniofacialpain patient may be suffering from psychic dis
orders. In no other type of patient seen by the dentist does psychic
condition play a larger role [7]. Most craniofacialpain patients have
symptoms or signs of anxiety, and a sensory preoccupation with the oc
clusion of their teeth [8]. Many of these patients can be characterized
by a heavy reliance on denial, repression, and projection of their psy
chic disorders in order to maintain their selfconcept of emotional sta
bility [6]. Often the complaints these patients relate to the practi
tioner are not compatible with any objective signs.
The practitioner who manages the care of craniofacialpain patients
assumes a difficult task. For same of these patients, diagnosis is ob
vious. Generally, however, the craniofacialpain patient presents a cmn
plex combination of signs and symptoms [7]. More than one disease en
tity normally accounts for the patient's symptoms and most craniofacial
pain patients suffer from a paindysfunction complex involving a ccmbina
tion of masticatory muscle disorders, occlusal disharmony, emotional
tension, and anxiety [5]. Nevertheless the possibility of multiple
almost subclinical etiologic factors combining to produce the dysfunc
tion and pain must be considered. The close relationship of organic and
emotional disorders as they appear in craniofacialpain patients provides
the examining dentist with the problem of discriminating which factor is
primary in the etiology of the patient's dysfunction [7]. Unfortunately,
the terporcarndibular joint is one of the most difficult areas of the
body to examine radiographically [8]. Hence, with these patients, the
dentist relies to a large degree on tests of emotional stability and
physical examination by visualization, palpation, and auscultation [7].
Therapeutic measures for the care of craniofacialpain patients are
as varied as the factors contributing to the disorder. "A small percent
age of patients with symptoms referrable to the temporamandibular joint
will portray such a confusing picture that consultation with other. dental
or medical specialists is indicated" [7, p. 129]. The majority of these
patients will exhibit symptoms that lead to any one of several alterna
tive courses of patient care. Altering the occlusion of the natural
teeth is one means of treating craniofacialpain patients. Although in
many cases minor occlusal abnormalities are only contributing factors to
a patient's pain, attention by the dentist to occlusion is at least
partially successful for a majority of craniofacialpain patients [8].
However, it is important in early therapy not to alter the occlusion ir
reversibly. Treatment by means of tooth extraction or endodontics, jaw
fixation, prosthetic devices, or by topical treatments may also be sug
gested by the patient's symptoms. The articular surface of the mandib
ular condyle has an excellent reparative capacity [6]. Thus, the use of
sedatives, antibiotics, and muscle relaxants, along with physical therapy,
oftenleads to patient 'cures' as these treatments ease the patient's pain
and increase jaw mobility while natural restoration of the joint is in
progress. If, after a reasonable length of time (3 to 6 months) the pa
tient's symptoms are not relieved, the dentist may consider referral to
another source of care or therapy such as surgery [7].
Typically, the healthcare process for craniofacialpain patients
may be viewed as following the format of Figure 2 [9]. When a patient
is admitted into the care system, he undergoes a datacollection process.
This involves taking a 'full and pertinent' patient history and a phys
ical examination of the areas of discomfort. The data gathered consist
of symptoms, signs, medical and/or dental history, physical examination
findings, psychosocial information, and so forth. Once these elements
have been elicited, a diagnosis is attempted. If this is not yet pos
sible, the severe symptoms are treated and the patient's health state is
monitored.
FIGURE 2
DAMOSTIC=ASSIFICATION AND TREIMEnT
PLANNING PR=SS FOR CRANIOFACIAL PAIN
When initial treatment does not result in a 'cure' for the cranio
facialpain patient, treatment effects are evaluated and new data col
lected. When a patient's diagnostic classification leads to a course
of treatment that is not within the realm cf the practitioner's special
ty he is referred to a more appropriate care source. 1Mnitoring is con
tinued on those patients not rejected frcm the system at this point, and
the patient is discharged when he is symptcnfree. However, when other
disorders have been isolated during the course of treatment, the patient
is recycled through the classificationtreatment process.
The diagnosistreatment sequence is not fixed. Treatment can begin
prior to a diagnostic classification or treatment can follow a diagnosis.
Moreover, there may be many diagnostictreatment dataacquisition cycles
before the patient is considered 'vell.'
1.2 Research Objective
.The introductory discussion of the need for diagnostic and treatment
planning models, and the brief description of the craniofacialpain care
system, provide the setting for a statement of the research objective un
derlying this dissertation. This objective is to derive analytic repre
sentations of the decision processes involved in selecting diagnostic
classifications and planning treatments for craniofacialpain patients.
A diagnosticclassification model that duplicates the classification of
expert practitioners is sought. For treatment planning, the modeling
goal is to provide a structure for interaction of the critical considera
tions associated with the treatmentselection process. These analytic
representations will be structured to permit their application as teaching
devices in the training of dental practitioners, as methods of testing the
effects of new diagnostic tools and treatment applications, and as aids to
the practice of dentistry.
This research objective will be met by developing:
1. A diagnosticclassification model based on the theory
of nonparametric pattern classification, with
a. criteria for applicability of the modeling technique
to diagnostic classification
b. model validation for craniofacialpain patients
c. development of a minimumcost symptomselection
algorithm
2. A Markovian representation of the treatmentselection
process, with
a. justification for utilizing a Markovian model of
the underlying care system
b. model validation for craniofacialpain patients
3. A description of potential model applications in teaching,
research, and practice.
1.3 Dissertation Overview
In Chapter 1 the motivation and scope of this dissertation was pre
sented. Chapter 2 provides a review of literature relevant to the diag
nostic and treatmentselection processes. A model of the diagnostic
classification process is developed in Chapter 3. Chapter 4 follows
with an analytic representation of the treatmentplanning process. Con
clusions derived from this modelbuilding effort, and suggestions for
future research, are presented in Chapter 5.
I '
CHAPTER 2
PREVIOUS RESEARCH
Over threehundred publications have been addressed to the problem
of modeling the diagnostic and treatmentplanning process. Spanning
fourteen years, this research has considered such diverse problems as
the classification of liver biopsies [10] and the optimal plan for
treating midshaft fractures of the femur [11]. At least ninetyone
disorders have been utilized as environments for developing diagnostic
and treatmentplanning models. The magnitude of this research effort
emphasizes the need for analytic representations of these complex deci
sionmaking processes.
Fortunately, the significant contributions in this voluminous
literature can be neatly partitioned into four distinct categories. Re
search in diagnostic classification has been based either on the applica
tion of Bayesian statistics or on the use of nonparametric pattern
classifiers. Treatment planning has been presented as either a finite
horizon decision problem or as an application of decision analysis to a
Markov process of uncertain duration. This section presents a brief dis
cussion of each of these categories and evaluates their suitability as
analytic representations of the process of providing health care for
craniofacialpain patients.
2.1 Bayesian Classification Models
Bayesian diagnosticclassification models, such as [12, 13, 14,
15, 16], make a diagnosis on the basis of selecting a patient's 'most
probable' disease state. The Bayesian classifier is an elementary type
of parametric patternclassification model. In general, parametric
classifiers make use of one or more of the statistical characteristics
of the dispersion of the data being classified to establish rules for
data classification. With the Bayesian models, only the conditional
probabilities for exhibiting sets of synptcmns, given a particular dis
ease, are tabulated from past medical data. Then, utilizing Bayes'
theorem, the probabilities for the presence of alternate diseases
dl,d2,...,dn can be calculated as a function of the symptcmcciplex S
the practitioner observes in the patient. Bayes' theorem provides that
for each of the d.
P(djIS) = C(S)P(Sdi)P (di)
n
where C(S) = 1/[Z P(Sjic)P(dk)],
k=l
hence, a patient with symptcmccmplex S is classified in diseasegroup i
if
P(dilS) = max p(dIS).
k
A survey of the results of application of Bayesian models is given in
Table 1.
Although the percentage of correct diagnoses in most of these test
applications is high, there are several reasons why a Bayesian diagnos
tic model is not used as the means of generating diagnostic classifica
tion in this dissertation. The first reason is the difficulty in ac
quiring the proportional presence of alternate diseases P(di), i=l,2,...,n,
in the population of patients that are to be classified by the model.
These 'prior' probabilities of having a particular disease are a function
TABLE 1
SURVEY OF DIAGNOSTICCLASSIFICATION MODELS
Bayesian Classifiers
Reference
Number
Disease Group
Number Of
Patients In
Study
% Correct
Patient
Diagnoses
Nontoxic Goiter 88
Bone Tumor 77
Thyroid 268
Congenital Heart 202
Gastric Ulcer 14
NonParametric Classifiers
Reference
Number
[17]
[18]
[19]
[20]
Disease Group
Liver
Asthma
Hematologic
Thyroid
Number Of
Patients In
Study
52
230
49
225
% Correct
Patient
Diagnoses
98.1
90.0
93.9
96.0
[12]
[13]
[14]
[15]
[16]
85.3
77.9
96.3
90.0
100.0
of seasonal variation, geographic location, population demography, and
many other factors. Secondly, valid Bayesian analysis requires the
analyst to determine the dependence among exhibited symptoms for each
disease considered by the diagnostic model. In this respect, the prob
abilities for the presence of groups of symptoms are independent for
saoe diagnostic alternatives and strongly correlated for others [4]. The
third reason for not selecting a Bayesian model is the massive storage
requirement dictated by the necessity of keeping the set of conditional
probabilities. These conditionals, P(S di) for every observable symptcm
complex S and. every disease i considered, must be at hand each time the
model is used. For example, given ten alternate diseases and ten symp
toms for which no assumptions of betweensymptma independence can be made,
storage is required for 10 (2101), or 10,230, conditional probabilities.
2.2 NonParametric Classification models
Nonparametric diagnostic models, like [17, 18, 19, 20], utilize
nonparametric pattern classifiers, a form of pattern recognition model
ing. In the literature on pattern recognition, the term 'nonparametric'
implies that no form of probability distribution is assumed for the
dispersion of symptom data in establishing the rules for pattern classi
fication. These models do assume, however, that classes of symptno data
are distinct entities and, hence, a patient with a particular set of
symptom S cannot simultaneously occupy more than one diagnostic state.
That is, the models assume a deterministic classification for each pat
tern viewed by the pattern classifier where every observable pattern has
one, and only one, correct classification.
Nonparametric modeling permits the analyst to bypass the difficult
problems of explicitly determiinng the conditional probabilities for,
and the dependence. among, symptams that are required for Bayesian analysis.
With the nonparametric classifier, a diagnosis is generated for the
practitioner by evaluating a discriminant function associated with each
diagnostic classification, gi(.), i=1,2,...,n. As was the case with the
Bayesian models, the values of these discriminants are a function of the
symptcmccrplex S exhibited by the patient. The patient's diagnostic
classification corresponds to that disease whose associated discriminant
function value is maximum. That is, a patient with symptoms S is classi
fied in diseasegroup i if
gi(S)>gk(S) for all k 7 i.
Results frcm scae of the applications of patternrecognition classi
fiers are presented in Table 1. In these test applications diagnostic
accuracy was consistently high. Because of these models' ease of imple
rentation and small storage requirements, a nonparametric pattern classi
fier is preferable as a vehicle for generating diagnostic classifications.
The use of a nonparametric classifier is further motivated by features
of the care process for craniofacialpain patients discussed in Chapter 3.
2.3 FiniteHorizon Treatment Planning
In the realm of research on modeling the treatmentplanning process,
several authors [9, 21, 22] have presented schemes for analysis that
utilize methods for making decisions under risk and uncertainty. The
treatmentselection process has alternately been defined as a twoperson
zerosum game, structured as a decision tree, and modeled as a Markov
process of limited duration. Treatment costs and the 'costs' of occupy
ing 'non well' or terminal patient states, provide the basis for select
ing an 'optimal' treatment plan. Finiteness of the planning horizon is
assured either by establishing a maximum permissible number of treatment
applications, or by considering at any stage of analysis the effects of
a fixed number of future treatments. Validation of the decisions gen
erated by these models has thus far been limited to checks on the feasi
bility of the treatment regimens selected. Unfortunately, the finite
horizon models either do not consider the possibility of a patient's
prolonged stay in the healthcare system, as is the case of the models
with a maximum number of possible treatments, or, where only a fixed
number of future treatments is considered, they provide no more than a
heuristic treatmentselection procedure.
2.4 UncertainDuration Treatment Planning
Bunch and Andrew [11] have considered the possibility of prolonged
occupation of the same diagnostic state during the course of a patient's
progression through the care system. In their Markovian representation
of the care system for midshaft fractures of the femur, they provide
this modeling refinement. As a consequence of this modification, the
number of treatment decisions made for each patient is a random variable
with no fixed upper bound. Howard's iterative scheme for policy selec
tion [25] provides the means for choosing the optimal treatment regimen
by selecting treatment alternatives that maximize the relative 'value'
of occupying each disease state. Although the Bunch and Andrew model did
not consider return visits to the same disease state, a more generalized
Markovian representation could incorporate that possibility. Neverthe
less, the proximity to reality that this category of transient Markovian
models provides requires considerable effort as holdingtime distribu
tions, treatment 'costs,' and transition probabilities must be supplied
by the analyst for all treatment alternatives at each of the disease
states in the care system.
The data collected on craniofacialpain patient progressions
through the care system reveal that both prolonged occupation of a
single diagnostic state and return visits to the same state occur fre
quently. Moreover, as will be discussed in Chapter 4, there are several
characteristics of the craniofacialpain care system that permit reduc
tions in the number of input parameters required for a transient Markovian
model of this system. Therefore, an uncertainduration transient
Markovian representation of the healthcare process has been selected as
the means of evaluating the effectiveness of alternative treatment regi
mens on patients with craniofacial pain.
CHAPTER 3
DIAGNOSTIC CLASSIFICATION
The analytic model developed to provide diagnostic classifications
for craniofacialpain patients is based on the principles employed in
nonparametric pattern classification. The patterns classified by this
diagnostic model are vector representations (see Section 3.1 and Appen
dix A) of the craniofacialpain patient's physical and emotional status.
In the first sections of this chapter the' theoretical background for the
diagnostic model is established. This discussion is followed by a pre
sentation of the validation procedures used to evaluate model perfor
mance. Next, an algorithm is developed to reduce the 'costs' associated
with model utilization. The chapter closes with a discussion of poten
tial applications of the craniofacialpain diagnostic classifier in
teaching, in research, and in the healthcare process.
3.1 Model Components
In the initial phase of the development of the diagnosticclassi
fication model a set of possible alternative diagnostic classifications
was established for craniofacialpain patients. Figure 3 provides a
list of these possible classifications. Note that the alternative classi
fications in Figure 3 are not mutually exclusive as a craniofacialpain
patient classified in same diagnostic alternative 'A' could also have
the disorder specified by sane other diagnostic alternative 'B.'
However, for the purposes of this dissertation, each patient's diagnostic
18
1. Temporomandibular Joint Arthritis Developmental
2. Temporamandibular Joint Arthritis Infectious
3. Temporoaandibular Joint ArthritisOsteo (Degenerative)
4. Temporamandibular Joint ArthritisTraumatic (Acute)
5. Temporamandibular Joint ArthritisTraumatic (Chronic)
6. MyopathyAcute Trauma
7. MyopathyMyositis
8. Oral PathologyDental Pathology
9. Vascular ChangesMigrainous Vascular Changes
10. Myofacial PainDysfunction MalocclusionBalancing Interferences
11. Myofacial PainDysfunction MalocclusionLateral Deviation of Slide
12. Myofacial PainDysfunction MalocclusionUneven Centric Stops
13. Myofacial PainDysfunction PsychoneurosisAnxiety/Depression
14. Myofacial PainDysfunction Bruxism
15. Myofacial PainDysfunction Reflex Protective Muscular Contracture
16. Myofacial PainDysfunction Loss of Posterior Occlusion
17. Neuropathy
FIGURE 3
CRANIOFACIALPAIN DIAGNOSTIC ALTERNATIVES
I
classification is made on the basis of specifying that etiological fac
tor that requires most immediate action on the part of the attending
practitioner. Thus, diagnostic classification of a patient into diag
nostic alternative 'A' signals that the etiology specified by that al
ternative should determine the course of the patient's care.
The next step in model development isolated relevant data which
measured the physiological and psychological status of craniofacialpain
patients. In particular, this step of model development sought those
elements of patient status that practitioners employ in their own classi
fication of craniofacialpain patients. Appendix A presents a list of
these data elements. Wherever it was feasible, measures of patient
status were segmented to amplify the significance of particular readings
of each measure. Thus, for example, while the duration of a patient's
pain is a continuous measure of his status, it is important for the pur
poses of classification to know whether a craniofacialpain patient's
duration of pain is less than 3 weeks, from 3 to 6 weeks, or longer than
6 weeks. For this measure of patient status, a short history of pain
indicates a strong possibility of a recent traumatic injury while pain
over a long period is more likely associated with long standing arthritic
or psychic disorders.
To facilitate the development of an analytic model of the diagnostic
classification process, a vector representation of the relevant elements
of patient data has been developed. The vector permits the notation of
any of the data elements shown in the listing in Appendix A. The pre
sence of any of the items found in Appendix A is recorded in a patient's
data vector by an entry of '1' in the vectordimension corresponding to
the item number, while the absence of a vector item is noted by a '0'
datavector entry. For example, referring to the listing in Appendix A,
*a male patient would have the following fifth, sixth, and seventh ele
ments in his data vector
(...,1,0,0,...),
while a premenopausal female would have the series of elements
(...,0,1,0,...).
This vector notation of a patient's status serves as the input data for
a nonparametric pattern classifier that assigns a diagnostic classifica
tion to the patient's dysfunction.
Nonparametric pattern classification, as described in Meisel [23]
and Nilsson [24], is the process of creating decision surfaces that
separate patterns into homogeneous classes, C, i=1,2,...,p, specified
by the analyst. In the craniofacialpain diagnostic model, the Ci are
the diagnostic alternatives shown in Figure 3. Classification of a pat
tern (a patient'sdatavector) into one of the classes is performed by
a pattern classifier composed of a maximum detector and a set of dis
criminant functions. These discriminants, g (a), j=l,2,...,p, are single
valued functions of each patient's datavector a. If a. represents a
data vector for a patient whose correct diagnostic classification is the
ith diagnostic alternative, then the gj(a) are chosen so that
gi(ai)>gj(ai) i, j=l,2,...,p, ji.
The craniofacialpain classifier uses linear discriminant functions.
These discriminants are linear in the sense that they provide mappings
from E" to El that exhibit the form
gj (a) = all+a2 j2+...+anjnj (n+l)
where in the patientdatavector a, the value of ar denotes the presence
(ar = 1) or absence (ar = 0) of patientdatavector item r; and the
Wjk. k=l,2,...,n+l, are constants associated with the j discriminant
function called 'weights.' These discriminantfunction weights,
Wjk, j=l,2,...,p, k=l,2,...,n+l, provide an analytic means of duplicating
the correct classification of each pattern observed by the nonparametric
classifier. They provide a link between a pattern's correct classifica
tion and the individual components of the pattern's vector representa
tion. In essence, each discriminant's weights are additive elements
whose component sums have significance in terms of a isolating pattern's
correct classification. These weights are a mathematical means of stor
ing information already known about the correct classification of observed
pattern vectors. Moreover, the weights can be interpreted fran the
point of view of the significance that the practitioner places on each
datavector component. A discussion of this interpretation of the dis
criminantfunction weights appears in Section 3.2.
Central to the use of linear discriminant functions is the assump
tion that the space of observable patient data vectors is linearly
separable, for by definition [24],
a pattern space A is linear and its subsets of patterns
AAl ,... ,A are linearly separable if and only if linear
discriminant functions g ,g2,... ,g exist such that
for all a in A. g. (a) >gj (a)
for all i=1,2,... ,p, j,2,...,p, ji.
In the context of diagnostic classification, the assumption of linear
separability implies that there exists a set of hyperplanes that parti
tion the space of observable patient data vectors into convex homogeneous
regions, each region representing a unique diagnostic classification.
Rosen [26] has provided a restatement of this assumption in the require
ment that the sets of data vectors corresponding to each diagnostic al
ternative have nonintersecting convex hulls. In either form, this is
a fairly restrictive assumption on the dispersion of patient data vec
tors (see Section 3.2).
Selecting the 'weights' for each of the discriminant functions is
a process known as 'training.' For the linear nonparametric classifier,
training generates each discriminant function's wjk's by applying a sys
tematic algorithm to the members of a set of representative patterns with
preestablished classifications. Nilsson [24] discusses several algorithms
suitable for training the craniofacialpain diagnostic classifier. In
the course of using these algorithms for model development, a new 'mod
ified fixedincrement' training algorithm was constructed (see Appendix
B). Employing the new algorithm has resulted in a reduction of approx
imately 35% in the amount of training time required to derive the weights
for the craniofacialpain classifier.
Symbolically, the craniofacialpain diagnostic classifier, with its
set of trained weights, can be represented in the following format:
let a. = the 296dimension data vector describing patient 'i'
aik = the kth element in the data vector describing patient
'i', whose value is either zero or one, k=l,2,...,295
(by definition ai,296=1)
Cj = diagnostic alternative 'j', j=l,2,...,17
dij = the value of the discriminant function for diagnostic
alternative 'j' generated by the data vector of patient
'i'
W. = the 296dinension vector of weights associated with
J
diagnostic alternative 'j'
Wk = the k element in the weight vector W.,
jk 3
that is
a = [ailai2,...ai295
Wj =I [wjl' j2 ,wj295'wj2961
and
296
d. = a.W. ai jk
di 373 k= 1 ik jk
where T denotes vector transposition. Patient 'i' is classified in
diagnostic alternative Cj when d i>dis for every s/. If m.x di is
not unique, then it is not yet possible to classify patient 'i' into
one of the diagnostic alternatives. Treatment is prescribed for severe
synptcas and classification is attempted at a later date.
Data from four sources were used to construct and verify the diag
nosticclassification model, as well as the treatmentplanning model
presented in Chapter 4. Contributions of clinical records came from
the dental schools at the universities of California at Los Angeles,
Florida, Illinois, and Indiana. In all, the records of 250 patients,
involving a total of 480 patientpractitioner interactions, form the
data base for model building and validation. The relevant information
from each of these patient visits has been recorded in the datavector
format of Appendix A. A diagnostic classification from Figure 3 was
assigned to each of these patient data vectors by either Dr. Thomas B.
Fast, Chairman of the Division of Oral Diagnosis, or by Dr. Parker E.
Mahan, Chairman of the Department of Basic Dental Sciences, at the
College of Dentistry, University of Florida.
With this basic structure for the diagnosticclassification model,
the classified patient data vectors, and the training algorithm presented
in Appendix B, an initial test was performed to verify that the space of
observed patient data vectors was separable by linear discriminant func
tions. Application of the modified fixedincrement training algorithm
to the set of 480 data vectors verified this requirement, as the algo
rithm terminated in a set of feasible discriminantfunction weights.
Using the discriminant functions these constants determine, it is possi
ble to duplicate the preestablished diagnostic classifications for each
of the patient data vectors.
This first test of the diagnostic classifier established that a non
parametric classifier could be employed to reproduce the original clas
sifications for each data vector used in model construction. However,
this test does not reveal how well the classification model will perform
on patient data not employed in developing the discriminantfunction
weights. The remainder of this section, and Section 3.3, address the
question of how the diagnostic classifier performs on 'new' patient data
vectors, that is, vectors that have no duplicate in the training sample.
Model training has created a set of weights that, by the definition
of the training procedure, correctly classify every patient data vector
that lies within the bounds of the trainingsample patternclass convex
hulls. Since every data vector is a binary vector, new patient data
vectors must fall outside the convex hulls established by the training
sample vectors. Yet, if new data vectors have a number of datavector
elements that are identical to those of the trainingsample vectors
with the same diagnostic classification, then this relationship will be
reflected in a 'close proximity,' as measured by a Euclideandistance
function, between each new vector and its associated trainingsample
convex hull. Given this close proximity, the classifier's discriminant
functions should correctly classify most new data vectors as these vec
tors will lie within or near the boundaries of the appropriate discrim
inating hyperplanes. Hence, the key to providing adequate classifier
performance for new data vectors lies in devising datavectorrepresen
tations of patient data for which the data vectors of a canron diagnostic
classification exhibit strong similarity.
In the introductory discussion of the elements of patient data used
in the patient data vector, it was pointed out that an effort was made to
select components of patient status that assist the practitioner in his
selection of diagnostic classifications for a craniofacialpain patient.
Then these elements were partitioned to generate as much discriminating
information as possible from each data element. In terms of the alter
nate diagnostic classifications, these elements of patient data were
chosen so that all patients in any one diagnostic classification would
have a unique combination of exhibited or nonexhibited datavector ele
ments. Employing these carefully constructed qualitative data elements
resulted in a set of 'natural' gaps in the vector representations of
patient data from alternate diagnostic classifications. The fact that
there are portions of the pattern space that cannot be occupied by any
data vector, and partitions of the space where the vectors of each clas
sification must lie, assiststhe classifer in making correct classifica
tions of data not used in model construction.
As Section 3.3 shows, this discussion is not meant to imply that the
craniofacialpain diagnostic classifier can, in its present state of
development, correctly classify every new data vector. What has been
stated is that a knowledge of the underlying classifying process can
be employed in constructing the data vector examined by the classifier,
and that fully utilizing this information will lead to a classifier that
can be expected to be capable of performing well on new patient data.
Of course, this discussion has been predicated on the separability of
the underlying pattern space of data vectors. If this requirement is
not met by same form of patientdatavector representation, classifica
tion of patients by linear classifier is not possible.
The next section of this chapter provides relationships between
linear separability and the data that may be observed in a healthcare
system for which diagnostic classification by linear discriminants is
being considered. This section has a dual purpose. First, linear sep
arability is couched in 'nongemaetric' terms. Second, and more impor
tantly, using the craniofacialpain healthcare system as an example
of the section's developments provides information about the suitability
of the nonparametric classifier as a model of the decisionmaking pro
cess associated with diagnostic classification in this care system.
3.2 Alternative Interpretations of Linear Separability
The criteria for pattern space separability are mathematically
concise. Unfortunately, these separability criteria are not readily
expressible in nongeometric terms. The discussion developed in this
section provides the reader with scme nongeometric criteria that indi
cate when the use of a nonparametric pattern classifier should be con
sidered as a means of generating diagnoses for a medical or dental dis
order.
The first criterion is associated with a probabilistic measure of
symptom exhibition. Given a patient who exhibits sane set of symptoms
S, nonparametric pattern classification requires that P[SIC] = 1 for
the diagnostic alternative 'C.' that describes the patient's current
diagnostic status, and P[S Ck] = 0 for all other diagnostic alternatives
'Ck.' However, assume that for the disorder in question the probability
of exhibiting any relevant symptom has been calculated fran historical
data, that is, estimates of P[siCjC] are available for all relevant
symptoms si and all diagnostic alternatives Cj. Then, if the following
decision rule leads to the correct classification of a majority of the
patients with the disorder in question, utilization of a nonparametric
classification model should be investigated:
classify a patient who exhibits the set of symptoms S in the
th
j diagnostic alternative if
T P[silCj] > P[silCk] for all kj. (1)
s.iS s.iCS
Since (1) holds if and only if
log [T P[silCj]] > log [T P[silCk]] for all kj,
.eS siES
decision rule (1) can be expressed in terms of logarithms. Let the set
of symptoms S be represented as a row vector a with the elements of a
assigned values as follows:
ai = 1 if symptom s is an element of S
and ai = 0 if symptan s is not an element of S,
where n is the total number of relevant symptnos. Form the column vectors
Wj = [log P[sllC], log P[s2Cj],..., log P[snlCj]T
Then log [ P[si.C.]] = aW., and decision rule (1) can be restated as
s.cS
classify a patient who is characterized by the vector a in the
.th
j diagnostic alternative if
aW. > a for all kij. (2)
Note that decision rule (2) is identical to the decision rule employed
in nonparametric pattern classification.
This equivalence implies that if (1) holds for every preclassified
patient examined, the values log P[siC j] form a set of feasible discrim
inantfunction weights. If (1) leads to the correct classification of
a majority of the patients examined, it is logical to assume that there
may be a set of feasible discriminantfunction weights. This assumption
was examined using the craniofacial pain patient data. From the data
vectors classified in Diagnostic Alternatives 13, 14, and 15, a total of
189 patient visits, the P[siC .] were calculated. Each data vector was
then classified with decision rule (1), and 164 of the data vectors
(86.7%) were assigned to their preestablished diagnostic alternative.
The second criterion provides a subjective measure of the feasibil
ity of using a nonparametric pattern classifier. If symptoms for most
of the diagnostic alternatives, associated with the disorder of interest,
can be isolated such that
1. a patient's exhibition of a subset of these symptoms leads
the practitioner to a selection of one of the diagnostic
alternatives, or
2. a patient's exhibition of a subset of these symptoms leads
the practitioner to eliminate from further consideration
one of the diagnostic alternatives,
then the use of a nonparametric classifier as a means of generating
classifications should be investigated.
The linear nonparametric classifier employes a weighted sum of
the symptoms exhibited by each patient in its discriminating functions.
If symptoms can be isolated that are significant to the classification
of patients with the disorder under investigation, then there is a
'natural' weight for each of these symptans in the decisionmaking pro
cess used by the practitioner. The existence of these natural weights
increases the probability that a training algorithm will be able to find
a feasible set of discriminantfunction weights. Indeed, the relative
importance of the significant symptoms may be reflected in the magnitude
of the discriminantfunction weights generated by the application of a
training algorithm.
As an example, the significant symptoms associated with two cranio
facialpain diagnostic alternatives, Alternatives 4 and 14, were isolated
by Dr. Fast. A comparison of these symptars and their associated dis
criminantfunction weights revealed a high degree of correlation between
symptom significance and discriminantfunction weights, see Table 2.
The reader should note that both of the criteria discussed in this
section are heuristic approximations to the gearetric requirement for
pattern space separability. However, if the disorder under investigation
meets one or both of these criteria, it may be possible to employ a non
parametric classifier to diagnose the disorder since the requirement for
pattern space separability is most likely met.
TABLE 2
CORRELATION BEmCIE SIGNIFICANT SYMPTOMS
AND DISCRIMINANTFUNCTION WEIGHTS
Diagnostic Alternative 4: Temporamandibular Joint ArthritisTraumatic
(Acute)
DiscriminantFunction
Significant Symptoms Weights
(+) Duration of Pain (less than 3 weeks) + 3
(+) History of Trauma (accidental) +30
(+) Preauricular Pain +11
() Salivary Gland Disease 12
() Otitis 1
(discriminantfunction weights for Diagnostic Alternative 4 range
from 19 to +37)
Diagnostic Alternative 14: Myofacial PainDysfunction Bruxisn
DiscriminantFunction
Significant Symptams Weights
(+) Duration of Pain (more than 6 weeks) +15
(+) Facets + 2
(+) Bruxism and/or Clenching +56
() History of Trauma (accidental) 16
() Salivary Gland Disease 5
(discriminantfunction weights for Diagnostic Alternative 14 range
fran 23 to +56)
Note: For both Diagnostic Alternatives
(+) indicates a symptom that leads the practitioner to classify
a patient in that diagnostic alternative
() indicates a symptom that leads the practitioner to classify
a patient in saoe other diagnostic alternative
A
3.3 Mbdel Validation
Validation of the craniofacialpain diagnosticclassification
model presented in Section 3.1 has been accomplished by three types of
validating procedures. The discussion presented in the preceding sec
tions, and in particular the relationship between significant symptans
and their associated weights shown in Table 2, reveal a close proximity
between the decisionmaking process the practitioner utilizes and the
nonparametric classifier's symptamweighing scheme. This section pre
sents two other procedures employed in evaluating the diagnostic clas
sification model's performance.
The first procedure involved testing the diagnostic accuracy of
the classification model on patient data that were not employed in model
construction. Six classification tests were run in sequential order.
In the first five of these tests random samples of 50 patientdatavec
tors were drawn frcm the data base of 480 vectors discussed in Section
3.1. Then, as each of the tests was performed, the training algorithm
in Appendix B was applied to the remaining 430 data vectors. With the
weights derived from the training algorithm, the sample of 50 patients
was classified. The modelgenerated classifications for each of the
data vectors were compared to the classifications assigned to the vectors
when they were created. As each test classification of a sample was
completed, the diagnostic classifier's discriminantfunction weights were
set equal to zero, the sample of data vectors was returned to the data
base, and the next test's random sample was drawn. A summary of the re
sults of these tests of diagnostic accuracy is presented in Table 3.
In each of the first five tests it was possible for a patient who
has had multiple practitionervisits to have same of the vectors repre
TABLE 3
TESTS OF DIAGNOSTIC CLASSIFIER ACCURACY
Number of
Patient
Data Vectors
50
50
50
50
50
51
ONE
TWO
THREE
FOUR
FIVE
SIX
Number of
Data Vectors
Correctly Classified Classifier Accuracy
92.0%
90.0%
88.0%
94.0%
90.0%
84.3%
Mean Classifier Accuracy 89.7%
Standard Deviation of Classifier Accuracy 3.5%
TEST
TEST
TEST
TEST
TEST
TEST
senting these visits in a test's randan sample and sane vectors used
in model construction. Such occurrences lead to test results that over
estimate classifier accuracy. Hence, in Test Six, a random sample of
all of the patient data associated with 40 patients (a total of 51
patient data vectors) was selected. This sample was classified by the
diagnosticclassification. model using the remaining 429 data vectors as
a data base. The results of this test are included in the data shown
in Table 3. There is one other possible factor affecting the classifier's
accuracy as measured by these tests. It is conceivable that there were
duplicate data vectors in the data base of 480 patientdatavectors.
If duplicates do exist and were included in both the test samples and
the samples' training bases, measures of classifier accuracy will be
overly optimistic. However, since 'noise' is introduced by the variabil
ity among craniofacialpain patients and generated in the practitioner's
transcribing of the elements of patient data into the datavector format,
295
and since there are 2295 possible data vectors, the probability that two
or more of the databased patient vectors include an identical specifica
tion of datavector elements is small enough to justify neglecting this
possibility and its effects.
The results sunmarized in Table 3 reveal that the diagnosticclas
sification model performs well in duplicating the diagnostic classifica
tions originally assigned by the reviewing practitioners, Dr. Fast and
Dr. Mahan. Moreover, the size of the test samples was quite large in
relation to the data base employed in developing each test's diagnostic
model. As new data became available and are incorporated in the para
meters of the model, the accuracy of the craniofacialpain diagnostic
classifier can be expected to increase slightly.
The second validating procedure established a measure of variability
on the diagnostic classifications that might be given by different dental
practitioners. The discussion presented in Section 1.1 related the dif
ficulties associated with diagnosing craniofacialpain disorders. Prac
titioners with varying kinds of professional experience can be expected
to reflect their dissimilar backgrounds in differing diagnostic classi
fications for these patients. To measure the variability associated with
dissimilar backgrounds, five craniofacialpain data vectors were selected
from the data base employed in constructing the craniofacialpain diag
nostic classifier. Four dentists frcm the staff of the College of Den
tistry at the University of Florida were asked to review these patient
data vectors and assign to each of them a diagnostic classification.
Table 4 summarizes their assignments and also includes the diagnostic
classification originally given by the reviewing practitioners.
The variability in diagnostic assignments reflected in Table 4 re
affirms the justification for the research objectives set forth in
Section 1.2. Some of the differences in the practitioners' choices of
diagnostic classifications can be explained by the limited amount of
data contained in each of the data vectors, and the lessthanfull med
ical statement of each of the diagnostic alternatives. Nevertheless, a
diagnosticclassification model that generates classifications that are
in 90% agreement with those of experts in the field provides a sizeable
improvement over the variability in classification assignments exhibited
in Table 4 in which only half the respondents agreed on a single diag
nosis in four out of five cases.
TABLE 4
CLASSIFICATION VARIABILITY AMONG DENTAL PRACTITIONERS
Diagnostic Classification for
Patient 1 Patient 2 Patient 3 Patient 4 Patient 5+
Original
Classification 4 13 15 15 9
Practitioner 1 1 7 15 15 3
Practitioner 2 6 12 15 8 3
Practitioner 3 4 15 15 15 13
Practitioner 4 4 15 15 14 *
* No classification given
+ Patient 5 exhibited a minimal amount of input data (only 17
nonzero datavector entries)
These four dental practitioners exhibited 100.0% agreement of the
diagnosis on one of the five patients, and 50.0% agreement on the
diagnostic classification of the remaining four patients.
3.4 MinimumCost SymptomSelection Algorithm
The craniofacialpain diagnosticclassification model detailed in
the previous sections of this chapter has been structured upon the data
vector of the 295 relevant signs, symptoms, and items of patient history
shown in Appendix A. To utilize this model, the practitioner must ex
amine a patient for the presence or absence of each of these data vector
elements. Although the cost in time and fees varies fran item to item,
there is an expense to the practitioner, and to the patient, associated
with checking each element in the data vector. Hence, it is logical to
investigate the possibility of finding a reduced data vector that 'costs'
less for the patient and practitioner to use and yet still permit cor
rect classification of all craniofacialpain .patients.
A review of the literature (see Meisel [23] Chapter 9 for a survey)
reveals that many authors have considered the task of selecting a set
of features to be used in a patternclassification scheme. Traditional
methods of viewing this problem are based on a search for a transforma
tion that takes a given set of patterns into scme 'new' pattern space
where separation by discriminant functions is possible. Measures of
pattern class separability are employed to evaluate the effects of
transforming the set of patterns from one space to another. In general,
these transformations take a pattern representation in 'n' features and
create a set of 'r' (r
'new' features are linear combinations of the original features. How
ever, to reduce the 'costs' associated with using the craniofacialpain
diagnostic classifier, a transformation must be found that decreases
the size of the datavector pattern space by eliminating features rather
that combining them. For example, assume patients were diagnosed on
the basis of bodytemperature and bloodpressure readings. Traditional
techniques for feature selection might employ a linear combination of
body temperatures and blood pressure measurements as one 'new' feature.
The transformation sought in this investigation would lead to the clas
sification of patients by either body temperature or blood pressure
alone if this were possible. This example will be used again in Section
3.4.1 to illustrate the algebraic and geometric structure of the problem.
Nelson and Levy [27] have attacked the problem of selecting a re
duced set of unaltered features for use in a classification scheme.
These authors attach a cost to the use of each available feature, and
employ a ranking scheme to measure each feature's discriminating power.
Then, under a restriction on the total cost of features employed, they
develop an algorithm that selects the set of features that maximizes the
classifier's discriminating power. Unfortunately, their scheme does not
guarantee the selection of a subset of original features that contain
enough 'information' to permit pattern class separation by discriminant
function. Therefore, a new algorithm is presented in this section that
minimizes the cost of the set of features used by the pattern classifier
yet insures that all patterns can be correctly classified by a set of
linear discriminant functions. In the remainder of this section the
more general terms 'feature,' 'pattern,' and 'pattern class' will be
used respectively to represent a data vector item, a patient's data vec
tor, and a diagnostic classification.
The problem of finding a minimumcost collection of features would
not be considered if there did not already exist a set of 'n' features
by which the patterns under examination could be correctly classified
by linear discriminants. That is, given a 'n' dimensional representa
I
tion of each of the 'm.' patterns in each of the 'p' pattern classes
m m m m
i = [ail ,ai2 ,...,ain ,], ml1,2,...,mi, i=,2,...,p,
where
m
amc k=l,2,...,n, equals either zero or one, there must exist
a set of 'n+1' dimensional W.'s, j=l,2,...,p, such that
J
m
a." (W.W.) > 0 for all m=l,2,...,m. (3)
i=l,2,...,p
j=1,2,...,p
jVi.
Letting A. be. the mi (n+l) dimensional matrix of patterns in pattern
class i, then the requirement of (3) can be written in the following
form:
A(W..) > 0 i=1,2,...,p
j=l,2,...,p .
j3i.
If such pattern representations and W. 's exists, then a solution to the
3
following problem yields a minimumcost collection of patternclassifying
features:
P1: minimize CX
subject to Ai[X O(WiWj)] > 0 i=l,2,...,p
j=1,2,...,p
j3i
I
S1 1 1
ai ai2 ... a. 1
where A. = i in
_l2 2 2
a 2 a 2 a 1
*
m. m. m.
a. i a. L .. a. i 1
11 1ii ...2 in
Wi = [wil' i2'''' win' in+1
C = [C1C2,...,cn,0]
X_ = [x1,x2,...,xn,1T
and ik is an unrestricted variable
cj is the cost of using feature j
x 0 if feature i is not used
1 if feature i is used
Note: The [ notation is to be read as element by element
multiplication i.e., QOR = S [si] = [q.ijr.ij].
3.4.1 Algorithm Development
The algorithm developed to solve problem P1 is an enumerative
algorithm similar in structure to that of Balas [28]. Unfortunately,
the nonlinear nature of problem Pl's constraints prohibits full imple
mentation of the more powerful techniques used in implicit enumeration
on linear integer problems. The structure of these constraints and
their effect on the optimization of P1 will be discussed in a stepby
step development.
The minimumcost featureselection algorithm does not solve P1 to
the extent of finding the values of the vectors W., i=l,2,,,.,p. This
2.
algorithm does find the minimumcost collection of features X* and the
total cost associated with using these features, and guarantees the
existence of W. vectors associated with this optimal feature set. Given
1
this guarantee, the modified fixedincrement algorithm frcm Appendix B
*
can be employed to find the vectors W., i=l,2,...,p.
Choose same solution to P1. By hypothesis there exists at least
one solution (X,Wi,W2,... ,W) to P1 where X = [1,1,...,1,1]. Suppose
there is sauce other solution (X, 2... ,W') where one or more elements
xi in the X vector are equal to zero. For the constraint matrices in P1,
A. [X [ (W W.) > 0 i=1,2,...,p
j=1,2,...,p
jVi.
If the matrix products [A. X] = A., i=l,2,...,p are constructed, then
each set of constraints in P1 can be written in the form
(WiW.) > 0 i=l,2,...,p (4)
j=1,2,...,p
j i.
The creation of the A. is called the zeroing process. Of the col
1
umns of A., A. retains all columns j of A. where x. = 1, and substitutes
1 1
a column of zeros for each of those columns k in Ai where xk = 0. Using
the zeroing process, the feasibility of any possible solution vector X
to P1 can be examined in terms of the A. O X this vector X creates.
As an example of the zeroing process for a particular set of patterns,
let a be a twodimensional patientdatavector a1 = [aai] where
[ 2
i if patient i has normal body temperature
1 if patient i has abnormal body temperature
and
i O if patient i has normal blood pressure
2 1 if patient i has abnormal blood pressure .
Assume two diagnostic categories, X and Y, where data vectors a and
2 1 2
aX are reclassified in category X and data vectors y and a are pre
classified in category Y.
1 2 1 2
If a = [1,0], a = [1,11], = [0,0], and a = [0,1]
then =[ 0 and A = 01.
Graphically the pattern space can be represented as
2 2
t
pressure
1 1
a ax .
temperature
Consider the vector X= I then [Ax X] = [1 1 and [A X] =[0
Graphically the pattern space, as transformed by X can be represented as
pressure
12 12
temperature
The vector X effectively creates a representation of each patient data
vector in terms of the patient's body temperature alone.
Note that relation (2) is the requirement for pattern separability
by linear discriminants. Hence, a vector X is a component in a feasible
^ A A ^
solution (X,W ,W ,...,W ) to P1 if and only if there exist W. i=l,2,...,p,
1K P1
such that (2) holds for all ifj. As discussed in Section 3.1, a pattern
space is linearly separable, and hence, feasible W. exist, if and only if
the individual pattern classes have nonintersecting convex hulls. For
the pattern vectors considered in this section, the individual components
of each of the patterns in each pattern class are either zero or one. As
there is a onetoone correspondence between the individual patterns in
a pattern class and the vertices of the pattern class's convex hull, the
convex hull of a patternclass Ai can be expressed as all convex combina
^m Consider
tions of the individual patternclass vectors ai, m=l,2,... ,m.. Consider
the following examples of the convexhull representation of linear separa
bility.
Assume a = [1,0], aX = [1,1], a = [0,0], and a2 [0,1].
Graphically this pattern space can be represented as
2 2
Feature 2 Y *0 X
1 1
Feature 1
1 2
where the line X from al to a2 represents the convex hull of pattern
1 2
class X and the line Y from a_ to a_ represents the convex hull of
patternclass Y. Since X and Y do not intersect, implying that the
space is linearly separable, it is possible to draw an infinite number
of lines 0 that serve as discriminating hyperplanes.
1 2 1 2
Assure aX = [1,0], a = [0,1], a = [0,0], and a = [1,1].
Graphically this pattern space can be represented as
2 2
Feature 2 
Feature 1
1 2
where the line X from a_ to a_ represents the convex hull of pattern
1 2
class X and the line Y from a_ to ay represents the convex hull of
patternclass Y. Since the lines X and Y intersect, the pattern space
is not linearly separable, and hence, it is impossible to draw a discri
minating hyperplane 0.
Therefore, the following condition is equivalent to condition (4):
A t
a vector X is feasible to Pl if and only if there do not exist Us and U
such that
^ t ^
UA = U A for any s=1,2,...,p (5)
t=1,2,...,p
s3t
where
.i i i i
= [Ul,u2'"..um
1
uk > 0 for all k=l,2,...,m.
and
m.
1 uk = 1 for all i=l,2,...,p.
k=l
Checking the feasibility of some vector X by condition (5) yields
[p(pl)]/2 distinct subproblems. Each of these subproblems may be
characterized as follows:
AT ^T
let A = A and A = B with A and B having columns a.
and bj respectively for any A and At.
m. m.
P2: Find u. > 0, u.=l, and v. > 0, E3 v.=l
i=l 1 3j=l 1
such that
m. m.
E1 u.a. = Z3 v.b.
i=l 7 j=l 3
If such u. and v. exist for any one of the subproblems then X is not
1 J
feasible to Pl. Because the number of subproblems is large even for a
relatively small number p of pattern classes, there is justification for
seeking methods to expedite the solution of each subproblem P2.
To achieve this goal, a series of conditions will be presented that
characterize same of the criteria necessary to the existence of a solu
tion to subproblem P2. In addition to establishing criteria for exis
tence, these conditions provide a means for reducing the size of the
matrices A and B. This reduction will be discussed after the conditions
are established.
th k
Condition 1: If the kth row of A has all el ments ai, i=1,2,... ,m
equal to zero (one) and the kth row of B has all
k
elements bk, j=1,2,...,m, equal to one (zero) then no
m. m.
u.>0, 3 u.=l and v.>0, Z3 v.=l exist such that
1 i=l 1 3 j=1 3
Justification 1:
m. m.
1 u.a. = E3 v.b.
i=l 1 j=l 3 3
Under Condition 1 there is no set of convex combina
th th
tions of the k row elements of A and of the k row
elements of B such that the combinations are equal.
Condition 2:
Justification 2:
Hence, there can be no set of convex combinations
of the columns of A and of B such that the combina
tions are equal.
Symbolically,
m. m.
since no ui> 0, Z u.=l and v.>0, E1 v.=l
i=l 1 3 j=l 3
exist such that
m. m.
1 u.a. = 3 v.b.
i=1 j=l 3 3
m. m.
no u.>0, El u.=l, and v.>0, E3 v.=l
i=l j=l 3
exist such that
m. m.
Z1 u.a. = E3 v.b.
i=l j=l 3
th k
If the k row of A has all elements ai, i=l,2,...,mi,
equal to zero (one) and the kth row of B has all
elements bk, i=1,2,...,m., equal to zero (one), the
kth row of matrixes A and B can be eliminated without
loss of possible solutions to subproblem P2.
Under Condition 2 every convex combination of the k
row elements of A and of the kth row elements of B
are equal. Hence, a set of convex combinations of the
columns of A and of the columns of B are equal if and
only if the convex combinations of the remaining rows
th
(all rows except the k row) are equal, Symrbolically,
let aik denote the pattern a. whose k component has
been eliminated and similarly let bjk denote the
elimination of component k from pattern b., then as
m. m.
ZE u.a. = 3 v.b. ,
i=l 1 j=l 3 3
for any choice of
m. m.
u.>0, E1 u.=l and v.>0, EZ v.=l,
Si=l : I j=1 I
m. m.
E3 u.a. = E3 v.b.
i=l 1 j=l 3
if and only if
m. m.
EI u.a = v.b.
i=l i j=l 3jk
Condition 3:
Justification 3:
k
If the kth rw of A has all elements ai, i=l,2,... ,mi,
equal to zero, and some br equals one,
m. m.
no u.>0, 1 u.=l, and v.>0, v >0, E3 v.=l
1 i=l 1 r j=l 1
exist such that
m. m.
1 u.a. = E3 v.b.
i=l 1 1 j=l 31
Under Condition 3 any convex combination of the col
umns of B that includes a nonzero product of the
thth
column b results in a k row term greater than zero.
The value of the k row term for any convex combina
tion of the columns of A is equal to zero. Hence, no
set of convex combinations of the columns of A and B
can be equal if the combination for B includes a
specification that vr>0. Symbolically,
A
if v >0,
then for any choice of vj, j=1,2,...,m., j3r,
Condition 4:
Justification 4:
47
m. A
where v >0 and 3 v.=l
r j=l
m. k m k
3 v.b > E u.a. =0
j=l 3 3 i=l
m.
for any choice of u. such that u.>0 and i u.=l.
1 1 i=l 1
i=l
m.
Hence, if v >0, there exist no u.>0, u.=l
r i=l 1
m.
and v.>0, j/r, E3 v.=l such that
3 j=l 3
m. m.
3 u.a. = 3 v.b..
i=l I I j=l 3 3
th k
If the kt row of A has all elements a., i=l,2,... ,mi,
k
equal to one, and some b equals zero,
r
m. m.
no u.>0, I u.=1 and v.>0, v >0, 3 v.=l
1 i=l 1 3 j= 3
exist such that
m. m.
E1 u.a. = 3 v.b.
i=l 1 j=1 3 3
Condition 4 is similar to Condition 3 in that any
convex combination of the rows of B that includes a
nonzero product of the rth column yields a kt row
term whose value cannot equal any convex combination
of the kt row elements of A. Symbolically,
for any choice of u. and v., where v >0,
m. m.
3 v.b. < u.a. = 1.
j=l 3 3 i=l 1
Note that Conditions 3 and 4 can also be stated, and justified, with
the role of the elements of the A and B matrices reversed.
Given this set of four conditions, consider the following row par
tition of the A and B matrices:
A* B*
Al B
A= A" B= B1
A0 C
AC. B0
A0 B0
where by appropriate change of rows in A and B
1. every element in each row of Al is a one
2. every element in each row of B, is a one
3. every element in each row of A0 is a zero
4. every element in each row of B0 is a zero.
The partitions A, Bl, A, and B are the rows of A and B corresponding
to B1, Al, B0, and A0, respectively, and A* and B* are the remaining rows
of A and B. With this partitioning and the four previously established
conditions, the size of the data vectors associated with many of the
[p(pl) /2 subproblems P2 can be significantly reduced. The reduction
process, Procedure 1, can be stated in this manner:
Step 1: If for same row k in Al (B1) each element in the corre
sponding row of B1 (A) is equal to one, then row k
of A and B can be eliminated by Condition 2.
Step 2: If for same row k in A0 (B) each element in the corre
00
spending row of B0 (AO) is equal to zero, then row k of
A and B can be eliminated by Condition 2.
Step 3: If for scene row k in AO (B0) the corresponding row in
B6 (AO) has all elements equal to one or if for same row
k in A, (B) the corresponding row in B1 (A,) has all
elements equal to zero, then this particular subproblem
P2 has no feasible solution by Condition 1. Procedure 1
and the search for a solution to P2 are terminated at
this point because the convex hulls of patternclasses
A and B do not intersect.
Step 4: If for some row k in A, (B1) the corresponding row in
Bc (Ac) has one or more elements equal to zero, i.e.,
k k k kk k
b = b =...=b = 0 (a=a =...=at=0) then
r s t r s t
columns br,bs'... bt (ar,as, ...,at) can be eliminated by
Condition 3.
Step 5; If for same row k in A0 (B0) the corresponding row in
B0 (A0) has one or more elements equal to one, i.e.,
k k k kk k
br = bs =.=b = 1 (a=a =...=a =1) then
columns b,bs,...,bt (ar,a s....at) can be eliminated by
Condition 4.
Step 6: If the use of Steps 1, 2, 4, and 5 has eliminated all
elements of both matrices, then this particular subproblem
has an infinite number of feasible solutions by Condition
2. Procedure 1 and the search for a solution to P2 are
terminated at this point because the convex hulls of the
patternclasses A and B intersect.
Step 7: If the use of Steps 1, 2, 4, and 5 has eliminated one or
more rows or columns from either matrix then repartition
the matrices and return to Step 1, otherwise terminate
Procedure 1.
In coding Procedure 1 for computer processing, there is no need to
physically partition the rows of the A and B matrices. Summing the
elements in any row of A or B reveals whether the individual elements in
the row are all equal to zero or are all equal to one. Given this infor
mation, the steps from Procedure 1 determine whether a pattern is re
moved from A or B, whether a row in A and B is removed, or whether the
procedure should be terminated because no feasible set of convex combina
tions for P2 exists.
As an example of the use of Procedure 1 consider the set of matrices
A and B in subproblem P2 were
0 1 1 0 1 1 1
S1 0 0 0 0 0 0 0
A= B=
1 0 0 0 1 1 1 0
0 1 1 1 1 1 1 0
In the first application of the steps of Procedure 1:
1. Column 4 can be eliminated from matrix A by Step 4 and
2. Column 1 can be eliminated from matrix A by Step 5.
After the first application of the steps of the procedure
1 1 11 11 1
A 0 0 B 0 0 0 0
A= B=
00 1 1 1 0
1 1 1 1 0
In the second application of the steps of Procedure 1:
1. Row 1 can be eliminated from both matrices by Step 1
2. Row 2 can be eliminated frcm both matrices by Step 2 and
3. Column 4 can be eliminated frcm matrix B by Step 4.
After the second application of the steps of the procedure
0 0 1 1 1
A= B=
1 1 1 1 1
In the third application of the steps of Procedure 1:
1. Row 2 can be eliminated frcm both matrices by Step 1 and
2. Procedure 1 can be terminated by Step 3.
Hence, for this set of A and B matrices, subproblem P2 has no feasible
solution.
Although the use of Procedure 1 may lead to a reduction in the size
of most subproblems, the pattern vectors (ai and bj) for each of these
problems may still be quite large. Restating subproblem P2 as a linear
program yields
P3: minimize [0 0]
subject to IA B [u= [
11...1 00...0
00...0 11...1
and U>0
V>0
where the existence of any solution vectors U* and V* signals the inter
section of the convex hulls of patternclasses A and B.
Consider the dual of P3, .written in the following form:
P4: maximize [0 1 11 1
2
subject to A 01 1 HI
B l 0U 1 <
I,,Xl X2 unrestricted in sign,
Note that P4 may have many associated ir variables, but has only as many
constraints as the number of patterns in A and B (as reduced by Procedure
1). P4 always has at least one solution to its constraint set. Thus, if
an application of a linearprogramming algorithm to P4 reveals the exis
tence of an unbounded solution, then P2 has no solution. Therefore, if
and only if P4 has a bounded solution do ui and vj exist such that
m. m.
1 u.a. = E3 v.b.
i=l j=l
where
u. > 0, E u. = 1
i=l
and .
v. > 0, E3 v. = 1.
3 j=1 I
The preceding discussion with its development of a reduction proce
dure and dual formulation provides the structure for a second procedure.
Procedure 2 establishes a mechanism to verify the feasibility of any
assignment of zeros and ones to the X vector of problem P1, see Figure 4.
That is, given some vector X and a set of patterns a., in=l,2,...,m.,
and i=l,2,...,p, the [p(pl1)/2 subproblems P2 are formed by zeroing out
FIGURE 4
PROCEDURE 2
the appropriate patternvector elements. Then Procedure 1 is applied
to each subproblem. Finally, for each pair of pattern classes the
boundedness of the dual formulation P4 is examined. Vector X represents
a feasible set of a patternclassifying features for P1 if and only if
each of the [p(pl)]/2 subproblem formulations P4 is unbounded.
Before a statement of the algorithm to solve problem P1 is presented
several terms must be defined. The assignment vector is defined as a
listing of variables xi, elements of the vector X in Pi, whose values have
been determined by the steps of the algorithm. The elements in this vec
tor are recorded with the value of their assignment, either zero or one.
These elements are entered in the vector in the order they were assigned,
with the first algorithm assignment in the first (left) position. For
example, consider the assignment vector
[x4 = 0, 10 =1, x2 = 0].
This vector records that the algorithm first assigned x4 equal to zero,
then assigned x10 equal to one, and its last assignment was x2 equal to
zero. Feasibility of a solution X, as determined by the assignmentvector
cc~ponent values, is checked by Procedure 2 with the value of those vari
ables not included in the assignment vector temporarily set equal to one.
The value V of an assignment vector is defined as minus one times the
sum of the costs associated with each of the variables in the assignment
vector, multiplied by the value assigned to the respective variable.
For the example assignment vector, [x4 = 0, x10 = 1, x2 = 0], where
c4 = 5, cl0 = 2, and c2 = 7, the assignment vector has the value
V= (1)[5(0) + 2(1) + 7(0)] = 2.
3.4.2 State.rnt of the MinimumCost SimptcmSelection Algorithm
Step 0: Create the assignment vector (at this point the vector is
null as there is no variable assignment in the vector).
Set V*== and go to Step 4.
Step 1: Start at the right side of the assignment vector and move
to left, stopping at the first variable assigned a zero
value. If no variable in the assignment vector has a
zero assignment, go to Step 2. Otherwise go to Step 3.
Step 2: Calculate V for the assignment vector. If V is greater
than V*, record the values of the variables in the assign
ment vector as the optimal solution X* to P1. Otherwise,
record (as the optimal solution X* to P1) the values of the
variables in the best current solution X. Terminate the
algorithm.
Step 3: Change the value of the variable isolated in Step 1 to an
assigned value of one, and eliminate from the assignment
vector all variable assignments to the right of this new
assignment. If the assignment vector includes the assign
ment x.=l for every xi in X return to Step 2. Otherwise go
to Step 4.
Step 4: Select a variable xk that is not an element of the assign
ment vector. Assign this variable the value Xk=0 in the
assignment vector. Use Procedure 2 to check the feasibility
of this assignment. If the assignment vector is not fea
sible, go to Step 6. Otherwise go to Step 5.
Step 5: If the assignment vector with the new assignment xk=0 does
not include an assignment for every xi in X, return to
Step 4. Otherwise go to Step 7.
Step 6: If the assignment vector with the assignment Xk=l (xk is the
variable selected in Step 4) does not include an assignment
for every xi in X, return to Step 4. Otherwise go to Step 7.
Step 7: Calculate V for the assignment vector. If V* is greater
than V, go to Step 1. Otherwise go to Step 8.
Step 8: Record as the best current solution X the values of the
variables in this assignment vector. Set V*=V, and return
to Step 1.
Note that in the course of applying this algorithm all solutions are
considered and the best current solution is replaced only when another
solution has a larger associated value. As the number of possible solutions
is finite, the algorithm must terminate, and at this termination the value
of the optimal solution and its assignments are known. An application of
the minimumcost symptcmselection algorithm is presented in Appendix C.
3.4.3 Computational Considerations
Returning to the setting of diagnostic classification of craniofacial
pain patients, application of the minimumcost symptomselection algorithm
295
would require an enumeration (explicit or implicit) over 22 possible
solutions in order to find the optimal collection of datavector elements.
As the number of possible solutions is prohibitively large, heuristic
modifications to the symptanselection algorithm are required for this
application. One possible modification could employ the fact that only
a few of the elements in the patient data vector have large associated
'costs' for their utilization. In particular, the eight elements of
radiographic data and the two measures of emotional trauma are significant
ly more 'costly' to examine than the other items in the data vector.
With this modification, the algorithm would only consider eliminating
these ten high cost features. Another heuristic approximation to the
optimal collection of features might rank the datavector elements in
order of descending cost of utilization. Procedure 2 would then be used
to eliminate these components one by one, starting with the item of high
est cost, until the procedure signaled an infeasible solution to P1. Cer
tainly, other heuristics might also be developed to exploit the structure
of this algorithm.
3.5 Model Applications
The structure of the craniofacialpain diagnosticclassification
model permits model utilization for a variety of purposes. Since the
model is developed in terms of general datavector and diagnosticalterna
tive parameters, these model components can be altered to suit the appli
cation in question. This section presents a brief discussion of sane of
the possible applications of the diagnostic classifier.
In a teaching environment, the diagnosticclassification model with
its set of discriminant weights can be stored for ccmputerterminal ac
cess. Then, on a set of tutorial example patients, students can compare
their diagnoses with those of the diagnostic model. Moreover, the student
can interact with the classifier in constructing his own 'sample' patients
for the classifier to diagnose. Finally, the student can request the
classifier to relate those discriminantfunction weights that the model
employs in considering the 'significance' (Section 3.2) of any one or
group of symptoms.
The effectiveness of new diagnostic tests can be evaluated using the
minimirncost symptamsselection algorithm. This algorithm provides an
immediate nieasure of the 'worth' of new research developments. Given a
cost for employing a new test, the algorithm returns an evaluation of
the test's classifying capability. The algorithm reveals whether the
test is included in the mininumcost collection of features and whether
the use of the new test permits the practitioner to discontinue other
examination procedures. Additionally, the algorithm can be employed to
point out new areas for research, as it isolates diagnostic alternatives
where correct classification of patients is difficult using existing tests
and procedures.
As employed in the practitioner's office, the diagnostic classifier
will provide a direct link between the practicing dentist and the kn3w
ledge of experts in the field of craniofacial pain. Information will
flow over the link in both directions. As new patients are seen by the
practitioner, the record of each visit will be reviewd by experts and
then used to supplement the data base employed in model construction.
Then, when developments dictate, new sets of discriminantfunction weights
can be transmitted to the dental practitioners. This kind of interaction
results in a more accurate and representative diagnostic classifier as
the patientsample data base becomes larger.
CHAPTER 4
TREATMENT PLANNING
The selection of treatment regimens for craniofacialpain patients
is modeled as a 4arkovian decision process. The states in this Marko
vian model are descriptions of a patient's healthcare status and the
decision alternatives are feasible treatments for the patient's dys
function (see Section 4.1). In the first two sections of this chapter,
motivation for the rodel structure is provided and the components of
the decision model are developed. The third section provides a descrip
tion of the validating procedures used to determine the appropriateness
of the model and the modelgenerated treatment decisions. This chapter
closes with a discussion of potential teaching, research, and private
practice applications of the treatmentplanning model.
4.1 Model Components
Several modelbuilding components frcn the craniofacialpain care
system are isolated to permit the construction of a Markovian represen
tation of this system. A set of state descriptions that characterize,
for decisionmaking purposes, the status of craniofacialpain patients
is presented in Section 4.1.1. Then transition probabilities measuring
the effects of treatment applications are discussed in Section 4.1.2.
Section 4.1.3 overlays the model's state descriptions and transition
probabilities with costs accrued during the patient's progression through
the care system. These components are integrated and verified in the
discussions of Sections 4.2 and 4.3.
Values for many of the treatmentplanning model's parameters were
gathered from the set of patient records discussed in Section 3.1. As
the patient histories from the contributing university dental clinics
were reviewed, notations of treatment applications and time between suc
cessive visits were made for each patientpractitioner interaction. The
values of the remaining model parameters were either estimated by the
reviewing practitioners, Dr. Fast and Dr. Mahan, or were gathered from
responses to questionnaries completed by patients who visited the
University of Florida's Dental Clinic. In modeling the complicated pro
cess of care for craniofacialpain patients, several simplifying assump
tions were made. This section provides the motivation for these assump
tions and presents the notation employed in the analytic description of
the treatmentplanning process.
4.1.1 Patient States
In general, a Markovian system structure requires that the current
state of the system completely characterizes the probabilities associated
with future state occupancies of the system. To fully satisfy this
Markovian condition for state structure in the craniofacialpain treat
mentplanning nodel would require that the model include as distinct mod
el states every possible combination of diagnostic classifications a pa
tient might have occupied, in conjunction with every combination of treat
ment applications he might have undergone, during his stay in the care
system. Unfortunately, such a model would have an infinite number of
'patient states.'
However, for a majority of craniofacialpain patients the know
ledge of a patient's prior treatment record, coupled with his current
diagnostic classification, is adequate to determine his prior diagnostic
classifications. Even in the cases where the current classification
and prior treatment record do not provide a total description of a pa
tient's condition, these elements of patient status do provide signifi
cant information about the probabilities associated with a patient's
future status in the care system. For example, in the data employed in
model construction, 47 craniofacialpain patients occupied Diagnostic
Alternative 15 and were treated with an application of drugs at least
once. Eight of these patients were 'well' after a first treatment with
drugs, while 39 required multiple applications of drugs or other treat
ments during their stay in the system. Yet of the 12 patients who were
given two applications of drugs, 9 were 'well' following the second
repetition of drug therapy. Thus, while the overall databased transi
tionprobability estimate for a transition from Diagnostic Alternative
15 into the well state following any one application of drugs is .36,
the transitionprobability estimate for a transition into the well state
following two successive applications of drugs is .75. Hence, for this
diagnostic classification, information on the prior application of drugs
is important in determining a patient's future status in the care system.
This form of 'current diagnostic classification augmented by treat
ment record' patientstate description is employed in the craniofacial
pain treatmentplanning model as an approximation to a 'true' Markovian
state structure. Each of the diagnostic alternatives shown in Figure 3
forms the basis for a collection of patient states. The diagnostic al
ternative is augmented with a record of treatments that have been applied
since the patient entered the care system. Appendix D provides a list
of the treatment alternatives that may be prescribed for craniofacial
pain patients. The record of each treatment given to the patient is noted
in the patientstate descriptions without regard to its chronological
order. For example, a patient's occupation of the state 'JI1,2,2'
denotes that he is currently classified in diagnostic alternative J,
and that since he entered the care system he has been treated with one
application of treatment 1 and two applications of treatment 2.
.Augmenting the patientstate descriptions with treatment history
expands the dimensionality of the state space, yet the number of history
augmented states remains finite for two reasons. The treatment records
used in model construction reveal that, for sane combinations of diag
nostic alternatives and treatment applications, there is a feasible
limit to the number of treatment repetitions that can be given to any
one patient. Thus, the first reason for a finite state space is that no
patient state in the treatmentplanning model includes more repetitions
of a particular treatment than the clinical data have established as a
feasible limit. As an example, the records of patient visits used in
model construction establish a feasible limit of only one application
of treatment 18 for patients classified in any of the diagnostic alter
natives. Therefore, the treatmentplanning model includes patient states
that exclude treatment 18 as a portion of their treatment history or
exhibit the form
'JI. .. ,18,...'
for each diagnostic classification 'J' where 18 is a feasible treatment.
The second reason for a finite state space is that there is a 'boundary
application' of many treatments such that neither the treatmentrecord
data nor the reviewing practitioners established differences between the
transition probabilities for the boundary application and those for
further repetitions of the treatments (see Section 4.1.2 and Appendix E).
In Diagnostic Alternative 13, for example, the first application of treat
ment 24 is the boundary repetition of that treatment. Hence, multiple
repetitions of treatment 24 are not added to the state description of
patient states based on Diagnostic Alternative 13, as the additional
information on multiple applications does not influence transition pro
babilities associated with this treatment's effectiveness. Thus, a
second application of treatment 24 for a patient who continues to be
classified in Diagnostic Alternative 13 places the patient in a state
of the form
'131 ...,24,....
The craniofacialpain treatmentplanning model includes two terminal
patient states in addition to the patient states that are based on diag
nostic alternatives. One or the other of these two terminal states,
'well' or 'referred,' represents the patient's status when he exits the
care system. A patient exists the system in the 'well' state when the
effects of treatment applications result in sufficient improvement so
that no further treatment is required. The patient moves into the 're
ferred' state in lieu of further treatment. This alternative to treat
ment is selected when the 'expected costs' of remaining in the care sys
tem exceed the costs of referring the patient to another source of care
(see Section 4.1.3).
4.1.2 Transition Probabilities
Patientstate transitions that involve a change of diagnostic clas
sification follow one of two basic formats, see Figure 5. For the initial
diagnostic classifications in Format I, with each treatment application,
the patient either remains in his original diagnostic classification or
he transits into the well state. For Format II, the six diagnostic al
ternatives shcwn in the lower illustration form a different structure.
Format I
Patients whose firstvisit diagnostic classification is Diagnostic
Alternative 1, 2, 3, 4, 5, 6, 10, 11, 14, 16, or 17, make transitions out
of their original classification 'I' according to the following figure:
Format II
For patients originally classified in Diagnostic Alternative 7, 8, 9, 12,
13, or 15, the following kinds of diagnosticclassification transitions
are possible:
FIGURE 5
DIAGNOSTICCLASSIFICATION TRANSITIONS
Here it is possible for the patient to alternate between any one of
several diagnostic classifications during the course of his stay in the
care system. Note that in both formats for diagnosticclassification
transitions a patient moves into the referred state not as a result of
a treatment application, but rather as an alternative to further treat
ment.
To these underlying diagnosticclassification transitions the cranio
facialpain treatmentplanning model adds a record of the changes in
treatment history. Appendix F displays complete charts of all of the
diagnosticalternativebased patient states included in the treatment
selection model. In these charts the patient states are connected by
arcs that represent feasible transitions from one state to another. Not
shown in the charts are the well and referred patient states and the arcs
that connect every diagnosticalternativebased state with these terminal
states.
Howard [25] establishes that in terms of the policy decisions gen
erated by a Markovian decision model, holdingtime distributions are im
portant only insofar as they affect the mean weighting time in each sys
tem state and the expected costs of each state occupancy. The records
of the patient visits employed in model construction revealed that, in
the care of the patients described by the data, one or more treatments
were prescribed at each visit, and a series of return visits was scheduled
for the patient following his initial interaction with the practitioner
if return visits were warranted. Under these conditions, specifying
holdingtime distributions for the time between successive patientstate
transitions does not refine the model. Therefore, the treatmentplanning
model employs a Markovian rather than semiMarkovian representation of
the care system, since a 'n' visit holding time in a particular patient
state can be modeled with no loss of information as 'n' repetitions of
the 'virtual' transition frcm the state in question to itself. Care for
craniofacialpain patients is modeled as a discretestage Markovian sys
tem with the beginning of visits to the practitioner serving as stage
indicators.
Using the historyaugmented patient states, transition probabilities
are specified in terms of the treatment that generated the transformation.
In making a statetransition following a treatment, a patient must move
to a state that includes that treatment as a portion of its state descrip
tion. For example, following application of treatment 'k,' a patient
must progress frcm patientstate 'IIm,n' to 'JIk,m,n' where 'I' may be
equivalent to 'J.' The only exception to this rule is in the application
of a treatirnrt beyond its boundary number of repetitions. Here, if treat
nmnt 'k' has a boundary number of two, then following an application of
treatment 'k' three or more times a patient progresses from patient state
'IIk,k,m,n' to 'JIk,k,m,n' where again 'I' may be equivalent to 'J.'
This structure is indicated because inclusion of more than the boundary
number of applications (two in this case) in the state description does
not affect the transition probabilities.
Estimates of the values of the transition probabilities were ob
tained from the patient records discussed previously. A discussion of
the stability of these probability estimates under variations in patient
data is presented in Appendix E. Where the data on the effects of treat
ment alternatives were limited, the datagenerated probability estimates
were refined by estimates frnm the reviewing practitioners. Notationally,
transition probabilities are represented in the analytic model in the
following form;
pk = the probability of making a transition from
patientstate 'I' to patientstate 'J' following
the application of treatmentalternative 'k.'
4.1.3 Cost Structure
A patient's progression through the craniofacialpain system gener
ates a niltitude of implicit and explicit costs. The explicit costs can
be measured in terms of the dollar charges paid by the patient or the
practitioner during the patient's stay in the system. Other costs are
implicit in nature and can be quantified only as they relate to the
'opportunities' lost by the patient and the practitioner %wile the pa
tient remains in the care system. For modeling purposes four major
system costs have been isolated. These costs are:
(a) Cost of treatment applications
(b) Cost of the practitioner and his staff's
services
(c) Cost to the patient of occupying a nonwell
patientstate
(d) Patientreferral cost.
Although these costs do not encompass all of the system costs, they mea
sure significant explicit and implicit charges associated with a patient's
stay in this system. In the treatmentplanning model, each of these costs
is charged on a perpatientvisit basis.
Costs of the various treatment applications and the costs associated
with the practitioner and his staff's services were estimated by the re
viewing practitioners. Estimates of treatment and caresystem service
costs were partitioned by diagnostic classification as well as treatment
category. The cost estimates reflect typical charges in a dental clinic
environment.
The inconvenience experienced by a patient in making a visit to the
practitioner was used as a measure of the cost of occupying a 'nonwell'
patient state. Estimates of this inconvenience cost were gathered from
responses to a questionnaire completed by patients at the University of
Florida's Dental Clinic. These were general dental patients not neces
sarily suffering from craniofacial pain. Figure 6 shows the distribution
of these patient estimates.
Values for patientreferral costs were composed of the sum of three
distinct estimates. The first component was an estimate of the total
fee charged by the practitioner receiving the referred craniofacialpain
patient. Record transferral and duplication costs, as well as the fees
lost by the referring practitioner, formed the second component. The
third component of the patientreferral cost is a measure of the incon
venience experienced by the referred patient, a value estimated by using
a multiple of the value of the inconvenience cost discussed in the pre
ceding paragraph. Appendix G provides a justification for using this
particular combination of components in the referredcost estimates.
Symbolically, the patientstate transition costs (negative constants)
are represented in the analytical model as
k
c j = the sum of the costs generated by the transition
from patientstate 'I' to patientstate 'J'
following the application of treatment 'k.'
This sum includes the type (a), (b), (c), and (d) costs appropriate to
each patientstate transition.
Fiftyeight patients at the University of Florida's Dental Clinic responded
to the following questions:
How much would you estimate that this trip to the
Dental Clinic cost you in terms of lost wages, baby
sitting fees, transportation costs, and other costs
that you may have had to pay so that you could
be here for your appointment?
The distribution of these. estimates is shown in this histogram.
Number
of
Respon
dants
0. 1. 10. 20. 30.
.99 9. 19. 29. 39.
40. 50. 60. 70. 80.
49. 59. 69. 79. 300.
Dollars
The mean value for these 58 estimates of patientvisit inconvenience costs
was $30.72.
FIGURE 6
PATIENTVISIT INCONVENIENCE COST
4.2 Selection of Optimal Treatments
The craniofacialpain treatmentplanning model is transient in the
sense that only two of the model's patient states, well and referred, can
represent the patient's status when he exits the healthcare system. In
a stochastic sense, only the terminal states' are recurrent as they alone
possess nonzero longrun probabilities of state occupancy. Hence, the
choice of treatment alternatives at each patient state is made with the
goal of minimizing the costs accrued by the patient as he passes through
the diagnosticalternativebased patient states into one of the recurrent
states.
For notational convenience, in the analytic model the well patient
state is denoted as state 'W' and the referred state as state 'R.' In
modeling the care system for craniofacialpain patients there is no
justification for providing costs for the transitions from states 'R'
and 'W' to themselves, hence, 'cR,R and W' are set equal to zero.
Analytically, the treatmentplanning model is made monodesmic; i.e.,
having only one recurring state, by defining pR,W=1 and p WR0. The
total number of states, not including states 'W' and 'R,' is denoted by
'S.' With these definitions and the notation introduced in the previous
section, a procedure for selecting the set of optimal treatment decisions
is developed.
Howard [25] has shown that for a monodesmic, transient Markovian
decision model, a set of optimal decisions is defined as those decisions
that maximize the expectedvalue 'v of occupying each systemstate 'I.'
Since the treatmentplanning model for craniofacialpain patients fits
into this category of decision model, a modification of Howard's algorithm
is employed to select optimal treatment regimes. The process of select
i
ing an optimal set of treatments is accomplished by finding the set of
I
treatment alternatives kl,k2,... ,k that maximize each of the vI (the
expected value of occupying patientstate 'I' given treatment alternative
'k ') where
kI k k\
Iv = r + p P v I=1,2, ...,S
all patient
states J
and
kp kI kI
I all patient P
states J
kI
With treatmentaugmented patient states, maximizing the v can be
carried out in the following manner:
1. Group for simultaneous analysis all patient states possessing
a common treatment history, where one or more of the treatments in this
history are at their boundary level. Each of the 'T' sets of states
complying with this description forms an analysis set B., j=1,2,...,T.
2. Label sequentially the patient states, starting with state W
as 1, state R as 2, and then selecting numbers for the remaining unlabeled
patient states on the basis that the one with the most treatments in its
history receives the next numberlabel. For example, state 'JIl,2,2,4'
would be labeled with a smaller number than state 'JI2,6,6.' When the
numbering scheme reaches the members of one of the analysis sets isolated
in Step 1 (above), numbers for the members of that set may be arbitrarily
assigned. Given this state numbering scheme, the selection of optimal
treatments can proceed dynamically since for each state I that is not a
member of an analysis set, I=1,2,...,S, I/Bj, j=1,2,...,T
I
VI = rI + PIJVJ
J=1
and for the states of set B, .j=l,2,...,T
t
V = r + Z p V + E p i IcB.
JeB. J=l
where t = the number of last nongroup B. state imme
diately preceding the smallest numberlabeled
state in B..
Thus, the process of selecting optimal treatments proceeds recur
sively from the state of smallest numberlabel to the one of largest
numberlabel, stopping to consider simultaneously the values of a number
of states only when an analysis set is encountered.
Howard's value iteration and policy improvement algorithm [25] is
employed only in the case of selecting treatments for the analysisset
patient states. An example of this section's labeling and optimization
procedure is presented in Appendix H.
This optimization procedure was applied to the states of the cranio
facialpain treatmentplanning model. Appendix G presents a list of the
optimal treatment selections for each of the model's patient states.
4.3 Model Validation
Validation of the craniofacialpain treatmentplanning model was
accomplished in two phases. In the first phase of validation, the indi
vidual components of the Markovian representation were examined by the
reviewing practitioners. The second phase of model validation compared
modelgenerated treatment decisions with those made by the reviewing ex
perts. In addition, statistics generated by the model were compared to
the caresystem description provided by the patient records from the
university dental clinics. This section discusses the results of these
validating efforts.
The review of model components was accomplished as values for the
model parameters were collected. Some of the databased estimates of
transition probabilities and boundarylevel application numbers did not
conform to expert judgment about the effects and effectiveness of vari
ous treatment applications. When these disparities occurred, the esti
mates were modified to reflect expert judgment.
The general structure of the patient states was reviewed to insure
that the representation shown in Appendix F did in fact portray a set of
logical progressions through the care system. Although this examination
established the validity of the patient progressions, the review did
point out one deficiency in the model's structure. The number and types
of treatment alternatives available for use at each patient state were
determined by records of actual applications of these treatments in the
data used for model construction. It was the judgment of the reviewing
practitioners that in several cases the selection of treatment alterna
tives for a patient state did not include the 'most appropriate' treat
ment alternative. Nevertheless, model deficiency can readily be correct
ed. With the collection of data on the effects of these 'most appropriate'
treatments, these additional treatment alternatives can be incorporated
as decision alternatives for the patient states in question.
The reviewing practitioners made selections of treatments for each
of the model's patient states. In those cases where the model's treat
ment alternatives did not include the practitioners' 'most appropriate'
choice of treatments, the practitioners made a selection from the same
list of alternatives used by the model. Appendix G lists their choices
of treatment along with each modelgenerated selection. The two sets of
treatment plans include the same treatment selection for 87 out of 94
patient states, or 92.6% of the patient states. The 7 differences in
treatment selections arise in part from the approximations the treatment
planning model employs in its representation of the care system and in
part fromnslight inconsistencies in the practitioner's treatment selections.
One last test was performed to verify the suitability of the Mark
ovian representation of the craniofacialpain care system. Mean transit
times through the care system to one of the terminal states were calcu
lated using the modelgenerated treatment decisions, and each of six
firstvisit patient states. These modelgenerated transit times were
compared to estimates of the same statistics gathered from the patient
records contributed by the university dental clinics. Table 5 presents
the values of both sets of statistics. The close correlation of these
values reveals that the treatmentplanning model not only duplicates the
decisions of experts, but also provides a structure for gathering other
relevant information about the underlying care system.
4.4 bodel Applications
Like the diagnosticclassification model presented in Chapter 3, the
craniofacialpain treatmentplanning model has been structured to permit
its utilization in a variety of applications. Markovian modeling provides
an analytic representation of the craniofacialpain care system as well
as establishing a means of making treatment selections. This section dis
cusses applications of the model's analytic representation and treatment
selections in teaching, in research, and in practice.
The modelgenerated treatment decisions reveal which treatments are
most frequently used in the care of craniofacialpain patients. In a
teaching environment, this information can be used to specify treatment
I
TABLE 5
MEAN TRANSIT TIMES THROUGH THE CRANIOFACIALPAIN CARE SYSTEM
Model Truncated Patient
For a Patient ~Wose First Generated Iodel Record
Diagnostic Classification Was Estimate* Estimate+ EstimateV
MyopathyMyositis 1.50 1.34 1.35
Oral PathologyDental Pathology 1.11 1.04 1.08
Vascular Changes
Migrainous Vascular Changes 3.89 3.42 3.06
Myofacial Pain Dysfunction
Uneven Centric Stops 1.86 1.43 1.50
Myofacial Pain Dysfunction
Anxiety/Depression 3.87 3.47 3.18
Myofacial Pain Dysfunction
Reflex Protective Muscular
Contracture 1.90 1.79 1.87
The values in these sets of estimates are specified in terms of the
number of patient visits in which the patient occupies a nonwell or
nonreferred patient state.
Note: The treatmentplanning model considers the possibility of
'infinite duration' occupancy of nonwell or nonreferred
states.
+ These truncated estimates were generated frcm the treatment
planning model on the conditional basis that a patient must
transit into either the well or the referred state by his
fifth patient visit.
V The maximum number of visits for any patient described by
the clinical data was five patient visits.
application techniques that should be emphasized in training dental stu
dents in craniofacialpain care. Moreover, the parameters employed in
model development, in particular the transition probabilities and refer
ral costs, are themselves valuable instructional materials in developing
the dental student's treatmentselection skills.
The treatmentplanning model provides a method for evaluating new
developments in treatment for craniofacialpain patients. With estimates
of the effectiveness of his new treatment, the researcher can use the
craniofacialpain treatmentplanning model to get two immediate responses.
First, the optimization technique of Section 4.2 will determine if this
new treatment provides 'better care' for the patient than any of the
other treatment alternatives the model has to choose fram. Second, if
optimal treatment selections for the model include the new treatment, the
model's statistics will show improvement in length of stay, and other
relevant measures of treatment effectiveness, introduced by using this
new treatment.
In the office of the practicing dentist, the treatmentplanning mod
el's decisions could provide a concise reference of the treatment selec
tions suggested by experts in the field of craniofacial pain. Moreover,
the practitioner would have a chance to contribute to the refinement of
the listing as the treatment records of his patients could supplement
the data used in model construction. In addition, the practitioner could
employ the statistics associated with the treatmentplanning model in
scheduling the length, and number, of his appointments for craniofacial
pain patients.
Ii
I
CHAPTER 5
CONCLUSIONS AND FUTUPR RESEARCH.
This dissertation has presented analytic models of the decision pro
cesses associated with diagnosing and selecting treatments for a partic
ular healthcare problem. The selection, construction, and testing of
these models have been discussed in sace detail. Meanwhile, the model
building effort itself has been the source of a number of insights into
decisionmaking in a healthcare environment. These insights will be
reflected in this chapter's discussion of the dissertation's central re
search conclusion and suggestions.of topics for future investigation.
The similarity between the decisionmaking processes employed by
the practitioner and the analytic structure of this dissertation's models
is quite revealing. In both diagnosis and treatment planning for cranio
facialpain patients it appears that the practitioner, like the analytic
models, makes 'firstorder' decisions. The linearity of symptom signifi
cance (a firstorder polynomial of symptom weights), and the present
patientstate dependency of transition probabilities measuring treatment
effectiveness (a firstorder stochastic dependence) provide a means of
generating decisions that closely approximate the decisions made by dental
practitioners. This general conclusion on the applicability of first
order decision techniques to craniofacialpain diagnostic classification
and treatment planning characterizes the central development of this
dissertation.
Given this summary statement, there are several logical extensions
to this dissertation's research that should be examined in future inves
tigations. The following suggestions identify some of the more fruitful
areas for further research efforts. These suggestions are ordered in
the author's view of their significance.
1. This dissertation's research found that firstorder decision
making models are valid descriptions of the underlying thought processes
employed by the craniofacialpain practitioner. It is possible that these
firstorder descriptive decisions are 'suboptiral' and that higher order
decisionmaking tools might yield prescriptive, or 'optimal,' diagnostic
classifications and treatment plans for craniofacialpain patients. That
is, considering the interaction between significant symptoms and multiple
state dependency for patientstate transitions may lead to optimal diag
nostic and treatrentselection decisions. As the models themselves can
readily be increased in their decisionmaking 'order,' an investigation
into this possibility would be hampered only by the necessity of collect
ing an elaborate data base. Nevertheless, such an investigation should
be undertaken in this, the most significant, of future research areas.
2. As this dissertation's analytic models can be applied directly
to any healthcare problem where there is verification that practitioners
make firstorder decisions, one potential avenue of future research would
be to isolate those healthenvironments where these kinds of decisions
are made. However, a word of caution is interjected at this point. Math
ematical modeling demands an underlying structure for the process being
modeled. Yet, in a process dealing with a product that is subject to
considerable variation, such as the care of a patient in a healthcare
system, isolating an underlying process structure is difficult. Moreover,
the problem of finding process structure is compounded in the healthcare
field by a lack of unifying and consistent nomenclature. In the health
care field, scholarly literature and historical precedent can serve as
the justification for two or more contradicting sets of terminology for
the same anatomical structure or physiological process. Thus, in re
searching the generality of firstorder decisionmaking techniques, the
investigator must consider process variability and nomenclature incon
sistency before he mrrkes any statement about the applicability of this
dissertation's decisionmaking tools to other healthcare environments.
3. A nongeometric discussion of the criteria for pattern space
separability was presented to provide a means of characterizing health
care disorders for which diagnostic classification by a linear pattern
classifier might be feasible. Unfortunately, this dissertation's tech
niques are heuristic and do not provide an exact reproduction of the
underlying mathematical specifications. Future research in this area
could lead to a precise statement of nongeametric criteria for linear
separability, and thus provide an indirect means for evaluating potential
applications of linear nonparametric classifiers.
4. This dissertation's minimumcost symptomselection algorithm
represents a clear departure frcm previous research in feature selection.
The algorithm's utilization of the convexhull representation of pattern
space separability makes this development unique in the literature of
feature selection. However, the algorithm's method of checking the fea
sibility of potential feature collections is extremely tedious. A more
efficient method to check featurecollection feasibility may be revealed
through future investigations in this area.
II
I
5. From a mathematicalprogramming point of view, the syptam
selection algorithm represents one of a limited number of techniques
capable of solving a problem with nonlinear constraints. The algorithm
seeks an optimal assignment of components, where the feasibility of any
assignment is determined by the existence of a set of discriminating com
ponent multipliers. In this more general context, the structure of the
algorithm may be applicable in a variety of problem areas not directly
related to the featureselection problem. The possibility of employing
the algorithm in this general setting should be investigated.
6. In modeling the treatmentplanning process for craniofacialpain
patients the concept of boundarylevel treatment applications was intro
duced. Boundary numbers on the effects of repeated treatment applications
are likely to occur in data derived from the care of patients with a va
riety of physiological disorders. Further investigations of this phenom
enon may result in more effective methods of predicting which treatments
will have boundarylevel application numbers, and more efficient statis
tical techniques to determine values for these numbers.
7. The training algorithm developed in the construction of the
craniofacialpain diagnostic classifier generates a feasible integer so
lution to a large number of linear constraints. This algorithm is both
efficient and easily coded for computer applications. An investigation
of the uses of this algorithm in a mathematicalprogranming setting may
reveal applications in solution techniques for more general integer pro
grams.
8. Potential applications have been suggested for the diagnostic
classification and treatmentplanning models in teaching, in research,
and in practice. The models and their applications have been presented so
I I
81
that they might readily be employed by sare future investigator. Actual
applications of the models should yield significant contributions to
the effectiveness of the teacher, researcher, and practitioner.
APPENDIX A
CRPNIQFACIALPAIN PATIENT DATA VECTOR
Referral Throu
Sex
005 Male
Ae Group
Duration of Pair
Character of Pai
Change in Charac
001 Medical GP
003 Dental GP
006 Female
008 0 
010 40 
012
013
014
015
.n 016
018
020
022
024
026
ter of Pain
002
004
007
Medical Specialist
Dental Specialist
Female, menopausal
or post menopausal
19 009 20 39
55 011 56 up
Less than 3 weeks
Frcm 3 to 6 weeks
More than 6 weeks
Episodic
Aching 017 Burning
Cutting 019 Discomfort
Dull 021 Pressure
Pricking 023 Sharp
Soreness 025 Stinging
Tenderness 027 Throbbing
028 Constantly getting worse
029 Got worse, then plateaued
030 Got worse, plateaued, then better
031 Getting better
032 Intermittent periods without pain
033 No change since beginning
List of Drugs Taken
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
History of Trauma
Mild Analgesics; Asprin, APC, etc,
Moderate Analgesics (nonnarcoticl
Strong Analgesics: Narcotics and
Synthetic Narcotics
Antianxiety Agents: Mellaril, etc.
Antiarthritic Agents: Steroids, etc.
Antidepressives: Tofranil, etc.
Birth Control Pills
Hormone Preparations
Antiinflammatory Agents.
Muscle Relaxants: Valium
Muscle Relaxants: Meprobaniate
Muscle Relaxants: Others
Sedatives: Barbiturates, etc.
Other Drugs
Accidental
Factitial
Surgical
location of Swelling
09
104
o08
Side
97 ~09^
101 102 10
105 1 10
109/ 110 11
)2 113 1 4
Right
Side
Location of Tenderness
Left
.Side
Location of Pain
Left
Side
Limited Jaw Opening
Joint Sounds
Headaches
244
245
246
247
248
243 Yes
Clicking
Crepitation
Pain accompanying joint sound
Frequent headaches
Headache associated with joint pain
Right
Side
Right
Side
249 Taste
251 Visual acuity
Upper Respiratory Infection
253 In
of
Changes in
Evidence of 254 ArI
255 Eve3
256 Neu
257 Otit
258 Salj
259 Sint
260 Strc
261 Vasc
Facets 262 1 3
Lateral Slide Preraturities
iritis
try's Syndrome
:opathy
tis
vary gland disease
isitis
kes
;ular disease
263 4  up
264 On working side
265 On balancing side
Tooth Ache 266 Yes
Biting Stress Tooth Mobility 267 Yes
Recent Restorative or Dental Prosthesis
Jaw Deviates on Opening 269
Impinrgeent of Coronoid Process
on ZygaCmatic Arch
MeniscusCondyle Dyscoordination
Padio caphic Examination 275
Left
271
268
270
.eft
Yes
Right
272
Right
273 Left 274 Right
Mandibular condyle apposition
(such as spur formation)
Mandibular condyle resorption
(such as flattening of anterior
superior surface or irregular surface)
250 Hearing
252 Perception of light
touch on face
Conjunction with beginning
STMJ pain
I
Radiographic Examination
Emotional T
Bruxism or
Uneven Cent
History of
History of
277
278
279
280
281
282
?rauma 283 Anxiety
Clenching 285 Yes
:ric Stops 286 Yes
Lengthy Dental Procedures
General Anesthesia 28i
ossa' opposition
ossa resorption
rticular eminence apposition
rticular eminence resorption
evidence of fracture
clinical or radiographic
evidence of pathoses
284 Depression
287 Yes
3 Yes
Tinnitus 289 Yes
Extraction of Teeth 290 Less than 6 veks k
291 Leaving a space the
Preauricular Pain 292 Yes
Alteration of InterOcclusal or InterArch Space
prior to '1IT pain
at permits extrusion
293 Yes
Paresthesia 294 Yes
Luxation or Subluxation
295 Yes
APPENDIX B
MODIFIED FIXEDINCREIEN2T TRAINING ALGORITHM
In presenting the modified fixedincrement training algorithm the
following notation is employed:
p = the number of classification categories
t = the number of trainingsample row vectors
(k)
a = training sample row vector number 'k' preclassified
3.
in category 'j', j=l,2,...,p, k=l,2,...,t, and
k=i[mod t] where 'i' is the index of the training
algorithm iteration
(i) th
W = the 'j column of weights (the constraints in the
3
'j 't discriminant function) used in the 'ith'
iteration of the training algorithm, j=l,2,...,p.
a = nonnegative constant specified by the analyst
to adjust the size of the 'dead zone' [23] in dis
cririnant function values, i.e., a > 0
S= positive constant specified by the analyst to adjust
the scale of the weight vectors, i.e., S > 0.
Using this notation, let a) be the i pattern examined by the
algorithm, then
case 1: if a.k) Wi) > ak) W + a for all cj
S_ 3 all c
let W(i+l) = Wi) for all c.
c C
case 2: if ak W) < a (k) (i) + a for a subset B of the
3 3 3 z
p discriminants z E B,
jiB
(i+l) (i) (k)
let W = Ba ] z c B
z z [
W(i+) = W(i) for all c / {B U j}
w: C
c c
and w4(i+) = W + B[a (k)] where nB = the number
of discriminants in
the subset B.
The algorithm is terminated when the values of the W., j=l,2,...,p, have
J3
not changed during a complete cycle of the t training patterns, i.e.,
when W.1w. +2.. .=W for all j where 0 is the last case 2 pattern
j 3
examined by the algorithn.
This algorithm is guaranteed to terminate in a set of feasible
W4, j=1,2,...,p, if the training sample is linearly separable and a and B
have been appropriately selected. If the training sample is linearly
separable, the algorithm will converge for any fixed value of a > 0,
where 8 is selected appropriately large. Hence, the algorithm is nor
mally applied to a training sample with a=0 and B=1. If the algorithm
converges, these constants can be adjusted and the training algorithm
reapplied.
The justification for specifying a nonzero a (a = size of the
dead zone) is that as a is increased the accuracy of the classifier is
increased in making classifications of data not used in developing the
discriminantfunction weights. For example, with the craniofacialpain
diagnostic classifier and the test samples discussed in Section 3.3,
the diagnostic model correctly classified approximately 5% more of the
test samples' data vectors when the model was trained with a=30, 8=3
(versus an original training with a=0, =1l).
Proof that the algorithm converges if feasible weight vectors
*
W., j=l,2,...,p, exist (that is, the sample space is linearly separable)
is developed in Nilsson [22]. Nilsson's proof can be directly applied
*
since for any set of feasible W.
3
(k) (k) *
a. W. > a. W + a
3J  + a
for all k=l,2,...,t, and z=l,2,...,p, zfj, while for any W j=,2,.,p
J
i=l,2,...,
a!k) w(i) < a(k) (i)
3 3 3 z
for sane k and sane z.
Typically, a training algorithm is applied to the members of a
training sample without prior knowledge of whether the sample pattern
space is linearly separable. The algorithm is allowed to process sample
patterns until it either converges on a set of discriminating hyperplanes
or it has run for a 'reasonable' amount of time without termination. Ex
perience with medical data and the modified fixedincrement algorithm
has shown that if there is a set of discriminating hyperplanes, the
algorithm will find it in no more than 3 complete cycles for each of the
pattern classes. For example, if there are 5 pattern classes and the
pattern space can be linearly partitioned, the algorithm should terminate
in no more than 15 full cycles through the training data. This rough
measure of training time provides an index for establishing a limit on
ccrputer processing time.
An application of the modified fixedincrement training algorithm
is presented in Figure 7.
Given the training sample of the form a = [ai,a ,11 where
i2'
1
a = [0,0,11
2 = [1,0,1]
2
3
a3 = [0,1,1]
the training sample patterns can be represented in 3dimensional
space by 1 3
Sa a3
2
A92
The modified fixedincrement algorithm with a = 0 and 8 = 1
proceeds as follows:
Sample
[0,0,1] [ 0
[1,0,1] [ 0
[0,1,1] [1
[0,0,1] [1
[1,0,11 [1
*[0,1,1] [2
*[0,0,1] [2
*[1,0,1] [2
Hence, the se
(* indicates correct sample classification)
W W W aW aW aW
1 2 3 1 2 
, 0, 0] [ 0, 0, 0] [ 0, 0, 0] 0 0
I, 0, 2] [ 0, 0,1] [ 0, 0,1] 2 1 ]
, 0, 1] [ 2, 0, 1] [1, 0,2] 1 1 
.,1, 0] [ 2,1, 0] [1, 2, 0] 0 0
L,1, 2] [ 2,1,1] [1, 2,1] 1 1 
,1, 1] [ 3,1, 0] [1, 2,1] 0 1 ]
,1, 1] [ 3,i, 0] [1, 2,1] 1 0 
,1, 1] [ 3,1, 0] [1, 2,1] 1 3 
>t of weights generated by this training sample is
W = [2,1, 1]
W2 = [ 3,1, 0]
W3 = [1, 2,l].
FIGURE 7
APPLICATION OF THE MODIFIED FIXEDINCREMENT ALGORITHM
3
L
L
L
0
2.
