Citation |

- Permanent Link:
- https://ufdc.ufl.edu/UF00089748/00001
## Material Information- Title:
- Analytical models for diagnostic classification and treatment planning for craniofacial pain.
- Series Title:
- Analytical models for diagnostic classification and treatment planning for craniofacial pain.
- Creator:
- Leonard, Michael Steven
- Publisher:
- Michael Steven Leonard
- Publication Date:
- 1973
- Language:
- English
## Subjects- Subjects / Keywords:
- Cost estimates ( jstor )
Discriminants ( jstor ) Facial pain ( jstor ) Information classification ( jstor ) Mathematical vectors ( jstor ) Matrices ( jstor ) Modeling ( jstor ) Statistical models ( jstor ) Symptomatology ( jstor ) Transition probabilities ( jstor )
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Resource Identifier:
- 022771867 ( alephbibnum )
14057039 ( oclc )
## UFDC Membership |

Downloads |

## This item has the following downloads:
oai.xml
UF00089748_00001.pdf 00006.txt UF00089748_00001_0036.txt UF00089748_00001_0097.txt Copyright2.txt UF00089748_00001_0017.txt 00026.txt 00047.txt UF00089748_00001_0004.txt 00080.txt 00058.txt UF00089748_00001_0074.txt UF00089748_00001_0123.txt 00105.txt 00060.txt 00054.txt 00092.txt UF00089748_00001_0107.txt UF00089748_00001_0073.txt UF00089748_00001_0095.txt UF00089748_00001_0126.txt 00051.txt UF00089748_00001_0029.txt 00055.txt 00061.txt UF00089748_00001_0081.txt 00067.txt Copyright1.txt 00037.txt UF00089748_00001_0131.txt UF00089748_00001_0101.txt 00033.txt 00100.txt UF00089748_00001_0058.txt UF00089748_00001_0039.txt 00096.txt UF00089748_00001_0070.txt UF00089748_00001_0022.txt UF00089748_00001_0033.txt UF00089748_00001_0051.txt 00108.txt UF00089748_00001_0030.txt 00062.txt UF00089748_00001_0047.txt UF00089748_00001_0079.txt 00002.txt UF00089748_00001_0021.txt 00112.txt UF00089748_00001_0060.txt UF00089748_00001_0035.txt UF00089748_00001_0104.txt 00076.txt 00057.txt UF00089748_00001_0114.txt UF00089748_00001_0071.txt 00087.txt 00066.txt 00073.txt 00075.txt UF00089748_00001_0135.txt UF00089748_00001_0023.txt UF00089748_00001_0018.txt UF00089748_00001_0054.txt 00007.txt UF00089748_00001_0077.txt 00127.txt 00027.txt 00063.txt UF00089748_00001_0012.txt 00114.txt 00091.txt 00071.txt 00120.txt UF00089748_00001_0085.txt 00059.txt UF00089748_00001_0037.txt UF00089748_00001_0062.txt UF00089748_00001_0125.txt UF00089748_00001_0001.txt 00042.txt UF00089748_00001_0118.txt UF00089748_00001_0055.txt UF00089748_00001_0121.txt 00012.txt UF00089748_00001_0130.txt UF00089748_00001_0020.txt 00125.txt 00023.txt 00039.txt UF00089748_00001_0115.txt UF00089748_00001_0009.txt 00122.txt UF00089748_00001_0129.txt UF00089748_00001_0113.txt UF00089748_00001_0072.txt 00133.txt 00072.txt 00081.txt 00020.txt UF00089748_00001_0043.txt 00038.txt UF00089748_00001_0064.txt UF00089748_00001_0063.txt 00101.txt 00011.txt UF00089748_00001_0014.txt UF00089748_00001_0103.txt 00034.txt 00010.txt 00083.txt 00024.txt 00110.txt 00093.txt 00117.txt UF00089748_00001_0011.txt 00022.txt UF00089748_00001_0094.txt 00119.txt UF00089748_00001_0106.txt 00111.txt UF00089748_00001_0111.txt 00019.txt UF00089748_00001_0087.txt 00126.txt UF00089748_00001_0040.txt UF00089748_00001_0061.txt UF00089748_00001_0116.txt UF00089748_00001_0099.txt UF00089748_00001_0002.txt UF00089748_00001_0133.txt UF00089748_00001_0015.txt 00070.txt UF00089748_00001_0105.txt 00032.txt 00068.txt 00107.txt UF00089748_00001_0076.txt UF00089748_00001_0102.txt 00128.txt UF00089748_00001_0048.txt 00064.txt 00008.txt UF00089748_00001_0128.txt 00035.txt UF00089748_00001_0044.txt 00095.txt UF00089748_00001_0013.txt 00090.txt UF00089748_00001_0006.txt UF00089748_00001_0084.txt 00016.txt UF00089748_00001_0024.txt UF00089748_00001_0042.txt 00116.txt 00118.txt UF00089748_00001_0068.txt 00005.txt 00103.txt 00017.txt oai_xml.txt 00097.txt UF00089748_00001_0003.txt 00050.txt 00121.txt UF00089748_00001_0028.txt 00085.txt 00018.txt 00098.txt UF00089748_00001_0053.txt 00113.txt 00052.txt UF00089748_00001_0008.txt 00084.txt 00069.txt UF00089748_00001_0100.txt 00004.txt UF00089748_00001_0069.txt UF00089748_00001_0059.txt UF00089748_00001_0027.txt UF00089748_00001_0005.txt 00088.txt UF00089748_00001_0078.txt UF00089748_00001_0120.txt UF00089748_00001_0110.txt UF00089748_00001_0122.txt UF00089748_00001_0082.txt 00029.txt UF00089748_00001_0046.txt UF00089748_00001_0088.txt 00074.txt UF00089748_00001_0041.txt UF00089748_00001_0019.txt 00132.txt UF00089748_00001_0112.txt 00077.txt UF00089748_00001_0119.txt UF00089748_00001_0108.txt 00041.txt 00053.txt UF00089748_00001_0117.txt 00104.txt UF00089748_00001_0083.txt 00115.txt 00078.txt UF00089748_00001_0086.txt 00131.txt 00021.txt UF00089748_00001_0057.txt 00028.txt 00031.txt 00009.txt UF00089748_00001_0091.txt 00046.txt UF00089748_00001_0127.txt UF00089748_00001_0031.txt UF00089748_00001_0050.txt UF00089748_00001_0075.txt UF00089748_00001_0093.txt UF00089748_00001_0056.txt 00044.txt 00013.txt UF00089748_00001_0092.txt 00001.txt 00109.txt 00099.txt 00102.txt UF00089748_00001_0026.txt 00040.txt UF00089748_00001_pdf.txt 00129.txt 00094.txt 00014.txt 00086.txt UF00089748_00001_0010.txt UF00089748_00001_0034.txt UF00089748_00001_0096.txt 00130.txt 00049.txt 00079.txt UF00089748_00001_0124.txt 00048.txt UF00089748_00001_0032.txt 00123.txt UF00089748_00001_0090.txt UF00089748_00001_0067.txt UF00089748_00001_0080.txt UF00089748_00001_0134.txt 00065.txt UF00089748_00001_0098.txt 00106.txt UF00089748_00001_0089.txt 00015.txt 00056.txt UF00089748_00001_0109.txt 00045.txt UF00089748_00001_0016.txt UF00089748_00001_0049.txt UF00089748_00001_0065.txt UF00089748_00001_0132.txt UF00089748_00001_0025.txt UF00089748_00001_0066.txt 00030.txt UF00089748_00001_0045.txt 00089.txt UF00089748_00001_0052.txt 00082.txt UF00089748_00001_0007.txt 00036.txt 00124.txt 00043.txt 00025.txt UF00089748_00001_0038.txt 00003.txt |

Full Text |

ANALYTICAL MODELS FOR DIAGNOSTIC CLASSIFICATION AND TREATMENT PLANNING FOR CRANIOFACIAL PAIN By Michael Steven Leonard A DISSERITTION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FUIFIJThENT OF THE REQUIPFE~ETS FOR THE DEGREE OF DOCIOR OF PHILSOPHY UNIVERSITY OF FLORIDA 1973 To my wife, Mary ACKNOEDGEMENTS Without the considerable contributions of time and effort by the members of his committee, it would have been impossible for the author to have canpleted this dissertation. In particular, the author expresses gratitude to his Chairman, Dr. Kerry Kilpatrick, for his encouragement and direction during the course of this research effort. The author also thanks Dr. Kilpatrick for his editorial assistance during the development and organization of this manuscript. The author thanks Dr. Richard Mackenzie and Dr. Stephen Roberts for providing the initial direction for this research. Additionally, the author is grateful to Dr. Than Hodgson and Dr. Donald Ratliff for their assistance in evaluating and refining the author's ideas throughout this project. The author expresses his gratitude to Dr. Thomas Fast and Dr. Parker Mahan for the contribution of their extensive knowledge about craniofacial pain to the author's research. The author is deeply appreciative of Dr. Fast's and Dr. Mahan's willing- ness to spend many hours examining dental records and their endurance of the nomenclature and idiosyncracies of this mathematical-modeling effort. Financial support for this research was provided by the Health Systems Research Division, J. Hillis Miller Health Center. The division's sup- port in conjunction with a traineeship granted by the National Science Foundation made it possible for the author to undertake this research. The author is also grateful to the Industrial and Systems Engineering Department for the contribution of computer funds. Additionally the au- thor thanks Dr. William Solberg, University of California at Los Angeles; Dr. Daniel Laskin, University of Illinois; and Dr. David Mitchell, University of Indiana, for providing access to the patient records employed in this modeling effort. The author would like to express his thanks to the secretarial staff of the Health Systems Research Division for their translation of the au- thor's 'first-order' approximation to handwriting into a draft of this manuscript. Their tolerance of a multitude of last minute changes made by the author has been appreciated. Finally, the author thanks his wife, Mary, and his parents, Dorothy and Charles Leonard, for their encouragement and support throughout the course of this research. M.S.L. August, 1973 TABLE OF CONTENTS ACKNOIW EDGI ME TS ................................................. LIST OF TABLES............... .................................... LIST OF FIGURES................................................... ABSTRACT o o ...................... ............................ Chapter 1. Introduction ......................................... 1.1 Craniofacial Pain............................... 1.2 Research Objective.............................. 1.3 Dissertation Overview........................... 2. Previous Research..................................... 2.1 Bayesian Classification Models.................... 2.2 Non-Parametric Classification Models.............. 2.3 Finite-Horizon Treatment Planning................. 2.4 Uncertain-Duration Treatment Planning............ 3. Diagnostic Classification........................... 3.1 Model Components.................................. 3.2 Alternative Interpretations of Linear Separability 3.3 Model Validation ................................. 3.4 Minimum-Cost Symptcn-Selection Algorithm......... 3.4.1 Algorithm Development...................... iii vii viii ix 1 2 8 9 10 10 13 14 15 17 17 26 31 36 39 3.4.2 Statement of the Minimum-Cost Symptom- Selection Algorithmn....................... 3.4.3 Computational Considerations.............. 3.5 Model Applications .................................. 4. Treatment Planning.................................... 4.1 Model Camponents........ .................... 4.1.1 Patient States........................... 4.1.2 Transition Probabilities.................. 4.1.3 Cost Structure............................ 4.2 Selection of Optimal Treatments.................. 4.3 Model Validation ............................... 4.4 Model Applications................................ 5. Conclusions and Future Research ........................ Appendices A Craniofacial-Pain Patient Data Vector.................. B Modified Fixed-Increment Training Algorithm............ C Application of the Minimum-Cost Symptom-Selection Algorithm ............................................... D Treatment Alternatives for Craniofacial-Pain Patients... E Stability of Transition-Probability Estimates........... F Flow Charts of Patient-State Transitions ............... G Patient-State Treatment Selections..................... H Application of the Patient-State-Labeling and Optimal- Treatment-Selection Procedure ......................... BIBLIOGRAPHY...................................................... BIOGRAPHICAL SKETCH.............................................. 55 56 57 59 59 60 63 67 70 72 74 77 82 87 91 97 99 104 110 114 118 121 LIST OF TABLES Tables 1. Survey of Diagnostic-Classification YMdels.............. 12 2. Correlation Between Significant Symptoms and Discriminant-Function Weights........................... 30 3. Tests of Diagnostic Classifier Accuracy................. 32 4. Classification Variability Among Dental Practitioners... 35 5. Mean Transit Times Through the Craniofacial-Pain Care System ............... ............................... 75 vii LIST OF FIGURES Figures 1. Terporanandibular Joint ................ ............. 3 2. Diagnostic-Classification and Treatment-Planning Process for Craniofacial Pain......................... 7 3. Craniofacial-Pain Diagnostic Alternatives............... 18 4. Procedure 2........... .......... .................... 53 5. Diagnostic-Classification Transitions .................. 64 6. Patient-Visit Inconvenience Cost....................... 69 7. Application of the Mbdified Fixed-Increment Algorithm... 90 8. Multiple-State History-Augnented Process............. 115 viii Abstract of. Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirement for the Degree of Doctor of Philosophy ANALYTICAL MODELS FOR DIAGNOSTIC CLASSIFICATION AND TREATMENT PLANNING FOR CRANIOFACIAL PAIN By Michael Steven Leonard December, 1973 Chairman: Dr. Kerry E. Kilpatrick Major Department: Industrial and Systems Engineering This dissertation presents a systematic approach to craniofacial- pain diagnosis and treatment planning using analytic models of the under- lying decision-making processes. Patient diagnoses are generated by a linear pattern-recognition classifier trained with a sample of preclas- sified craniofacial-pain patient data. For this classifier, an algorithm is developed that minimizes the total cost of the set of features employed in the classifying process. Diagnostic classifications, augmented by a history of prior treatment applications, provide the state descriptions for a Markovian decision model of the treatment-planning process. Cranio- facial-pain patient records from four university dental clinics serve as a data base for model construction and validation. The analytic models provide a means of duplicating the diagnostic classifications and treatment plans of experts. Approximately 90% of the diagnostic classifier's classifications and 93% of the treatment- planning model's treatment selections concurred with the decisions made by experts in the field of care for craniofacial-pain patients. Moreover, the models permit an examination of the critical considerations associated with both decision-making processes. These capabilities are discussed in terms of applications of the models in teaching, research, and in the practice of dentistry. CHAPTER 1 INTRODUCTION The rapid pace of developments in medical and dental research pre- vents the practicing physician and dentist from fully utilizing each new diagnostic and treatment-planning aid as it is published. In each of the last four years an average of 215,000 new publications have been written to supplement the knowledge of the health-care practitioner [1]. Con- currently, the pressures of an ever-increasing patient load force prac- titioners to select the most expeditious means for diagnosing disorders and selecting treatments. For example, the medical general-practitioner (1970) saw an average of 173 patients a week [2], and the median dental practitioner (1971) saw two patients an hour [3]. Given these circu~r- stances, practitioners may overlook possible diagnostic and treatment al- ternatives or they may apply inappropriate treatments. If meaningful analytic descriptions of the diagnostic and treatment-planning processes can be developed, these models can assist educators in training new prac- titioners, researchers in evaluating and disseminating new developments, and practitioners in improving the quality of patient care [4]. Developing models of the diagnostic-classification and treatment- planning process requires an understanding of the underlying physiological processes of diseases and the mechanisms of their cures. Obviously, the effects of disease and the means of cure vary from one health-care prob- lem to another. Thus, modeling efforts in diagnosis and treatment plan- ning must be integrally related to the facet of health care that is under study. This reality prohibits the model builder from making broad state- ments about the applicability of his models to other health-care environ- ments. Accordingly, the models developed in this dissertation are spe- cifically oriented toward the health-care problem presented in Section 1.1 with the understanding that the results of this modeling effort may not be applicable to the whole of health-care diagnosis and treatment planning. 1.1 Craniofacial Pain The head and face are subject to chronic, persistent, or recurrent pain more often than any other portion of the body. Pain in the head or face has a greater significance to patients than any other pain. It may arouse fears that the patient is in danger of losing his mind or that he has a tumor of the brain. In addition, the emotional state of the patient is adversely influenced because it is generally known by the layman that the profession's knowledge of the causes of these pains is meager and that methods of treat- ment are inadequate [5, p. v]. H. Houston Yerritt, M.D., Dean Columbia University College of Physicians and Surgeons One source of the pain Dr. Yerritt describes is dysfunction of the temporomandibular joint. The temporamandibular joint, see Figure 1, provides the articulation between the mandible and the cranium. This joint is unique both in its structure and its function. Within the plane of the temporcmandibular joint, lateral, vertical and pivoting motion is permitted. In addition, the joint is the point of articulation for the only articulated complex that contains teeth. With this joint, "motion is directed more by the musculature and less by the shape of the artic- ulating bones and ligaments than is the fact for other joints" [5, p. 34]. The fact that joint motion is highly dependent on musculature im- plies that then mandibular dysfunction occurs there is same disturbance Right tempornandibular articulation Inset: Anatomical features of the temporoandibular joint Mandibular Fossa Articular Eminence -- Meniscus Mandibular Condyle FIGURE 1 4EMPOKRIOANDIBULAR JOINT of the intricate neuromuscular mechanisms controlling mandibular move- ment [5]. Emotional tension may also lead to hypertonicity of the striated masticatory muscles resulting in facial pain or altered sensa- tion without evidence of peripheral dysfunction. In addition, abnormal occlusal contacts of the teeth may affect muscle tonicity resulting in mandibular dysfunction [5]. Moreover, the temporamandibular joint is prone to disorders common to all joints: rheumatoid arthritis, osteo- arthritis, traumatic injuries, neoplasms, and nonarticular disorders. Although the term 'craniofacial-pain' is a broad classification for pain in the head and face, the term is used in this dissertation to describe pathological, congenital, hereditary-based, or emotional causes of pain in and around the temporcmandibular joint. Though the degree of severity may vary, one or more of the following four 'cardinal smptcnrs' are exhibited by the craniofacial-pain patient: pain, joint sounds, limitation of motion, and tenderness in the mastic- atory muscles [6]. Accampanying these symptcas the patient may complain of, or the practitioner may find, hearing loss, burning sensations, mi- graine-like headaches, vertigo, tinnitus, subluxation, luxation, dental pulpitis, sinus disease, glandular disorders, occlusal disharmony, and radiographic evidence of joint abnormality. The degree of association of these additional symptoms and findings with the etiology of the joint disorders is subject to considerable variation. Paralleling these areas of anatomic dysfunction is the possibility that the craniofacial-pain patient may be suffering from psychic dis- orders. In no other type of patient seen by the dentist does psychic condition play a larger role [7]. Most craniofacial-pain patients have symptoms or signs of anxiety, and a sensory preoccupation with the oc- clusion of their teeth [8]. Many of these patients can be characterized by a heavy reliance on denial, repression, and projection of their psy- chic disorders in order to maintain their self-concept of emotional sta- bility [6]. Often the complaints these patients relate to the practi- tioner are not compatible with any objective signs. The practitioner who manages the care of craniofacial-pain patients assumes a difficult task. For same of these patients, diagnosis is ob- vious. Generally, however, the craniofacial-pain patient presents a cmn- plex combination of signs and symptoms [7]. More than one disease en- tity normally accounts for the patient's symptoms and most craniofacial- pain patients suffer from a pain-dysfunction complex involving a ccmbina- tion of masticatory muscle disorders, occlusal disharmony, emotional tension, and anxiety [5]. Nevertheless the possibility of multiple almost sub-clinical etiologic factors combining to produce the dysfunc- tion and pain must be considered. The close relationship of organic and emotional disorders as they appear in craniofacial-pain patients provides the examining dentist with the problem of discriminating which factor is primary in the etiology of the patient's dysfunction [7]. Unfortunately, the terporcarndibular joint is one of the most difficult areas of the body to examine radiographically [8]. Hence, with these patients, the dentist relies to a large degree on tests of emotional stability and physical examination by visualization, palpation, and auscultation [7]. Therapeutic measures for the care of craniofacial-pain patients are as varied as the factors contributing to the disorder. "A small percent- age of patients with symptoms referrable to the temporamandibular joint will portray such a confusing picture that consultation with other. dental or medical specialists is indicated" [7, p. 129]. The majority of these patients will exhibit symptoms that lead to any one of several alterna- tive courses of patient care. Altering the occlusion of the natural teeth is one means of treating craniofacial-pain patients. Although in many cases minor occlusal abnormalities are only contributing factors to a patient's pain, attention by the dentist to occlusion is at least partially successful for a majority of craniofacial-pain patients [8]. However, it is important in early therapy not to alter the occlusion ir- reversibly. Treatment by means of tooth extraction or endodontics, jaw fixation, prosthetic devices, or by topical treatments may also be sug- gested by the patient's symptoms. The articular surface of the mandib- ular condyle has an excellent reparative capacity [6]. Thus, the use of sedatives, antibiotics, and muscle relaxants, along with physical therapy, oftenleads to patient 'cures' as these treatments ease the patient's pain and increase jaw mobility while natural restoration of the joint is in progress. If, after a reasonable length of time (3 to 6 months) the pa- tient's symptoms are not relieved, the dentist may consider referral to another source of care or therapy such as surgery [7]. Typically, the health-care process for craniofacial-pain patients may be viewed as following the format of Figure 2 [9]. When a patient is admitted into the care system, he undergoes a data-collection process. This involves taking a 'full and pertinent' patient history and a phys- ical examination of the areas of discomfort. The data gathered consist of symptoms, signs, medical and/or dental history, physical examination findings, psychosocial information, and so forth. Once these elements have been elicited, a diagnosis is attempted. If this is not yet pos- sible, the severe symptoms are treated and the patient's health state is monitored. FIGURE 2 DAMOSTIC-=ASSIFICATION AND TREIMEnT- PLANNING PR=SS FOR CRANIOFACIAL PAIN When initial treatment does not result in a 'cure' for the cranio- facial-pain patient, treatment effects are evaluated and new data col- lected. When a patient's diagnostic classification leads to a course of treatment that is not within the realm cf the practitioner's special- ty he is referred to a more appropriate care source. 1Mnitoring is con- tinued on those patients not rejected frcm the system at this point, and the patient is discharged when he is symptcn-free. However, when other disorders have been isolated during the course of treatment, the patient is recycled through the classification-treatment process. The diagnosis-treatment sequence is not fixed. Treatment can begin prior to a diagnostic classification or treatment can follow a diagnosis. Moreover, there may be many diagnostic-treatment data-acquisition cycles before the patient is considered 'vell.' 1.2 Research Objective .The introductory discussion of the need for diagnostic and treatment- planning models, and the brief description of the craniofacial-pain care system, provide the setting for a statement of the research objective un- derlying this dissertation. This objective is to derive analytic repre- sentations of the decision processes involved in selecting diagnostic classifications and planning treatments for craniofacial-pain patients. A diagnostic-classification model that duplicates the classification of expert practitioners is sought. For treatment planning, the modeling goal is to provide a structure for interaction of the critical considera- tions associated with the treatment-selection process. These analytic representations will be structured to permit their application as teaching devices in the training of dental practitioners, as methods of testing the effects of new diagnostic tools and treatment applications, and as aids to the practice of dentistry. This research objective will be met by developing: 1. A diagnostic-classification model based on the theory of non-parametric pattern classification, with a. criteria for applicability of the modeling technique to diagnostic classification b. model validation for craniofacial-pain patients c. development of a minimum-cost symptom-selection algorithm 2. A Markovian representation of the treatment-selection process, with a. justification for utilizing a Markovian model of the underlying care system b. model validation for craniofacial-pain patients 3. A description of potential model applications in teaching, research, and practice. 1.3 Dissertation Overview In Chapter 1 the motivation and scope of this dissertation was pre- sented. Chapter 2 provides a review of literature relevant to the diag- nostic and treatment-selection processes. A model of the diagnostic- classification process is developed in Chapter 3. Chapter 4 follows with an analytic representation of the treatment-planning process. Con- clusions derived from this model-building effort, and suggestions for future research, are presented in Chapter 5. I ' CHAPTER 2 PREVIOUS RESEARCH Over three-hundred publications have been addressed to the problem of modeling the diagnostic and treatment-planning process. Spanning -fourteen years, this research has considered such diverse problems as the classification of liver biopsies [10] and the optimal plan for treating mid-shaft fractures of the femur [11]. At least ninety-one disorders have been utilized as environments for developing diagnostic and treatment-planning models. The magnitude of this research effort emphasizes the need for analytic representations of these complex deci- sion-making processes. Fortunately, the significant contributions in this voluminous literature can be neatly partitioned into four distinct categories. Re-- search in diagnostic classification has been based either on the applica- tion of Bayesian statistics or on the use of non-parametric pattern classifiers. Treatment planning has been presented as either a finite- horizon decision problem or as an application of decision analysis to a Markov process of uncertain duration. This section presents a brief dis- cussion of each of these categories and evaluates their suitability as analytic representations of the process of providing health care for craniofacial-pain patients. 2.1 Bayesian Classification Models Bayesian diagnostic-classification models, such as [12, 13, 14, 15, 16], make a diagnosis on the basis of selecting a patient's 'most probable' disease state. The Bayesian classifier is an elementary type of parametric pattern-classification model. In general, parametric classifiers make use of one or more of the statistical characteristics of the dispersion of the data being classified to establish rules for data classification. With the Bayesian models, only the conditional probabilities for exhibiting sets of synptcmns, given a particular dis- ease, are tabulated from past medical data. Then, utilizing Bayes' theorem, the probabilities for the presence of alternate diseases dl,d2,...,dn can be calculated as a function of the symptcm-cciplex S the practitioner observes in the patient. Bayes' theorem provides that for each of the d. P(djIS) = C(S)P(Sdi)P (di) n where C(S) = 1/[Z P(Sjic)P(dk)], k=l hence, a patient with symptcm-ccmplex S is classified in disease-group i if P(dilS) = max p(dIS). k A survey of the results of application of Bayesian models is given in Table 1. Although the percentage of correct diagnoses in most of these test applications is high, there are several reasons why a Bayesian diagnos- tic model is not used as the means of generating diagnostic classifica- tion in this dissertation. The first reason is the difficulty in ac- quiring the proportional presence of alternate diseases P(di), i=l,2,...,n, in the population of patients that are to be classified by the model. These 'prior' probabilities of having a particular disease are a function TABLE 1 SURVEY OF DIAGNOSTIC-CLASSIFICATION MODELS Bayesian Classifiers Reference Number Disease Group Number Of Patients In Study % Correct Patient Diagnoses Nontoxic Goiter 88 Bone Tumor 77 Thyroid 268 Congenital Heart 202 Gastric Ulcer 14 Non-Parametric Classifiers Reference Number [17] [18] [19] [20] Disease Group Liver Asthma Hematologic Thyroid Number Of Patients In Study 52 230 49 225 % Correct Patient Diagnoses 98.1 90.0 93.9 96.0 [12] [13] [14] [15] [16] 85.3 77.9 96.3 90.0 100.0 of seasonal variation, geographic location, population demography, and many other factors. Secondly, valid Bayesian analysis requires the analyst to determine the dependence among exhibited symptoms for each disease considered by the diagnostic model. In this respect, the prob- abilities for the presence of groups of symptoms are independent for saoe diagnostic alternatives and strongly correlated for others [4]. The third reason for not selecting a Bayesian model is the massive storage requirement dictated by the necessity of keeping the set of conditional probabilities. These conditionals, P(S di) for every observable symptcm- complex S and. every disease i considered, must be at hand each time the model is used. For example, given ten alternate diseases and ten symp- toms for which no assumptions of between-symptma independence can be made, storage is required for 10 (210-1), or 10,230, conditional probabilities. 2.2 Non-Parametric Classification models Non-parametric diagnostic models, like [17, 18, 19, 20], utilize non-parametric pattern classifiers, a form of pattern recognition model- ing. In the literature on pattern recognition, the term 'non-parametric' implies that no form of probability distribution is assumed for the dispersion of symptom data in establishing the rules for pattern classi- fication. These models do assume, however, that classes of symptno data are distinct entities and, hence, a patient with a particular set of symptom S cannot simultaneously occupy more than one diagnostic state. That is, the models assume a deterministic classification for each pat- tern viewed by the pattern classifier where every observable pattern has one, and only one, correct classification. Non-parametric modeling permits the analyst to bypass the difficult problems of explicitly determiinng the conditional probabilities for, and the dependence. among, symptams that are required for Bayesian analysis. With the non-parametric classifier, a diagnosis is generated for the practitioner by evaluating a discriminant function associated with each diagnostic classification, gi(.), i=1,2,...,n. As was the case with the Bayesian models, the values of these discriminants are a function of the symptcm-ccrplex S exhibited by the patient. The patient's diagnostic classification corresponds to that disease whose associated discriminant- function value is maximum. That is, a patient with symptoms S is classi- fied in disease-group i if gi(S)>gk(S) for all k 7 i. Results frcm scae of the applications of pattern-recognition classi- fiers are presented in Table 1. In these test applications diagnostic accuracy was consistently high. Because of these models' ease of imple- rentation and small storage requirements, a non-parametric pattern classi- fier is preferable as a vehicle for generating diagnostic classifications. The use of a non-parametric classifier is further motivated by features of the care process for craniofacial-pain patients discussed in Chapter 3. 2.3 Finite-Horizon Treatment Planning In the realm of research on modeling the treatment-planning process, several authors [9, 21, 22] have presented schemes for analysis that utilize methods for making decisions under risk and uncertainty. The treatment-selection process has alternately been defined as a two-person zero-sum game, structured as a decision tree, and modeled as a Markov process of limited duration. Treatment costs and the 'costs' of occupy- ing 'non well' or terminal patient states, provide the basis for select- ing an 'optimal' treatment plan. Finiteness of the planning horizon is assured either by establishing a maximum permissible number of treatment applications, or by considering at any stage of analysis the effects of a fixed number of future treatments. Validation of the decisions gen- erated by these models has thus far been limited to checks on the feasi- bility of the treatment regimens selected. Unfortunately, the finite- horizon models either do not consider the possibility of a patient's prolonged stay in the health-care system, as is the case of the models with a maximum number of possible treatments, or, where only a fixed number of future treatments is considered, they provide no more than a heuristic treatment-selection procedure. 2.4 Uncertain-Duration Treatment Planning Bunch and Andrew [11] have considered the possibility of prolonged occupation of the same diagnostic state during the course of a patient's progression through the care system. In their Markovian representation of the care system for mid-shaft fractures of the femur, they provide this modeling refinement. As a consequence of this modification, the number of treatment decisions made for each patient is a random variable with no fixed upper bound. Howard's iterative scheme for policy selec- tion [25] provides the means for choosing the optimal treatment regimen by selecting treatment alternatives that maximize the relative 'value' of occupying each disease state. Although the Bunch and Andrew model did not consider return visits to the same disease state, a more generalized Markovian representation could incorporate that possibility. Neverthe- less, the proximity to reality that this category of transient Markovian models provides requires considerable effort as holding-time distribu- tions, treatment 'costs,' and transition probabilities must be supplied by the analyst for all treatment alternatives at each of the disease states in the care system. The data collected on craniofacial-pain patient progressions through the care system reveal that both prolonged occupation of a single diagnostic state and return visits to the same state occur fre- quently. Moreover, as will be discussed in Chapter 4, there are several characteristics of the craniofacial-pain care system that permit reduc- tions in the number of input parameters required for a transient Markovian model of this system. Therefore, an uncertain-duration transient Markovian representation of the health-care process has been selected as the means of evaluating the effectiveness of alternative treatment regi- mens on patients with craniofacial pain. CHAPTER 3 DIAGNOSTIC CLASSIFICATION The analytic model developed to provide diagnostic classifications for craniofacial-pain patients is based on the principles employed in non-parametric pattern classification. The patterns classified by this diagnostic model are vector representations (see Section 3.1 and Appen- dix A) of the craniofacial-pain patient's physical and emotional status. In the first sections of this chapter the' theoretical background for the diagnostic model is established. This discussion is followed by a pre- sentation of the validation procedures used to evaluate model perfor- mance. Next, an algorithm is developed to reduce the 'costs' associated with model utilization. The chapter closes with a discussion of poten- tial applications of the craniofacial-pain diagnostic classifier in teaching, in research, and in the health-care process. 3.1 Model Components In the initial phase of the development of the diagnostic-classi- fication model a set of possible alternative diagnostic classifications was established for craniofacial-pain patients. Figure 3 provides a list of these possible classifications. Note that the alternative classi- fications in Figure 3 are not mutually exclusive as a craniofacial-pain patient classified in same diagnostic alternative 'A' could also have the disorder specified by sane other diagnostic alternative 'B.' However, for the purposes of this dissertation, each patient's diagnostic 18 1. Temporomandibular Joint Arthritis -Developmental 2. Temporamandibular Joint Arthritis -Infectious 3. Temporoaandibular Joint Arthritis--Osteo (Degenerative) 4. Temporamandibular Joint Arthritis--Traumatic (Acute) 5. Temporamandibular Joint Arthritis--Traumatic (Chronic) 6. Myopathy-Acute Trauma 7. Myopathy--Myositis 8. Oral Pathology-Dental Pathology 9. Vascular Changes--Migrainous Vascular Changes 10. Myofacial Pain-Dysfunction Malocclusion-Balancing Interferences 11. Myofacial Pain-Dysfunction Malocclusion-Lateral Deviation of Slide 12. Myofacial Pain-Dysfunction Malocclusion-Uneven Centric Stops 13. Myofacial Pain-Dysfunction Psychoneurosis-Anxiety/Depression 14. Myofacial Pain-Dysfunction Bruxism 15. Myofacial Pain-Dysfunction Reflex Protective Muscular Contracture 16. Myofacial Pain-Dysfunction Loss of Posterior Occlusion 17. Neuropathy FIGURE 3 CRANIOFACIAL-PAIN DIAGNOSTIC ALTERNATIVES I classification is made on the basis of specifying that etiological fac- tor that requires most immediate action on the part of the attending practitioner. Thus, diagnostic classification of a patient into diag- nostic alternative 'A' signals that the etiology specified by that al- ternative should determine the course of the patient's care. The next step in model development isolated relevant data which measured the physiological and psychological status of craniofacial-pain patients. In particular, this step of model development sought those elements of patient status that practitioners employ in their own classi- fication of craniofacial-pain patients. Appendix A presents a list of these data elements. Wherever it was feasible, measures of patient status were segmented to amplify the significance of particular readings of each measure. Thus, for example, while the duration of a patient's pain is a continuous measure of his status, it is important for the pur- poses of classification to know whether a craniofacial-pain patient's duration of pain is less than 3 weeks, from 3 to 6 weeks, or longer than 6 weeks. For this measure of patient status, a short history of pain indicates a strong possibility of a recent traumatic injury while pain over a long period is more likely associated with long standing arthritic or psychic disorders. To facilitate the development of an analytic model of the diagnostic- classification process, a vector representation of the relevant elements of patient data has been developed. The vector permits the notation of any of the data elements shown in the listing in Appendix A. The pre- sence of any of the items found in Appendix A is recorded in a patient's data vector by an entry of '1' in the vector-dimension corresponding to the item number, while the absence of a vector item is noted by a '0' data-vector entry. For example, referring to the listing in Appendix A, *a male patient would have the following fifth, sixth, and seventh ele- ments in his data vector (...,1,0,0,...), while a pre-menopausal female would have the series of elements (...,0,1,0,...). This vector notation of a patient's status serves as the input data for a non-parametric pattern classifier that assigns a diagnostic classifica- tion to the patient's dysfunction. Non-parametric pattern classification, as described in Meisel [23] and Nilsson [24], is the process of creating decision surfaces that separate patterns into homogeneous classes, C, i=1,2,...,p, specified by the analyst. In the craniofacial-pain diagnostic model, the Ci are the diagnostic alternatives shown in Figure 3. Classification of a pat- tern (a patient's-data-vector) into one of the classes is performed by a pattern classifier composed of a maximum detector and a set of dis- criminant functions. These discriminants, g (a), j=l,2,...,p, are single- valued functions of each patient's data-vector a. If a. represents a data vector for a patient whose correct diagnostic classification is the ith diagnostic alternative, then the gj(a) are chosen so that gi(ai)>gj(ai) i, j=l,2,...,p, j-i. The craniofacial-pain classifier uses linear discriminant functions. These discriminants are linear in the sense that they provide mappings from E" to El that exhibit the form gj (a) = all+a2 j2+...+anjnj (n+l) where in the patient-data-vector a, the value of ar denotes the presence (ar = 1) or absence (ar = 0) of patient-data-vector item r; and the Wjk. k=l,2,...,n+l, are constants associated with the j discriminant function called 'weights.' These discriminant-function weights, Wjk, j=l,2,...,p, k=l,2,...,n+l, provide an analytic means of duplicating the correct classification of each pattern observed by the non-parametric classifier. They provide a link between a pattern's correct classifica- tion and the individual components of the pattern's vector representa- tion. In essence, each discriminant's weights are additive elements whose component sums have significance in terms of a isolating pattern's correct classification. These weights are a mathematical means of stor- ing information already known about the correct classification of observed pattern vectors. Moreover, the weights can be interpreted fran the point of view of the significance that the practitioner places on each data-vector component. A discussion of this interpretation of the dis- criminant-function weights appears in Section 3.2. Central to the use of linear discriminant functions is the assump- tion that the space of observable patient data vectors is linearly separable, for by definition [24], a pattern space A is linear and its subsets of patterns AAl ,... ,A are linearly separable if and only if linear discriminant functions g ,g2,... ,g exist such that for all a in A. g. (a) >gj (a) for all i=1,2,... ,p, j,2,...,p, ji. In the context of diagnostic classification, the assumption of linear separability implies that there exists a set of hyperplanes that parti- tion the space of observable patient data vectors into convex homogeneous regions, each region representing a unique diagnostic classification. Rosen [26] has provided a restatement of this assumption in the require- ment that the sets of data vectors corresponding to each diagnostic al- ternative have non-intersecting convex hulls. In either form, this is a fairly restrictive assumption on the dispersion of patient data vec- tors (see Section 3.2). Selecting the 'weights' for each of the discriminant functions is a process known as 'training.' For the linear non-parametric classifier, training generates each discriminant function's wjk's by applying a sys- tematic algorithm to the members of a set of representative patterns with pre-established classifications. Nilsson [24] discusses several algorithms suitable for training the craniofacial-pain diagnostic classifier. In the course of using these algorithms for model development, a new 'mod- ified fixed-increment' training algorithm was constructed (see Appendix B). Employing the new algorithm has resulted in a reduction of approx- imately 35% in the amount of training time required to derive the weights for the craniofacial-pain classifier. Symbolically, the craniofacial-pain diagnostic classifier, with its set of trained weights, can be represented in the following format: let a. = the 296-dimension data vector describing patient 'i' aik = the kth element in the data vector describing patient 'i', whose value is either zero or one, k=l,2,...,295 (by definition ai,296=1) Cj = diagnostic alternative 'j', j=l,2,...,17 dij = the value of the discriminant function for diagnostic alternative 'j' generated by the data vector of patient 'i' W. = the 296-dinension vector of weights associated with -J diagnostic alternative 'j' Wk = the k element in the weight vector W., jk -3 that is a = [ailai2,...ai295 Wj =I [wjl' j2 ,wj295'wj2961 and 296 d. = a.W. ai jk di -3-73 k= 1 ik jk where T denotes vector transposition. Patient 'i' is classified in diagnostic alternative Cj when d i>dis for every s/. If m.x di is not unique, then it is not yet possible to classify patient 'i' into one of the diagnostic alternatives. Treatment is prescribed for severe synptcas and classification is attempted at a later date. Data from four sources were used to construct and verify the diag- nostic-classification model, as well as the treatment-planning model presented in Chapter 4. Contributions of clinical records came from the dental schools at the universities of California at Los Angeles, Florida, Illinois, and Indiana. In all, the records of 250 patients, involving a total of 480 patient-practitioner interactions, form the data base for model building and validation. The relevant information from each of these patient visits has been recorded in the data-vector format of Appendix A. A diagnostic classification from Figure 3 was assigned to each of these patient data vectors by either Dr. Thomas B. Fast, Chairman of the Division of Oral Diagnosis, or by Dr. Parker E. Mahan, Chairman of the Department of Basic Dental Sciences, at the College of Dentistry, University of Florida. With this basic structure for the diagnostic-classification model, the classified patient data vectors, and the training algorithm presented in Appendix B, an initial test was performed to verify that the space of observed patient data vectors was separable by linear discriminant func- tions. Application of the modified fixed-increment training algorithm to the set of 480 data vectors verified this requirement, as the algo- rithm terminated in a set of feasible discriminant-function weights. Using the discriminant functions these constants determine, it is possi- ble to duplicate the pre-established diagnostic classifications for each of the patient data vectors. This first test of the diagnostic classifier established that a non- parametric classifier could be employed to reproduce the original clas- sifications for each data vector used in model construction. However, this test does not reveal how well the classification model will perform on patient data not employed in developing the discriminant-function weights. The remainder of this section, and Section 3.3, address the question of how the diagnostic classifier performs on 'new' patient data vectors, that is, vectors that have no duplicate in the training sample. Model training has created a set of weights that, by the definition of the training procedure, correctly classify every patient data vector that lies within the bounds of the training-sample pattern-class convex hulls. Since every data vector is a binary vector, new patient data vectors must fall outside the convex hulls established by the training- sample vectors. Yet, if new data vectors have a number of data-vector elements that are identical to those of the training-sample vectors with the same diagnostic classification, then this relationship will be reflected in a 'close proximity,' as measured by a Euclidean-distance function, between each new vector and its associated training-sample convex hull. Given this close proximity, the classifier's discriminant functions should correctly classify most new data vectors as these vec- tors will lie within or near the boundaries of the appropriate discrim- inating hyperplanes. Hence, the key to providing adequate classifier performance for new data vectors lies in devising data-vector-represen- tations of patient data for which the data vectors of a canron diagnostic classification exhibit strong similarity. In the introductory discussion of the elements of patient data used in the patient data vector, it was pointed out that an effort was made to select components of patient status that assist the practitioner in his selection of diagnostic classifications for a craniofacial-pain patient. Then these elements were partitioned to generate as much discriminating information as possible from each data element. In terms of the alter- nate diagnostic classifications, these elements of patient data were chosen so that all patients in any one diagnostic classification would have a unique combination of exhibited or non-exhibited data-vector ele- ments. Employing these carefully constructed qualitative data elements resulted in a set of 'natural' gaps in the vector representations of patient data from alternate diagnostic classifications. The fact that there are portions of the pattern space that cannot be occupied by any data vector, and partitions of the space where the vectors of each clas- sification must lie, assiststhe classifer in making correct classifica- tions of data not used in model construction. As Section 3.3 shows, this discussion is not meant to imply that the craniofacial-pain diagnostic classifier can, in its present state of development, correctly classify every new data vector. What has been stated is that a knowledge of the underlying classifying process can be employed in constructing the data vector examined by the classifier, and that fully utilizing this information will lead to a classifier that can be expected to be capable of performing well on new patient data. Of course, this discussion has been predicated on the separability of the underlying pattern space of data vectors. If this requirement is not met by same form of patient-data-vector representation, classifica- tion of patients by linear classifier is not possible. The next section of this chapter provides relationships between linear separability and the data that may be observed in a health-care system for which diagnostic classification by linear discriminants is being considered. This section has a dual purpose. First, linear sep- arability is couched in 'non-gemaetric' terms. Second, and more impor- tantly, using the craniofacial-pain health-care system as an example of the section's developments provides information about the suitability of the non-parametric classifier as a model of the decision-making pro- cess associated with diagnostic classification in this care system. 3.2 Alternative Interpretations of Linear Separability The criteria for pattern space separability are mathematically concise. Unfortunately, these separability criteria are not readily expressible in non-geometric terms. The discussion developed in this section provides the reader with scme non-geometric criteria that indi- cate when the use of a non-parametric pattern classifier should be con- sidered as a means of generating diagnoses for a medical or dental dis- order. The first criterion is associated with a probabilistic measure of symptom exhibition. Given a patient who exhibits sane set of symptoms S, non-parametric pattern classification requires that P[SIC] = 1 for the diagnostic alternative 'C.' that describes the patient's current diagnostic status, and P[S Ck] = 0 for all other diagnostic alternatives 'Ck.' However, assume that for the disorder in question the probability of exhibiting any relevant symptom has been calculated fran historical data, that is, estimates of P[siCjC] are available for all relevant symptoms si and all diagnostic alternatives Cj. Then, if the following decision rule leads to the correct classification of a majority of the patients with the disorder in question, utilization of a non-parametric classification model should be investigated: classify a patient who exhibits the set of symptoms S in the th j diagnostic alternative if T P[silCj] > P[silCk] for all kj. (1) s.iS s.iCS Since (1) holds if and only if log [T P[silCj]] > log [T P[silCk]] for all k-j, .eS siES decision rule (1) can be expressed in terms of logarithms. Let the set of symptoms S be represented as a row vector a with the elements of a assigned values as follows: ai = 1 if symptom s is an element of S and ai = 0 if symptan s is not an element of S, where n is the total number of relevant symptnos. Form the column vectors Wj = [log P[sllC], log P[s2|Cj],..., log P[snlCj]T Then log [ P[si.C.]] = aW., and decision rule (1) can be restated as s.cS classify a patient who is characterized by the vector a in the .th j diagnostic alternative if aW. > a for all kij. (2) Note that decision rule (2) is identical to the decision rule employed in non-parametric pattern classification. This equivalence implies that if (1) holds for every preclassified patient examined, the values log P[siC j] form a set of feasible discrim- inant-function weights. If (1) leads to the correct classification of a majority of the patients examined, it is logical to assume that there may be a set of feasible discriminant-function weights. This assumption was examined using the craniofacial pain patient data. From the data vectors classified in Diagnostic Alternatives 13, 14, and 15, a total of 189 patient visits, the P[siC .] were calculated. Each data vector was then classified with decision rule (1), and 164 of the data vectors (86.7%) were assigned to their pre-established diagnostic alternative. The second criterion provides a subjective measure of the feasibil- ity of using a non-parametric pattern classifier. If symptoms for most of the diagnostic alternatives, associated with the disorder of interest, can be isolated such that 1. a patient's exhibition of a subset of these symptoms leads the practitioner to a selection of one of the diagnostic alternatives, or 2. a patient's exhibition of a subset of these symptoms leads the practitioner to eliminate from further consideration one of the diagnostic alternatives, then the use of a non-parametric classifier as a means of generating classifications should be investigated. The linear non-parametric classifier employes a weighted sum of the symptoms exhibited by each patient in its discriminating functions. If symptoms can be isolated that are significant to the classification of patients with the disorder under investigation, then there is a 'natural' weight for each of these symptans in the decision-making pro- cess used by the practitioner. The existence of these natural weights increases the probability that a training algorithm will be able to find a feasible set of discriminant-function weights. Indeed, the relative importance of the significant symptoms may be reflected in the magnitude of the discriminant-function weights generated by the application of a training algorithm. As an example, the significant symptoms associated with two cranio- facial-pain diagnostic alternatives, Alternatives 4 and 14, were isolated by Dr. Fast. A comparison of these symptars and their associated dis- criminant-function weights revealed a high degree of correlation between symptom significance and discriminant-function weights, see Table 2. The reader should note that both of the criteria discussed in this section are heuristic approximations to the gearetric requirement for pattern space separability. However, if the disorder under investigation meets one or both of these criteria, it may be possible to employ a non- parametric classifier to diagnose the disorder since the requirement for pattern space separability is most likely met. TABLE 2 CORRELATION BEmCIE SIGNIFICANT SYMPTOMS AND DISCRIMINANT-FUNCTION WEIGHTS Diagnostic Alternative 4: Temporamandibular Joint Arthritis-Traumatic (Acute) Discriminant-Function Significant Symptoms Weights (+) Duration of Pain (less than 3 weeks) + 3 (+) History of Trauma (accidental) +30 (+) Preauricular Pain +11 (-) Salivary Gland Disease -12 (-) Otitis 1 (discriminant-function weights for Diagnostic Alternative 4 range from -19 to +37) Diagnostic Alternative 14: Myofacial Pain-Dysfunction Bruxisn Discriminant-Function Significant Symptams Weights (+) Duration of Pain (more than 6 weeks) +15 (+) Facets + 2 (+) Bruxism and/or Clenching +56 (-) History of Trauma (accidental) -16 (-) Salivary Gland Disease 5 (discriminant-function weights for Diagnostic Alternative 14 range fran -23 to +56) Note: For both Diagnostic Alternatives (+) indicates a symptom that leads the practitioner to classify a patient in that diagnostic alternative (-) indicates a symptom that leads the practitioner to classify a patient in saoe other diagnostic alternative A 3.3 Mbdel Validation Validation of the craniofacial-pain diagnostic-classification model presented in Section 3.1 has been accomplished by three types of validating procedures. The discussion presented in the preceding sec- tions, and in particular the relationship between significant symptans and their associated weights shown in Table 2, reveal a close proximity between the decision-making process the practitioner utilizes and the non-parametric classifier's symptam-weighing scheme. This section pre- sents two other procedures employed in evaluating the diagnostic clas- sification model's performance. The first procedure involved testing the diagnostic accuracy of the classification model on patient data that were not employed in model construction. Six classification tests were run in sequential order. In the first five of these tests random samples of 50 patient-data-vec- tors were drawn frcm the data base of 480 vectors discussed in Section 3.1. Then, as each of the tests was performed, the training algorithm in Appendix B was applied to the remaining 430 data vectors. With the weights derived from the training algorithm, the sample of 50 patients was classified. The model-generated classifications for each of the data vectors were compared to the classifications assigned to the vectors when they were created. As each test classification of a sample was completed, the diagnostic classifier's discriminant-function weights were set equal to zero, the sample of data vectors was returned to the data base, and the next test's random sample was drawn. A summary of the re- sults of these tests of diagnostic accuracy is presented in Table 3. In each of the first five tests it was possible for a patient who has had multiple practitioner-visits to have same of the vectors repre- TABLE 3 TESTS OF DIAGNOSTIC CLASSIFIER ACCURACY Number of Patient Data Vectors 50 50 50 50 50 51 ONE TWO THREE FOUR FIVE SIX Number of Data Vectors Correctly Classified Classifier Accuracy 92.0% 90.0% 88.0% 94.0% 90.0% 84.3% Mean Classifier Accuracy 89.7% Standard Deviation of Classifier Accuracy 3.5% TEST TEST TEST TEST TEST TEST senting these visits in a test's randan sample and sane vectors used in model construction. Such occurrences lead to test results that over- estimate classifier accuracy. Hence, in Test Six, a random sample of all of the patient data associated with 40 patients (a total of 51 patient data vectors) was selected. This sample was classified by the diagnostic-classification. model using the remaining 429 data vectors as a data base. The results of this test are included in the data shown in Table 3. There is one other possible factor affecting the classifier's accuracy as measured by these tests. It is conceivable that there were duplicate data vectors in the data base of 480 patient-data-vectors. If duplicates do exist and were included in both the test samples and the samples' training bases, measures of classifier accuracy will be overly optimistic. However, since 'noise' is introduced by the variabil- ity among craniofacial-pain patients and generated in the practitioner's transcribing of the elements of patient data into the data-vector format, 295 and since there are 2295 possible data vectors, the probability that two or more of the data-based patient vectors include an identical specifica- tion of data-vector elements is small enough to justify neglecting this possibility and its effects. The results sunmarized in Table 3 reveal that the diagnostic-clas- sification model performs well in duplicating the diagnostic classifica- tions originally assigned by the reviewing practitioners, Dr. Fast and Dr. Mahan. Moreover, the size of the test samples was quite large in relation to the data base employed in developing each test's diagnostic model. As new data became available and are incorporated in the para- meters of the model, the accuracy of the craniofacial-pain diagnostic classifier can be expected to increase slightly. The second validating procedure established a measure of variability on the diagnostic classifications that might be given by different dental practitioners. The discussion presented in Section 1.1 related the dif- ficulties associated with diagnosing craniofacial-pain disorders. Prac- titioners with varying kinds of professional experience can be expected to reflect their dissimilar backgrounds in differing diagnostic classi- fications for these patients. To measure the variability associated with dissimilar backgrounds, five craniofacial-pain data vectors were selected from the data base employed in constructing the craniofacial-pain diag- nostic classifier. Four dentists frcm the staff of the College of Den- tistry at the University of Florida were asked to review these patient data vectors and assign to each of them a diagnostic classification. Table 4 summarizes their assignments and also includes the diagnostic classification originally given by the reviewing practitioners. The variability in diagnostic assignments reflected in Table 4 re- affirms the justification for the research objectives set forth in Section 1.2. Some of the differences in the practitioners' choices of diagnostic classifications can be explained by the limited amount of data contained in each of the data vectors, and the less-than-full med- ical statement of each of the diagnostic alternatives. Nevertheless, a diagnostic-classification model that generates classifications that are in 90% agreement with those of experts in the field provides a sizeable improvement over the variability in classification assignments exhibited in Table 4 in which only half the respondents agreed on a single diag- nosis in four out of five cases. TABLE 4 CLASSIFICATION VARIABILITY AMONG DENTAL PRACTITIONERS Diagnostic Classification for Patient 1 Patient 2 Patient 3 Patient 4 Patient 5+ Original Classification 4 13 15 15 9 Practitioner 1 1 7 15 15 3 Practitioner 2 6 12 15 8 3 Practitioner 3 4 15 15 15 13 Practitioner 4 4 15 15 14 * * No classification given + Patient 5 exhibited a minimal amount of input data (only 17 non-zero data-vector entries) These four dental practitioners exhibited 100.0% agreement of the diagnosis on one of the five patients, and 50.0% agreement on the diagnostic classification of the remaining four patients. 3.4 Minimum-Cost Symptom-Selection Algorithm The craniofacial-pain diagnostic-classification model detailed in the previous sections of this chapter has been structured upon the data vector of the 295 relevant signs, symptoms, and items of patient history shown in Appendix A. To utilize this model, the practitioner must ex- amine a patient for the presence or absence of each of these data vector elements. Although the cost in time and fees varies fran item to item, there is an expense to the practitioner, and to the patient, associated with checking each element in the data vector. Hence, it is logical to investigate the possibility of finding a reduced data vector that 'costs' less for the patient and practitioner to use and yet still permit cor- rect classification of all craniofacial-pain .patients. A review of the literature (see Meisel [23] Chapter 9 for a survey) reveals that many authors have considered the task of selecting a set of features to be used in a pattern-classification scheme. Traditional methods of viewing this problem are based on a search for a transforma- tion that takes a given set of patterns into scme 'new' pattern space where separation by discriminant functions is possible. Measures of pattern class separability are employed to evaluate the effects of transforming the set of patterns from one space to another. In general, these transformations take a pattern representation in 'n' features and create a set of 'r' (r 'new' features are linear combinations of the original features. How- ever, to reduce the 'costs' associated with using the craniofacial-pain diagnostic classifier, a transformation must be found that decreases the size of the data-vector pattern space by eliminating features rather that combining them. For example, assume patients were diagnosed on the basis of body-temperature and blood-pressure readings. Traditional techniques for feature selection might employ a linear combination of body temperatures and blood pressure measurements as one 'new' feature. The transformation sought in this investigation would lead to the clas- sification of patients by either body temperature or blood pressure alone if this were possible. This example will be used again in Section 3.4.1 to illustrate the algebraic and geometric structure of the problem. Nelson and Levy [27] have attacked the problem of selecting a re- duced set of unaltered features for use in a classification scheme. These authors attach a cost to the use of each available feature, and employ a ranking scheme to measure each feature's discriminating power. Then, under a restriction on the total cost of features employed, they develop an algorithm that selects the set of features that maximizes the classifier's discriminating power. Unfortunately, their scheme does not guarantee the selection of a subset of original features that contain enough 'information' to permit pattern class separation by discriminant function. Therefore, a new algorithm is presented in this section that minimizes the cost of the set of features used by the pattern classifier yet insures that all patterns can be correctly classified by a set of linear discriminant functions. In the remainder of this section the more general terms 'feature,' 'pattern,' and 'pattern class' will be used respectively to represent a data vector item, a patient's data vec- tor, and a diagnostic classification. The problem of finding a minimum-cost collection of features would not be considered if there did not already exist a set of 'n' features by which the patterns under examination could be correctly classified by linear discriminants. That is, given a 'n' dimensional representa- I tion of each of the 'm.' patterns in each of the 'p' pattern classes m m m m i = [ail ,ai2 ,...,ain ,], ml-1,2,...,mi, i=-,2,...,p, where m amc k=l,2,...,n, equals either zero or one, there must exist a set of 'n+1' dimensional W.'s, j=l,2,...,p, such that -J m a." (W.-W.) > 0 for all m=l,2,...,m. (3) i=l,2,...,p j=1,2,...,p jVi. Letting A. be. the mi- (n+l) dimensional matrix of patterns in pattern- class i, then the requirement of (3) can be written in the following form: A(W.-.) > 0 i=1,2,...,p j=l,2,...,p . j3i. If such pattern representations and W. 's exists, then a solution to the -3 following problem yields a minimum-cost collection of pattern-classifying features: P1: minimize CX subject to Ai[X O(Wi-Wj)] > 0 i=l,2,...,p j=1,2,...,p j3i I S1 1 1 ai ai2 ... a. 1 where A. = i in _l2 2 2 a 2 a 2 a 1 * m. m. m. a. i a. L .. a. i 1 11 1ii ...2 in Wi = [wil' i2'''' win' in+1 C = [C1C2,...,cn,0] X_ = [x1,x2,...,xn,1T and ik is an unrestricted variable cj is the cost of using feature j x 0 if feature i is not used 1 if feature i is used Note: The [ notation is to be read as element by element multiplication i.e., QOR = S [si] = [q.ijr.ij]. 3.4.1 Algorithm Development The algorithm developed to solve problem P1 is an enumerative algorithm similar in structure to that of Balas [28]. Unfortunately, the non-linear nature of problem Pl's constraints prohibits full imple- mentation of the more powerful techniques used in implicit enumeration on linear integer problems. The structure of these constraints and their effect on the optimization of P1 will be discussed in a step-by- step development. The minimum-cost feature-selection algorithm does not solve P1 to the extent of finding the values of the vectors W., i=l,2,,,.,p. This 2. algorithm does find the minimum-cost collection of features X* and the total cost associated with using these features, and guarantees the existence of W. vectors associated with this optimal feature set. Given --1 this guarantee, the modified fixed-increment algorithm frcm Appendix B * can be employed to find the vectors W., i=l,2,...,p. Choose same solution to P1. By hypothesis there exists at least one solution (X,Wi,W2,... ,W) to P1 where X = [1,1,...,1,1]. Suppose there is sauce other solution (X, 2... ,W') where one or more elements xi in the X vector are equal to zero. For the constraint matrices in P1, A. [X [ (W -W.) > 0 i=1,2,...,p j=1,2,...,p jVi. If the matrix products [A. X] = A., i=l,2,...,p are constructed, then each set of constraints in P1 can be written in the form (Wi-W.) > 0 i=l,2,...,p (4) j=1,2,...,p j i. The creation of the A. is called the zeroing process. Of the col- 1 umns of A., A. retains all columns j of A. where x. = 1, and substitutes 1 1 a column of zeros for each of those columns k in Ai where xk = 0. Using the zeroing process, the feasibility of any possible solution vector X to P1 can be examined in terms of the A. O X this vector X creates. As an example of the zeroing process for a particular set of patterns, let a be a two-dimensional patient-data-vector a1 = [aai] where [ 2 i if patient i has normal body temperature 1 if patient i has abnormal body temperature and i O if patient i has normal blood pressure 2 1 if patient i has abnormal blood pressure . Assume two diagnostic categories, X and Y, where data vectors a and 2 1 2 aX are reclassified in category X and data vectors y and a are pre- classified in category Y. 1 2 1 2 If a = [1,0], a = [1,11], = [0,0], and a = [0,1] then =[ 0 and A = 01. Graphically the pattern space can be represented as 2 2 t pressure 1 1 a ax . temperature Consider the vector X= I then [Ax X] = [1 1 and [A X] =[0 Graphically the pattern space, as transformed by X can be represented as pressure 12 12 temperature The vector X effectively creates a representation of each patient data vector in terms of the patient's body temperature alone. Note that relation (2) is the requirement for pattern separability by linear discriminants. Hence, a vector X is a component in a feasible ^ A A ^ solution (X,W ,W ,...,W ) to P1 if and only if there exist W. i=l,2,...,p, -1K P-1 such that (2) holds for all ifj. As discussed in Section 3.1, a pattern space is linearly separable, and hence, feasible W. exist, if and only if the individual pattern classes have non-intersecting convex hulls. For the pattern vectors considered in this section, the individual components of each of the patterns in each pattern class are either zero or one. As there is a one-to-one correspondence between the individual patterns in a pattern class and the vertices of the pattern class's convex hull, the convex hull of a pattern-class Ai can be expressed as all convex combina- ^m Consider tions of the individual pattern-class vectors ai, m=l,2,... ,m.. Consider the following examples of the convex-hull representation of linear separa- bility. Assume a = [1,0], aX = [1,1], a = [0,0], and a2 [0,1]. Graphically this pattern space can be represented as 2 2 Feature 2 -Y *-0 X 1 1 Feature 1 1 2 where the line X from al to a2 represents the convex hull of pattern- 1 2 class X and the line Y from a_ to a_ represents the convex hull of pattern-class Y. Since X and Y do not intersect, implying that the space is linearly separable, it is possible to draw an infinite number of lines 0 that serve as discriminating hyperplanes. 1 2 1 2 Assure aX = [1,0], a = [0,1], a = [0,0], and a = [1,1]. Graphically this pattern space can be represented as 2 2 Feature 2 -- Feature 1 1 2 where the line X from a_ to a_ represents the convex hull of pattern- 1 2 class X and the line Y from a_ to ay represents the convex hull of pattern-class Y. Since the lines X and Y intersect, the pattern space is not linearly separable, and hence, it is impossible to draw a discri- minating hyperplane 0. Therefore, the following condition is equivalent to condition (4): A t a vector X is feasible to Pl if and only if there do not exist Us and U such that ^ t ^ UA = U A for any s=1,2,...,p (5) t=1,2,...,p s3t where .i i i i = [Ul,u2'"..um 1 uk > 0 for all k=l,2,...,m. and m. 1 uk = 1 for all i=l,2,...,p. k=l Checking the feasibility of some vector X by condition (5) yields [p(p-l)]/2 distinct subproblems. Each of these subproblems may be characterized as follows: AT ^T let A = A and A = B with A and B having columns a. and bj respectively for any A and At. m. m. P2: Find u. > 0, u.=l, and v. > 0, E3 v.=l i=l 1 3j=l 1 such that m. m. E1 u.a. = Z3 v.b. i=l 7 j=l 3 If such u. and v. exist for any one of the subproblems then X is not 1 J feasible to Pl. Because the number of subproblems is large even for a relatively small number p of pattern classes, there is justification for seeking methods to expedite the solution of each subproblem P2. To achieve this goal, a series of conditions will be presented that characterize same of the criteria necessary to the existence of a solu- tion to subproblem P2. In addition to establishing criteria for exis- tence, these conditions provide a means for reducing the size of the matrices A and B. This reduction will be discussed after the conditions are established. th k Condition 1: If the kth row of A has all el ments ai, i=1,2,... ,m equal to zero (one) and the kth row of B has all k elements bk, j=1,2,...,m, equal to one (zero) then no m. m. u.>0, 3 u.=l and v.>0, Z3 v.=l exist such that 1 i=l 1 3 j=1 3 Justification 1: m. m. 1 u.a. = E3 v.b. i=l 1 j=l 3 3 Under Condition 1 there is no set of convex combina- th th tions of the k row elements of A and of the k row elements of B such that the combinations are equal. Condition 2: Justification 2: Hence, there can be no set of convex combinations of the columns of A and of B such that the combina- tions are equal. Symbolically, m. m. since no ui> 0, Z u.=l and v.>0, E1 v.=l i=l 1 3- j=l 3 exist such that m. m. 1 u.a. = 3 v.b. i=1 j=l 3 3 m. m. no u.>0, El u.=l, and v.>0, E3 v.=l i=l j=l 3 exist such that m. m. Z1 u.a. = E3 v.b. i=l j=l 3 th k If the k row of A has all elements ai, i=l,2,...,mi, equal to zero (one) and the kth row of B has all elements bk, i=1,2,...,m., equal to zero (one), the kth row of matrixes A and B can be eliminated without loss of possible solutions to subproblem P2. Under Condition 2 every convex combination of the k row elements of A and of the kth row elements of B are equal. Hence, a set of convex combinations of the columns of A and of the columns of B are equal if and only if the convex combinations of the remaining rows th (all rows except the k row) are equal, Symrbolically, let aik denote the pattern a. whose k component has been eliminated and similarly let bjk denote the elimination of component k from pattern b., then as m. m. ZE u.a. = 3 v.b. , i=l 1 j=l 3 3 for any choice of m. m. u.>0, E1 u.=l and v.>0, EZ v.=l, Si=l : I j=1 I m. m. E3 u.a. = E3 v.b. i=l 1 j=l 3 if and only if m. m. EI u.a = v.b. i=l i j=l 3jk Condition 3: Justification 3: k If the kth rw of A has all elements ai, i=l,2,... ,mi, equal to zero, and some br equals one, m. m. no u.>0, 1 u.=l, and v.>0, v >0, E3 v.=l 1- i=l 1 r j=l 1 exist such that m. m. 1 u.a. = E3 v.b. i=l 1 1 j=l 31 Under Condition 3 any convex combination of the col- umns of B that includes a non-zero product of the thth column b results in a k row term greater than zero. The value of the k row term for any convex combina- tion of the columns of A is equal to zero. Hence, no set of convex combinations of the columns of A and B can be equal if the combination for B includes a specification that vr>0. Symbolically, A if v >0, then for any choice of vj, j=1,2,...,m., j3r, Condition 4: Justification 4: 47 m. A where v >0 and 3 v.=l r j=l m. k m k 3 v.b > E u.a. =0 j=l 3 3 i=l m. for any choice of u. such that u.>0 and i u.=l. 1 1- i=l 1 i=l m. Hence, if v >0, there exist no u.>0, u.=l r i=l 1 m. and v.>0, j/r, E3 v.=l such that 3 j=l 3 m. m. 3 u.a. = 3 v.b.. i=l I I j=l 3 3 th k If the kt row of A has all elements a., i=l,2,... ,mi, k equal to one, and some b equals zero, r m. m. no u.>0, I u.=1 and v.>0, v >0, 3 v.=l 1 i=l 1 3- j= 3 exist such that m. m. E1 u.a. = 3 v.b. i=l 1 j=1 3 3 Condition 4 is similar to Condition 3 in that any convex combination of the rows of B that includes a non-zero product of the rth column yields a kt row term whose value cannot equal any convex combination of the kt row elements of A. Symbolically, for any choice of u. and v., where v >0, m. m. 3 v.b. < u.a. = 1. j=l 3 3 i=l 1 Note that Conditions 3 and 4 can also be stated, and justified, with the role of the elements of the A and B matrices reversed. Given this set of four conditions, consider the following row par- tition of the A and B matrices:- A* B* Al B A= A" B= B1 A0 C AC. B0 A0 B0 where by appropriate change of rows in A and B 1. every element in each row of Al is a one 2. every element in each row of B, is a one 3. every element in each row of A0 is a zero 4. every element in each row of B0 is a zero. The partitions A, Bl, A, and B are the rows of A and B corresponding to B1, Al, B0, and A0, respectively, and A* and B* are the remaining rows of A and B. With this partitioning and the four previously established conditions, the size of the data vectors associated with many of the [p(p-l) /2 subproblems P2 can be significantly reduced. The reduction process, Procedure 1, can be stated in this manner: Step 1: If for same row k in Al (B1) each element in the corre- sponding row of B1 (A) is equal to one, then row k of A and B can be eliminated by Condition 2. Step 2: If for same row k in A0 (B) each element in the corre- 00 spending row of B0 (AO) is equal to zero, then row k of A and B can be eliminated by Condition 2. Step 3: If for scene row k in AO (B0) the corresponding row in B6 (AO) has all elements equal to one or if for same row k in A, (B) the corresponding row in B1 (A,) has all elements equal to zero, then this particular subproblem P2 has no feasible solution by Condition 1. Procedure 1 and the search for a solution to P2 are terminated at this point because the convex hulls of pattern-classes A and B do not intersect. Step 4: If for some row k in A, (B1) the corresponding row in Bc (Ac) has one or more elements equal to zero, i.e., k k k kk k b = b =...=b = 0 (a=a =...=at=0) then r s t r s t columns br,bs'... bt (ar,as, ...,at) can be eliminated by Condition 3. Step 5; If for same row k in A0 (B0) the corresponding row in B0 (A0) has one or more elements equal to one, i.e., k k k kk k br = bs =.=b = 1 (a=a =...=a =1) then columns b,bs,...,bt (ar,a s....at) can be eliminated by Condition 4. Step 6: If the use of Steps 1, 2, 4, and 5 has eliminated all elements of both matrices, then this particular subproblem has an infinite number of feasible solutions by Condition 2. Procedure 1 and the search for a solution to P2 are terminated at this point because the convex hulls of the pattern-classes A and B intersect. Step 7: If the use of Steps 1, 2, 4, and 5 has eliminated one or more rows or columns from either matrix then repartition the matrices and return to Step 1, otherwise terminate Procedure 1. In coding Procedure 1 for computer processing, there is no need to physically partition the rows of the A and B matrices. Summing the elements in any row of A or B reveals whether the individual elements in the row are all equal to zero or are all equal to one. Given this infor- mation, the steps from Procedure 1 determine whether a pattern is re- moved from A or B, whether a row in A and B is removed, or whether the procedure should be terminated because no feasible set of convex combina- tions for P2 exists. As an example of the use of Procedure 1 consider the set of matrices A and B in subproblem P2 were 0 1 1 0 1 1 1 S1 0 0 0 0 0 0 0 A= B= 1 0 0 0 1 1 1 0 0 1 1 1 1 1 1 0 In the first application of the steps of Procedure 1: 1. Column 4 can be eliminated from matrix A by Step 4 and 2. Column 1 can be eliminated from matrix A by Step 5. After the first application of the steps of the procedure 1 1 11 11 1 A 0 0 B 0 0 0 0 A= B= 00 1 1 1 0 1 1 1 1 0 In the second application of the steps of Procedure 1: 1. Row 1 can be eliminated from both matrices by Step 1 2. Row 2 can be eliminated frcm both matrices by Step 2 and 3. Column 4 can be eliminated frcm matrix B by Step 4. After the second application of the steps of the procedure 0 0 1 1 1 A= B= 1 1 1 1 1 In the third application of the steps of Procedure 1: 1. Row 2 can be eliminated frcm both matrices by Step 1 and 2. Procedure 1 can be terminated by Step 3. Hence, for this set of A and B matrices, subproblem P2 has no feasible solution. Although the use of Procedure 1 may lead to a reduction in the size of most subproblems, the pattern vectors (ai and bj) for each of these problems may still be quite large. Restating subproblem P2 as a linear program yields P3: minimize [0 0] subject to IA -B [u= [ 11...1 00...0 00...0 11...1 and U>0 V>0 where the existence of any solution vectors U* and V* signals the inter- section of the convex hulls of pattern-classes A and B. Consider the dual of P3, .written in the following form: P4: maximize [0 1 11 1 2 subject to A 01 1 HI -B l 0U 1 < I,,Xl X2 unrestricted in sign, Note that P4 may have many associated ir variables, but has only as many constraints as the number of patterns in A and B (as reduced by Procedure 1). P4 always has at least one solution to its constraint set. Thus, if an application of a linear-programming algorithm to P4 reveals the exis- tence of an unbounded solution, then P2 has no solution. Therefore, if and only if P4 has a bounded solution do ui and vj exist such that m. m. 1 u.a. = E3 v.b. i=l j=l where u. > 0, E u. = 1 i=l and . v. > 0, E3 v. = 1. 3 j=1 I The preceding discussion with its development of a reduction proce- dure and dual formulation provides the structure for a second procedure. Procedure 2 establishes a mechanism to verify the feasibility of any assignment of zeros and ones to the X vector of problem P1, see Figure 4. That is, given some vector X and a set of patterns a., in=l,2,...,m., and i=l,2,...,p, the [p(p-l1)/2 subproblems P2 are formed by zeroing out FIGURE 4 PROCEDURE 2 the appropriate pattern-vector elements. Then Procedure 1 is applied to each subproblem. Finally, for each pair of pattern classes the boundedness of the dual formulation P4 is examined. Vector X represents a feasible set of a pattern-classifying features for P1 if and only if each of the [p(p-l)]/2 subproblem formulations P4 is unbounded. Before a statement of the algorithm to solve problem P1 is presented several terms must be defined. The assignment vector is defined as a listing of variables xi, elements of the vector X in Pi, whose values have been determined by the steps of the algorithm. The elements in this vec- tor are recorded with the value of their assignment, either zero or one. These elements are entered in the vector in the order they were assigned, with the first algorithm assignment in the first (left) position. For example, consider the assignment vector [x4 = 0, 10 =1, x2 = 0]. This vector records that the algorithm first assigned x4 equal to zero, then assigned x10 equal to one, and its last assignment was x2 equal to zero. Feasibility of a solution X, as determined by the assignment-vector cc~ponent values, is checked by Procedure 2 with the value of those vari- ables not included in the assignment vector temporarily set equal to one. The value V of an assignment vector is defined as minus one times the sum of the costs associated with each of the variables in the assignment vector, multiplied by the value assigned to the respective variable. For the example assignment vector, [x4 = 0, x10 = 1, x2 = 0], where c4 = 5, cl0 = 2, and c2 = 7, the assignment vector has the value V= (-1)-[5(0) + 2(1) + 7(0)] = -2. 3.4.2 State.rnt of the Minimum-Cost Simptcm-Selection Algorithm Step 0: Create the assignment vector (at this point the vector is null as there is no variable assignment in the vector). Set V*=-= and go to Step 4. Step 1: Start at the right side of the assignment vector and move to left, stopping at the first variable assigned a zero value. If no variable in the assignment vector has a zero assignment, go to Step 2. Otherwise go to Step 3. Step 2: Calculate V for the assignment vector. If V is greater than V*, record the values of the variables in the assign- ment vector as the optimal solution X* to P1. Otherwise, record (as the optimal solution X* to P1) the values of the variables in the best current solution X. Terminate the algorithm. Step 3: Change the value of the variable isolated in Step 1 to an assigned value of one, and eliminate from the assignment vector all variable assignments to the right of this new assignment. If the assignment vector includes the assign- ment x.=l for every xi in X return to Step 2. Otherwise go to Step 4. Step 4: Select a variable xk that is not an element of the assign- ment vector. Assign this variable the value Xk=0 in the assignment vector. Use Procedure 2 to check the feasibility of this assignment. If the assignment vector is not fea- sible, go to Step 6. Otherwise go to Step 5. Step 5: If the assignment vector with the new assignment xk=0 does not include an assignment for every xi in X, return to Step 4. Otherwise go to Step 7. Step 6: If the assignment vector with the assignment Xk=l (xk is the variable selected in Step 4) does not include an assignment for every xi in X, return to Step 4. Otherwise go to Step 7. Step 7: Calculate V for the assignment vector. If V* is greater than V, go to Step 1. Otherwise go to Step 8. Step 8: Record as the best current solution X the values of the variables in this assignment vector. Set V*=V, and return to Step 1. Note that in the course of applying this algorithm all solutions are considered and the best current solution is replaced only when another solution has a larger associated value. As the number of possible solutions is finite, the algorithm must terminate, and at this termination the value of the optimal solution and its assignments are known. An application of the minimum-cost symptcm-selection algorithm is presented in Appendix C. 3.4.3 Computational Considerations Returning to the setting of diagnostic classification of craniofacial- pain patients, application of the minimum-cost symptom-selection algorithm 295 would require an enumeration (explicit or implicit) over 22 possible solutions in order to find the optimal collection of data-vector elements. As the number of possible solutions is prohibitively large, heuristic modifications to the symptan-selection algorithm are required for this application. One possible modification could employ the fact that only a few of the elements in the patient data vector have large associated 'costs' for their utilization. In particular, the eight elements of radiographic data and the two measures of emotional trauma are significant- ly more 'costly' to examine than the other items in the data vector. With this modification, the algorithm would only consider eliminating these ten high cost features. Another heuristic approximation to the optimal collection of features might rank the data-vector elements in order of descending cost of utilization. Procedure 2 would then be used to eliminate these components one by one, starting with the item of high- est cost, until the procedure signaled an infeasible solution to P1. Cer- tainly, other heuristics might also be developed to exploit the structure of this algorithm. 3.5 Model Applications The structure of the craniofacial-pain diagnostic-classification model permits model utilization for a variety of purposes. Since the model is developed in terms of general data-vector and diagnostic-alterna- tive parameters, these model components can be altered to suit the appli- cation in question. This section presents a brief discussion of sane of the possible applications of the diagnostic classifier. In a teaching environment, the diagnostic-classification model with its set of discriminant weights can be stored for ccmputer-terminal ac- cess. Then, on a set of tutorial example patients, students can compare their diagnoses with those of the diagnostic model. Moreover, the student can interact with the classifier in constructing his own 'sample' patients for the classifier to diagnose. Finally, the student can request the classifier to relate those discriminant-function weights that the model employs in considering the 'significance' (Section 3.2) of any one or group of symptoms. The effectiveness of new diagnostic tests can be evaluated using the minimirn-cost symptams-selection algorithm. This algorithm provides an immediate nieasure of the 'worth' of new research developments. Given a cost for employing a new test, the algorithm returns an evaluation of the test's classifying capability. The algorithm reveals whether the test is included in the mininum-cost collection of features and whether the use of the new test permits the practitioner to discontinue other examination procedures. Additionally, the algorithm can be employed to point out new areas for research, as it isolates diagnostic alternatives where correct classification of patients is difficult using existing tests and procedures. As employed in the practitioner's office, the diagnostic classifier will provide a direct link between the practicing dentist and the kn3w- ledge of experts in the field of craniofacial pain. Information will flow over the link in both directions. As new patients are seen by the practitioner, the record of each visit will be reviewd by experts and then used to supplement the data base employed in model construction. Then, when developments dictate, new sets of discriminant-function weights can be transmitted to the dental practitioners. This kind of interaction results in a more accurate and representative diagnostic classifier as the patient-sample data base becomes larger. CHAPTER 4 TREATMENT PLANNING The selection of treatment regimens for craniofacial-pain patients is modeled as a 4arkovian decision process. The states in this Marko- vian model are descriptions of a patient's health-care status and the decision alternatives are feasible treatments for the patient's dys- function (see Section 4.1). In the first two sections of this chapter, motivation for the rodel structure is provided and the components of the decision model are developed. The third section provides a descrip- tion of the validating procedures used to determine the appropriateness of the model and the model-generated treatment decisions. This chapter closes with a discussion of potential teaching, research, and private practice applications of the treatment-planning model. 4.1 Model Components Several model-building components frcn the craniofacial-pain care system are isolated to permit the construction of a Markovian represen- tation of this system. A set of state descriptions that characterize, for decision-making purposes, the status of craniofacial-pain patients is presented in Section 4.1.1. Then transition probabilities measuring the effects of treatment applications are discussed in Section 4.1.2. Section 4.1.3 overlays the model's state descriptions and transition probabilities with costs accrued during the patient's progression through the care system. These components are integrated and verified in the discussions of Sections 4.2 and 4.3. Values for many of the treatment-planning model's parameters were gathered from the set of patient records discussed in Section 3.1. As the patient histories from the contributing university dental clinics were reviewed, notations of treatment applications and time between suc- cessive visits were made for each patient-practitioner interaction. The values of the remaining model parameters were either estimated by the reviewing practitioners, Dr. Fast and Dr. Mahan, or were gathered from responses to questionnaries completed by patients who visited the University of Florida's Dental Clinic. In modeling the complicated pro- cess of care for craniofacial-pain patients, several simplifying assump- tions were made. This section provides the motivation for these assump- tions and presents the notation employed in the analytic description of the treatment-planning process. 4.1.1 Patient States In general, a Markovian system structure requires that the current state of the system completely characterizes the probabilities associated with future state occupancies of the system. To fully satisfy this Markovian condition for state structure in the craniofacial-pain treat- ment-planning nodel would require that the model include as distinct mod- el states every possible combination of diagnostic classifications a pa- tient might have occupied, in conjunction with every combination of treat- ment applications he might have undergone, during his stay in the care system. Unfortunately, such a model would have an infinite number of 'patient states.' However, for a majority of craniofacial-pain patients the know- ledge of a patient's prior treatment record, coupled with his current diagnostic classification, is adequate to determine his prior diagnostic classifications. Even in the cases where the current classification and prior treatment record do not provide a total description of a pa- tient's condition, these elements of patient status do provide signifi- cant information about the probabilities associated with a patient's future status in the care system. For example, in the data employed in model construction, 47 craniofacial-pain patients occupied Diagnostic Alternative 15 and were treated with an application of drugs at least once. Eight of these patients were 'well' after a first treatment with drugs, while 39 required multiple applications of drugs or other treat- ments during their stay in the system. Yet of the 12 patients who were given two applications of drugs, 9 were 'well' following the second repetition of drug therapy. Thus, while the overall data-based transi- tion-probability estimate for a transition from Diagnostic Alternative 15 into the well state following any one application of drugs is .36, the transition-probability estimate for a transition into the well state following two successive applications of drugs is .75. Hence, for this diagnostic classification, information on the prior application of drugs is important in determining a patient's future status in the care system. This form of 'current diagnostic classification augmented by treat- ment record' patient-state description is employed in the craniofacial- pain treatment-planning model as an approximation to a 'true' Markovian state structure. Each of the diagnostic alternatives shown in Figure 3 forms the basis for a collection of patient states. The diagnostic al- ternative is augmented with a record of treatments that have been applied since the patient entered the care system. Appendix D provides a list of the treatment alternatives that may be prescribed for craniofacial- pain patients. The record of each treatment given to the patient is noted in the patient-state descriptions without regard to its chronological order. For example, a patient's occupation of the state 'JI1,2,2' denotes that he is currently classified in diagnostic alternative J, and that since he entered the care system he has been treated with one application of treatment 1 and two applications of treatment 2. .Augmenting the patient-state descriptions with treatment history expands the dimensionality of the state space, yet the number of history- augmented states remains finite for two reasons. The treatment records used in model construction reveal that, for sane combinations of diag- nostic alternatives and treatment applications, there is a feasible limit to the number of treatment repetitions that can be given to any one patient. Thus, the first reason for a finite state space is that no patient state in the treatment-planning model includes more repetitions of a particular treatment than the clinical data have established as a feasible limit. As an example, the records of patient visits used in model construction establish a feasible limit of only one application of treatment 18 for patients classified in any of the diagnostic alter- natives. Therefore, the treatment-planning model includes patient states that exclude treatment 18 as a portion of their treatment history or exhibit the form 'JI. .. ,18,...' for each diagnostic classification 'J' where 18 is a feasible treatment. The second reason for a finite state space is that there is a 'boundary application' of many treatments such that neither the treatment-record data nor the reviewing practitioners established differences between the transition probabilities for the boundary application and those for further repetitions of the treatments (see Section 4.1.2 and Appendix E). In Diagnostic Alternative 13, for example, the first application of treat- ment 24 is the boundary repetition of that treatment. Hence, multiple repetitions of treatment 24 are not added to the state description of patient states based on Diagnostic Alternative 13, as the additional information on multiple applications does not influence transition pro- babilities associated with this treatment's effectiveness. Thus, a second application of treatment 24 for a patient who continues to be classified in Diagnostic Alternative 13 places the patient in a state of the form '131 ...,24,.... The craniofacial-pain treatment-planning model includes two terminal patient states in addition to the patient states that are based on diag- nostic alternatives. One or the other of these two terminal states, 'well' or 'referred,' represents the patient's status when he exits the care system. A patient exists the system in the 'well' state when the effects of treatment applications result in sufficient improvement so that no further treatment is required. The patient moves into the 're- ferred' state in lieu of further treatment. This alternative to treat- ment is selected when the 'expected costs' of remaining in the care sys- tem exceed the costs of referring the patient to another source of care (see Section 4.1.3). 4.1.2 Transition Probabilities Patient-state transitions that involve a change of diagnostic clas- sification follow one of two basic formats, see Figure 5. For the initial diagnostic classifications in Format I, with each treatment application, the patient either remains in his original diagnostic classification or he transits into the well state. For Format II, the six diagnostic al- ternatives shcwn in the lower illustration form a different structure. Format I Patients whose first-visit diagnostic classification is Diagnostic Alternative 1, 2, 3, 4, 5, 6, 10, 11, 14, 16, or 17, make transitions out of their original classification 'I' according to the following figure: Format II For patients originally classified in Diagnostic Alternative 7, 8, 9, 12, 13, or 15, the following kinds of diagnostic-classification transitions are possible: FIGURE 5 DIAGNOSTIC-CLASSIFICATION TRANSITIONS Here it is possible for the patient to alternate between any one of several diagnostic classifications during the course of his stay in the care system. Note that in both formats for diagnostic-classification transitions a patient moves into the referred state not as a result of a treatment application, but rather as an alternative to further treat- ment. To these underlying diagnostic-classification transitions the cranio- facial-pain treatment-planning model adds a record of the changes in treatment history. Appendix F displays complete charts of all of the diagnostic-alternative-based patient states included in the treatment- selection model. In these charts the patient states are connected by arcs that represent feasible transitions from one state to another. Not shown in the charts are the well and referred patient states and the arcs that connect every diagnostic-alternative-based state with these terminal states. Howard [25] establishes that in terms of the policy decisions gen- erated by a Markovian decision model, holding-time distributions are im- portant only insofar as they affect the mean weighting time in each sys- tem state and the expected costs of each state occupancy. The records of the patient visits employed in model construction revealed that, in the care of the patients described by the data, one or more treatments were prescribed at each visit, and a series of return visits was scheduled for the patient following his initial interaction with the practitioner if return visits were warranted. Under these conditions, specifying holding-time distributions for the time between successive patient-state transitions does not refine the model. Therefore, the treatment-planning model employs a Markovian rather than semi-Markovian representation of the care system, since a 'n' visit holding time in a particular patient state can be modeled with no loss of information as 'n' repetitions of the 'virtual' transition frcm the state in question to itself. Care for craniofacial-pain patients is modeled as a discrete-stage Markovian sys- tem with the beginning of visits to the practitioner serving as stage indicators. Using the history-augmented patient states, transition probabilities are specified in terms of the treatment that generated the transformation. In making a state-transition following a treatment, a patient must move to a state that includes that treatment as a portion of its state descrip- tion. For example, following application of treatment 'k,' a patient must progress frcm patient-state 'IIm,n' to 'JIk,m,n' where 'I' may be equivalent to 'J.' The only exception to this rule is in the application of a treatirnrt beyond its boundary number of repetitions. Here, if treat- nmnt 'k' has a boundary number of two, then following an application of treatment 'k' three or more times a patient progresses from patient state 'IIk,k,m,n' to 'JIk,k,m,n' where again 'I' may be equivalent to 'J.' This structure is indicated because inclusion of more than the boundary number of applications (two in this case) in the state description does not affect the transition probabilities. Estimates of the values of the transition probabilities were ob- tained from the patient records discussed previously. A discussion of the stability of these probability estimates under variations in patient data is presented in Appendix E. Where the data on the effects of treat- ment alternatives were limited, the data-generated probability estimates were refined by estimates frnm the reviewing practitioners. Notationally, transition probabilities are represented in the analytic model in the following form; pk = the probability of making a transition from patient-state 'I' to patient-state 'J' following the application of treatment-alternative 'k.' 4.1.3 Cost Structure A patient's progression through the craniofacial-pain system gener- ates a niltitude of implicit and explicit costs. The explicit costs can be measured in terms of the dollar charges paid by the patient or the practitioner during the patient's stay in the system. Other costs are implicit in nature and can be quantified only as they relate to the 'opportunities' lost by the patient and the practitioner %wile the pa- tient remains in the care system. For modeling purposes four major system costs have been isolated. These costs are: (a) Cost of treatment applications (b) Cost of the practitioner and his staff's services (c) Cost to the patient of occupying a non-well patient-state (d) Patient-referral cost. Although these costs do not encompass all of the system costs, they mea- sure significant explicit and implicit charges associated with a patient's stay in this system. In the treatment-planning model, each of these costs is charged on a per-patient-visit basis. Costs of the various treatment applications and the costs associated with the practitioner and his staff's services were estimated by the re- viewing practitioners. Estimates of treatment and care-system service costs were partitioned by diagnostic classification as well as treatment category. The cost estimates reflect typical charges in a dental clinic environment. The inconvenience experienced by a patient in making a visit to the practitioner was used as a measure of the cost of occupying a 'non-well' patient state. Estimates of this inconvenience cost were gathered from responses to a questionnaire completed by patients at the University of Florida's Dental Clinic. These were general dental patients not neces- sarily suffering from craniofacial pain. Figure 6 shows the distribution of these patient estimates. Values for patient-referral costs were composed of the sum of three distinct estimates. The first component was an estimate of the total fee charged by the practitioner receiving the referred craniofacial-pain patient. Record transferral and duplication costs, as well as the fees lost by the referring practitioner, formed the second component. The third component of the patient-referral cost is a measure of the incon- venience experienced by the referred patient, a value estimated by using a multiple of the value of the inconvenience cost discussed in the pre- ceding paragraph. Appendix G provides a justification for using this particular combination of components in the referred-cost estimates. Symbolically, the patient-state transition costs (negative constants) are represented in the analytical model as k c j = the sum of the costs generated by the transition from patient-state 'I' to patient-state 'J' following the application of treatment 'k.' This sum includes the type (a), (b), (c), and (d) costs appropriate to each patient-state transition. Fifty-eight patients at the Unive-rsity of Florida's Dental Clinic responded to the following questions: How much would you estimate that this trip to the Dental Clinic cost you in terms of lost wages, baby- sitting fees, transportation costs, and other costs that you may have had to pay so that you could be here for your appointment? The distribution of these. estimates is shown in this histogram. Number of Respon- dants 0.- 1.- 10.- 20.- 30.- .99 9. 19. 29. 39. 40.- 50.- 60.- 70.- 80.- 49. 59. 69. 79. 300. Dollars The mean value for these 58 estimates of patient-visit inconvenience costs was $30.72. FIGURE 6 PATIENT-VISIT INCONVENIENCE COST 4.2 Selection of Optimal Treatments The craniofacial-pain treatment-planning model is transient in the sense that only two of the model's patient states, well and referred, can represent the patient's status when he exits the health-care system. In a stochastic sense, only the terminal states' are recurrent as they alone possess non-zero long-run probabilities of state occupancy. Hence, the choice of treatment alternatives at each patient state is made with the goal of minimizing the costs accrued by the patient as he passes through the diagnostic-alternative-based patient states into one of the recurrent states. For notational convenience, in the analytic model the well patient state is denoted as state 'W' and the referred state as state 'R.' In modeling the care system for craniofacial-pain patients there is no justification for providing costs for the transitions from states 'R' and 'W' to themselves, hence, 'cR,R and W' are set equal to zero. Analytically, the treatment-planning model is made monodesmic; i.e., having only one recurring state, by defining pR,W=1 and p WR0. The total number of states, not including states 'W' and 'R,' is denoted by 'S.' With these definitions and the notation introduced in the previous section, a procedure for selecting the set of optimal treatment decisions is developed. Howard [25] has shown that for a monodesmic, transient Markovian decision model, a set of optimal decisions is defined as those decisions that maximize the expected-value 'v of occupying each system-state 'I.' Since the treatment-planning model for craniofacial-pain patients fits into this category of decision model, a modification of Howard's algorithm is employed to select optimal treatment regimes. The process of select- i ing an optimal set of treatments is accomplished by finding the set of I treatment alternatives kl,k2,... ,k that maximize each of the vI (the expected value of occupying patient-state 'I' given treatment alternative 'k ') where kI k k\ Iv = r + p P v I=1,2, ...,S all patient states J and kp kI kI I all patient P states J kI With treatment-augmented patient states, maximizing the v can be carried out in the following manner: 1. Group for simultaneous analysis all patient states possessing a common treatment history, where one or more of the treatments in this history are at their boundary level. Each of the 'T' sets of states complying with this description forms an analysis set B., j=1,2,...,T. 2. Label sequentially the patient states, starting with state W as 1, state R as 2, and then selecting numbers for the remaining unlabeled patient states on the basis that the one with the most treatments in its history receives the next number-label. For example, state 'JIl,2,2,4' would be labeled with a smaller number than state 'JI2,6,6.' When the numbering scheme reaches the members of one of the analysis sets isolated in Step 1 (above), numbers for the members of that set may be arbitrarily assigned. Given this state numbering scheme, the selection of optimal treatments can proceed dynamically since for each state I that is not a member of an analysis set, I=1,2,...,S, I/Bj, j=1,2,...,T I VI = rI + PIJVJ J=1 and for the states of set B, .j=l,2,...,T t V = r + Z p V + E p i IcB. JeB. J=l where t = the number of last non-group B. state imme- diately preceding the smallest number-labeled state in B.. Thus, the process of selecting optimal treatments proceeds recur- sively from the state of smallest number-label to the one of largest number-label, stopping to consider simultaneously the values of a number of states only when an analysis set is encountered. Howard's value iteration and policy improvement algorithm [25] is employed only in the case of selecting treatments for the analysis-set patient states. An example of this section's labeling and optimization procedure is presented in Appendix H. This optimization procedure was applied to the states of the cranio- facial-pain treatment-planning model. Appendix G presents a list of the optimal treatment selections for each of the model's patient states. 4.3 Model Validation Validation of the craniofacial-pain treatment-planning model was accomplished in two phases. In the first phase of validation, the indi- vidual components of the Markovian representation were examined by the reviewing practitioners. The second phase of model validation compared model-generated treatment decisions with those made by the reviewing ex- perts. In addition, statistics generated by the model were compared to the care-system description provided by the patient records from the university dental clinics. This section discusses the results of these validating efforts. The review of model components was accomplished as values for the model parameters were collected. Some of the data-based estimates of transition probabilities and boundary-level application numbers did not conform to expert judgment about the effects and effectiveness of vari- ous treatment applications. When these disparities occurred, the esti- mates were modified to reflect expert judgment. The general structure of the patient states was reviewed to insure that the representation shown in Appendix F did in fact portray a set of logical progressions through the care system. Although this examination established the validity of the patient progressions, the review did point out one deficiency in the model's structure. The number and types of treatment alternatives available for use at each patient state were determined by records of actual applications of these treatments in the data used for model construction. It was the judgment of the reviewing practitioners that in several cases the selection of treatment alterna- tives for a patient state did not include the 'most appropriate' treat- ment alternative. Nevertheless, model deficiency can readily be correct- ed. With the collection of data on the effects of these 'most appropriate' treatments, these additional treatment alternatives can be incorporated as decision alternatives for the patient states in question. The reviewing practitioners made selections of treatments for each of the model's patient states. In those cases where the model's treat- ment alternatives did not include the practitioners' 'most appropriate' choice of treatments, the practitioners made a selection from the same list of alternatives used by the model. Appendix G lists their choices of treatment along with each model-generated selection. The two sets of treatment plans include the same treatment selection for 87 out of 94 patient states, or 92.6% of the patient states. The 7 differences in treatment selections arise in part from the approximations the treatment- planning model employs in its representation of the care system and in part fromnslight inconsistencies in the practitioner's treatment selections. One last test was performed to verify the suitability of the Mark- ovian representation of the craniofacial-pain care system. Mean transit times through the care system to one of the terminal states were calcu- lated using the model-generated treatment decisions, and each of six first-visit patient states. These model-generated transit times were compared to estimates of the same statistics gathered from the patient records contributed by the university dental clinics. Table 5 presents the values of both sets of statistics. The close correlation of these values reveals that the treatment-planning model not only duplicates the decisions of experts, but also provides a structure for gathering other relevant information about the underlying care system. 4.4 bodel Applications Like the diagnostic-classification model presented in Chapter 3, the craniofacial-pain treatment-planning model has been structured to permit its utilization in a variety of applications. Markovian modeling provides an analytic representation of the craniofacial-pain care system as well as establishing a means of making treatment selections. This section dis- cusses applications of the model's analytic representation and treatment selections in teaching, in research, and in practice. The model-generated treatment decisions reveal which treatments are most frequently used in the care of craniofacial-pain patients. In a teaching environment, this information can be used to specify treatment- I TABLE 5 MEAN TRANSIT TIMES THROUGH THE CRANIOFACIAL-PAIN CARE SYSTEM Model Truncated Patient For a Patient ~Wose First Generated Iodel- Record Diagnostic Classification Was Estimate* Estimate+ EstimateV Myopathy-Myositis 1.50 1.34 1.35 Oral Pathology-Dental Pathology 1.11 1.04 1.08 Vascular Changes- Migrainous Vascular Changes 3.89 3.42 3.06 Myofacial Pain Dysfunction- Uneven Centric Stops 1.86 1.43 1.50 Myofacial Pain Dysfunction- Anxiety/Depression 3.87 3.47 3.18 Myofacial Pain Dysfunction- Reflex Protective Muscular Contracture 1.90 1.79 1.87 The values in these sets of estimates are specified in terms of the number of patient visits in which the patient occupies a non-well or non-referred patient state. Note: The treatment-planning model considers the possibility of 'infinite duration' occupancy of non-well or non-referred states. + These truncated estimates were generated frcm the treatment- planning model on the conditional basis that a patient must transit into either the well or the referred state by his fifth patient visit. V The maximum number of visits for any patient described by the clinical data was five patient visits. application techniques that should be emphasized in training dental stu- dents in craniofacial-pain care. Moreover, the parameters employed in model development, in particular the transition probabilities and refer- ral costs, are themselves valuable instructional materials in developing the dental student's treatment-selection skills. The treatment-planning model provides a method for evaluating new developments in treatment for craniofacial-pain patients. With estimates of the effectiveness of his new treatment, the researcher can use the craniofacial-pain treatment-planning model to get two immediate responses. First, the optimization technique of Section 4.2 will determine if this new treatment provides 'better care' for the patient than any of the other treatment alternatives the model has to choose fram. Second, if optimal treatment selections for the model include the new treatment, the model's statistics will show improvement in length of stay, and other relevant measures of treatment effectiveness, introduced by using this new treatment. In the office of the practicing dentist, the treatment-planning mod- el's decisions could provide a concise reference of the treatment selec- tions suggested by experts in the field of craniofacial pain. Moreover, the practitioner would have a chance to contribute to the refinement of the listing as the treatment records of his patients could supplement the data used in model construction. In addition, the practitioner could employ the statistics associated with the treatment-planning model in scheduling the length, and number, of his appointments for craniofacial- pain patients. Ii I CHAPTER 5 CONCLUSIONS AND FUTUPR RESEARCH. This dissertation has presented analytic models of the decision pro- cesses associated with diagnosing and selecting treatments for a partic- ular health-care problem. The selection, construction, and testing of these models have been discussed in sace detail. Meanwhile, the model building effort itself has been the source of a number of insights into decision-making in a health-care environment. These insights will be reflected in this chapter's discussion of the dissertation's central re- search conclusion and suggestions.of topics for future investigation. The similarity between the decision-making processes employed by the practitioner and the analytic structure of this dissertation's models is quite revealing. In both diagnosis and treatment planning for cranio- facial-pain patients it appears that the practitioner, like the analytic models, makes 'first-order' decisions. The linearity of symptom signifi- cance (a first-order polynomial of symptom weights), and the present- patient-state dependency of transition probabilities measuring treatment effectiveness (a first-order stochastic dependence) provide a means of generating decisions that closely approximate the decisions made by dental practitioners. This general conclusion on the applicability of first- order decision techniques to craniofacial-pain diagnostic classification and treatment planning characterizes the central development of this dissertation. Given this summary statement, there are several logical extensions to this dissertation's research that should be examined in future inves- tigations. The following suggestions identify some of the more fruitful areas for further research efforts. These suggestions are ordered in the author's view of their significance. 1. This dissertation's research found that first-order decision- making models are valid descriptions of the underlying thought processes employed by the craniofacial-pain practitioner. It is possible that these first-order descriptive decisions are 'suboptiral' and that higher order decision-making tools might yield prescriptive, or 'optimal,' diagnostic classifications and treatment plans for craniofacial-pain patients. That is, considering the interaction between significant symptoms and multiple- state dependency for patient-state transitions may lead to optimal diag- nostic and treatrent-selection decisions. As the models themselves can readily be increased in their decision-making 'order,' an investigation into this possibility would be hampered only by the necessity of collect- ing an elaborate data base. Nevertheless, such an investigation should be undertaken in this, the most significant, of future research areas. 2. As this dissertation's analytic models can be applied directly to any health-care problem where there is verification that practitioners make first-order decisions, one potential avenue of future research would be to isolate those health-environments where these kinds of decisions are made. However, a word of caution is interjected at this point. Math- ematical modeling demands an underlying structure for the process being modeled. Yet, in a process dealing with a product that is subject to considerable variation, such as the care of a patient in a health-care system, isolating an underlying process structure is difficult. Moreover, the problem of finding process structure is compounded in the health-care field by a lack of unifying and consistent nomenclature. In the health- care field, scholarly literature and historical precedent can serve as the justification for two or more contradicting sets of terminology for the same anatomical structure or physiological process. Thus, in re- searching the generality of first-order decision-making techniques, the investigator must consider process variability and nomenclature incon- sistency before he mrrkes any statement about the applicability of this dissertation's decision-making tools to other health-care environments. 3. A non-geometric discussion of the criteria for pattern space separability was presented to provide a means of characterizing health- care disorders for which diagnostic classification by a linear pattern classifier might be feasible. Unfortunately, this dissertation's tech- niques are heuristic and do not provide an exact reproduction of the underlying mathematical specifications. Future research in this area could lead to a precise statement of non-geametric criteria for linear separability, and thus provide an indirect means for evaluating potential applications of linear non-parametric classifiers. 4. This dissertation's minimum-cost symptom-selection algorithm represents a clear departure frcm previous research in feature selection. The algorithm's utilization of the convex-hull representation of pattern space separability makes this development unique in the literature of feature selection. However, the algorithm's method of checking the fea- sibility of potential feature collections is extremely tedious. A more efficient method to check feature-collection feasibility may be revealed through future investigations in this area. II I 5. From a mathematical-programming point of view, the syptam- selection algorithm represents one of a limited number of techniques capable of solving a problem with non-linear constraints. The algorithm seeks an optimal assignment of components, where the feasibility of any assignment is determined by the existence of a set of discriminating com- ponent multipliers. In this more general context, the structure of the algorithm may be applicable in a variety of problem areas not directly related to the feature-selection problem. The possibility of employing the algorithm in this general setting should be investigated. 6. In modeling the treatment-planning process for craniofacial-pain patients the concept of boundary-level treatment applications was intro- duced. Boundary numbers on the effects of repeated treatment applications are likely to occur in data derived from the care of patients with a va- riety of physiological disorders. Further investigations of this phenom- enon may result in more effective methods of predicting which treatments will have boundary-level application numbers, and more efficient statis- tical techniques to determine values for these numbers. 7. The training algorithm developed in the construction of the craniofacial-pain diagnostic classifier generates a feasible integer so- lution to a large number of linear constraints. This algorithm is both efficient and easily coded for computer applications. An investigation of the uses of this algorithm in a mathematical-progranming setting may reveal applications in solution techniques for more general integer pro- grams. 8. Potential applications have been suggested for the diagnostic- classification and treatment-planning models in teaching, in research, and in practice. The models and their applications have been presented so I I 81 that they might readily be employed by sare future investigator. Actual applications of the models should yield significant contributions to the effectiveness of the teacher, researcher, and practitioner. APPENDIX A CRPNIQFACIAL-PAIN PATIENT DATA VECTOR Referral Throu Sex 005 Male Ae Group Duration of Pair Character of Pai Change in Charac 001 Medical GP 003 Dental GP 006 Female 008 0 - 010 40 - 012 013 014 015 .n 016 018 020 022 024 026 ter of Pain 002 004 007 Medical Specialist Dental Specialist Female, menopausal or post menopausal 19 009 20 39 55 011 56 up Less than 3 weeks Frcm 3 to 6 weeks More than 6 weeks Episodic Aching 017 Burning Cutting 019 Discomfort Dull 021 Pressure Pricking 023 Sharp Soreness 025 Stinging Tenderness 027 Throbbing 028 Constantly getting worse 029 Got worse, then plateaued 030 Got worse, plateaued, then better 031 Getting better 032 Intermittent periods without pain 033 No change since beginning List of Drugs Taken 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 History of Trauma Mild Analgesics; Asprin, APC, etc, Moderate Analgesics (non-narcoticl Strong Analgesics: Narcotics and Synthetic Narcotics Anti-anxiety Agents: Mellaril, etc. Anti-arthritic Agents: Steroids, etc. Anti-depressives: Tofranil, etc. Birth Control Pills Hormone Preparations Anti-inflammatory Agents. Muscle Relaxants: Valium Muscle Relaxants: Meprobaniate Muscle Relaxants: Others Sedatives: Barbiturates, etc. Other Drugs Accidental Factitial Surgical location of Swelling 09 104 o08 Side 97 ~09^ 101 102 10 105 1 10 109/ 110 11 )2 113 1 4 Right Side Location of Tenderness Left .Side Location of Pain Left Side Limited Jaw Opening Joint Sounds Headaches 244 245 246 247 248 243 Yes Clicking Crepitation Pain accompanying joint sound Frequent headaches Headache associated with joint pain Right Side Right Side 249 Taste 251 Visual acuity Upper Respiratory Infection 253 In of Changes in Evidence of 254 ArI 255 Eve3 256 Neu 257 Otit 258 Salj 259 Sint 260 Strc 261 Vasc Facets 262 1 3 Lateral Slide Preraturities iritis try's Syndrome :opathy tis vary gland disease isitis kes ;ular disease 263 4 -- up 264 On working side 265 On balancing side Tooth Ache 266 Yes Biting Stress Tooth Mobility 267 Yes Recent Restorative or Dental Prosthesis Jaw Deviates on Opening 269 Impinrgeent of Coronoid Process on ZygaCmatic Arch Meniscus-Condyle Dyscoordination Padio caphic Examination 275 Left 271 268 270 .eft Yes Right 272 Right 273 Left 274 Right Mandibular condyle apposition (such as spur formation) Mandibular condyle resorption (such as flattening of anterior- superior surface or irregular surface) 250 Hearing 252 Perception of light touch on face Conjunction with beginning STMJ pain I Radiographic Examination Emotional T Bruxism or Uneven Cent History of History of 277 278 279 280 281 282 ?rauma 283 Anxiety Clenching 285 Yes :ric Stops 286 Yes Lengthy Dental Procedures General Anesthesia 28i ossa' opposition ossa resorption rticular eminence apposition rticular eminence resorption evidence of fracture clinical or radiographic evidence of pathoses 284 Depression 287 Yes 3 Yes Tinnitus 289 Yes Extraction of Teeth 290 Less than 6 veks k 291 Leaving a space the Preauricular Pain 292 Yes Alteration of Inter-Occlusal or Inter-Arch Space prior to '1IT pain at permits extrusion 293 Yes Paresthesia 294 Yes Luxation or Subluxation 295 Yes APPENDIX B MODIFIED FIXED-INCREIEN2T TRAINING ALGORITHM In presenting the modified fixed-increment training algorithm the following notation is employed: p = the number of classification categories t = the number of training-sample row vectors (k) a = training sample row vector number 'k' preclassified --3. in category 'j', j=l,2,...,p, k=l,2,...,t, and k=i[mod t] where 'i' is the index of the training- algorithm iteration (i) th W = the 'j column of weights (the constraints in the --3 'j 't discriminant function) used in the 'ith' iteration of the training algorithm, j=l,2,...,p. a = non-negative constant specified by the analyst to adjust the size of the 'dead zone' [23] in dis- cririnant function values, i.e., a > 0 S= positive constant specified by the analyst to adjust the scale of the weight vectors, i.e., S > 0. Using this notation, let a) be the i pattern examined by the algorithm, then case 1: if a.k) Wi) > ak) W + a for all cj S_ -3 -all c let W(i+l) = Wi) for all c. -c -C case 2: if ak W) < a (k) (i) + a for a subset B of the -3 -3 -3 -z p discriminants z E B, jiB (i+l) (i) (k) let W = Ba ] z c B z -z -[ W(i+) = W(i) for all c / {B U j} w: -C -c -c and w4(i+) = W + B[a (k)] where nB = the number of discriminants in the subset B. The algorithm is terminated when the values of the W., j=l,2,...,p, have -J3 not changed during a complete cycle of the t training patterns, i.e., when W.-1w. +2.. .=W for all j where 0 is the last case 2 pattern -j --3 examined by the algorithn. This algorithm is guaranteed to terminate in a set of feasible W4, j=1,2,...,p, if the training sample is linearly separable and a and B have been appropriately selected. If the training sample is linearly separable, the algorithm will converge for any fixed value of a > 0, where 8 is selected appropriately large. Hence, the algorithm is nor- mally applied to a training sample with a=0 and B=1. If the algorithm converges, these constants can be adjusted and the training algorithm reapplied. The justification for specifying a non-zero a (a = size of the dead zone) is that as a is increased the accuracy of the classifier is increased in making classifications of data not used in developing the discriminant-function weights. For example, with the craniofacial-pain diagnostic classifier and the test samples discussed in Section 3.3, the diagnostic model correctly classified approximately 5% more of the test samples' data vectors when the model was trained with a=30, 8=3 (versus an original training with a=0, =1l). Proof that the algorithm converges if feasible weight vectors * W., j=l,2,...,p, exist (that is, the sample space is linearly separable) is developed in Nilsson [22]. Nilsson's proof can be directly applied * since for any set of feasible W. -3 (k) (k) * a. W. > a. W + a --3J -- + a- for all k=l,2,...,t, and z=l,2,...,p, zfj, while for any W j=,2,.,p -J i=l,2,..., a!k) w(i) < a(k) (i) -3 -3 --3 -z for sane k and sane z. Typically, a training algorithm is applied to the members of a training sample without prior knowledge of whether the sample pattern space is linearly separable. The algorithm is allowed to process sample patterns until it either converges on a set of discriminating hyperplanes or it has run for a 'reasonable' amount of time without termination. Ex- perience with medical data and the modified fixed-increment algorithm has shown that if there is a set of discriminating hyperplanes, the algorithm will find it in no more than 3 complete cycles for each of the pattern classes. For example, if there are 5 pattern classes and the pattern space can be linearly partitioned, the algorithm should terminate in no more than 15 full cycles through the training data. This rough measure of training time provides an index for establishing a limit on ccrputer processing time. An application of the modified fixed-increment training algorithm is presented in Figure 7. Given the training sample of the form a = [ai,a ,11 where i2' 1 a = [0,0,11 2 = [1,0,1] -2 3 a3 = [0,1,1] the training sample patterns can be represented in 3-dimensional space by 1 3 Sa a3 2 A92 The modified fixed-increment algorithm with a = 0 and 8 = 1 proceeds as follows: Sample [0,0,1] [ 0 [1,0,1] [ 0 [0,1,1] [-1 [0,0,1] [-1 [1,0,11 [-1 *[0,1,1] [-2 *[0,0,1] [-2 *[1,0,1] [-2 Hence, the se (* indicates correct sample classification) W- W W aW aW aW -1 -2 -3 -1 -2 - , 0, 0] [ 0, 0, 0] [ 0, 0, 0] 0 0 I, 0, 2] [ 0, 0,-1] [ 0, 0,-1] 2 -1 -] , 0, 1] [ 2, 0, 1] [-1, 0,-2] 1 1 - .,-1, 0] [ 2,-1, 0] [-1, 2, 0] 0 0 L,-1, 2] [ 2,-1,-1] [-1, 2,-1] 1 1 - ,-1, 1] [ 3,-1, 0] [-1, 2,-1] 0 -1 ] ,-1, 1] [ 3,-i, 0] [-1, 2,-1] 1 0 - ,-1, 1] [ 3,-1, 0] [-1, 2,-1] -1 3 - >t of weights generated by this training sample is W = [-2,-1, 1] W2 = [ 3,-1, 0] W3 = [-1, 2,-l]. FIGURE 7 APPLICATION OF THE MODIFIED FIXED-INCREMENT ALGORITHM -3 L L L 0 2. |

Full Text |

26
stated is that a knowledge of the underlying classifying process can be employed in constructing the data vector examined by the classifier, and that fully utilizing this information will lead to a classifier that can be expected to be capable of performing well on new patient data. Of course, this discussion has been predicated on the separability of the underlying pattern space of data vectors. If this requirement is not met by sane form of patient-data-vector representation, classifica tion of patients by linear classifier is not possible. The next section of this chapter provides relationships between linear separability and the data that may be observed in a health-care system for which diagnostic classification by linear discriminants is being considered. This section has a dual purpose. First, linear sep arability is couched in 'non-geanetric* terms. Second, and more impor tantly, using the craniofacial-pain health-care system as an example of the section1 s developments provides information about the suitability of the non-par ame trie classifier as a model of the decision-making pro cess associated with diagnostic classification in this care system 3.2 Alternative Interpretations of Linear Separability The criteria for pattern space separability are mathematically concise. Unfortunately, these separability criteria are not readily expressible in non-gecmetric terms. The discussion developed in this section provides the reader with sane non-geanetric criteria that indi cate when the use of a non-parametric pattern classifier should be con sidered as a means of generating diagnoses for a medical or dental dis order. The first criterion is associated with a probabilistic measure of symptom exhibition. Given a patient who exhibits sane set of symptoms APPENDIX B MODIFIED FIXED-INCREMENT TRAINING ALGORITHM In presenting the modified fixed-increment training algorithm, the following notation is employed: p = the number of classification categories t = the number of training-sample row vectors a: (k) -3 = training sample row vector number 'k' preclassified in category 'j', j=l,2,...,p, k=l,2,...,t, and k=i[mod t] where i* is the index of the training- algorithm iteration wji} = the column of weights (the constraints in the 1 j^1' discriminant function) vised in the 'i**1' iteration of the training algorithm, j=l,2,...,p. a = non-negative constant specified by the analyst to adjust the size of the 'dead zone' [23] in dis- crirdnant function values, i.e., a >_ 0 6 = positive constant specified by the analyst to adjust the scale of the weight vectors, i.e., 0 > 0. Using this notation, let aP^ be the i^1 pattern examined by the algorithm, then case 1: if aik* wi1* > aik* WiiJ + a ~3 -3 -3 -c let W(i+1) = W(i) c c for all c^j for all c. 87 7 FIGURE 2 DIAGNOSTIC-CLASSIFIGriCN AND TREATMENT- PLANNING PROCESS FOR CRANIOFACIAL PAIN Dr. Daniel Laskin, University of Illinois; and Dr.'David Mitchell, University of Indiana, for providing access to the patient records employed in this modeling effort. The author would like to express his thanks to the secretarial staff of the Health Systems Research Division for their translation of the au thor's 'first-order' approximation to handwriting into a draft of this manuscript. Their tolerance of a multitude of last minute changes made by the author has been appreciated. Finally, the author thanks his wife, Mary, and his parents, Dorothy and Charles Leonard, for their encouragement and support throughout the course of this research. M.S.L. August, 1973 iv 64 Format I Patients whose first-visit diagnostic classification is Diagnostic Alternative 1, 2, 3, 4, 5, 6, 10, 11, 14, 16, of 17, make transitions out of their original classification 'I' according to the following figure: Format II For patients originally classified in Diagnostic Alternative 7, 8, 9, 12, 13, or 15, the following kinds of diagnostic-classification transitions are possible: 113 Format 1: Referral Cost = [record transferral cost] + [practitioner's lost fee] + 2*[inconvenience cost associated with a dental visit] Fermat 2: Referral Cost [fee paid to referral care system] + 2*[inconvenience cost associated with a dental visit] Format 3: Referral cost = [fee paid to referral care system] + [practitioners lost fee] + [record transferral cost] + 2* [inconvenience cost associated with a dental visit] where in all three formats the multiple of the inconvenience cost was suggested by the fact that in the clinical records (Section 3.1} the median number of visits to the referral care system was two visits. The treatment-planning model was optimized with referral costs based on each of these formats. Use of the Format 3 referral costs leach to model-gen erated treatment selections that most closely duplicated the selections of the reviewing practitioners. Hence, this format for patient-referral costs has been selected for utilization in the treatment-planning model. APPENDIX D TREATMENT ALTERNATIVES FOR CRANIOFACIAL-PAIN PATIENTS Treatment Application Number Treatments 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32 33 Chill Therapy Drug Therapy Fixation Heat Therapy Occlusal Adjustment Physical Therapy Prosthetics Tooth Extraction or Endodontics Drug Therapy and Fixation Drug Therapy and Health Therapy Drug Therapy and Occlusal Adjustment Drug Therapy and Physical Therapy Drug Therapy and Prosthetics Heat Therapy and Physical Therapy Occlusal Adjustment and Physical Therapy Physical Therapy and Prosthetics Chill Therapy, Drug Therapy, and Physical Therapy Drug Therapy, Fixation, and Heat Therapy Drug Therapy, Fixation, and Physical Therapy 97 63 ment 24 is the boundary repetition of that treatment. Hence, multiple repetitions of treatment 24 are not added to the state description of patient states based on Diagnostic Alternative 13, as the additional information on multiple applications does not influence transition pro babilities associated with this treatment's effectiveness. Thus, a second application of treatment 24 for a patient who continues to be classified in Diagnostic Alternative 13 places the patient in a state of the form 1 "3T 24 1 XJ X / f The craniofacial-pain treatment-planning model includes two terminal patient states in addition to the patient states that are based on diag nostic alternatives. One or the other of these two terminal states, 'well' or 'referred,' represents the patient's status when he exits the care system. A patient exists the system in the 'well' state when the effects of treatment applications result in sufficient improvement so that no further treatment is required. The patient moves into the 're ferred' state in lieu of further treatment. This alternative to treat ment is selected when the 'expected costs' of remaining in the care sys tem exceed the costs of referring the patient to another source of care (see Section 4.1.3). 4.1.2 Transition Probabilities Patient-state transitions that involve a change of diagnostic clas sification follow one of two basic formats, see Figure 5. For the initial diagnostic classifications in Format I, with each treatment application, the patient either remains in his original diagnostic classification or he transits into the well state. For Format II, the six diagnostic al ternatives shewn in the lower illustration form a different structure. 85 Changes in 249 Taste 250 Hearing 251 Visual acuity 252 Perception of light touch on face Upper Respiratory Infection 253 In conjunction with beginning of TMJ pain Evidence of 254 Arthritis 255 Every's Syndrome 256 Neuropathy 257 Otitis 258 Salivary gland disease 259 Sinusitis 260 Strokes 261 Vascular disease Facets 262 1-3 263 4 up Lateral Slide Prematurities 264 On working side 265 On balancing side Tooth Ache 266 Yes Biting Stress Tooth Mobility 267 Yes Recent Restorative or Dental Prosthesis 268 Yes Jaw Deviates on Opening 269 Left 270 Right Impingeirent of Coronoid Process 271 Left 272 Right on Zygcsratic Arch Meniscus-Condyle Dyscoordination 273 Left 274 Right Radiographic Examination 275 Mandibular condyle apposition (such as spur formation} 276 Mandibular condyle resorption (such as flattening of anterior- superior surface or irregular surface) 116 * vx = O v2 = O and Vj: find the that maximize kI kI t D -kI vkJ VI I + jf3 PIJ J ,1=3,4 v = max kc k, 4 k * ^ +j3PW Vj V, = max jf and Vgt find the k^ that maximize h k 8 k k 6 V = ri +Jf7PU VJ +j!3puvj *1 ,1=7,8 v = max kn k9 8 k * r9 + VJ 1 ~ p99 V10 max "10 -10 '10 8 kio * + j=3Pl0J Vj 1 P101010 19 classification is made on the basis of specifying that etiological fac tor that requires most inmediate action on the part of the attending practitioner. Thus, diagnostic classification of a patient into diag nostic alternative 'A' signals that the etiology specified by that al ternative should determine the course of the patient's care. The next step in model development isolated relevant data which measured the physiological and psychological status of craniofacial-pain patients. In particular, this step of model development sought those elements of patient status that practitioners employ in their own classi fication of craniofacial-pain patients. Appendix A presents a list of these data elements. Wherever it was feasible, measures of patient status were segmented to amplify the significance of particular readings of each measure. Thus, for example, while the duration of a patient's pain is a continuous measure of his status, it is important for the par- poses of classification to know whether a craniofacial-pain patient's duration of pain is less than 3 weeks, from 3 to 6 weeks, or longer than 6 weeks. For this measure of patient status, a short history of pain indicates a strong possibility of a recent traumatic injury virile pain over a long period is more likely associated with long standing arthritic or psychic disorders. To facilitate the development of an analytic model of the diagnostic- classification process, a vector representation of the relevant elements of patient data has been developed. The vector permits the notation of any of the data elements shown in the listing in Appendix A. The pre sence of any of the items found in Appendix A is recorded in a patient's data vector by an entry of '1' in the vector-dimension corresponding to the item number, while the absence of a vector item is noted by a 'O' 71 ing an optimal set of treatments is accomplished by finding the set of K treatment alternatives k, ,k,...,k that maximize each of the v_ (the ls I expected value of occupying patient-state I' given treatment alternative 'kj') where h J5! *i k. ^ = V + all patient ^ ^ ' states J 1=1,2,...,S and *1 rI y ^ *1 all patient PlJ ^ states J With treatment-augmented patient states, maximizing the v^ can be carried out in the following manner: 1. Group for simultaneous analysis all patient states possessing a common treatment history, where one or more of the treatments in this history are at their boundary level. Each of the 'T* sets of states complying with this description forms an analysis set B^, j=l,2,...,T. 2. Label sequentially the patient states, starting with state W as 1, state Ras 2, and then selecting numbers for the remaining unlabeled patient states on the basis that the one with the most treatments in its history receives the next number-label. For example, state ..111,2,2,4* would be labeled with a smaller number than state *.112,6,6.1 When the numbering scheme reaches the members of one of the analysis sets isolated in Step 1 (above), numbers for the members of that set may be arbitrarily assigned. Given this state numbering scheme, the selection of optimal treatments can proceed dynamically since for each state I that is not a member of an analysis set, 1=1,2,...,S, IBj, j=l,2,...,T I VI rI ?UVJ BIOGRAPHICAL SKETCH Michael Steven Leonard was bom February 2, 1947, in Salisbury, North Carolina. In June, 1965, he was graduated cum laude from Cocoa High School in Rockledge, Florida. He received the degree of Bachelor of Industrial Engineering with High Honors fron the University of Florida in June, 1970. In September, 1970 he began graduate work in engineering at the University of Florida. He received the degree of Master of Engineering in March, 1972. In June, 1972, he was designated a Distin guished Military Graduate of the Air Force Reserve Officer Training Corps. From September, 1970, until the present, his graduate training has been supported by a National Science Foundation traineeship. Michael Leonard is married to the former Mary Elizabeth Stewart of Cocoa, Florida. He holds the reserve ccrrmission of Second Lieutenant in the United States Air Force. He is a member of Lambda Chi Alpha fraternity; Alpha Pi Mu, Sigma Tau, and Tau Beta Pi honorary fraternities; and the Operations Research Society of America. 121 APPENDIX C APPLICATION OF THE MINIMUM-COST SYMPTOM-SELECTION ALGORITHM Given three pattern classes X, Y, and Z, with patterns of the form -j = faji'aj2'aj3,:L'' where = [0 10 1] = [0 0 11] 2 rr. i n 2 a^ = [0 0 0 1] = [10 0 1] , [0111] aj = [1 0 1 1] these patterns can be represented in three-dimensional space (without their constant = 1 components) by 1 feature 1 X /K feature 2 One set of feasible linear-classifier discriminant-function weights ,T for these patterns is Wx =[-2 3 0 -1] WY = [ 1 -2 2 0] Wz = [ 1 -1 -2 1]T. Suppose feature 1 costs* 2 units to employ in the classifier, feature 2 costs 6 units to employ in the classifier, and feature 3 costs 3 units to employ in the classifier. Then, for the minimum-cost symptcm-selection algorithm (Section 3.4) , and C = [2 6 3 0]. *1- 0 10 1 , a2 " 0 0 11 II 0 0 0 1 0 111 L J 10 11 10 0 1 * 91 48 Note that Conditions 3 and 4 can also be stated, and justified, with .the role of the elements of the A and B matrices reversed. Given this set of four conditions, consider the following row par tition of the A and B matrices:- A = "a* " 'b* *1 *1 B = Bi *0. *0 I o o 1 1 W o i where by appropriate change of rows in A and B 1. every element in each row of A^ is a one 2. every element in each row of B^ is a one 3. every element in each row of A^ is a zero 4. every element in each row of Bq is a zero. The partitions A^, B^, A^, and B are the rows of A and B corresponding to B1, A1, Bq, and AQ, respectively, and A* and B* are the remaining rows of A and B. With this partitioning and the four previously established conditions, the size of the data vectors associated with many of the [p(p-l)]/2 subproblems P2 can be significantly reduced. The reduction process, Procedure 1, can be stated in this manner: Step 1: If for seme row k in A^ (B^) each element in the corre sponding row of B (Ap is equal to one, then row k of A and B can be eliminated by Condition 2. 29 one of the diagnostic alternatives, then the use of a non-parainetrie classifier as a means of generating classifications should be investigated. The linear non-parametrie classifier employes a weighted sum of the symptoms exhibited by each patient in its discriminating functions. If symptoms can be isolated that are significant to the classification of patients with the disorder under investigation, then there is a 'natural' weight for each of these symptoms in the decision-making pro cess used by the practitioner. The existence of these natural weights increases the probability that a training algorithm will be able to find a feasible set of discriminant-function weights. Indeed, the relative importance of the significant symptoms may be reflected in the magnitude of the discriminant-function weights generated by the application of a training algorithm. As an example, the significant symptoms associated with two cranio facial-pain diagnostic alternatives, Alternatives 4 and 14, were isolated by Dr. Fast. A comparison of these symptoms and their associated dis criminant-function weights revealed a high degree of correlation between symptom significance and discriminant-function-weights, see Table 2. The reader should note that both of the criteria discussed in this section are heuristic approximations to the geometric requirement for pattern space separability. However, if the disorder under investigation meets one or both of these criteria, it may be possible to employ a non- parametric classifier to diagnose the disorder since the requirement for pattern space separability is most likely met. 60 Values for many of the treatment-planning model's parameters viere gathered from the set of patient records discussed in Section 3.1. As the patient histories from the contributing university dental clinics were reviewed, notations of treatment applications and time between suc cessive visits were made for each patient-practitioner interaction. The values of the remaining model parameters were either estimated by the reviewing practitioners, Dr. Fast and Dr. Mahan, or were gathered frcm responses to questionnaries completed by patients who visited the University of Florida's Dental Clinic. In modeling the complicated pro cess of care for craniofacial-pain patients, several simplifying assump tions were made. This section provides the motivation for these assump tions and presents the notation employed in the analytic description of the treatment-planning process. 4.1.1 Patient States In general, a Markovian system structure requires that the current state of the system completely characterizes the probabilities associated with future state occupancies of the system. To fully satisfy this Markovian condition for state structure in the craniofacial-pain treat ment-planning model would require that the model include as distinct mod el states every possible combination of diagnostic classifications a pa tient might have occupied, in conjunction with every combination of treat ment applications he might have undergone, during his stay in the care system. Unfortunately, such a model would have an infinite number of patient states.' However, for a majority of craniofacial-pain patients the know ledge of a patient's prior treatment record, coupled with his current diagnostic classification, is adequate to determine his prior diagnostic 12 TABLE 1 SURREY OF DIAGNOSTIC-CLASSIFICATiai MODELS Bayesian Classifiers Reference Number Disease Group Number Of Patients In Study % Correct Patient Diagnoses [12] Nontoxic Goiter 88 85.3 [13] Bone Tumor 77 77.9 [14] Thyroid 268 96.3 [15] Congenital Heart 202 90.0 [16] Gastric Ulcer 14 100.0 Non-Parametric Classifiers Reference Number Disease Group Number Of Patients In Study % Correct Patient Diagnoses [17] Liver 52 98.1 [18] Asthma 230 90.0 [19] Hematologic 49 93.9 [20] Thyroid 225 96.0 23 VA = the 296-diirension vector of weights associated with diagnostic alternative j1 til w., = the k element in the weight vector W., that is rj295,wj2961 T and 296 ik jk where T denotes vector transposition. Patient 'i' is classified in diagnostic alternative Ch when d^j>d^s for every j. If mgx d^t is not unique, then it is not yet possible to classify patient i' into one of the diagnostic alternatives. Treatment is prescribed for severe syrptcms and classification is attempted at a later date. Data frcm four sources were used to construct and verify the diag nostic-classification model, as well as the treatment-planning model presented in Chapter 4. Contributions of clinical records came frcm the dental schools at the universities of California at Los Angeles, Florida, Illinois, and Indiana. In all, the records of 250 patients, involving a total of 480 patient-practitioner interactions, form the data base for model building and validation. The relevant information frcm each of these patient visits has been recorded in the data-vector format of Appendix A. A diagnostic classification frcm Figure 3 was assigned to each of these patient data vectors by either Dr. Thanas B. Fast, Chairman of the Division of Oral Diagnosis, or by Dr. Parker E. Mahan, Chairman of the Department of Basic Dental Sciences, at the College of Dentistry, University of Florida. 41 20 data-vector entry. For example, referring to the listing in Appendix A, a male patient would have the following fifth, sixth, and seventh ele ments in his data vector (...,1,0,0,...), while a pre-manopausal female would have the series of elements (...,0,1,0,...). This vector notation of a patient's status serves as the input data for a non-par aire trie pattern classifier that assigns a diagnostic classifica tion to the patient's dysfunction. Non-parsmetric pattern classification, as described in Meisel [23] and Nilsson [24], is the process of creating decision surfaces that separate patterns into homogeneous classes,Ct, i=l,2,...,p, specified by the analyst. In the craniofacial-pain diagnostic model, the (t are the diagnostic alternatives shown in Figure 3. Classification of a pat tern (a patient' s-data-vector) into one of the classes is performed by a pattern classifier composed of a maximum detector and a set of dis criminant functions. These discriminant^ (a), j=l,2, ,p, are single valued functions of each patient's data-vector a. If au represents a data vector for a patient whose correct diagnostic classification is the x1 diagnostic alternative, then the (a) are chosen so that gi-i*>gj-i) i, j=l,2,...,p, j^i. The craniofacial-pain classifier uses linear discriminant functions. These discriminants are linear in the sense that they provide mappings from E11 to that exhibit the form gj(a) =a1wjl+a2wj2+...+^wjn4.(n+1) where in the patient-data-vector a, the value of a^ denotes the presence 37 the basis of body-temperature and blood-pressure readings. Traditional techniques for feature selection might employ a linear combination of body temperatures and blood pressure measurements as one 'new* feature. The transformation sought in this investigation would lead to the clas sification of patients by either body temperature or blood pressure alone if this were possible. This example will be used again in Section 3.4.1 to illustrate the algebraic and geometric structure of the problem. Nelson and Levy [27] have attacked the problem of selecting a re duced set of unaltered features for use in a classification scheme. These authors attach a cost to the use of each available feature/ and employ a ranking scheme to measure each feature's discriminating power. Then, under a restriction on the total cost of features employed, they develop an algorithm that selects the set of features that maximizes the classifier's discriminating power. Unfortunately, their scheme does not guarantee the selection of a subset of original features that contain enough 'information' to permit pattern class separation by discriminant function. Therefore, a new algorithm is presented in this section that minimizes the cost of the set of features used by the pattern classifier yet insures that all patterns can be correctly classified by a set of linear discriminant functions. In the remainder of this section the more general terms 'feature,' 'pattern,' and 'pattern class' will be used respectively to represent a data vector item, a patient's data vec tor, and a diagnostic classification. The problem of finding a minimum-cost collection of features would not be considered if there did not already exist a set of 'n' features by which the patterns under examination could be correctly classified by linear discriminants. That is, given a *n' dimensional representa- 69 Fifty-eight patients at the University of Florida's Dental Clinic responded to the following questions; Hew much would you estimate that this trip to the Dental Clinic cost you in terms of lost wages, baby sitting fees, transportation costs, and other costs that you may have had to pay so that you could be hare for your appointment? The distribution of these, estimates is shown in this histogram. .99 9. 19. 29. 39. 49. 59. 69. 79. 300. Dollars The mean value for these 58 estimates of patient-visit inconvenience costs was $30.72. FIGURE 6 PATIENT-VISIT INCONVENIENCE COST 11 15, 16], irak a diagnosis on the basis of selecting a patient's 'most probable' disease state. The Bayesian classifier is an elementary type of parametric pattern-classification model. In general, parametric classifiers make use of one or more of the statistical characteristics of the dispersion of the data being classified to establish rules for data classification. With the Bayesian models, only the conditional probabilities for exhibiting sets of symptoms, given a particular dis ease, are tabulated from past medical data. Then, utilizing Bayes' theorem, the probabilities for the presence of alternate diseases *.. ,d^ can be calculated as a function of the syirptcm-ccrrplex S the practitioner observes in the patient. Bayes* theorem provides that for each of the d^ P(dilS) = C(S)PCS|di)P(di) n where C(S) = 1/[E PiSjcyPicy}, k=l hence, a patient with syiptcm-ccmplex S is classified in disease-group i if P(d. |S) = max p(d, |s). 1 k A survey of the results of application of Bayesian models is given in Table 1. Although the percentage of correct diagnoses in most of these test applications is high, there are several reasons why a Bayesian diagnos tic irodel is not used as the means of generating diagnostic classifica tion in this dissertation. The first reason is the difficulty in ac quiring the proportional presence of alternate diseases P(d^}, i=l,2,...,n, in the population of patients that are to be classified by the model. These 'prior' probabilities of having a particular disease are a function 50 Step 7: If the use of Steps 1, 2, 4, and 5 has eliminated one or more rows or columns iron either matrix then repartition the matrices and return to Step 1, otherwise terminate Procedure 1. In coding Procedure 1 for computer processing, there is no need to physically partition the rows of the A and B matrices. Summing the elements in any row of A or B reveals whether the individual elements in the row are all equal to zero or are all equal to one. Given this infor mation, the steps from Procedure 1 determine whether a pattern is re moved iron A or B, whether a row in A and B is removed, or whether the procedure should be terminated because no feasible set of convex combina tions for P2 exists. As an example of the use of Procedure 1 consider the set of matrices A and B in subproblem P2 where 0 1 1 - 0 1 1 1 1 A = 1 0 0 0 B = 0 0 0 0 1 0 0 0 1 1 1 0 0 1 1 1 1 1 1 0 In the first application of the steps of Procedure 1: 1. Column 4 can be eliminated from matrix A by Step 4 and 2. Column 1 can be eliminated from matrix A by Step 5. After the first application of the steps of the procedure 1 1 1111 A = 0 0 B = 0 0 0 0 0 0 1110 1 1 1 1 0_ 25 function, between each new vector and its associated training-sample convex hull. Given this close proximity, the classifier's discriminant functions should correctly classify most new data vectors as these vec tors will lie within or near the boundaries of the appropriate discrim inating hyperplanes. Hence, the key to providing adequate classifier performance for new data vectors lies in devising data-vector-represen- tations of patient data for which the data vectors of a canton diagnostic classification exhibit strong similarity. In the introductory discussion of the elements of patient data used in the patient data vector, it was pointed out that an effort was made to select components of patient status that assist the practitioner in his selection of diagnostic classifications for a craniofacial-pain patient. Thai these elements were partitioned to generate as much discriminating information as possible frcm each data element. In terms of the alter nate diagnostic classifications, these elements of patient data were chosen so that all patients in any one diagnostic classification would have a unique combination of exhibited or non-exhibited data-vector ele ments. Employing these carefully constructed qualitative data elements resulted in a set of 'natural' gaps in the vector representations of patient data iron alternate diagnostic classifications. The fact that there are portions of the pattern space that cannot be occupied by any data vector, and partitions of the space where the vectors of each clas sification must lie, assists the classifer in making correct classifica tions of data not used in model construction. As Section 3.3 shows, this discussion is not meant to imply that the craniofacial-pain diagnostic classifier can, in its present state of development, correctly classify every new data vector. What has been 94 By Step 3 of Procedure,1, no feasible convex combination of these matrices exists. *rp Arp Application of Procedure 1 to and yields the following reduced matrices: aÂ£ = [1 1] [0] . By Step 3 of Procedure 1, no feasible convex combination of these matrices exists. AT at Application of Procedure 1 to and A^ yields the following reduced matrices: =[0 1] A^ = [0 1] . Ajp Arp For these reduced matrices, A^ and Ay P4 has the following form: maximize A^ + ^ subject to A^ 0 it + A^ <_ 0 X2 i 0 it + ^2 r t, A^, A2 unrestricted. P4 has the bounded optimal solution A^ = A 2 = 0. Hence the assignment vector ^ = 1, x^ = 0] is infeasible by the rules of Procedure 2. Go to Step 6 of the algorithm. Step 6: The assignment vector is now ^ = 1, x^ = 1]. As this vector does not include an assignment for every variable, return to Step 4 of the algorithm. The tree of possible solutions to PI new has the form APPENDIX F FLOW CHARTS OF PATIENT-STATE TRANSITIONS C 1 *) 1116 } (2 *-(^ 2124 ^ 2124,2.4) v 20 ^ C^) 2 20 C3 C324 104 61 classifications. Even in the cases where the current classification and prior treatment record do not provide a total description of a pa tients condition, these elements of patient status do provide signifi cant information about the probabilities associated with, a patient's future status in the care system. For example, in the data employed in model construction, 47 craniofacial-pain patients occupied Diagnostic Alternative 15 and were treated with an application of drugs at least once. Eight of these patients were 'well* after a first treatment with drugs, while 39 required multiple applications of drugs or other treat ments during their stay in the system. Yet of the 12 patients who were given two applications of drugs, 9 were well* following the second repetition of drug therapy. Thus, while the overall data-based transi tion-probability estimate for a transition from Diagnostic Alternative 15 into the well state following any one application of drugs is .36, the transition-probability estimate for a transition into the well state following two successive applications of drugs is .75. Hence, for this diagnostic classification, information on the prior application of drugs is important in determining a patient's future status in the care system. This form of 'current diagnostic classification augmented by treat ment record' patient-state description is employed in the craniofacial- pain treatment-planning model as an approximation to a 'true' Markovian state structure. Each of the diagnostic alternatives shown in Figure 3 forms the basis for a collection of patient states. The diagnostic al ternative is augmented with a record of treatments that have been applied since the patient entered the care system. Appendix D provides a list of the treatment alternatives that may be prescribed for craniofacial- pain patients. The record of each treatment given to the patient is noted in the patient-state descriptions without regard to its chronological U,F Libraries:Digital Dissertation Project 3 of 4 nonprofit, educational purposes via the Internet or successive technologies. This is a non-exclusive gra'h't 'of'^perrars-si-ons 'ft/i^sjbecific off-line and Sion .tfuS on-line uses for an indefinite term, QÂ£f-iine uses shall be limited , -A tbi .yqoD SVKtoiA to those specifically allowed by "Faif^UseW as pjbesa&sSLbed by the terms United States copyright legislaLibh' (trf7~~Titl-e iVP U. S. Code) as well as h&muisn to the maintenance and preservat^qj^gg^a digital^archive copy. Digitization allows "the" University of Florida~~bo generate image- and text-based versions as appropriate and to provide and enhance access using search software. This grant of permissions prohibits use of the digitized versions for commercial use or profit. Printed or Typed Name A of Copyright Holder/Licensee Personal information blurred *23 'ZjOoQ Date of 'Signature Please print, sign and return to: Cathleen Martyniak UF Dissertation Project 5/23/2008 11:35 AM 13 of seasonal variation, geographic location, population demography, and many other factors. Secondly, valid Bayesian analysis requires the analyst to determine the dependence among exhibited symptoms for each disease considered by the diagnostic model. In this respect, the prob abilities for the presence of groups of symptoms are independent for sane diagnostic alternatives and strongly correlated for others [4]. The third reason for not selecting a Bayesian model is the massive storage requirement dictated by the necessity of keeping the set of conditional probabilities. These conditionals, P(S|d^) for every observable symptcm- canplex S and. every disease i considered, must be at hand each time the model is used. For example, given ten alternate diseases and ten symp toms for which no assumptions of between-symptcm independence can be made, storage is required for 10-(210-1), or 10,230, conditional probabilities. 2.2 Non-Parametric Classification Models Non-paramatric diagnostic models, like [17, 18, 19, 20], utilize non-parametric pattern classifiers, a form of pattern recognition model ing. In the literature on pattern recognition, the term 'non-parametric' implies that no form of probability distribution is assumed for the dispersion of symptom data in establishing the rules for pattern classi fication. These models do assume, however, that classes of symptom data are distinct entities and, hence, a patient with a particular set of symptom S cannot simultaneously occupy more than one diagnostic state. That is, the models assume a deterministic classification for each pat tern viewed by the pattern classifier where every observable pattern has one, and only one, correct classification. Non-parametric modeling permits the analyst to bypass the difficult problems of explicitly determining the conditional probabilities for, 8 When initial treatment does not result in a 'cure' for the cranio facial-pain patient, treatment effects are evaluated and new data col lected. When a patient's diagnostic classification leads to a course of treatment that is not within the realm c-f the practitioner's special ty he is referred to a more appropriate care source. Monitoring is con tinued on those patients not rejected iron the system at this point, and the patient is discharged when he is symptom-free. However, when other disorders have been isolated during the course of treatment, the patient is recycled through the classification-treatment process. The diagnosis-treatment sequence is not fixed. Treatment can begin prior to a diagnostic classification or treatment can follow a diagnosis. Moreover, there may be many diagnostic-treatment data-acquisition cycles before the patient is considered 'well.' 1.2 Research Objective . The introductory discussion of the need for diagnostic and treatment planning models, and the brief description of the craniofacial-pain care system, provide the setting for a statement of the research objective un derlying this dissertation. This objective is to derive analytic repre sentations of the decision processes involved in selecting diagnostic classifications and planning treatments for craniofacial-pain patients. A diagnostic-classification model that duplicates the classification of expert practitioners is sought. For treatment planning, the modeling goal is to provide a structure for interaction of the critical considera tions associated with the treatment-selection process. These analytic representations will be structured to permit their application as teaching devices in the training of dental practitioners, as methods of testing the effects of new diagnostic tools and treatment applications, and as aids to the practice of dentistry. 44 characterized as follows: let A = A and A = B with A and B having columns a. " 'S v 1 P2: and bj respectively for any Ag and At. m. m. Find u. > 0, E1 u.=l, and v. > 0, E3 v.=l i such that m i=l 3 - 3 j=l m. v3 Z1 u.a. i=l 11 j=l 3 3 ZJ v.b. . If such u^ and v^ exist for any one of the subproblems then X is not feasible to Pi. Because the number of subproblems is large even for a relatively snail number p of pattern classes, there is justification for seeking methods to expedite the solution of each subproblem P2. To achieve this goal, a series of conditions will be presented that characterize seme of the criteria necessary to the existence of a solu tion to subproblem P2. In addition to establishing criteria for exis tence, these conditions provide a means for reducing the size of the matrices A and B. This reduction will be discussed after the conditions are established. th k Condition 1: If the k row of A has all elements a^, i=l,2,... ,nu, equal to zero (one) and the k 1 row of B has all V elements b., j=l,2,...,m, equal to one (zero) then no u.>0, ed u.=l and v.>0, E-1 v.=l exist such that 1 1=1 1 j=l ^ m. m. Z1 u.a. = = E-5 v.b. . i=l 11 j=l 3 3 Justification 1: Under Condition 1 there is no set of convex combina- tions of the k 1 row elements of A and of the k1 row elements of B such that the combinations are equal. 67 following form; Jr p the probability of making a transition from JuJ patient-state 'I' to patient-state 'J* following the application of treatment-alternative 'k.1 4.1.3 Cost Structure A patients progression through the craniofacial-pain systsn gener ates a multitude of implicit and explicit costs. The explicit costs can be measured in terms of the dollar charges paid by the patient or the practitioner during the patients stay in the system. Other costs are implicit in nature and can be quantified only as they relate to the opportunities lost by the patient and the practitioner while the pa tient remains in the care system. For modeling purposes four major system costs have been isolated. These costs are: (a) Cost of treatment applications (b) Cost of the practitioner and his staff's services (c) Cost to the patient of occupying a non-well patient-state (d) Patient-referral cost. Although these costs do not encompass all of the system costs, they mea sure significant explicit and implicit charges associated with a patient's stay in this system. In the treatment-planning model, each of these costs is charged on a per-patient-visit basis. Costs of the various treatment applications and the costs associated with the practitioner and his staff's services were estimated by the re viewing practitioners. Estimates of treatment and care-system service costs were partitioned by diagnostic classification as well as treatment 2 study. This reality prohibits the model builder from making broad state ments about the applicability of his models to other health-care environ ments. Accordingly, the models developed in this dissertation are spe cifically oriental toward the health-care problem presented in Section 1.1 with the understanding that the results of this modeling effort may not be applicable to the whole of health-care diagnosis and treatment planning. 1.1 Craniofacial Pain The head and face are subject to chronic, persistent, or recurrent pain more often than any otter portion of the body. Pain in the head or face has a greater significance to patients than any other pain. It may arouse fears that the patient is in danger of losing his mind or that he has a tumor of the brain. In addition, the emotional state of the patient is adversely influenced because it is generally known by the layman that the profession's knowledge of the causes of these pains is meager and that methods of treat ment are inadequate [5, p. v]. H. Houston Merritt, M.D., Dean Columbia University College of Physicians and Surgeons One source of the pain Dr. Merritt describes is dysfunction of the tenporonandibular joint. The torporamandibular joint, see Figure 1, provides the articulation between the mandible and the cranium. This joint is unique both in its structure and its function. Within the plane of the temporomandibular joint, lateral, vertical and pivoting motion is permitted. In addition, the joint is the point of articulation for the only articulated complex that contains teeth. With this joint, "motion is directed more by the musculature and less by the shape of the artic ulating bones and ligaments than is the fact for otter joints" [5, p. 34]. The fact that joint motion is highly dependent on musculature im plies that when mandibular dysfunction occurs there is sane disturbance 75 TABLE 5 MEAN TRANSIT TIMES THROUGH THE CRANIOFACIAL-PAIN CARE SYSTEM For a Patient Whose First Diagnostic Classification Was Model Generated Estimate* Truncated I-iodel- Estimate+ Patient Record Estimate' Myopathy-Myositis 1.50 1.34 1.35 Oral Pathology-Dental Pathology 1.11 1.04 1.08 Vascular Changes- Migrainous Vascular Changes 3.89 3.42 3.06 Myofacial Pain Dysfunction- Uneven Centric Stops 1.86 1.43 1.50 Myofacial Pain Dysfunction- Anxiety/Depression 3.87 3.47 3.18 Myofacial Pain Dysfunction- Reflex Protective Muscular Contracture 1.90 1.79 1.87 The values in these sets of estimates are specified in terms of the number of patient visits in which the patient occupies a non-well or non-referred patient state. Note: The treatment-planning model considers the possibility of 'infinite duration' occupancy of non-well or non-referred states. + These truncated estimates were generated fran the treatment planning model on the conditional basis that a patient must transit into either the well or the referred state by his fifth patient visit. The maximum number of visits for any patient described by the clinical data was five patient visits. v 27 S, non-pararretric pattern classification requires that P [S} 3 =1 for the diagnostic alternative 'CL' that describes the patient's current diagnostic status, and P[S|C^] = 0 for all other diagnostic alternatives 'C^..' However, assume that for the disorder in question the probability of exhibiting any relevant symptom has been calculated from historical data, that is, estimates of Pts^JCh] are available for all relevant symptoms s^ and all diagnostic alternatives Ct. Then, if the following decision rule leads to the correct classification of a majority of the patients with the disorder in question, utilization of a non-parametric classification model should be investigated: classify a patient who exhibits the set of symptoms S in the th j diagnostic alternative if IT P[si|Cj] > TT for 311 ^3* d) s.eS s.eS i i- Since (1) holds if and only if log tTT P[si|Cj]] > log tfT p[silC]c^ for 311 s^eS s^eS decision rule (1) can be expressed in terms of logarithms. Let the set of symptoms S be represented as a row vector a with the elements of a assigned values as follows: a^ = 1 if symptom s^ is an element of S and a^ = 0 if symptom is not an element of S, where n is the total number of relevant symptoms. Form the column vectors Wj = [log Pts-JCh], log P[s2|Cj],..., log P[sn|c_.]]T 52 Consito: the dual of P3, written in the following form: P4: maximize [0_ 1 1] n_ h h - n>-X1,X2 unrestricted in sign. Note that P4 may have many associated ir^ variables, but has only as many constraints as the number of patterns in A and B (as reduced by Procedure 1). P4 always has at least one solution to its constraint set. Thus, if an application of a linear-programming algorithm to P4 reveals the exis tence of an unbounded solution, then P2 has no solution. Therefore, if and only if P4 has a bounded solution do u. and v. exist such that x 3 m. I1 u.a. i=l 1 1 xu Â¡P v.b. j=l 3 3 where and m. u. >0, I1 u. = 1 1 i=l 1 m. v. > 0, iP v. = 1. 3 j=i 3 The preceding discussion with its development of a reduction proce dure and dual formulation provides the structure for a second procedure. Procedure 2 establishes a mechanism to verify the feasibility of any a. assignment of zeros and ones to the X vector of problen PI, see Figure 4. That is, given seme vector X and a set of patterns a, rtt=l,2,... ,rm, and i=l,2,...,p, the [p(p-l)]/2 subproblems P2 are formed by zeroing out Two treatments and are available for patients classified in Alternative I and Alternative J. has a boundary-level application number of one application and T2 may be given only once during the patient's stay in the care system. Note that this figure omits the transition arcs between the diagnostic-classification-based patient states and the terminal states well and referred. FIGURE 8 MULTIPLE-STATE HISTORY-AUGMENTED PROCESS ANALYTICAL MODELS FOR DIAGNOSTIC CLASSIFICATION AND TREATMENT PLANNING FOR CRANIOFACIAL PAIN By Michael Steven Leonard A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL Fmrnii-ENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1973 108 45 Condition 2: Justification 2 Hence, there can be no set of convex combinations of the columns of A and of B such that the combina tions are equal. Symbolically, m. m. since no u.> 0, Z1 u.=l and v.>0, Z-* v.=l l 1=1 3- j=1 3 exist such that in. m. . Z1 u.a. = Z3 v.b. , i=l 11 j-1 33 m. m. '3 ,T -I no u.>0, Z u.=l, and v.>0, ZJ v.=l 1_ i=l 1 3 j=l 3 exist such that m. m. Z1 u.a. = Z3 v.b. . i=l 1 1 j=l 3 3 If the k row of A has all elements a^, i=l,2,... ,nu, equal to zero (one) and the k row of B has all Jr elements b^, i=l,2,...,iru, equal to zero (one), the +*Vi K11 ot of matrixes A and B can be eliminated without loss of possible solutions to subproblen P2. Under Condition 2 every convex combination of the k row elements of A and of the K 1 row elements of B are equal. Hence, a set of convex combinations of the columns of A and of the columns of B are equal if and only if the convex combinations of the remaining rows til (all rows except the kul row) are equal. Symbolically, * tlx let a^ denote the pattern a^ whose k component has * been eliminated and similarly let b., denote the 3K elimination of component k from pattern b^, then as Ill Patient State Model Selection Practitioner Selection Patient State Model Selection Practitioner Selection 9 12 12 14141 24 24 9112 12 Refer+ 14112,12 Refer Refer* 10 23 23 14112,23 12 12* 10123 12 12 14124,24 22 Refer+ 10112,23 12 12 14124,35 24 24 11 12 12* 15 12 12 11112 12 12* 15112 15 15 12 15 15 15120 20 20 15115 15 15 15122 34 24 12123 Refer Refer 15123 23 23 12135 24 24 15124 12 12 12115,31 Refer 17*+ 15127 16 16 12124,35 24 24 15134 34 34 13 12 12 15135 24 . 24 13112 12 12 15141 34 34 13118 24 24 15112,12 12 12 13122 34 34* 15112,23 12 12 13123 12 12 15116,27 16 16 13124 24 24 15120,20 20 Refer*+ 13112,12 23 23 15122,22 12 12 14 23 23 15122,34 34 34* 14112 12 12* 15123,23 23 23 14123 12 12 15124,24 24 24 14124 24 24 15134,34 34 34 14126 26 26 15112,22,22 12 12 14135 35 35 15122,34, 34 34 34* 120 [25] Howard, R.A., Dynamic Probabilistic Systems, Vol. II; Semi- Markov and Decision Processes, New York: Wiley (1971). <* [26] Rosen,- J.B., "Pattern Separation by Convex Programming," Journal of Mathematics and Application, Vol. 10 (1965) 123-134. [27] Nelson, G.E., and Levy, D.M., "Selection of Pattern Features by Mathematical Programming Algorithms," IEEE Transactions on Systems Science and Cybernetics, Vol. SSC-6 (1970) 20-25. [28] Balas, E., "An Additive Algorithm for Solving Linear Programs with Zero-One Variables," Operations Research, Vol. 13 (1965) 517-547. [29] Freund, J.E., Mathematical Statistics, Englewood Cliffs, N.J.: Prentice-Hall (1962). CHAPTER 2 PREVIOUS RESEARCH Over three-hundred publications have been addressed to the problem of modeling the diagnostic and treatment-planning process. Spanning ' fourteen years, this research has considered such diverse problems as the classification of liver biopsies [10] and the optimal plan for treating mid-shaft fractures of the femur [11]. At least ninety-one disorders have been utilized as environments for developing diagnostic and treatment-planning models. The magnitude of this research effort emphasizes the need for analytic representations of these complex deci sion-making processes. Fortunately, the significant contributions in this voluminous literature can be neatly partitioned into four distinct categories. Re search in diagnostic classification has been based either on the applica tion of Bayesian statistics or on the vise of non-par ame trie pattern classifiers. Treatment planning has been presented as either a finite- horizon decision problem or as an application of decision analysis to a Markov process of uncertain duration. This section presents a brief dis cussion of each of these categories and evaluates their suitability as analytic representations of the process of providing health care for craniofacial-pain patients. 2.1 Bayesian Classification Models Bayesian diagnostic-classification models, such as [12, 13, 14, 10 105 c 1 ) or Abstract of. Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirement for the Degree of Doctor of Philosophy ANALYTICAL MODELS FOR DIAGNOSTIC CLASSIFICATICW AND TREATMENT PLANNING FOR CRANIOFACIAL PAIN By Michael Steven Leonard December, 1973 Chairman: Dr. Kerry E. Kilpatrick Major Department: Industrial and Systems Engineering This dissertation presents a systematic approach to craniofacial- pain diagnosis and treatment planning using analytic models of the under lying decision-making processes. Patient diagnoses are generated by a linear pattern-recognition classifier trained with a sample of preclas sified craniofacial-pain patient data. For this classifier, an algorithm is developed that minimizes the total cost of the set of features employed in the classifying process. Diagnostic classifications, augmented by a history of prior treatment applications, provide the state descriptions for a Markovian decision model of the treatment-planning process. Cranio facial-pain patient records frcm four university dental clinics serve as a data base for model construction and validation. The analytic models provide a means of duplicating the diagnostic classifications and treatment plans of experts. Approximately 90% of the diagnostic classifier's classifications and 93% of the treatment planning model's treatment selections concurred with the decisions made by experts in the field of care for craniofacial-pain patients. Moreover, the models permit an examination of the critical considerations associated 119 [12] Boyle, J.A., Greig, W.R., Franklin, D.A., Harden, R.McG., Buchanan, W.W., and McGirr, E.M., "Construction of a Model for Computer-Assisted Diagnosis: Application to the Problem of Nontoxic Goiter," Quarterly Journal of Msdicine, Vol 35 (1965) 565-588. [13] Lodwick, G.S., Harm, C.L., Smith, W.E., Keller, R.F., and Robertson, E.B., "Ccmputer Diagnosis of Primary Bone Tumors: A Preliminary Report," Radiology, Vol. 80 (1963) 273-275. [14] Overall, J.E., and Williams, C.M., "Conditional Probability Pro gram for Diagnosis of Thyroid Function," Journal of the American Medical Association, Vol 183, No. 5 (1963) 307-313. [15] Toronto, A.F., Veasy, L.G., and Warner, H.R., "Evaluation of a Computer Program for Diagnosis of Congenital Health Disease," Progress in Cardiovascular Diseases, Vol. 5, No. 4 (1963) 362-377. [16] Wilson, W.J., Templeton, A.W., Turner, W.H., and Lodwich, G.S., "The Computer Analysis and Diagnosis of Gastric Ulcers," Radiology, Vol. 85 (1965) 1064-1073. [17] Burbank, F., "A Computer Diagnostic System for the Diagnosis of Prolonged Undifferentiated Liver Disease," American Journal of Medicine, Vol. 46 (1969) 401-413. [18] Collon, M.F., Rubin, L., Neyman, J., Dantzig, G.B., and Siegelaub, A.B., "Automated Multiphasic Screening and Diagnosis," American Journal of Public Health, Vol. 54 (1964) 641-750. [19] Lipkin, M., Engle, R.L., David, B.J., Zgorykin, V.K., Ebald, R., Sendrow, M., and Berkley, C., "Digital Computers as an Aid to Differential Diagnosis," Archives of Internal Medicine, Vol. 108 (1961) 56-72. [20] Overall, J.E., and Williams, C.M., "Comparison of Alternative Ccmputer Models for Thyroid Diagnosis," San Diego Symposium on Biomedical Engineering, Vol. 3 (1963). [21] Betague, N.E., and Gorry, A., "Automated Judgemental Decision- Making for a Serious Medical Problem," Management Science, Vol. 17, No. 1 (1971) B421-B434. [22] Ledley, R.S., "Computer Aids to Clinical Treatment Evaluation," Operations Research, Vol. 15 (1967) 694-705. [23] Meisel, W.S., Computer-Oriented Approaches to Pattern Recognition, New York: Academic Press (1972). [24]Nilsson, N.J., Learning Machines, New York: McGraw-Hill (1965). 103 and established a boundary application number for the treatment that reflected their knowledge about the treatment's effectiveness as well as the information supplied by the data. 62 order. For example, a patient's occupation, of the state *<111,2,2 denotes that he is currently classified in diagnostic alternative J, and that since he entered the care system he has been treated with one application of treatment 1 and two applications of treatment 2. .Augmenting the patient-state descriptions with treatment history expands the dimensionality of the state space, yet the number of history- augmented states remains finite for two reasons. The treatment records used in model construction reveal that, for sane combinations of diag nostic alternatives and treatment applications, there is a feasible limit to the number of treatment repetitions that can be given to any one patient. Thus, the first reason for a finite state space is that no patient state in the treatment-planning model includes more repetitions of a particular treatment than the clinical data have established as a feasible limit. As an example, the records of patient visits used in model construction establish a feasible limit of only one application of treatment 18 for patients classified in any of the diagnostic alter natives. Therefore, the treatment-planning model includes patient states that exclude treatment 18 as a portion of their treatment history or exhibit the form 'Jl...,18,... for each diagnostic classification J* where 18 is a feasible treatment. The second reason for a finite state space is that there is a boundary application* of many treatments such that neither the treatment-record data nor the reviewing practitioners established differences between the transition probabilities for the boundary application and those for further repetitions of the treatments (see Section 4.1.2 and Appendix E). In Diagnostic Alternative 13, for example, the first application of treat- 33 senting these visits in a test's random sample and sane vectors used in model construction. Such occurrences lead to test results that over estimate classifier accuracy. lienee, in Test Six, a random sample of all of the patient data associated with 40 patients (a total of 51 patient data vectors) was selected. This sample was classified by the diagnostic-classification, model using the remaining 429 data vectors as a data base. The results of this test are included in the data shown in Table 3. There is one other possible factor affecting the classifier's accuracy as measured by these tests. It is conceivable that there were duplicate data vectors in the data base of 480 patient-data-vectors. If duplicates do exist and were included in both the test samples and the samples' training bases, measures of classifier accuracy will be overly optimistic. However, since 'noise' is introduced by the variabil ity among craniofacial-pain patients and generated in the practitioner's transcribing of the elements of patient data into the data-vector format, 295 and since there are 2 possible data vectors, the probability that two or more of the data-based patient vectors include an identical specifica tion of data-vector elements is small enough to justify neglecting this possibility and its effects. The results summarized in Table 3 reveal that the diagnostic-clas sification model performs well in duplicating the diagnostic classifica tions originally assigned by the reviewing practitioners, Dr. Fast and Dr. Mahan. Moreover, the size of the test sanples was quite large in relation to the data base employed in developing each test's diagnostic model. As new data became available and are incorporated in the para meters of the model, the accuracy of the craniofacial-pain diagnostic classifier can be expected to increase slightly. 54 the appropriate pattern-vector elements. Then Procedure 1 is applied to each sutoproblem. Finally, for each pair of pattern classes the A boundedness of the dual formulation P4 is examined. Vector X represents a feasible set of a pattern-classifying features for PI if and only if each of the [p(p-l)]/2 subproblem formulations P4 is unbounded. Before a statement of the algorithm to solve problem PI is presented several terms must be defined. The assignment vector is defined as a listing of variables x^, elements of the vector X in Pi, whose values have been determined by .the steps of the algorithm. The elements in this vec tor are recorded with the value of their assignment, either zero or one. These elements are entered in the vector in the order they were assigned, with the first algorithm assignment in the first (left) position. For example, consider the assignment vector [x4 = 0, x10 =1, x2 = 0]. This vector records that the algorithm first assigned x^ equal to zero, then assigned x^q equal to one, and its last assignment was x2 equal to zero. Feasibility of a solution X, as determined by the assignment-vector component values, is checked by Procedure 2 with the value of those vari ables not included in the assignment vector temporarily set equal to one. The value V of an assignment vector is defined as minus one times the sum of the costs associated with each of the variables in the assignment vector, multiplied by the value assigned to the respective variable. For the example assignment vector, [x^ = 0, x^q = 1, = 0], where c^ = 5, c^q = 2, and = 7, the assignment vector has the value V = (-1) [5(0) +2(1) +7(0)] =-2. 53 FIGURE 4 PROCEDURE 2 4 of the intricate neuromuscular mechanisms controlling mandibular move ment [5]. Emotional tension may also lead to hyperbonicity of the striated masticatory muscles resulting in facial pain or altered sensa tion without evidence of peripheral dysfunction. In addition, abnormal occlusal contacts of the teeth may affect muscle tonicity resulting in mandibular dysfunction [5]. Moreover, the tenporcmandibular joint is prone to disorders carmon to all joints: rheumatoid arthritis, osteo arthritis, traumatic injuries, neoplasms, and nonarticular disorders. Although the term craniofacial-pain* is a broad classification for pain in the head and face, the term is used in this dissertation to describe pathological, congenital, hereditary-based, or emotional causes of pain in and around the temporomandibular joint. Though the degree of severity may vary, one or more of the following four 'cardinal symptoms' are exhibited by the craniofacial-pain patient: pain, joint sounds, limitation of motion, and tenderness in the mastic atory muscles [6]. Accompanying these symptoms the patient may complain of, or the practitioner may find, hearing loss, burning sensations, mi graine-like headaches, vertigo, tinnitus, subluxation, luxation, dental pulpitis, sinus disease, glandular disorders, occlusal disharmony, and radiographic evidence of joint abnormality. The degree of association of these additional symptoms and findings with the etiology of the joint disorders is subject to considerable variation. Paralleling these areas of anatomic dysfunction is the possibility that the craniofacial-pain patient may be suffering from psychic dis orders. In no other type of patient seen by the dentist does psychic condition play a larger role [7]. Most craniofacial-pain patients have symptoms or signs of anxiety, and a sensory preoccupation with the oc- 93 Step 6: The tree Step 4: where [ 0 ] is the null matrix. By Step 6 of Procedure 1, feasible convex combinations of these matrices exist. Hence the assignment vector [x2 0] is infeasible by the rules of Procedure 2. Go to Step 6 of the algorithm. The assignment vector is now [x2 = 1]. As this vector does not include an assignment for every variable, return to Step 4 of the algorithm. of possible solutions to Pi now has the form Select the variable x^ for assignment. The assignment vector is now [x2 = 1, x^ = 0]. Apply Procedure 2. In Procedure 2, zeroing out the column k=3 fran A^, A^, and A^ yields Al = Atp Application of Procedure 1 to A^ and yields the following reduced matrices: *0 o" 0 1 0 1 1 1 CT 0 0 *T 0 0 0 0 1 1 A2 _ 0 0 1 1 A3 = 0 0 1 1 * m Al = o o A2 = o 1 i i A CHAPTER 1 INTRODUCTION The rapid pace of developments in medical and dental research pre vents the practicing physician and dentist fran fully utilizing each new diagnostic and treatment-planning aid as it is published. In each of the last four years an average of 215,000 new publications have been written to supplement the knowledge of the health-care practitioner [1]. Con currently, the pressures of an ever-increasing patient load force prac titioners to select the most expeditious means for diagnosing disorders and selecting treatments. For example, the medical general-practitioner (1970) saw an average of 173 patients a week [2], and the median dental practitioner (1971) saw two patients an hour [3]. Given these circumr- stances, practitioners may overlook possible diagnostic and treatment al ternatives or they may apply inappropriate treatments. If meaningful analytic descriptions of the diagnostic and treatment-planning processes can be developed, these models can assist educators in training new prac titioners, researchers in evaluating and disseminating new developments, and practitioners in improving the quality of patient care [4]. Developing models of the diagnostic-classification and treatment planning process requires an understanding of the underlying physiological processes of diseases and the mechanisns of their cures. Obviously, the effects of disease and the means of cure vary frcm one health-care prob lem to another. Thus, modeling efforts in diagnosis and treatment plan ning must be integrally related to the facet of health care that is under 1 84 Location of Tenderness Left Side Right Side Location of Pain Side Limited Jaw Opening 243 Yes Joint Sounds 244 Clicking 245 Crepitation 246 Pain accompanying joint sound Headaches 247 Frequent headaches 248 Headache associated with joint pain 96 assignment for every variable; go to Step 7 of the algorithm. Step 7: The value V is calculated for this assignment vector, where V= -1[1(6) + 1(3) + 0(2)] = -9 . * As V = -=, go to Step 8 of the algorithm. * Step 8: V is set equal to -9, and the values of the variables x^ = 0, XÂ£ = 1, and x^ = 1 in this assignment vector are stored for future reference. Go to Step 1 of the algorithm. Steps 2 and 3 of the algorithm dictate that the algorithm is terminated at this point since these steps generate the assignment vector [x^ = 1, x^ = 1, x^ = 1] which is known to be feasible to PI. * V for this assignment vector is -11, which is smaller than V Hence the minimum-cost collection of classifying features is feature 2 and feature 3, with a cost of 9 units associated with utilizing these features in a linear pattern classifier. 101 Analysis by Duration of Pain Less than From 3 to Mare than 3 weeks 6 weeks 6 weeks Transition 8 into 13 15 Wall ? X* = 5.047 with 6 degrees of freedon Hence, the analysis reveals that duration of pain is not significant in determining estimates of transition probabilities out of Diagnostic Al- o temative 13 following application of treatment 24, as x nt- =12.592. Uj f D Analysis by Nature of Pain Continuous Episodic Transition 8 into 13 15 Well 2 X = 3.964 with 3 degrees of freedom Hence, the analysis reveals that nature of pain is not significant in determining estimates of transition probabilities out of Diagnostic Al- 2 temative 13 following application of treatment 24, as x Ar -3=7.815. UD f 0 1 8 6 1 0 11 3 CHAPTER 5 CONCLUSIONS AND FUTURE RESEARCH. This dissertation has presented analytic models of the decision pro cesses associated with diagnosing and selecting treatments for a partic ular health-care problem. The selection, construction, and testing of these models have been discussed in seme detail. Meanwhile, the model building effort itself has been the source of a number of insights into decision-making in a health-care environment. These insights will be reflected in this chapter's discussion of the dissertation's central re search conclusion and suggestions.of topics for future investigation. The similarity between the decision-making processes employed by the practitioner and the analytic structure of this dissertation' s models is quite revealing. In both diagnosis and treatment planning for cranio facial-pain patients it appears that the practitioner, like the analytic models, makes 'first-order' decisions. The linearity of symptom signifi cance (a first-order polynomial of symptom weights), and the present- patient-state dependency of transition probabilities measuring treatment effectiveness (a first-order stochastic dependence) provide a means of generating decisions that closely approximate the decisions made by dental practitioners. This general conclusion on the applicability of first- order decision techniques to craniofacial-pain diagnostic classification and treatment planning characterizes the central development of this dissertation. 77 30 TABEE 2 CORRELATION BETWEEN SIGNIFICANT SYMPTOMS AND DISCRIMINANT-FUNCTION WEIGHS Diagnostic Alternative 4: Temporomard.tbular Joint Arthritis-Traumatic (Acute) Significant Symptoms Discriminant-Function Weights (+) Duration of Pain (less than 3 weeks) + 3 (+) History of Trauma (accidental) +30 (+) Preauricular Pain +11 (-) Salivary Gland Disease -12 (-) Otitis 1 (discriminant-function weights for Diagnostic Alternative 4 range fran -19 to +37) Diagnostic Alternative 14: Myofacial Pain-Dysfunction Bruxism Discriminant-Function Significant Symptoms Weights (+) Duration of Pain (more than 6 weeks) +15 (+) Facets + 2 (+) Bruxism and/or Clenching +56 (-) History of Trauma (accidental) -16 (-) Salivary Gland Disease 5 (discriminant-function weights for Diagnostic Alternative 14 range from -23 to +56) Note: For both Diagnostic Alternatives (+) indicates a symptom that leads the practitioner to classify a patient in that diagnostic alternative (-) indicates a symptom that leads the practitioner to classify a patient in seme other diagnostic alternative 51 In the second application of the steps of Procedure 1: 1. Pow 1 can be eliminated from both matrices by Step 1 2. Pow 2 can be eliminated fron both matrices by Step 2 and 3. Column 4 can be eliminated frem matrix B by Step 4. After the second application of the steps of the procedure 0 0 1 1 l A = B = 1 1 . 1 1 1^ In the third application of the steps of Procedure 1: 1. Pew 2 can be eliminated fron both matrices by Step 1 and 2. Procedure 1 can be terminated by Step 3. Hence, for this set of A and B matrices, subproblem P2 has no feasible solution. Although the use of Procedure 1 may lead to a reduction in the size of most subproblems, the pattern vectors (a^ and b^) for ach of these problems may still be quite large. Restating subproblem P2 as a linear program yields P3: minimize [0 0] subject to and A -B 11...1 00...0 00...0 11...1 u > 0 V > 0 u* o' V 1 - 1 where the existence of any solution vectors U* and V* signals the inter section of the convex hulls of pattern-classes A and B. 106 13118 cry 8112 C S}12 \ 12 8112,12 C 10 > 10123 y ^10112,23^) O C 11 > +/ 11112 TJ 89 Proof that the algorithm converges if feasible weight vectors * Wj, j=l,2,...,p, exist (that is, the sample space is linearly separable) is developed in Nilsson [22]. Nilsson's proof can be directly applied since for any set of feasible W_. aP^ W* > a!k) W* + a 3 ~3 3 ~2 r(i) for all 10=1,2,... ,t, and 2=1,2,...,p, z^j, while for any j=l,2,...,p, i-l,2,,0 a(k) w(i) < -3 -3 + a for sane k and sane z. Typically, a training algorithm is applied to the members of a training sample without prior knowledge of whether the sample pattern space is linearly separable. The algorithm is allowed to process sample patterns until it either converges on a set of discriminating hyperplanes or.it has run for a 'reasonable' amount of time without termination. Ex perience with medical data and the modified fixed-increment algorithm has shown that if there is a set of discriminating hyperplanes, the algorithm will find it in no more than 3 complete cycles for each of the pattern classes. For exanple, if there are 5 pattern classes and the pattern space can be linearly partitioned, the algorithm should terminate in no more than 15 full cycles through the training data. This rough measure of training time provides an index for establishing a limit on computer processing time. An application of the modified fixed-increment training algorithm is presented in Figure 7. To ray wife, Mary I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Richard S. Mackenzie Professor and Director, Education Office of Dental I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope arid quality, as a dissertation for the degree of Doctor of Philosophy. Associate Professor of Industrial and Systems Engineering This dissertation vas submitted to the Dean of the College of Engineering and to the Graduate Council, and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. Dean, Graduate School 5 elusion of their teeth [8]. Many of these patients can be characterized by a heavy reliance on denial, repression, and projection of their psy- ciiic disorders in order to maintain their self-concept of emotional sta bility [6] Often the complaints these patients relate to the practi- . tioner are not canpatible with any objective signs. The practitioner who manages the care of craniofacial-pain patients . assumes a difficult task. For sane of these patients, diagnosis is ob vious. Generally, however, the craniofacial-pain patient presents a.com plex combination of signs and symptoms [7]. More than one disease en tity nonrally accounts for the patient's symptoms and most craniofacial- pain patients suffer frem a pain-dysfunction complex involving a combina tion of masticatory muscle disorders, occlusal disharmony, emotional . tension, and anxiety [5]. Nevertheless the possibility of multiple almost sub-clinical etiologic factors combining to produce the dysfunc tion and pain must be considered. The close relationship of organic and emotional disorders as they appear in craniofacial-pain patients provides the examining dentist with the problem of discriminating which factor is primary in the etiology of the patient's dysfunction [7]. Unfortunately, the temporomandibular joint is one of the most difficult areas of the body to examine radiographically [8]. Hence, with these patients, the dentist relies to a large degree on tests of emotional stability and physical examination by visualization, palpation, and auscultation [7]. Therapeutic measures for the care of craniofacial-pain patients are as varied as the factors contributing to the disorder. "A small percent age of patients with symptoms referrable to the temporomandibular joint will portray such a confusing picture that consultation with other, dental or medical specialists is indicated" [7, p. 129], The majority of these Step 4: Select the variable x^ for assignment. The assignment vector is now [x2 = 1, x^ = If x2. = ^ Apply Procedure 2. In Procedure 2, zeroing out the column k=l from A^, A^, and A^ yields 0 o o o o o' 1 1 *T 0 0 CT 0 0 0 1 A2 1 1 *3 = 0 0 1 1 1 1 i i_ Am ^m application of Procedure 1 to A^ and A^ yields the following reduced matrices: Am Am AÂ£ = [ 1 ] A2 = [0 0] . By Step 3 of Procedure 1, no feasible convex combination of these matrices exists. Am Am Application of Procedure 1 to A^ and A^ yields the following reduced matrices: = [ 1 ] A^ = [0 0] . By Step 3 of Procedure 1, no feasible convex combination of these matrices exists. "T AT Application of Procedure 1 to A2 and A^ yields the foliating reduced matrices: Am Am Aj = [1 li = to o] . By Step 3 of Procedure 1, no feasible convex combination of these matrices exists. Hence tlie assignment vector [x2 = 1, x^ = 1, x^ = 0] is feasible by the rules of Procedure 2. Go to Step 5 of the algorithm. Step 5: The assignment vector [x2 = 1, x3 = 1, x^ = 0] includes an 66 the care system, since a 'nl visit holding time in a particular patient state can be modeled with no loss of information as Tn' repetitions of the 'virtual* transition frcm the state in question to itself. Care for craniofacial-pain patients is modeled as a discrete-stage Markovian sys tem with the beginning of visits to the practitioner serving as stage indicators. Using the history-augmented patient states, transition probabilities are specified in terms of the treatment that generated the transformation. In making a state-transition following a treatment, a patient must move to a state that includes that treatment as a portion of its state descrip tion. For example, following application of treatment *k,' a patient must progress frcm patient-state 'Ilm,n* to 'Jlk,m,n' where I may be equivalent to J.1 The only exception to this rule is in the application of a treatment beyond its boundary number of repetitions. Here, if treat ment k1 has a boundary number of two, then following an application of treatment k' three or more times a patient progresses frcm patient state 'IIk,k,m,n' to 'Jl^k^n1 where again *1' maybe equivalent to *J.1 This structure is indicated because inclusion of more than the boundary number of applications (two in this case) in the state description does not affect the transition probabilities. Estimates of the values of the transition probabilities were ob tained frcm the patient records discussed previously. A discussion of the stability of these probability estimates under variations in patient data is presented in Appendix E. Where the data on the effects of treat ment alternatives were limited, the data-generated probability estimates were refined by estimates frcm the reviewing practitioners. Notationally, transition probabilities are represented in the analytic model in the 92 This feature-selection algorithm will be employed to find the minimum- cost collection of classifying features. For the purposes of illustra tion, the feature* variables x^, i=l,2,3, are selected in the order x2, x^, x^ in Step 4 of the algorithm (Section 3.4.2). Note that this is a logical ordering of features in descending order of feature-utilization 'costs.1 This prior specification of the order of variable assignments permits the construction of a tree that represents the possible solu tions remaining to be considered at each step of the algorithm. This tree of possible solutions to PI has the form * Step 0: The algoritlm is initialized with V = -<. Go to Step 4 of the algorithm. Step 4: Select the variable x2 for assignment. The assignment vector is now [x2 = 0]. Apply Procedure 2 (Figure 4). In Procedure 2, zeroing out the column k=2 from A^, A2, and A^ yields 0 o" 0 1 0 1 0 0 :t 0 0 :t 0 0 0 1 A2 ~ 1 1 *3 0 0 1 1 1 1 1 1 'sm Application of Procedure 1 (Section 3.4.1) to A^ and A2 yields the following reduced matrices: Am Am = P 1 A2 = [ 0 ], 38 tion of each of the 'nr' patterns in each of the *p* pattern classes -im= [aiim/ai2m/***,ainm,11/ i=l,2,...,p, where m a., k=l,2,...,n, equals either zero or one, there must exist JJv a set of 'n+1' dimensional Vhs, j=l,2,...,p, such that Â£um (Vh-W,.) > 0 for all m=l,2,...,im (3) i1/2/.../P j=l/2/.../P j^i. Letting be. the im* (n+1) dimensional matrix of patterns in pattern- class i, then the requirement of (3) can be written in the following form: A. (W.-W.) > 0 i=l/2/.../P x ~x j==l/2/.../P j^i. If such pattern representations and Vh 's exists, then a solution to the following problem yields a minimum-cost collection of pattern-classifying features: minimize CX subject to A. [X (W.-W.)] >Â£ i=l,2,...,p x i ] j=l/2,... ,p PI: BIBLIOGRAPHY [1] .S. Department of Health, Education, and Welfare, Cumulated Index Medicus, Washington, D.C.: U.S. Government Printinq Office (1970-1973). [2] Ahevne, P., Ryan, G.A., and Walsh, R.J., 1972 Reference Data on the Profile of Medical Practice, Chicago: Center for Health Services Research and Development, American Medical Associa tion (1972). [3 ] Bureau of Economic Research and Statistics, "1971 Survey of Dental Practice," Journal of the American Dental Association, Vol. 85 (1972) 154-158. [4] Bruce, R.A., and Yrdall, S.R., "Computer-Aided Diagnosis of Cardiovascular Disorders," Journal of Chronic Diseases, Vol. 19 (1966) 473-484. [5] Schwartz, L., and Chayes, C.M., Facial Pain and Mandibular Dysfunction, Philadelphia: W. B. Saunders (1968). [ 6 ] TMT Research Center, Conference on Function and Dysfunction of the Temporomandibular Joint Complex, Chicaqo: University of Illinois (1969). [ 7] Mitchell, D.F., The Dental Clinics of Worth America, Symposium on Oral Medicine, Philadelphia: W..B. Saunders (1268). [8] Mitchell, D.F., Standish, S.M., and Fast, T.B., Oral Diagnosis/ Oral Medicine, 2nd Edition, Philadelphia: Lea & Febiger TT971). [9] Ledley, R.S., "Practical Problems in the Use of Computers in Medical Diagnosis," Proceedings of the IEEE, Vol. 57, No. 11 (1969) 1900-1918. [10] Lincoln, T.L., and Parker, R.D., "Medical Diagnosis Using Bayes Theorem," Health Services Research, Vol. 2, No. 1 (1967) 34-35. [11] Bunch, W.H., and Andrew, G.M., "Use of Decision Theory in Treatment Selection," Clinical Opthopaedics and Related Research, No. 80 (1971) 39-52. 118 34 The second validating procedure established a measure of variability on the diagnostic classifications that might be given by different dental practitioners. The discussion presented in Section 1.1 related the dif ficulties associated with diagnosing craniofacial-pain disorders. Prac titioners with varying kinds of professional experience can be expected to reflect their dissimilar backgrounds in differing diagnostic classi fications for these patients. To measure the variability associated with dissimilar backgrounds, five craniofacial-pain data vectors were selected from the data base employed in constructing the craniofacial-pain diag nostic classifier. Four dentists from the staff of the College of Den tistry at the University of Florida were asked to review these patient data vectors and assign to each of them a diagnostic classification. Table 4 summarizes their assignments and also includes the diagnostic classification originally given by the reviewing practitioners. The variability in diagnostic assignments reflected in Table 4 re affirms the justification for the research objectives set forth in Section 1.2. Same of the differences in the practitioners' choices of diagnostic classifications can be explained by the limited amount of data contained in each of the data vectors, and the less-than-full med ical statement of each of the diagnostic alternatives. Nevertheless, a diagnostic-classification model that generates classifications that are in 90% agreement with those of experts in the field provides a sizeable improvement over the variability in classification assignments exhibited in Table 4 in which only half the respondents agreed on a single diag nosis in four out of five cases. 3 Right temporomandibular articulation Inset: Anatomical features of the temporomandibular joint TEMPORCMNDIBUIAR JOINT / 55 3.4.2 Statement of the Minimum-Cost Symptom- Selection Algorithm 3.4.3 Computational Considerations 56 3.5 Model Applications 57 4. Treatment Planning 59 4.1 Model Components 59 4.1.1 Patient States 60 4.1.2 Transition Probabilities 63 4.1.3 Cost Structure 67 4.2 Selection of Optimal Treatments 70 4.3 Model Validation 72 4.4 Model Applications. 74 5. Conclusions and Future Research 77 Appendices A Craniofacial-Pain Patient Data Vector 82 B Modified Fixed-Increment Training Algorithm 87 C Application of the Minimum-Cost Symptom-Selection Algorithm 91 D Treatment Alternatives for Craniofacial-Pain Patients... 97 E Stability of Transition-Probability Estimates 99 F Flow Charts of Patient-State Transitions 104 G Patient-State Treatment Selections 110 H Application of the Patient-State-Labeling and Optimal- Treatment-Selection Procedure 114 BIBLIOGRAPHY 118 BIOGRAPHICAL SKETCH 121 Vi 74 patient states, or 92.6% of the patient states. The 7 differences in treatment selections arise in part fran the approximations the treatment planning model employs in its representation of the care system and in part from slight inconsistencies in the practitioner's treatment selections. One last test was performed to verify the suitability of the Mark ovian representation of the craniofacial-pain care system. Mean transit times through the care system to one of the terminal states were calcu lated using the model-generated treatment decisions, and each of six first-visit patient states. These .model-generated transit times were compared to estimates of the same statistics gathered fran the patient records contributed by the university dental clinics. Table 5 presents the values of both sets of statistics. The close correlation of these values reveals that the treatment-planning model not only duplicates the decisions of experts, but also provides a structure for gathering other relevant information about the underlying care system. 4.4 Model Applications Like the diagnostic-classification model presented in Chapter 3, the craniofacial-pain treatment-planning model has been structured to permit its utilization in a variety of applications. Markovian modeling provides an analytic representation of the craniofacial-pain care system as well as establishing a means of making treatment selections. This section dis cusses applications of the model's analytic representation and treatment selections in teaching, in research, and in practice. The model-generated treatment decisions reveal which treatments are most frequently used in the care of craniofacial-pain patients. In a teaching environment, this information can be used to specify treatment- 14 and the dependence, among, syirptcms that are required for Bayesian analysis. With the non-parametric classifier, a diagnosis is generated for the practitioner by evaluating a discriminant function associated with each, diagnostic classification, g^(), i=l,2,...,n. As was the case with the Bayesian models, the values of these discriminants are a function of the symptom-complex S exhibited by the patient. The patient's diagnostic classification corresponds to that disease whose associated discriminant- function value is maximum. That is, a patient with symptoms S is classi fied in disease-group i if gi(S)>gk(S) for all k f i. Results frcm seme of the applications of pattern-recognition classi fiers are presented in Table 1. In these test applications diagnostic accuracy was consistently high. Because of these models' ease of imple mentation and small storage requirements, a non-parametric pattern classi fier is preferable as a vehicle for generating diagnostic classifications. The use of a non-parametric classifier is further motivated by features of the care process for craniofacial-pain patients discussed in Chapter 3. 2.3 Finite-Horizon Treatment Planning In the realm of research on modeling the treatment-planning process, several authors [9, 21, 22] have presented schemes for analysis that utilize methods for making decisions tinder risk and uncertainty. The treatment-selection process has alternately been defined as a two-person zero-sum game, structured as a decision tree, and modeled as a Markov process of limited duration. Treatment costs and the 'costs' of occupy ing 'non well' or terminal patient states, provide the basis for select ing an 'optimal1 treatment plan. Finiteness of the planning horizon is assured either by establishing a maximum permissible number of treatment 32 TABLE 3 TESTS OF DIAGNOSTIC CLASSIFIER ACCURACY Number of Patient Data Vectors Number of Data Vectors Correctly Classified Classifier TEST ONE 50 46 92.0% TEST' TWO 50 45 90.0% TEST THEEE 50 44 88.0% TEST FOUR 50 47 94.0% TEST FIVE 50 45 90.0% TEST SIX 51 43 84.3% Mean Classifier Accuracy 89.7% Standard Deviation of Classifier Accuracy 3.5% 58 cost for employing a new test, the algorithm returns an evaluation of the test's classifying capability. The algorithm reveals whether the test is included in the minimum-cost collection of features and whether the lose of the new test permits the practitioner to discontinue other examination procedures. Additionally, the algorithm can be employed to point out new areas for research, as it isolates diagnostic alternatives where correct classification of patients is difficult using existing tests and procedures. As employed in the practitioner's office, the diagnostic classifier veil provide a direct link between the practicing dentist and the know ledge of experts in the field of craniofacial pain. Information will flow over the link in both directions. As new patients are seen by the practitioner, the record of each visit will be reviewed by experts and then used to supplement the data base employed in model construction. Then, when developments dictate, new sets of discriminant-function weights can be transmitted to the dental practitioners. This kind of interaction results in a more accurate and representative diagnostic classifier as the patient-sample data base becomes larger. xml record header identifier oai:www.uflib.ufl.edu.ufdc:UF0008974800001datestamp 2009-02-09setSpec [UFDC_OAI_SET]metadata oai_dc:dc xmlns:oai_dc http:www.openarchives.orgOAI2.0oai_dc xmlns:dc http:purl.orgdcelements1.1 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.openarchives.orgOAI2.0oai_dc.xsd dc:title Analytical models for diagnostic classification and treatment planning for craniofacial pain.dc:creator Leonard, Michael Stevendc:publisher Michael Steven Leonarddc:date 1973dc:type Bookdc:identifier http://www.uflib.ufl.edu/ufdc/?b=UF00089748&v=00001000580622 (alephbibnum)14057039 (oclc)dc:source University of Floridadc:language English ACKNCWLEDGEMENTS Without the considerable contributions of time and effort by the members of his committee, it would have been impossible for the author to have completed this dissertation. In particular, the author expresses gratitude to his Chairman, Dr. Kerry Kilpatrick, for his encouragement and direction during the course of this research effort. The author also thanks Dr. Kilpatrick for his editorial assistance during the development and organization of this manuscript. The author thanks Dr. Richard Mackenzie and Dr. Stephen Roberts for providing the initial direction for this research. Additionally, the author is grateful to Dr. Them Hodgson and Dr. Donald Ratliff for their assistance in evaluating and refining the author's ideas throughout this project. The author expresses his gratitude to Dr. Thomas Fast and Dr. Parker Mahan for the contribution of their extensive knowledge about craniofacial pain to the author's research. The author is deeply appreciative of Dr. Fast's and Dr. Mahan's willing ness to spend many hours examining dental records and their endurance of the nomenclature and idiosyncracies of this mathematical-modeling effort. Financial support for this research was provided by the Health Systems Research Division, J. Hillis Miller Health Center. The division's sup port in conjunction with a traineeship granted by the National Science ! Foundation made it possible for the author to undertake this research. The author is also grateful to the Industrial and Systems Engineering Department for the contribution of computer funds. Additionally the au thor thanks Dr. William Solberg, University of California at Los Angeles; iii 18 1. Temporomandibular Joint ArthritisDevelopmental 2. Temporomandibular Joint ArthritisInfectious 3. Temporomandibular Joint ArthritisOsteo (Degenerative) 4. Temporomandibular Joint Arthritis-Traumatic (Acute) 5. Temporomandibular Joint Arthritis Traumatic (Chronic) 6. MyopathyAcute Trauma 7. MyopathyMyositis 8. Oral PathologyDental Pathology 9. Vascular ChangesMigrainous Vascular Changes 10. Myofacial Pain-Dysfunction MalocclusionBalancing Interferences 11. Myofacial Pain-Dysfunction MalocclusionLateral Deviation of Slide 12. Myofacial Pain-Dysfunction MalocclusionUneven Centric Stops 13. Myofacial Pain-Dysfunction PsychoneurosisAnxiety/Depression 14. Myofacial Pain-Dysfunction Bruxism 15. Myofacial Pain-Dysfunction Reflex Protective Muscular Contracture 16. Myofacial Pain-Dysfunction Loss of Posterior Occlusion 17. Neuropathy FIGURE 3 CRANIOFACIAL-PAIN DIAGNOSTIC ALTERNATIVES 43 Assume a^ = [1,0], a^ = [0,1], aj = [0,0], and aj = [1,1] Graphically this pattern space can be represented as 2 Y 12 where the line X from ^ to ^ represents the convex hull of pattem- 1 2 class X and the line Y frcm to represents the convex hull of pattern-class Y. Since the lines X and Y intersect, the pattern space is not linearly separable, and hence, it is impossible to draw a discri minating hyperplane 0. Therefore, the following condition is equivalent to condition (4): c t a vector X is feasible to PI if and only if there do not exist b and U such that where <3 t A u A = U A for any s=l,2,...,p S *"* V tl,2,...,p s^t U1 = [UjyU^, ,Uj^ ] (5) uk^- > 0 for all k=l,2,...,m^ and m. E1 = 1 for all i=l,2,...,p . k=l K Checking the feasibility of seme vector X by condition (5) yields [p(p-l)]/2 distinct subproblems. Each of these subproblems may be LIST OF FIGURES Figures 1. Temporomandibular Joint 3 2. Diagnostic-Classification and Treatment-Planning Process for Craniofacial Pain.... 7 3. Craniofacial-Pain Diagnostic Alternatives 18 4. Procedure 2 53 5. Diagnostic-Classification Transitions 64 6. Patient-Visit Inconvenience Cost 69 7. Application of the Modified Fixed-Increment Algorithm... 90 8. Multiple-State History-Augmented Process 115 viii Given the training sample of the form a = [a^,a2,l] where aÂ£ = [0,0/1] a2 = [1,0/1] a2 = [0,1,1] the training sample patterns can be represented in 3-dimensional space by The modified fixed-increment algorithm with a = 0 and g 1 proceeds as follows: (* indicates correct sample classification) Sample *1 2 3 BEi 2 3 [0,0,1] [ 0, 0, 0] [ 0, 0, 0] [ 0, 0, 0] 0 0 0 [1,0,1] [ 0, 0, 2] [ 0, 0,-1] [ 0, 0,-1] 2 -1 -1 [0,1,1] [-1, 0, 1] [ 2, 0, 1] [-1, 0,-2] 1 1 -2 [0,0,1] [-1,-1, 0] [ 2,-1, 0] [-1, 2, 0] 0 0 0 [1,0,1] [-1,-1, 2] [ 2,-1,-1] [-1, 2,-1] 1 1 -2 *[0,1,1] [-2,-1, 1] [ 3,-1, 0] [-1, 2,-1] 0 -1 1 *[0,0,1] [-2,-1, 1] [ 3,-1, 0] [-1, 2,-1] 1 0 -1 *[1,0,1] [-2,-1, 1] [ 3,-1, 0] [-1, 2,-1] -1 3 -2. Hence, the set of weights generated by this training sample is Wx = [-2,-1, 1] W2 = [ 3,-1, 0] W3 = [-1, 2,-1]. FIGURE 7 .APPLICATION OF THE MODIFIED FIXED-INCREMENT ALGORITHM CHAPTER 4 TREATMENT PLANNING The selection of treatment regimens for craniofacial-pain patients is modeled as a Markovian decision process. The states in this Marko vian model are descriptions of a patient's health-care status and the decision alternatives are feasible treatments for the patient's dys function (see Section 4.1). In the first two sections of this chapter, motivation for the model structure is provided and the components of the decision model are developed. The third section provides a descrip tion of the validating procedures used to determine the appropriateness of the model and the model-generated treatment decisions. This chapter closes with a discussion of potential teaching, research, and private practice applications of the treatment-planning model. 4.1 Model Components Several model-building components from the craniofacial-pain care system are isolated to permit the construction of a Markovian represen tation of this system. A set of state descriptions that characterize, for decision-making purposes, the status of craniofacial-pain patients is presented in Section 4.1.1. Then transition probabilities measuring the effects of treatment applications are discussed in Section 4.1.2. Section 4.1.3 overlays the model's state descriptions and transition probabilities with costs accrued during the patient's progression -through the care system. These components are integrated and verified in the discussions of Sections 4.2 and 4.3. 59 49 Step 2: If for sane row k in AQ (Bq) each element in the corre sponding rcw of Bq (Aq) is equal to zero, then row k of A and B can be eliminated by Condition 2. Step 3: Step 4: Step 5; Step 6: If for sane row k in Aq (Bq) the corresponding row in Bq (Aq) has all elements equal to one or if for sane row (B1) elerrents equal to zero, then this particular subproblem P2 has no feasible solution by Condition 1. Procedure 1 and the search for a solution to P2 are terminated at this point because the convex hulls of pattern-classes A and B do not intersect. If for sane row k in (B^) the corresponding row in (a) has one or more elements equal to zero, i.e., , k k _,k n ,k_ k_ k r s t r s t columns b ,b ,...,b. (a ,a ,...,a ) can be eliminated by XT S l XT S t Condition 3. If for sane row k in AQ (BQ) the corresponding row in Bq (Aq) has one or more elements equal to one, i.e., = b* =.. = 1 (aÂ¡W=.. .=a^=l) then X S u XT S u columns b ,b ,...,b. (a .a ,...,a) can be eliminated by rs trs t Condition 4. If the use of Steps 1, 2, 4, and 5 has eliminated all elements of both matrices, then this particular subproblem has an infinite number of feasible solutions by Condition 2. Procedure 1 and the search for a solution to P2 are terminated at this point because the convex hulls of the pattern-classes A and B intersect. c c the corresponding row in B^ (A^) has all CHAPTER 3 DIAGNOSTIC CLASSIFICATION The analytic model developed to provide diagnostic classifications for craniofacial-pain patients is based on the principles employed in non-parametric pattern classification. The patterns classified by this diagnostic model are vector representations (see Section 3.1 and Appen dix A) of the craniofacial-pain patient's physical and emotional status. In the first sections of this chapter the' theoretical background for the diagnostic model is established. This discussion is followed by a pre sentation of the validation procedures used to evaluate model perfor mance. Next, an algorithm is developed to reduce the 'costs' associated with model utilization. The chapter closes with a discussion of poten tial applications of the craniofacial-pain diagnostic classifier in teaching, in research, and in the health-care process. 3.1 Model Components In the initial phase of the development of the diagnostic-classi fication model a set of possible alternative diagnostic classifications was established for craniofacial-pain patients. Figure 3 provides a list of these possible classifications. Note that the alternative classi fications in Figure 3 are not mutually exclusive as a craniofacial-pain patient classified in sane diagnostic alternative 'A' could also have the disorder specified by sane other diagnostic alternative 'B.' However, for the purposes of this dissertation, each patient's diagnostic 17 TABLE OF CONTENTS ACKNCLEDGEMENTS iii LIST OF TABLES vii LIST OF FIGURES viii ABSTRACT ix Chapter 1. Introduction 1 1.1 Craniofacial Pain 2 1.2 Research Objective 8 1.3 Dissertation Overview 9 2. Previous Research 10 2.1 Bayesian Classification Models 10 2.2 Non-Parametric Classification Models 13 2.3 Finite-Horizon Treatment Planning 14 2.4 Uncertain-Duration Treatment Planning 15 3. Diagnostic Classification 17 3.1 Model Components 17 3.2 Alternative Interpretations of Linear Separability 26 3.3 Model Validation 31 3.4 Minimum-Cost Symptom-Selection Algorithm 36 3.4.1 Algorithm Development 39 v 68 category- The cost estimates reflect typical charges in a dental clinic environment. The inconvenience experienced by a patient in making a visit to the practitioner was used as a measure of the cost of occupying a 'non-well' patient state. Estimates of this inconvenience cost were gathered from responses to a questionnaire completed by patients at the University of Florida's Dental Clinic. These were general dental patients not neces sarily suffering from craniofacial pain. Figure 6 shows the distribution of these patient estimates. Values for patient-referral costs were composed of the sum of three distinct estimates. The first component was an estimate of the total fee charged by the practitioner receiving the referred craniofacial-pain patient. Record transferral and duplication costs, as well as the fees lost by the referring practitioner, formed the second component. The third component of the patient-referral cost is a measure of the incon venience experienced by the referred patient, a value estimated by using a multiple of the value of the inconvenience cost discussed in the pre ceding paragraph. Appendix G provides a justification for using this particular combination of components in the referred-cost estimates. Symbolically, the patient-state transition costs (negative constants) are represented in the analytical model as k cTT = the sum of the costs generated by the transition Xu from patient-state 'I' to patient-state 'J' following the application of treatment 'k.' This sum includes the type (a), (b), (c), and (d) costs appropriate to each patient-state transition. APPENDIX G PATIENT-STATE Patient State Model Selection Practitioner Selection 1 16 16 1116 16 16 2 24 24 2124 24 24 2124,24 24 Refer*+ 3 23 23* 3112 Refer 12+ 312.5 41 41 3124 24 24 3132 32 32 3123,41 Refer Refer 3132,32 Refer Refer 4 35 35* 4120 Refer Refer 4124 24 24 4134 34 34 4135 24 24 4120,20 Refer Refer 4124,24 24 24 4124,35 24 24 TREATMENT SELECTIONS Patient State Model Selection Practitioner Selection 4134/34 34 34 5 35 35 5116 Refer 17+ 5124 17 17 5135 24 24 5136 24 24 5124,35 24 24 6 24 24 7 33 33 7112 24 24 7118 12 12 7133 24 24 7112,24 16 16 8 12 12 8112 12 12* 8118 12 12 8124 18 18 8112,12 16 16 8118,24 24 24 8134,41 12 12 no 100 Analysis by Sex Pre-menopausal Male Female Menopausal or Post-menopausal Female Transitions 8 into 0 1 0 13 1 11 2 15 0 1 0 Well 2 11 1 2 X 1-250 with 6 degrees of freedom Hence, the analysis reveals that the sex of the patient is not significant in determining estimates of transition probabilities out of Diagnostic 2 Alternative 13 following application of treatnent 24, as x g=12.592. Analysis by Age Group Transitions 8 into 13 15 Well 2 X = 2.286 with 3 degrees of freedom Hence, the analysis reveals that age group of the patient is not signifi cant in determining estimates of transition probabilities out of Diagnos- 2 tic Alternative 13 following application of treatment 24, as x 2=7.815. 20 39 40 55 Years Years 8 0 7 1 6 1 7 112 Patient Model Practitioner State Selection Selection 16 25 25 17 Refer Refer Note: * indicates that the reviewing practitioners made their selection of treatment from a set of alternatives that did not include the 'most appropriate' treatment alternative (Section 4.3) + indicates a difference between the treatment selections made by the treatment-planning model and the reviewing practitioners. Hie model-generated treatment selections agree with the reviewing prac titioners' selections in 87 out of 94 patient states. This represents a 92.6% agreement between the taro sources of treatment selections. In terms of the craniofacial-pain treatment-planning model's treat ment selections, the patient-referral costs are the most significant of the model's components. For each of the model's patient states, the cost of referral out of that state is used in making the decision whether to continue treatment for a patient, or whether to suggest that he go to another source of care. If this cost is set too low, then patients who should be treated in the craniofacial-pain care system are inappropriately referred out of the system. On the other hand, too high a referral cost leads the model to suggest that patients remain in this care system when it would be to their advantage to seel: care elsewhere. For these reasons, this cost was the subject of considerable examination in the building of the treatment-planning model. The reviewing practitioners suggested three possible alternative for mats for the cost of referring a patient out of the craniofacial-pain care system. These were: 72 and for the states of set B_.,. j=l,2,.... ,T t where t = the number of last non-group Bj state inrne- diately preceding .the smallest nurriber-labeled state in B ^ Thus, the process of selecting optimal treatments proceeds recur sively from the state of smallest number-label to the one of largest number-label, stopping to consider simultaneously the values of a number of states only when an analysis set is encountered. Howard's value iteration and policy improvement algorithm [25] is employed only in the case of selecting treatments for the analysis-set patient states. An example of this section's labeling and optimization procedure is presented in Appendix H. This optimization procedure was applied to the states of the cranio facial-pain treatment-planning model. Appendix G presents a list of the optimal treatment selections for each of the model's patient states. 4.3 Model Validation Validation of the craniofacial-pain treatment-planning model was accomplished in two phases. In the first phase of validation, the indi vidual components of the. Markovian representation were examined by the reviewing practitioners. The second phase of model validation compared model-generated treatment decisions with those made by the reviewing ex perts. In addition, statistics generated by the model were compared to the care-system description provided by the patient records from the university dental clinics. This section discusses the resulte of these validating efforts. 36 3.4 Minimum-Cost Symptom-Selection Algorithm The craniofacial-pain diagnostic-classification model detailed in the previous sections of this chapter has been structured upon the data vector of the 295 relevant signs, symptoms, and items of patient history shown in Appendix A. To utilize this model , the practitioner must ex amine a patient for the presence or absence of each of these data vector elements. Although the cost in time and fees varies frcm item to item, there is an expense to the practitioner, and to the patient, associated with checking each element in the data vector. Hence, it is logical to investigate the possibility of finding a reduced data vector that 'costs' less for the patient and practitioner to use and yet still permit cor rect classification of all craniofacial-pain patients. A review of the literature (see Meisel [23] Chapter 9 for a survey) reveals that many authors have considered the task of selecting a set of features to be used in a pattern-classification scheme. Traditional methods of viewing this problem are based on a search for a transforma tion that takes a given set of patterns into same 'new' pattern space where separation by discriminant functions is possible. Measures of pattern class separability are employed to evaluate the effects of transforming the set of patterns frcm one space to another. In general, these transformations take a pattern representation in 'n' features and create a set of 'r' (r ever, to reduce the 'costs' associated with using the craniofacial-pain diagnostic classifier, a transformation must be found that decreases the size of the data-vector pattern space by eliminating features rather that combining them. For example, assume patients were diagnosed on 78 Given this summary statement, there are several logical extensions to this dissertations research that should be examined in future inves tigations. The following suggestions identify sene of the more fruitful areas for further research efforts. These suggestions are ordered in the author's view of their significance. 1. This dissertation's research found that first-order decision making models are valid descriptions of the underlying thought processes employed by the craniofacial-pain practitioner. It is possible that these first-order descriptive decisions are suboptimal' and that higher order decision-making tools might yield prescriptive, or 'optimal,' diagnostic classifications and treatment plans for craniofacial-pain patients. That is, considering the interaction between significant symptoms and multiple- state dependency for patient-state transitions may lead to optimal diag nostic and treatment-selection decisions. As the models themselves can readily be increased in their decision-making 'order,' an investigation into this possibility would be hampered only by the necessity of collect ing an elaborate data base. Nevertheless, such an investigation should be undertaken in this, the most significant, of future research areas. 2. As this dissertation's analytic models can be applied directly to any health-care problem where there is verification that practitioners make first-order decisions, one potential avenue of future research would be to isolate those health-environments where these kinds of decisions are made. However, a word of caution is interjected at this point. Math ematical modeling demands an underlying structure for the process being modeled. Yet, in a process dealing with a product that is subject to considerable variation, such as the care of a patient in a health-care system, isolating an underlying process structure is difficult. Moreover, 31 3.3 Model Validation Vali.daiJ.on of the craniofacial-pain diagnostic-classification model presented in Section 3.1 has been accomplished by three types of validating procedures. The discussion presented in the preceding sec tions, and in particular the relationship between significant symptoms and their associated weights shown in Table 2, reveal a close proximity between the decision-making process the practitioner utilizes and the non-parametric classifier's symptom-weighing scheme. This section pre sents -favo other procedures employed in evaluating the diagnostic clas sification model's performance. The first procedure involved testing the diagnostic accuracy of the classification model on patient data that were not employed in model construction. Six classification tests were run in sequential order. In the first five of these tests random samples of 50 patient-data-vec- tors were drawn from the data base of 480 vectors discussed in Section 3.1. Then, as each of the tests was performed, the training algorithm in Appendix B was applied to the remaining 430 data vectors. With the weights derived from the training algorithm, the sample of 50 patients was classified. The modelgenerated classifications for each of the data vectors were compared to the classifications assigned to the vectors when they were created. As each test classification of a sample was completed, the diagnostic classifier's discriminant-function weights were set equal to zero, the sample of data vectors was returned to the data base, and the next test's random sample was drawn. A summary of the re sults of these tests of diagnostic accuracy is presented in Table 3. In each of the first five tests it was possible for a patient who has had multiple practitioner-visits to have seme of the vectors repre- 9 This research objective will be met by developing: 1. A diagnostic-classification model based on the theory of non-par ametrie pattern classification, with a. criteria for applicability of the modeling technique to diagnostic classification b. model.validation for craniofacial-pain patients c. development of a minimum-cost symptom-selection algorithm 2. A Markovian representation of the treatment-selection process, with a. justification for utilizing a Markovian model of the underlying care system b. model validation for craniofacial-pain patients 3. A description of potential model applications in teaching, research, and practice. 1.3 Dissertation Overview In Chapter 1 the motivation and scope of this dissertation was pre sented. Chapter 2 provides a review of literature relevant to the diag nostic and treatment-selection processes. A model of .the diagnostic- classification process is developed in Chapter 3. Chapter 4 follows with an analytic representation of the treatment-planning process. Con clusions derived iron this model-building effort, and suggestions for future research, are presented in Chapter 5. 102 Analysis by Number of Replications of Treatment 24 Is*" Application 2n<^ implication 3^ Application Transitions 8 into 1 0 0 13 9 5 0 15 1 0 0 Well 6 3 5 2 X = 8.099 with 6 degrees of freedom Hence, the analysis reveals that the number of replications is not signif- cant in determining estimates of transition probabilities out of Diag nostic Alternative 13 following application of treatment 24, as Thus, for Diagnostic Alternative 13, the five factors of patient variation do not affect transition-probability estimates of the effective ness of treatment 24. The last type of analysis performed, analysis by number of treatment replications, established treatment-application boundary numbers. If the analysis revealed no significant effect for differences in treatment repetitions, then the boundary number for the treatment was set at zero or one by the reviewing practitioner. Note, that a zero boundary-applica tion number for a treatment alternative implies that a record of that treatment provides no additional information about the patient's progres sion through the care system, and, therefore, the treatment-planning model does not add a record of the treatments to its patient-state descriptions. If the analysis revealed a significant effect for treatment repetitions, the reviewing practitioners examined the data-based estimates of transi tion probabilities associated with multiple repetitions of the treatment 109 8112 17 28 98 Treatment Application Treatments Number 34 Drug Therapy, Heat Therapy, and Physical Therapy 35 Drug Therapy, Occlusal Adjustment, and Physical Therapy 36 Fixation, Heat Therapy, and Physical Therapy 41 Drug Therapy, Fixation, Heat Therapy, and Physical Therapy 107 73 The review of model components was accomplished as values for the model parameters were collected. Seme of the data-fcased estimates of transition probabilities and boundary-level application numbers did not conform to expert judgment about the effects and effectiveness of vari ous treatment applications. When these disparities occurred, the esti mates were modified to reflect expert judgment. The general structure of the patient states was reviewed to insure that the representation shown in Appendix F did in fact portray a set of logical progressions through the care system. Although this examination established the validity of the patient progressions, the review did point out one deficiency in the model's structure. The number and types of treatment alternatives available for use at each patient state were determined by records of actual applications of these treatments in the data used for model construction. It was the judgment of the reviewing practitioners that in several cases the selection of treatment alterna tives for a patient state did not include the 'most appropriate' treat ment alternative. Nevertheless, model deficiency can readily be correct ed. With the collection of data on the effects of these 'most appropriate* treatments, these additional treatment alternatives can be incorporated as decision alternatives for the patient states in question. The reviewing practitioners made selections of treatments for each of the model's patient states. In those cases where the model's treat ment alternatives did not include the practitioners' 'most appropriate* choice of treatments, the practitioners made a selection iron the same list of alternatives used by the model. Appendix G lists their choices of treatment along with each model-generated selection. The two sets of treatment plans include the same treatment selection for 87 out of 94 76 application techniques that should be emphasized in training dental stu dents in craniofacial-pain care. Moreover, the parameters employed in model development, in particular the transition probabilities and refer ral costs, are themselves valuable instructional materials in developing the dental student's treatment-selection slcills. The treatment-planning model provides a method for evaluating new developments in treatment for craniofacial-pain patients. With estimates of the effectiveness of his new treatment, the researcher can use the craniofacial-pain treatment-planning model to get two immediate responses First, the optimization technique of Section 4.2 will determine if this new treatment provides 'better care' for the patient than any of the other treatment alternatives the model has to choose frcm. Second, if optimal treatment selections for the model include the new treatment, the model's statistics will show improvement in length of stay, and other relevant measures of treatment effectiveness, introduced by using this new treatment. In the office of the practicing dentist, the treatment-planning mod el's decisions could provide a concise reference of the treatment selec tions suggested by experts in the field of craniofacial pain. Moreover, the practitioner would have a chance to contribute to the refinement of the listing as the treatment records of his patients could supplement the data used in model construction. In addition, the practitioner could employ the statistics associated with the treatment-planning model in scheduling the length, and number, of his appointments for craniofacial- pain patients. 47 Condition 4: Justification 4 m. A where v >0 and E-3 v.=l r j-1 3 k m-i k I3 v.b. > E1 u.a. = 0 j=l i=l i x m. for any choice of u. such that u.>0 and I u.=l. 1 1 i=l 1 m. Hence, if v >0, there exist no u.>0, E1 u.=l r 1 i=l 1 m. and v.>0, yfcc, E-3 v.=l such that 3 3=1 3 m. m. E3 u.a. = E-3 v.b. . i=l 1 1 j=l 1 3 If the kUL row of A has all elements a^, i=l,2,... ,rru, equal to one, and seme b^. equals zero, m. m. no u.>0, E1 u.=l and v.>0, v ^_0, E-* v.=l 1 i=l 1 3 r j=i 3 exist such that m. m. = t3 E u.a. = EJ v.b. . i=l 1 1 j=l ^ 3 Condition 4 is similar to Condition 3 in that any convex combination of the rows of B that includes a non-zero product of the r1 column yields a k row term whose value cannot equal any convex combination +*Vl of the k row elements of A. Symbolically, for any choice of u. and v., where v >0, jt i j r m. m. , Z3 v.b. < E1 u.a. = 1. j=l ^ ^ i=i ^ 81 that they might readily be employed by sore future investigator. Actual applications of the models should yield significant contributions to the effectiveness of the teacher, researcher, and practitioner. 117 where r. *1 10 z J=1 *1 u u 21 (ar = 1) or absence (a^ 0} of patient-data-vector item r; and the -* 4*V k=l,2, ,n+l, are constants associated with the j discriminant function called 'weights.' These discriminant-function weights, j=l,2,...,p, k=l,2, ,n+l, provide an analytic means of duplicating the correct classification of each pattern observe! by the non-parametric classifier. They provide a link between a pattern's correct classifica tion and the individual components of the pattern's vector representa tion. In essence, each discriminant's weights are additive elements whose component sums have significance in terms of a isolating pattern's correct classification. These weights are a .mathematical means of stor ing information already known about the correct classification of observed pattern vectors. Moreover, the weights can be interpreted from the point of view? of the significance that the practitioner places on each data-vector component. A discussion of this interpretation of the dis criminant-function weights appears in Section 3.2. Central to the use of linear discriminant functions is the assump tion that the space of observable patient data vectors is linearly separable, for by definition [24], a pattern space A is linear and its subsets of patterns A.,A2,...,A are linearly separable if and only if linear x ^ p discriminant functions exist such that for all a in A^ (a) >g^ (a) for all i=l,2,...,p, j=l,2,...,p, j^i. In the context of diagnostic classification, the assumption of linear separability implies that there exists a set of hyperplanes that parti tion the space of observable patient data vectors into convex homogeneous regions, each region representing a unique diagnostic classification. 40 algorithm does find the minimum-cost collection of features X* and the total cost associated with using these features, and guarantees the k existence of iA vectors associated with this optimal feature set. Given this guarantee, the modified fixed-increment algorithm frcm Appendix B * can be employed to find the vectors A, i=l,2,...,p. Choose seme solution to PI. By hypothesis there exists, at least one solution ... ,W ) to PI where X = [1,1,.. .,1,1], Suppose AAA A there is sane other solution (X,W^,W2,... ,W^) where one or more elements A x^ in the X vector are equal to zero. For the constraint matrices in PI, \[X (Wi-Wj)] > 0 i=l,2,...,p j=l,2,... ,p A A If the matrix products [A^D X] = A^, i=l,2, ,p are constructed, then each set of constraints in PI can be written in the form A. (W.-W.)>0 i=l,2,...,p (4) 1 1 J j=l,2,... ,p j#- A The creation of the A^ is called the zeroing process. Of the col- A A urnns of A^, A^ retains all columns j of A^ where x^ = 1, and substitutes A a column of zeros for each of those columns k in A^ where x^. = 0. Using the zeroing process, the feasibility of any possible solution vector _X to Pi can be examined in terms of the A^D X this vector X creates. As an example of the zeroing process for a particular set of patterns, let a1 be a two-dimensional patient-data-vector a1 = [a^a^] where al=1 0 if patient i has normal body temperature 1 if patient i has abnormal body temperature and 65 Here it is possible for the patient to alternate between any one of several diagnostic classifications during the course of his stay in the care system. Note that in both formats for diagnostic-classification transitions a patient moves into the referred state not as a result of a treatment application, but rather as an alternative to further treat ment. To these underlying diagnostic-classification transitions the cranio facial-pain treatment-planning model adds a record of the changes in treatment history. Appendix F displays complete charts of all of the diagnostic-alternative-based patient states included in the treatment- selection model. In these charts the patient states are connected by arcs that represent feasible transitions fran one state to another. Not shown in the charts are the well and referred patient states and the arcs that connect every diagnostic-altemative-based state with these terminal states. Howard [25] establishes that in terms of the policy decisions gen erated by a Markovian decision model, holding-time distributions are im portant only insofar as they affect the mean weighting time in each sys tem state and the expected costs of each state occupancy. The records of the patient visits employed in model construction revealed that, in the care of the patients described by the data, one or more treatments were prescribed at each visit, and a series of return visits was scheduled for the patient following his initial interaction with the practitioner if return visits were warranted. Under these conditions, specifying holding-time distributions for the time between successive patient-state transitions does not refine the model. Therefore, the treatment-planning model employs a Markovian rather than semi-Markovian representation of 83 List of Drugs Taken History of Trauma Location of Swelling 034 Mild Analgesics; Asprin, APC, etc, 035 Moderate Analgesics (non-narcoticl 036 Strong Analgesics; Narcotics and Synthetic Narcotics 037 Anti-anxiety Agents: Mellaril, etc, 038 Anti-arthritic Agents: Steroids, etc. 039 Anti-depressives: Tofranil, etc. 040 Birth Control Pills 041 Hormone Preparations 042 Anti-inflammatory Agents . 043 Muscle Belaxants: Valium 044 Muscle Belaxants: Meprobamate 045 Muscle Belaxants: Others 046 Sedatives: Barbiturates, etc. 047 Other Drugs 048 Accidental 049 Factitial 050 Surgical Side 46 m. m. E1 u.a. = E3 v.b. , . 11 3 ] i=l 3=1 J J for any choice of m. u.>0, E1 u.=l o A > m. E3 i=l 1 3 j=l II H m. = E3 j=l v.b. 3 3 if and only if * E u.a.. = EJ v.b.. . i=l 3k Condition 3: th 1c If the k row of A hasall elements a^, i=l,2,... ,nu, equal to zero, and sane b equals one, m. m. no u.>0, E1 u.=l, and v.>0, v >0, E3 v.=l i- i=1 i 3 r j=]_ 3 exist such that m. m. E1 u.a. = E3 v.b. . i=l 11 j=l 3 3 Justification 3: Under Condition 3 any convex combination of the col umns of B that includes a non-zero product of the column b^ results in a k1 row term greater than zero. The value of the km row term for any convex combina tion of the columns of A is equal to zero. Hence, no set of convex combinations of the columns of A and B can be equal if the combination for B includes a specification that vr>0. Symbolically, A if vr>0, A then for any choice of v^, j=l,2,.-..,iru, j?r, .APPENDIX A CRANIQFACIAL-PAIN PATIENT DATA VECTOR Referral Through 001 Medical GP 002 Medical Specialist 003 Dental GP 004 Dental Specialist Sex 005 Male 006 Female 007 Female, menopausal or post menopausal Age Group 008 0 - * 19 009 20 39 010 40 - - 55 011 56 up Duration of Pain 012 Less than 3 weeks 013 From 3 to 6 weeks 014 More than 6 weeks 015 Episodic Character of Pain 016 Aching 017 Burning 018 Cutting 019 Discanfort 020 Dull 021 Pressure 022 Pricking 023 Sharp 024 Soreness 025 Stinging 026 Tenderness 027 Throbbing Change in Character of Pain 028 Constantly getting 'worse 029 Got worse, then plateaued 030 Got worse, plateaued, then better 031 Getting better 032 Intermittent periods without pain 033 No change since beginning 82 16 The data collected on craniofacial-pain patient progressions through the care system reveal that both prolonged occupation of a single diagnostic state and return visits to the same state occur fre quently. Moreover, as will be discussed in Chapter 4, there are several characteristics of the craniofacial-pain care system that permit reduc tions in the number of input parameters required for a transient Markovian model of this system. Therefore, an uncertain-duration transient Markovian representation of the health-care process has been selected as the means of evaluating the effectiveness of alternative treatment regi mens on patients with craniofacial pain. ANALYTICAL MODELS FOR DIAGNOSTIC CLASSIFICATION AND TREATMENT PLANNING FOR CRANIOFACIAL PAIN By Michael Steven Leonard A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL Fmrnii-ENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1973 To ray wife, Mary ACKNCWLEDGEMENTS Without the considerable contributions of time and effort by the members of his committee, it would have been impossible for the author to have completed this dissertation. In particular, the author expresses gratitude to his Chairman, Dr. Kerry Kilpatrick, for his encouragement and direction during the course of this research effort. The author also thanks Dr. Kilpatrick for his editorial assistance during the development and organization of this manuscript. The author thanks Dr. Richard Mackenzie and Dr. Stephen Roberts for providing the initial direction for this research. Additionally, the author is grateful to Dr. Them Hodgson and Dr. Donald Ratliff for their assistance in evaluating and refining the author's ideas throughout this project. The author expresses his gratitude to Dr. Thomas Fast and Dr. Parker Mahan for the contribution of their extensive knowledge about craniofacial pain to the author's research. The author is deeply appreciative of Dr. Fast's and Dr. Mahan's willing ness to spend many hours examining dental records and their endurance of the nomenclature and idiosyncracies of this mathematical-modeling effort. Financial support for this research was provided by the Health Systems Research Division, J. Hillis Miller Health Center. The division's sup port in conjunction with a traineeship granted by the National Science ! Foundation made it possible for the author to undertake this research. The author is also grateful to the Industrial and Systems Engineering Department for the contribution of computer funds. Additionally the au thor thanks Dr. William Solberg, University of California at Los Angeles; iii Dr. Daniel Laskin, University of Illinois; and Dr.'David Mitchell, University of Indiana, for providing access to the patient records employed in this modeling effort. The author would like to express his thanks to the secretarial staff of the Health Systems Research Division for their translation of the au thor's 'first-order' approximation to handwriting into a draft of this manuscript. Their tolerance of a multitude of last minute changes made by the author has been appreciated. Finally, the author thanks his wife, Mary, and his parents, Dorothy and Charles Leonard, for their encouragement and support throughout the course of this research. M.S.L. August, 1973 iv TABLE OF CONTENTS ACKNCLEDGEMENTS iii LIST OF TABLES vii LIST OF FIGURES viii ABSTRACT ix Chapter 1. Introduction 1 1.1 Craniofacial Pain 2 1.2 Research Objective 8 1.3 Dissertation Overview 9 2. Previous Research 10 2.1 Bayesian Classification Models 10 2.2 Non-Parametric Classification Models 13 2.3 Finite-Horizon Treatment Planning 14 2.4 Uncertain-Duration Treatment Planning 15 3. Diagnostic Classification 17 3.1 Model Components 17 3.2 Alternative Interpretations of Linear Separability 26 3.3 Model Validation 31 3.4 Minimum-Cost Symptom-Selection Algorithm 36 3.4.1 Algorithm Development 39 v 55 3.4.2 Statement of the Minimum-Cost Symptom- Selection Algorithm 3.4.3 Computational Considerations 56 3.5 Model Applications 57 4. Treatment Planning 59 4.1 Model Components 59 4.1.1 Patient States 60 4.1.2 Transition Probabilities 63 4.1.3 Cost Structure 67 4.2 Selection of Optimal Treatments 70 4.3 Model Validation 72 4.4 Model Applications. 74 5. Conclusions and Future Research 77 Appendices A Craniofacial-Pain Patient Data Vector 82 B Modified Fixed-Increment Training Algorithm 87 C Application of the Minimum-Cost Symptom-Selection Algorithm 91 D Treatment Alternatives for Craniofacial-Pain Patients... 97 E Stability of Transition-Probability Estimates 99 F Flow Charts of Patient-State Transitions 104 G Patient-State Treatment Selections 110 H Application of the Patient-State-Labeling and Optimal- Treatment-Selection Procedure 114 BIBLIOGRAPHY 118 BIOGRAPHICAL SKETCH 121 Vi LIST OF TABLES Tables 1. Survey of Diagnostic-Classification Models 12 2. Correlation Between Significant Symptoms and Discriminant-Function Weights.... 30 3. Tests of Diagnostic Classifier Accuracy 32 4. Classification Variability Among Dental Practitioners... 35 5. Mean Transit Times Through the Craniofacial-Pain Care System 75 vii LIST OF FIGURES Figures 1. Temporomandibular Joint 3 2. Diagnostic-Classification and Treatment-Planning Process for Craniofacial Pain.... 7 3. Craniofacial-Pain Diagnostic Alternatives 18 4. Procedure 2 53 5. Diagnostic-Classification Transitions 64 6. Patient-Visit Inconvenience Cost 69 7. Application of the Modified Fixed-Increment Algorithm... 90 8. Multiple-State History-Augmented Process 115 viii Abstract of. Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirement for the Degree of Doctor of Philosophy ANALYTICAL MODELS FOR DIAGNOSTIC CLASSIFICATICW AND TREATMENT PLANNING FOR CRANIOFACIAL PAIN By Michael Steven Leonard December, 1973 Chairman: Dr. Kerry E. Kilpatrick Major Department: Industrial and Systems Engineering This dissertation presents a systematic approach to craniofacial- pain diagnosis and treatment planning using analytic models of the under lying decision-making processes. Patient diagnoses are generated by a linear pattern-recognition classifier trained with a sample of preclas sified craniofacial-pain patient data. For this classifier, an algorithm is developed that minimizes the total cost of the set of features employed in the classifying process. Diagnostic classifications, augmented by a history of prior treatment applications, provide the state descriptions for a Markovian decision model of the treatment-planning process. Cranio facial-pain patient records frcm four university dental clinics serve as a data base for model construction and validation. The analytic models provide a means of duplicating the diagnostic classifications and treatment plans of experts. Approximately 90% of the diagnostic classifier's classifications and 93% of the treatment planning model's treatment selections concurred with the decisions made by experts in the field of care for craniofacial-pain patients. Moreover, the models permit an examination of the critical considerations associated with both decision-making processes. These capabilities are discussed in terms of applications of the models in teaching, research, and in the practice of dentistry. x CHAPTER 1 INTRODUCTION The rapid pace of developments in medical and dental research pre vents the practicing physician and dentist fran fully utilizing each new diagnostic and treatment-planning aid as it is published. In each of the last four years an average of 215,000 new publications have been written to supplement the knowledge of the health-care practitioner [1]. Con currently, the pressures of an ever-increasing patient load force prac titioners to select the most expeditious means for diagnosing disorders and selecting treatments. For example, the medical general-practitioner (1970) saw an average of 173 patients a week [2], and the median dental practitioner (1971) saw two patients an hour [3]. Given these circumr- stances, practitioners may overlook possible diagnostic and treatment al ternatives or they may apply inappropriate treatments. If meaningful analytic descriptions of the diagnostic and treatment-planning processes can be developed, these models can assist educators in training new prac titioners, researchers in evaluating and disseminating new developments, and practitioners in improving the quality of patient care [4]. Developing models of the diagnostic-classification and treatment planning process requires an understanding of the underlying physiological processes of diseases and the mechanisns of their cures. Obviously, the effects of disease and the means of cure vary frcm one health-care prob lem to another. Thus, modeling efforts in diagnosis and treatment plan ning must be integrally related to the facet of health care that is under 1 2 study. This reality prohibits the model builder from making broad state ments about the applicability of his models to other health-care environ ments. Accordingly, the models developed in this dissertation are spe cifically oriental toward the health-care problem presented in Section 1.1 with the understanding that the results of this modeling effort may not be applicable to the whole of health-care diagnosis and treatment planning. 1.1 Craniofacial Pain The head and face are subject to chronic, persistent, or recurrent pain more often than any otter portion of the body. Pain in the head or face has a greater significance to patients than any other pain. It may arouse fears that the patient is in danger of losing his mind or that he has a tumor of the brain. In addition, the emotional state of the patient is adversely influenced because it is generally known by the layman that the profession's knowledge of the causes of these pains is meager and that methods of treat ment are inadequate [5, p. v]. H. Houston Merritt, M.D., Dean Columbia University College of Physicians and Surgeons One source of the pain Dr. Merritt describes is dysfunction of the tenporonandibular joint. The torporamandibular joint, see Figure 1, provides the articulation between the mandible and the cranium. This joint is unique both in its structure and its function. Within the plane of the temporomandibular joint, lateral, vertical and pivoting motion is permitted. In addition, the joint is the point of articulation for the only articulated complex that contains teeth. With this joint, "motion is directed more by the musculature and less by the shape of the artic ulating bones and ligaments than is the fact for otter joints" [5, p. 34]. The fact that joint motion is highly dependent on musculature im plies that when mandibular dysfunction occurs there is sane disturbance 3 Right temporomandibular articulation Inset: Anatomical features of the temporomandibular joint TEMPORCMNDIBUIAR JOINT / 4 of the intricate neuromuscular mechanisms controlling mandibular move ment [5]. Emotional tension may also lead to hyperbonicity of the striated masticatory muscles resulting in facial pain or altered sensa tion without evidence of peripheral dysfunction. In addition, abnormal occlusal contacts of the teeth may affect muscle tonicity resulting in mandibular dysfunction [5]. Moreover, the tenporcmandibular joint is prone to disorders carmon to all joints: rheumatoid arthritis, osteo arthritis, traumatic injuries, neoplasms, and nonarticular disorders. Although the term craniofacial-pain* is a broad classification for pain in the head and face, the term is used in this dissertation to describe pathological, congenital, hereditary-based, or emotional causes of pain in and around the temporomandibular joint. Though the degree of severity may vary, one or more of the following four 'cardinal symptoms' are exhibited by the craniofacial-pain patient: pain, joint sounds, limitation of motion, and tenderness in the mastic atory muscles [6]. Accompanying these symptoms the patient may complain of, or the practitioner may find, hearing loss, burning sensations, mi graine-like headaches, vertigo, tinnitus, subluxation, luxation, dental pulpitis, sinus disease, glandular disorders, occlusal disharmony, and radiographic evidence of joint abnormality. The degree of association of these additional symptoms and findings with the etiology of the joint disorders is subject to considerable variation. Paralleling these areas of anatomic dysfunction is the possibility that the craniofacial-pain patient may be suffering from psychic dis orders. In no other type of patient seen by the dentist does psychic condition play a larger role [7]. Most craniofacial-pain patients have symptoms or signs of anxiety, and a sensory preoccupation with the oc- 5 elusion of their teeth [8]. Many of these patients can be characterized by a heavy reliance on denial, repression, and projection of their psy- ciiic disorders in order to maintain their self-concept of emotional sta bility [6] Often the complaints these patients relate to the practi- . tioner are not canpatible with any objective signs. The practitioner who manages the care of craniofacial-pain patients . assumes a difficult task. For sane of these patients, diagnosis is ob vious. Generally, however, the craniofacial-pain patient presents a.com plex combination of signs and symptoms [7]. More than one disease en tity nonrally accounts for the patient's symptoms and most craniofacial- pain patients suffer frem a pain-dysfunction complex involving a combina tion of masticatory muscle disorders, occlusal disharmony, emotional . tension, and anxiety [5]. Nevertheless the possibility of multiple almost sub-clinical etiologic factors combining to produce the dysfunc tion and pain must be considered. The close relationship of organic and emotional disorders as they appear in craniofacial-pain patients provides the examining dentist with the problem of discriminating which factor is primary in the etiology of the patient's dysfunction [7]. Unfortunately, the temporomandibular joint is one of the most difficult areas of the body to examine radiographically [8]. Hence, with these patients, the dentist relies to a large degree on tests of emotional stability and physical examination by visualization, palpation, and auscultation [7]. Therapeutic measures for the care of craniofacial-pain patients are as varied as the factors contributing to the disorder. "A small percent age of patients with symptoms referrable to the temporomandibular joint will portray such a confusing picture that consultation with other, dental or medical specialists is indicated" [7, p. 129], The majority of these 6 patients will exhibit symptoms that lead to any one of several alterna tive courses of patient care. Altering the occlusion of the natural teeth is one means of treating craniofacial-pain patients. Although in many cases minor occlusal abnormalities are only contributing factors to a patient's pain, attention by the dentist to occlusion is at least partially successful for a majority of craniofacial-pain patients [8]. However, it is important in early therapy not to alter the occlusion ir reversibly. Treatment by means of tooth extraction or endodontics, jaw fixation, prosthetic devices, or by topical treatments may also be sug gested by the patient's symptoms. The articular surface of the mandib ular condyle has an excellent reparative capacity [6]. Thus, the use of sedatives, antibiotics, and muscle relaxants, along with physical therapy, often leads to patient 'cures' as these treatments ease the patient's pain and increase jaw mobility while natural restoration of the joint is in progress. If, after a reasonable length of time (3 to 6 months) the pa tient's symptoms are not relieved, the dentist may consider referral to another source of care or therapy such as surgery [7]. typically, the health-care process for craniofacial-pain patients may be viewed as following the format of Figure 2 [9]. When a patient is admitted into the care system, he undergoes a data-collection process. This involves taking a 'full and pertinent' patient history and a phys ical examination of the areas of discomfort. The data gathered consist of symptoms, signs, medical and/or dental history, physical examination findings, psychosocial information, and so forth. Once these elements have been elicited, a diagnosis is attempted. If this is not yet pos sible, the severe symptoms are treated and the patient's health state is monitored. 7 FIGURE 2 DIAGNOSTIC-CLASSIFIGriCN AND TREATMENT- PLANNING PROCESS FOR CRANIOFACIAL PAIN 8 When initial treatment does not result in a 'cure' for the cranio facial-pain patient, treatment effects are evaluated and new data col lected. When a patient's diagnostic classification leads to a course of treatment that is not within the realm c-f the practitioner's special ty he is referred to a more appropriate care source. Monitoring is con tinued on those patients not rejected iron the system at this point, and the patient is discharged when he is symptom-free. However, when other disorders have been isolated during the course of treatment, the patient is recycled through the classification-treatment process. The diagnosis-treatment sequence is not fixed. Treatment can begin prior to a diagnostic classification or treatment can follow a diagnosis. Moreover, there may be many diagnostic-treatment data-acquisition cycles before the patient is considered 'well.' 1.2 Research Objective . The introductory discussion of the need for diagnostic and treatment planning models, and the brief description of the craniofacial-pain care system, provide the setting for a statement of the research objective un derlying this dissertation. This objective is to derive analytic repre sentations of the decision processes involved in selecting diagnostic classifications and planning treatments for craniofacial-pain patients. A diagnostic-classification model that duplicates the classification of expert practitioners is sought. For treatment planning, the modeling goal is to provide a structure for interaction of the critical considera tions associated with the treatment-selection process. These analytic representations will be structured to permit their application as teaching devices in the training of dental practitioners, as methods of testing the effects of new diagnostic tools and treatment applications, and as aids to the practice of dentistry. 9 This research objective will be met by developing: 1. A diagnostic-classification model based on the theory of non-par ametrie pattern classification, with a. criteria for applicability of the modeling technique to diagnostic classification b. model.validation for craniofacial-pain patients c. development of a minimum-cost symptom-selection algorithm 2. A Markovian representation of the treatment-selection process, with a. justification for utilizing a Markovian model of the underlying care system b. model validation for craniofacial-pain patients 3. A description of potential model applications in teaching, research, and practice. 1.3 Dissertation Overview In Chapter 1 the motivation and scope of this dissertation was pre sented. Chapter 2 provides a review of literature relevant to the diag nostic and treatment-selection processes. A model of .the diagnostic- classification process is developed in Chapter 3. Chapter 4 follows with an analytic representation of the treatment-planning process. Con clusions derived iron this model-building effort, and suggestions for future research, are presented in Chapter 5. CHAPTER 2 PREVIOUS RESEARCH Over three-hundred publications have been addressed to the problem of modeling the diagnostic and treatment-planning process. Spanning ' fourteen years, this research has considered such diverse problems as the classification of liver biopsies [10] and the optimal plan for treating mid-shaft fractures of the femur [11]. At least ninety-one disorders have been utilized as environments for developing diagnostic and treatment-planning models. The magnitude of this research effort emphasizes the need for analytic representations of these complex deci sion-making processes. Fortunately, the significant contributions in this voluminous literature can be neatly partitioned into four distinct categories. Re search in diagnostic classification has been based either on the applica tion of Bayesian statistics or on the vise of non-par ame trie pattern classifiers. Treatment planning has been presented as either a finite- horizon decision problem or as an application of decision analysis to a Markov process of uncertain duration. This section presents a brief dis cussion of each of these categories and evaluates their suitability as analytic representations of the process of providing health care for craniofacial-pain patients. 2.1 Bayesian Classification Models Bayesian diagnostic-classification models, such as [12, 13, 14, 10 11 15, 16], irak a diagnosis on the basis of selecting a patient's 'most probable' disease state. The Bayesian classifier is an elementary type of parametric pattern-classification model. In general, parametric classifiers make use of one or more of the statistical characteristics of the dispersion of the data being classified to establish rules for data classification. With the Bayesian models, only the conditional probabilities for exhibiting sets of symptoms, given a particular dis ease, are tabulated from past medical data. Then, utilizing Bayes' theorem, the probabilities for the presence of alternate diseases *.. ,d^ can be calculated as a function of the syirptcm-ccrrplex S the practitioner observes in the patient. Bayes* theorem provides that for each of the d^ P(dilS) = C(S)PCS|di)P(di) n where C(S) = 1/[E PiSjcyPicy}, k=l hence, a patient with syiptcm-ccmplex S is classified in disease-group i if P(d. |S) = max p(d, |s). 1 k A survey of the results of application of Bayesian models is given in Table 1. Although the percentage of correct diagnoses in most of these test applications is high, there are several reasons why a Bayesian diagnos tic irodel is not used as the means of generating diagnostic classifica tion in this dissertation. The first reason is the difficulty in ac quiring the proportional presence of alternate diseases P(d^}, i=l,2,...,n, in the population of patients that are to be classified by the model. These 'prior' probabilities of having a particular disease are a function 12 TABLE 1 SURREY OF DIAGNOSTIC-CLASSIFICATiai MODELS Bayesian Classifiers Reference Number Disease Group Number Of Patients In Study % Correct Patient Diagnoses [12] Nontoxic Goiter 88 85.3 [13] Bone Tumor 77 77.9 [14] Thyroid 268 96.3 [15] Congenital Heart 202 90.0 [16] Gastric Ulcer 14 100.0 Non-Parametric Classifiers Reference Number Disease Group Number Of Patients In Study % Correct Patient Diagnoses [17] Liver 52 98.1 [18] Asthma 230 90.0 [19] Hematologic 49 93.9 [20] Thyroid 225 96.0 13 of seasonal variation, geographic location, population demography, and many other factors. Secondly, valid Bayesian analysis requires the analyst to determine the dependence among exhibited symptoms for each disease considered by the diagnostic model. In this respect, the prob abilities for the presence of groups of symptoms are independent for sane diagnostic alternatives and strongly correlated for others [4]. The third reason for not selecting a Bayesian model is the massive storage requirement dictated by the necessity of keeping the set of conditional probabilities. These conditionals, P(S|d^) for every observable symptcm- canplex S and. every disease i considered, must be at hand each time the model is used. For example, given ten alternate diseases and ten symp toms for which no assumptions of between-symptcm independence can be made, storage is required for 10-(210-1), or 10,230, conditional probabilities. 2.2 Non-Parametric Classification Models Non-paramatric diagnostic models, like [17, 18, 19, 20], utilize non-parametric pattern classifiers, a form of pattern recognition model ing. In the literature on pattern recognition, the term 'non-parametric' implies that no form of probability distribution is assumed for the dispersion of symptom data in establishing the rules for pattern classi fication. These models do assume, however, that classes of symptom data are distinct entities and, hence, a patient with a particular set of symptom S cannot simultaneously occupy more than one diagnostic state. That is, the models assume a deterministic classification for each pat tern viewed by the pattern classifier where every observable pattern has one, and only one, correct classification. Non-parametric modeling permits the analyst to bypass the difficult problems of explicitly determining the conditional probabilities for, 14 and the dependence, among, syirptcms that are required for Bayesian analysis. With the non-parametric classifier, a diagnosis is generated for the practitioner by evaluating a discriminant function associated with each, diagnostic classification, g^(), i=l,2,...,n. As was the case with the Bayesian models, the values of these discriminants are a function of the symptom-complex S exhibited by the patient. The patient's diagnostic classification corresponds to that disease whose associated discriminant- function value is maximum. That is, a patient with symptoms S is classi fied in disease-group i if gi(S)>gk(S) for all k f i. Results frcm seme of the applications of pattern-recognition classi fiers are presented in Table 1. In these test applications diagnostic accuracy was consistently high. Because of these models' ease of imple mentation and small storage requirements, a non-parametric pattern classi fier is preferable as a vehicle for generating diagnostic classifications. The use of a non-parametric classifier is further motivated by features of the care process for craniofacial-pain patients discussed in Chapter 3. 2.3 Finite-Horizon Treatment Planning In the realm of research on modeling the treatment-planning process, several authors [9, 21, 22] have presented schemes for analysis that utilize methods for making decisions tinder risk and uncertainty. The treatment-selection process has alternately been defined as a two-person zero-sum game, structured as a decision tree, and modeled as a Markov process of limited duration. Treatment costs and the 'costs' of occupy ing 'non well' or terminal patient states, provide the basis for select ing an 'optimal1 treatment plan. Finiteness of the planning horizon is assured either by establishing a maximum permissible number of treatment 15 applications, or by considering at any stage of analysis the effects of a fixed number of future treatments. Validation of the decisions gen erated by these models has thus far been limited to checks on the feasi bility of the treatment regimens selected. Unfortunately, the finite- horizon models either do not consider the possibility of a patient's prolonged stay in the health-care system, as is the case of the models with a maximum number of possible treatments, or, where only a fixed number of future treatments is considered, they provide no more than a heuristic treatment-selection procedure. 2.4 Uncertain-Duration Treatment Planning Bunch and Andrew [11] have considered the possibility of prolonged occupation of the same diagnostic state during the course of a patient's progression through the care system. In their Markovian representation of the care system for mid-shaft fractures of the femur, they provide this modeling refinement. As a consequence of this modification, the number of treatment decisions made for each patient is a random variable with no fixed upper bound. Howard's iterative scheme for policy selec tion [25] provides the means for choosing the optimal treatment regimen by selecting treatment alternatives that maximize the relative 'value' of occupying each disease state. Although the Bunch and Andrew model did not consider return visits to the same disease state, a more generalized Markovian representation could incorporate that possibility. Neverthe less, the proximity to reality that this category of transient Markovian models provides requires considerable effort as holding-time distribu tions, treatment 'costs,' and transition probabilities must be supplied by the analyst for all treatment alternatives at each of the disease states in the care system. 16 The data collected on craniofacial-pain patient progressions through the care system reveal that both prolonged occupation of a single diagnostic state and return visits to the same state occur fre quently. Moreover, as will be discussed in Chapter 4, there are several characteristics of the craniofacial-pain care system that permit reduc tions in the number of input parameters required for a transient Markovian model of this system. Therefore, an uncertain-duration transient Markovian representation of the health-care process has been selected as the means of evaluating the effectiveness of alternative treatment regi mens on patients with craniofacial pain. CHAPTER 3 DIAGNOSTIC CLASSIFICATION The analytic model developed to provide diagnostic classifications for craniofacial-pain patients is based on the principles employed in non-parametric pattern classification. The patterns classified by this diagnostic model are vector representations (see Section 3.1 and Appen dix A) of the craniofacial-pain patient's physical and emotional status. In the first sections of this chapter the' theoretical background for the diagnostic model is established. This discussion is followed by a pre sentation of the validation procedures used to evaluate model perfor mance. Next, an algorithm is developed to reduce the 'costs' associated with model utilization. The chapter closes with a discussion of poten tial applications of the craniofacial-pain diagnostic classifier in teaching, in research, and in the health-care process. 3.1 Model Components In the initial phase of the development of the diagnostic-classi fication model a set of possible alternative diagnostic classifications was established for craniofacial-pain patients. Figure 3 provides a list of these possible classifications. Note that the alternative classi fications in Figure 3 are not mutually exclusive as a craniofacial-pain patient classified in sane diagnostic alternative 'A' could also have the disorder specified by sane other diagnostic alternative 'B.' However, for the purposes of this dissertation, each patient's diagnostic 17 18 1. Temporomandibular Joint ArthritisDevelopmental 2. Temporomandibular Joint ArthritisInfectious 3. Temporomandibular Joint ArthritisOsteo (Degenerative) 4. Temporomandibular Joint Arthritis-Traumatic (Acute) 5. Temporomandibular Joint Arthritis Traumatic (Chronic) 6. MyopathyAcute Trauma 7. MyopathyMyositis 8. Oral PathologyDental Pathology 9. Vascular ChangesMigrainous Vascular Changes 10. Myofacial Pain-Dysfunction MalocclusionBalancing Interferences 11. Myofacial Pain-Dysfunction MalocclusionLateral Deviation of Slide 12. Myofacial Pain-Dysfunction MalocclusionUneven Centric Stops 13. Myofacial Pain-Dysfunction PsychoneurosisAnxiety/Depression 14. Myofacial Pain-Dysfunction Bruxism 15. Myofacial Pain-Dysfunction Reflex Protective Muscular Contracture 16. Myofacial Pain-Dysfunction Loss of Posterior Occlusion 17. Neuropathy FIGURE 3 CRANIOFACIAL-PAIN DIAGNOSTIC ALTERNATIVES 19 classification is made on the basis of specifying that etiological fac tor that requires most inmediate action on the part of the attending practitioner. Thus, diagnostic classification of a patient into diag nostic alternative 'A' signals that the etiology specified by that al ternative should determine the course of the patient's care. The next step in model development isolated relevant data which measured the physiological and psychological status of craniofacial-pain patients. In particular, this step of model development sought those elements of patient status that practitioners employ in their own classi fication of craniofacial-pain patients. Appendix A presents a list of these data elements. Wherever it was feasible, measures of patient status were segmented to amplify the significance of particular readings of each measure. Thus, for example, while the duration of a patient's pain is a continuous measure of his status, it is important for the par- poses of classification to know whether a craniofacial-pain patient's duration of pain is less than 3 weeks, from 3 to 6 weeks, or longer than 6 weeks. For this measure of patient status, a short history of pain indicates a strong possibility of a recent traumatic injury virile pain over a long period is more likely associated with long standing arthritic or psychic disorders. To facilitate the development of an analytic model of the diagnostic- classification process, a vector representation of the relevant elements of patient data has been developed. The vector permits the notation of any of the data elements shown in the listing in Appendix A. The pre sence of any of the items found in Appendix A is recorded in a patient's data vector by an entry of '1' in the vector-dimension corresponding to the item number, while the absence of a vector item is noted by a 'O' 20 data-vector entry. For example, referring to the listing in Appendix A, a male patient would have the following fifth, sixth, and seventh ele ments in his data vector (...,1,0,0,...), while a pre-manopausal female would have the series of elements (...,0,1,0,...). This vector notation of a patient's status serves as the input data for a non-par aire trie pattern classifier that assigns a diagnostic classifica tion to the patient's dysfunction. Non-parsmetric pattern classification, as described in Meisel [23] and Nilsson [24], is the process of creating decision surfaces that separate patterns into homogeneous classes,Ct, i=l,2,...,p, specified by the analyst. In the craniofacial-pain diagnostic model, the (t are the diagnostic alternatives shown in Figure 3. Classification of a pat tern (a patient' s-data-vector) into one of the classes is performed by a pattern classifier composed of a maximum detector and a set of dis criminant functions. These discriminant^ (a), j=l,2, ,p, are single valued functions of each patient's data-vector a. If au represents a data vector for a patient whose correct diagnostic classification is the x1 diagnostic alternative, then the (a) are chosen so that gi-i*>gj-i) i, j=l,2,...,p, j^i. The craniofacial-pain classifier uses linear discriminant functions. These discriminants are linear in the sense that they provide mappings from E11 to that exhibit the form gj(a) =a1wjl+a2wj2+...+^wjn4.(n+1) where in the patient-data-vector a, the value of a^ denotes the presence 21 (ar = 1) or absence (a^ 0} of patient-data-vector item r; and the -* 4*V k=l,2, ,n+l, are constants associated with the j discriminant function called 'weights.' These discriminant-function weights, j=l,2,...,p, k=l,2, ,n+l, provide an analytic means of duplicating the correct classification of each pattern observe! by the non-parametric classifier. They provide a link between a pattern's correct classifica tion and the individual components of the pattern's vector representa tion. In essence, each discriminant's weights are additive elements whose component sums have significance in terms of a isolating pattern's correct classification. These weights are a .mathematical means of stor ing information already known about the correct classification of observed pattern vectors. Moreover, the weights can be interpreted from the point of view? of the significance that the practitioner places on each data-vector component. A discussion of this interpretation of the dis criminant-function weights appears in Section 3.2. Central to the use of linear discriminant functions is the assump tion that the space of observable patient data vectors is linearly separable, for by definition [24], a pattern space A is linear and its subsets of patterns A.,A2,...,A are linearly separable if and only if linear x ^ p discriminant functions exist such that for all a in A^ (a) >g^ (a) for all i=l,2,...,p, j=l,2,...,p, j^i. In the context of diagnostic classification, the assumption of linear separability implies that there exists a set of hyperplanes that parti tion the space of observable patient data vectors into convex homogeneous regions, each region representing a unique diagnostic classification. 22 Rosen [26] has provided a restatement of this assumption in the require ment that the sets of data vectors corresponding to each diagnostic al ternative have non-intersecting convex hulls. In either' form, this is a fairly restrictive assumption on the dispersion of patient data vec tors (see Section 3.2). Selecting the 'weights' for each of the discriminant functions is a process known as 'training.' For the linear non-parametric classifier, training generates each discriminant function's w. ,'s by applying a sys- JK tematic algorithm to the members of a set of representative patterns with pre-established classifications. Nilsson [24] discusses several algorithms suitable for training the craniofacial-pain diagnostic classifier. In the course of using these algorithms for model development, a new 'mod ified fixed-increment' training algorithm was constructed (see Appendix B). Employing the new algorithm has resulted in a reduction of approx imately 35% in the amount of training time required to derive the weights for the craniofacial-pain classifier. Symbolically, the craniofacial-pain diagnostic classifier, with its set of trained weights, can be represented in the following format: let an = the 296-dimension data vector describing patient 'i' a^ = the k element in the data vector describing patient 'i', whose value is either zero or one, k=l,2,...,295 (by definition a^ 296~^ Cj = diagnostic alternative 'j', j=l,2,...,17 d^j = the value of the discriminant function for diagnostic alternative 'j' generated by the data vector of patient 'i' 23 VA = the 296-diirension vector of weights associated with diagnostic alternative j1 til w., = the k element in the weight vector W., that is rj295,wj2961 T and 296 ik jk where T denotes vector transposition. Patient 'i' is classified in diagnostic alternative Ch when d^j>d^s for every j. If mgx d^t is not unique, then it is not yet possible to classify patient i' into one of the diagnostic alternatives. Treatment is prescribed for severe syrptcms and classification is attempted at a later date. Data frcm four sources were used to construct and verify the diag nostic-classification model, as well as the treatment-planning model presented in Chapter 4. Contributions of clinical records came frcm the dental schools at the universities of California at Los Angeles, Florida, Illinois, and Indiana. In all, the records of 250 patients, involving a total of 480 patient-practitioner interactions, form the data base for model building and validation. The relevant information frcm each of these patient visits has been recorded in the data-vector format of Appendix A. A diagnostic classification frcm Figure 3 was assigned to each of these patient data vectors by either Dr. Thanas B. Fast, Chairman of the Division of Oral Diagnosis, or by Dr. Parker E. Mahan, Chairman of the Department of Basic Dental Sciences, at the College of Dentistry, University of Florida. 24 With this basic structure for the diagnostic-classification model, the classified patient data vectors, and the training algorithm presented in Appendix B, an initial test was performed to verify that the space of observed patient data vectors was separable by linear discriminant func tions. Application of the modified fixed-increment training algorithm to the set of 480 data vectors verified this requirement, as the algo rithm terminated in a set of feasible discriminant-function weights. Using the discriminant functions these constants determine, it is possi ble to duplicate the pre-established diagnostic classifications for each of the patient data vectors. This first test of the diagnostic classifier established that a non- parametric classifier could be employed to reproduce the original clas sifications for each data vector used in model construction. However, this test does not reveal how well the classification model will perform on patient data not employed in developing the discriminantfunction weights. The remainder of this section, and Section 3.3, address the question of how the diagnostic classifier performs on 'new' patient data vectors, that is, vectors that have no duplicate in the training sample. Model training has created a set of weights that, by the definition of the training procedure, correctly classify every patient data vector that 1 ips within the bounds of the training-sample pattern-class convex hulls. Since every data vector is a binary vector, new patient data vectors must fall outside the convex hulls established by the training- sample vectors. Yet, if new data vectors have a number of data-vector elements that are identical to those of the training-sample vectors with the same diagnostic classification, then this relationship will be reflected in a 'close proximity,1 as measured by a Euclidean-distance 25 function, between each new vector and its associated training-sample convex hull. Given this close proximity, the classifier's discriminant functions should correctly classify most new data vectors as these vec tors will lie within or near the boundaries of the appropriate discrim inating hyperplanes. Hence, the key to providing adequate classifier performance for new data vectors lies in devising data-vector-represen- tations of patient data for which the data vectors of a canton diagnostic classification exhibit strong similarity. In the introductory discussion of the elements of patient data used in the patient data vector, it was pointed out that an effort was made to select components of patient status that assist the practitioner in his selection of diagnostic classifications for a craniofacial-pain patient. Thai these elements were partitioned to generate as much discriminating information as possible frcm each data element. In terms of the alter nate diagnostic classifications, these elements of patient data were chosen so that all patients in any one diagnostic classification would have a unique combination of exhibited or non-exhibited data-vector ele ments. Employing these carefully constructed qualitative data elements resulted in a set of 'natural' gaps in the vector representations of patient data iron alternate diagnostic classifications. The fact that there are portions of the pattern space that cannot be occupied by any data vector, and partitions of the space where the vectors of each clas sification must lie, assists the classifer in making correct classifica tions of data not used in model construction. As Section 3.3 shows, this discussion is not meant to imply that the craniofacial-pain diagnostic classifier can, in its present state of development, correctly classify every new data vector. What has been 26 stated is that a knowledge of the underlying classifying process can be employed in constructing the data vector examined by the classifier, and that fully utilizing this information will lead to a classifier that can be expected to be capable of performing well on new patient data. Of course, this discussion has been predicated on the separability of the underlying pattern space of data vectors. If this requirement is not met by sane form of patient-data-vector representation, classifica tion of patients by linear classifier is not possible. The next section of this chapter provides relationships between linear separability and the data that may be observed in a health-care system for which diagnostic classification by linear discriminants is being considered. This section has a dual purpose. First, linear sep arability is couched in 'non-geanetric* terms. Second, and more impor tantly, using the craniofacial-pain health-care system as an example of the section1 s developments provides information about the suitability of the non-par ame trie classifier as a model of the decision-making pro cess associated with diagnostic classification in this care system 3.2 Alternative Interpretations of Linear Separability The criteria for pattern space separability are mathematically concise. Unfortunately, these separability criteria are not readily expressible in non-gecmetric terms. The discussion developed in this section provides the reader with sane non-geanetric criteria that indi cate when the use of a non-parametric pattern classifier should be con sidered as a means of generating diagnoses for a medical or dental dis order. The first criterion is associated with a probabilistic measure of symptom exhibition. Given a patient who exhibits sane set of symptoms 27 S, non-pararretric pattern classification requires that P [S} 3 =1 for the diagnostic alternative 'CL' that describes the patient's current diagnostic status, and P[S|C^] = 0 for all other diagnostic alternatives 'C^..' However, assume that for the disorder in question the probability of exhibiting any relevant symptom has been calculated from historical data, that is, estimates of Pts^JCh] are available for all relevant symptoms s^ and all diagnostic alternatives Ct. Then, if the following decision rule leads to the correct classification of a majority of the patients with the disorder in question, utilization of a non-parametric classification model should be investigated: classify a patient who exhibits the set of symptoms S in the th j diagnostic alternative if IT P[si|Cj] > TT for 311 ^3* d) s.eS s.eS i i- Since (1) holds if and only if log tTT P[si|Cj]] > log tfT p[silC]c^ for 311 s^eS s^eS decision rule (1) can be expressed in terms of logarithms. Let the set of symptoms S be represented as a row vector a with the elements of a assigned values as follows: a^ = 1 if symptom s^ is an element of S and a^ = 0 if symptom is not an element of S, where n is the total number of relevant symptoms. Form the column vectors Wj = [log Pts-JCh], log P[s2|Cj],..., log P[sn|c_.]]T 28 Then log [JJ Pts^Jc..]] = aWj, and decision rule (1) can be restated as s.eS i classify a patient who is characterized by the vector a in the j diagnostic alternative if aWj > a^_ for all k^j. (2) Note that decision rule (2) is identical to the decision rule employed in non-parametrie pattern classification. This equivalence implies that if (1) holds for every preclassified patient examined, the values log P[s.JCj] form a set of feasible discrim inant-function weights. If (1) leads to the correct classification of a majority of the patients examined, it is logical to assume that there may be a set of feasible discriminant-function weights. This assumption was examined using the craniofacial pain patient data. Fran the data vectors classified in Diagnostic Alternatives 13, 14, and 15, a total of 189 patient visits, the P[s^|Cj] were calculated. Each data vector was then classified with decision rule (1), and 164 of the data vectors (86.7%) were assigned to their pre-established diagnostic alternative. The second criterion provides a subjective measure of the feasibil ity of using a nan-parametric pattern classifier. If symptoms for most of the diagnostic alternatives, associated with the disorder of interest, can be isolated such that 1. a patients exhibition of a subset of these symptoms leads the practitioner to a selection of one of the diagnostic alternatives, or 2. a patient's exhibition of a subset of these symptoms leads the practitioner to eliminate from further consideration 29 one of the diagnostic alternatives, then the use of a non-parainetrie classifier as a means of generating classifications should be investigated. The linear non-parametrie classifier employes a weighted sum of the symptoms exhibited by each patient in its discriminating functions. If symptoms can be isolated that are significant to the classification of patients with the disorder under investigation, then there is a 'natural' weight for each of these symptoms in the decision-making pro cess used by the practitioner. The existence of these natural weights increases the probability that a training algorithm will be able to find a feasible set of discriminant-function weights. Indeed, the relative importance of the significant symptoms may be reflected in the magnitude of the discriminant-function weights generated by the application of a training algorithm. As an example, the significant symptoms associated with two cranio facial-pain diagnostic alternatives, Alternatives 4 and 14, were isolated by Dr. Fast. A comparison of these symptoms and their associated dis criminant-function weights revealed a high degree of correlation between symptom significance and discriminant-function-weights, see Table 2. The reader should note that both of the criteria discussed in this section are heuristic approximations to the geometric requirement for pattern space separability. However, if the disorder under investigation meets one or both of these criteria, it may be possible to employ a non- parametric classifier to diagnose the disorder since the requirement for pattern space separability is most likely met. 30 TABEE 2 CORRELATION BETWEEN SIGNIFICANT SYMPTOMS AND DISCRIMINANT-FUNCTION WEIGHS Diagnostic Alternative 4: Temporomard.tbular Joint Arthritis-Traumatic (Acute) Significant Symptoms Discriminant-Function Weights (+) Duration of Pain (less than 3 weeks) + 3 (+) History of Trauma (accidental) +30 (+) Preauricular Pain +11 (-) Salivary Gland Disease -12 (-) Otitis 1 (discriminant-function weights for Diagnostic Alternative 4 range fran -19 to +37) Diagnostic Alternative 14: Myofacial Pain-Dysfunction Bruxism Discriminant-Function Significant Symptoms Weights (+) Duration of Pain (more than 6 weeks) +15 (+) Facets + 2 (+) Bruxism and/or Clenching +56 (-) History of Trauma (accidental) -16 (-) Salivary Gland Disease 5 (discriminant-function weights for Diagnostic Alternative 14 range from -23 to +56) Note: For both Diagnostic Alternatives (+) indicates a symptom that leads the practitioner to classify a patient in that diagnostic alternative (-) indicates a symptom that leads the practitioner to classify a patient in seme other diagnostic alternative 31 3.3 Model Validation Vali.daiJ.on of the craniofacial-pain diagnostic-classification model presented in Section 3.1 has been accomplished by three types of validating procedures. The discussion presented in the preceding sec tions, and in particular the relationship between significant symptoms and their associated weights shown in Table 2, reveal a close proximity between the decision-making process the practitioner utilizes and the non-parametric classifier's symptom-weighing scheme. This section pre sents -favo other procedures employed in evaluating the diagnostic clas sification model's performance. The first procedure involved testing the diagnostic accuracy of the classification model on patient data that were not employed in model construction. Six classification tests were run in sequential order. In the first five of these tests random samples of 50 patient-data-vec- tors were drawn from the data base of 480 vectors discussed in Section 3.1. Then, as each of the tests was performed, the training algorithm in Appendix B was applied to the remaining 430 data vectors. With the weights derived from the training algorithm, the sample of 50 patients was classified. The modelgenerated classifications for each of the data vectors were compared to the classifications assigned to the vectors when they were created. As each test classification of a sample was completed, the diagnostic classifier's discriminant-function weights were set equal to zero, the sample of data vectors was returned to the data base, and the next test's random sample was drawn. A summary of the re sults of these tests of diagnostic accuracy is presented in Table 3. In each of the first five tests it was possible for a patient who has had multiple practitioner-visits to have seme of the vectors repre- 32 TABLE 3 TESTS OF DIAGNOSTIC CLASSIFIER ACCURACY Number of Patient Data Vectors Number of Data Vectors Correctly Classified Classifier TEST ONE 50 46 92.0% TEST' TWO 50 45 90.0% TEST THEEE 50 44 88.0% TEST FOUR 50 47 94.0% TEST FIVE 50 45 90.0% TEST SIX 51 43 84.3% Mean Classifier Accuracy 89.7% Standard Deviation of Classifier Accuracy 3.5% 33 senting these visits in a test's random sample and sane vectors used in model construction. Such occurrences lead to test results that over estimate classifier accuracy. lienee, in Test Six, a random sample of all of the patient data associated with 40 patients (a total of 51 patient data vectors) was selected. This sample was classified by the diagnostic-classification, model using the remaining 429 data vectors as a data base. The results of this test are included in the data shown in Table 3. There is one other possible factor affecting the classifier's accuracy as measured by these tests. It is conceivable that there were duplicate data vectors in the data base of 480 patient-data-vectors. If duplicates do exist and were included in both the test samples and the samples' training bases, measures of classifier accuracy will be overly optimistic. However, since 'noise' is introduced by the variabil ity among craniofacial-pain patients and generated in the practitioner's transcribing of the elements of patient data into the data-vector format, 295 and since there are 2 possible data vectors, the probability that two or more of the data-based patient vectors include an identical specifica tion of data-vector elements is small enough to justify neglecting this possibility and its effects. The results summarized in Table 3 reveal that the diagnostic-clas sification model performs well in duplicating the diagnostic classifica tions originally assigned by the reviewing practitioners, Dr. Fast and Dr. Mahan. Moreover, the size of the test sanples was quite large in relation to the data base employed in developing each test's diagnostic model. As new data became available and are incorporated in the para meters of the model, the accuracy of the craniofacial-pain diagnostic classifier can be expected to increase slightly. 34 The second validating procedure established a measure of variability on the diagnostic classifications that might be given by different dental practitioners. The discussion presented in Section 1.1 related the dif ficulties associated with diagnosing craniofacial-pain disorders. Prac titioners with varying kinds of professional experience can be expected to reflect their dissimilar backgrounds in differing diagnostic classi fications for these patients. To measure the variability associated with dissimilar backgrounds, five craniofacial-pain data vectors were selected from the data base employed in constructing the craniofacial-pain diag nostic classifier. Four dentists from the staff of the College of Den tistry at the University of Florida were asked to review these patient data vectors and assign to each of them a diagnostic classification. Table 4 summarizes their assignments and also includes the diagnostic classification originally given by the reviewing practitioners. The variability in diagnostic assignments reflected in Table 4 re affirms the justification for the research objectives set forth in Section 1.2. Same of the differences in the practitioners' choices of diagnostic classifications can be explained by the limited amount of data contained in each of the data vectors, and the less-than-full med ical statement of each of the diagnostic alternatives. Nevertheless, a diagnostic-classification model that generates classifications that are in 90% agreement with those of experts in the field provides a sizeable improvement over the variability in classification assignments exhibited in Table 4 in which only half the respondents agreed on a single diag nosis in four out of five cases. TABLE 4 CIASSIFICAIIGN VARIABILITY 'AMONG DENTAL PRACTITIONERS Diagnostic Classification for Patient 1 Patient 2 Patient 3 Patient 4 Patient Original Classification 4 13 15 15 9 Practitioner 1 1 7 15 15 3 Practitioner 2 6 12 15 8 3 Practitioner 3 4 15 15 15 13 Practitioner 4 4 15 15 14 * * No classification given + Patient 5 exhibited a minimal amount of input data (only 17 non-zero'data-vector entries) These four dental practitioners exhibited 100.0% agreement of the diagnosis on one of the five patients, and 50.0% agreement on the diagnostic classification of the remaining four patients. 36 3.4 Minimum-Cost Symptom-Selection Algorithm The craniofacial-pain diagnostic-classification model detailed in the previous sections of this chapter has been structured upon the data vector of the 295 relevant signs, symptoms, and items of patient history shown in Appendix A. To utilize this model , the practitioner must ex amine a patient for the presence or absence of each of these data vector elements. Although the cost in time and fees varies frcm item to item, there is an expense to the practitioner, and to the patient, associated with checking each element in the data vector. Hence, it is logical to investigate the possibility of finding a reduced data vector that 'costs' less for the patient and practitioner to use and yet still permit cor rect classification of all craniofacial-pain patients. A review of the literature (see Meisel [23] Chapter 9 for a survey) reveals that many authors have considered the task of selecting a set of features to be used in a pattern-classification scheme. Traditional methods of viewing this problem are based on a search for a transforma tion that takes a given set of patterns into same 'new' pattern space where separation by discriminant functions is possible. Measures of pattern class separability are employed to evaluate the effects of transforming the set of patterns frcm one space to another. In general, these transformations take a pattern representation in 'n' features and create a set of 'r' (r ever, to reduce the 'costs' associated with using the craniofacial-pain diagnostic classifier, a transformation must be found that decreases the size of the data-vector pattern space by eliminating features rather that combining them. For example, assume patients were diagnosed on 37 the basis of body-temperature and blood-pressure readings. Traditional techniques for feature selection might employ a linear combination of body temperatures and blood pressure measurements as one 'new* feature. The transformation sought in this investigation would lead to the clas sification of patients by either body temperature or blood pressure alone if this were possible. This example will be used again in Section 3.4.1 to illustrate the algebraic and geometric structure of the problem. Nelson and Levy [27] have attacked the problem of selecting a re duced set of unaltered features for use in a classification scheme. These authors attach a cost to the use of each available feature/ and employ a ranking scheme to measure each feature's discriminating power. Then, under a restriction on the total cost of features employed, they develop an algorithm that selects the set of features that maximizes the classifier's discriminating power. Unfortunately, their scheme does not guarantee the selection of a subset of original features that contain enough 'information' to permit pattern class separation by discriminant function. Therefore, a new algorithm is presented in this section that minimizes the cost of the set of features used by the pattern classifier yet insures that all patterns can be correctly classified by a set of linear discriminant functions. In the remainder of this section the more general terms 'feature,' 'pattern,' and 'pattern class' will be used respectively to represent a data vector item, a patient's data vec tor, and a diagnostic classification. The problem of finding a minimum-cost collection of features would not be considered if there did not already exist a set of 'n' features by which the patterns under examination could be correctly classified by linear discriminants. That is, given a *n' dimensional representa- 38 tion of each of the 'nr' patterns in each of the *p* pattern classes -im= [aiim/ai2m/***,ainm,11/ i=l,2,...,p, where m a., k=l,2,...,n, equals either zero or one, there must exist JJv a set of 'n+1' dimensional Vhs, j=l,2,...,p, such that Â£um (Vh-W,.) > 0 for all m=l,2,...,im (3) i1/2/.../P j=l/2/.../P j^i. Letting be. the im* (n+1) dimensional matrix of patterns in pattern- class i, then the requirement of (3) can be written in the following form: A. (W.-W.) > 0 i=l/2/.../P x ~x j==l/2/.../P j^i. If such pattern representations and Vh 's exists, then a solution to the following problem yields a minimum-cost collection of pattern-classifying features: minimize CX subject to A. [X (W.-W.)] >Â£ i=l,2,...,p x i ] j=l/2,... ,p PI: 39 where A. i "11 1 a.., a._ .. a. ll i2 in a 2 a 2 ail ai2 a.2 on m. m. m. . a.-i a.0i... a. x 1 L ll i2 xn W. = [w.. ,w.w. ,w. .,] i ll i2 xn xn+1 T C [c1,c2,...,cn,0] r X [x.^,x2, . ,x^,l] and Wy, is an unrestricted variable Cj is the cost of using feature j x. =4 i 0 if feature i is not used 1 if feature i is used . Note: The notation is to be read as element by element multiplication i.e., QDR = S [s^j] = [q^r^..]. 3.4.1 Algorithm Development The algorithm developed to solve problem PI is an enumerative algorithm similar in structure to that of Balas [28]. Unfortunately, the ncn-linear nature of problem Pi's constraints prohibits full imple mentation of the more powerful techniques used in implicit enumeration on linear integer problems. The structure of these constraints and their effect on the optimization of PI will be discussed in a step-by- step development. The minimum-cost feature-selection algorithm does not solve PI to the extent of finding the values of the vectors iA, i=lf2,,,.,p. This 40 algorithm does find the minimum-cost collection of features X* and the total cost associated with using these features, and guarantees the k existence of iA vectors associated with this optimal feature set. Given this guarantee, the modified fixed-increment algorithm frcm Appendix B * can be employed to find the vectors A, i=l,2,...,p. Choose seme solution to PI. By hypothesis there exists, at least one solution ... ,W ) to PI where X = [1,1,.. .,1,1], Suppose AAA A there is sane other solution (X,W^,W2,... ,W^) where one or more elements A x^ in the X vector are equal to zero. For the constraint matrices in PI, \[X (Wi-Wj)] > 0 i=l,2,...,p j=l,2,... ,p A A If the matrix products [A^D X] = A^, i=l,2, ,p are constructed, then each set of constraints in PI can be written in the form A. (W.-W.)>0 i=l,2,...,p (4) 1 1 J j=l,2,... ,p j#- A The creation of the A^ is called the zeroing process. Of the col- A A urnns of A^, A^ retains all columns j of A^ where x^ = 1, and substitutes A a column of zeros for each of those columns k in A^ where x^. = 0. Using the zeroing process, the feasibility of any possible solution vector _X to Pi can be examined in terms of the A^D X this vector X creates. As an example of the zeroing process for a particular set of patterns, let a1 be a two-dimensional patient-data-vector a1 = [a^a^] where al=1 0 if patient i has normal body temperature 1 if patient i has abnormal body temperature and 41 42 Note that relation (2) is the requirement for pattern separability by linear discriminants. Hence, a vector X is a component in a feasible A A A A A solution ** to ^ anc^ only ^ there exist VA i=l,2,...,p, such that (2) holds for all i^j. As discussed in Section 3.1, a pattern A space is linearly separable, and hence, feasible VA exist, if and only if the individual pattern classes have non-intersecting convex hulls. For the pattern vectors considered in this section, the individual components of each of the patterns in each pattern class are either zero or one. As there is a one-to-one correspondence between the individual patterns in a pattern class and the vertices of the pattern class's convex hull, the A convex hull of a pattern-class A^ can be expressed as all convex combina- * tions of the individual pattern-class vectors a/, m=l,2,... ,m^. Consider the following examples of the convex-hull representation of linear separa bility . Assume = [1,0], a^ = [1,1], a^ = [0,0], and a^ = [0,1]. Graphically this pattern space can be represented as 2 2 ^Y X Feature 2 **0 **-X 1 1 2y ^x Feature 1 12 where the line X from a^ to a^. represents the convex hull of pattern 1 2 class X and the line Y from to represents the convex hull of pattern-class Y. Since X and Y do not intersect, implying that the space is linearly separable, it is possible to draw an infinite number of lines 0 that serve as discriminating hyperplanes. 43 Assume a^ = [1,0], a^ = [0,1], aj = [0,0], and aj = [1,1] Graphically this pattern space can be represented as 2 Y 12 where the line X from ^ to ^ represents the convex hull of pattem- 1 2 class X and the line Y frcm to represents the convex hull of pattern-class Y. Since the lines X and Y intersect, the pattern space is not linearly separable, and hence, it is impossible to draw a discri minating hyperplane 0. Therefore, the following condition is equivalent to condition (4): c t a vector X is feasible to PI if and only if there do not exist b and U such that where <3 t A u A = U A for any s=l,2,...,p S *"* V tl,2,...,p s^t U1 = [UjyU^, ,Uj^ ] (5) uk^- > 0 for all k=l,2,...,m^ and m. E1 = 1 for all i=l,2,...,p . k=l K Checking the feasibility of seme vector X by condition (5) yields [p(p-l)]/2 distinct subproblems. Each of these subproblems may be 44 characterized as follows: let A = A and A = B with A and B having columns a. " 'S v 1 P2: and bj respectively for any Ag and At. m. m. Find u. > 0, E1 u.=l, and v. > 0, E3 v.=l i such that m i=l 3 - 3 j=l m. v3 Z1 u.a. i=l 11 j=l 3 3 ZJ v.b. . If such u^ and v^ exist for any one of the subproblems then X is not feasible to Pi. Because the number of subproblems is large even for a relatively snail number p of pattern classes, there is justification for seeking methods to expedite the solution of each subproblem P2. To achieve this goal, a series of conditions will be presented that characterize seme of the criteria necessary to the existence of a solu tion to subproblem P2. In addition to establishing criteria for exis tence, these conditions provide a means for reducing the size of the matrices A and B. This reduction will be discussed after the conditions are established. th k Condition 1: If the k row of A has all elements a^, i=l,2,... ,nu, equal to zero (one) and the k 1 row of B has all V elements b., j=l,2,...,m, equal to one (zero) then no u.>0, ed u.=l and v.>0, E-1 v.=l exist such that 1 1=1 1 j=l ^ m. m. Z1 u.a. = = E-5 v.b. . i=l 11 j=l 3 3 Justification 1: Under Condition 1 there is no set of convex combina- tions of the k 1 row elements of A and of the k1 row elements of B such that the combinations are equal. 45 Condition 2: Justification 2 Hence, there can be no set of convex combinations of the columns of A and of B such that the combina tions are equal. Symbolically, m. m. since no u.> 0, Z1 u.=l and v.>0, Z-* v.=l l 1=1 3- j=1 3 exist such that in. m. . Z1 u.a. = Z3 v.b. , i=l 11 j-1 33 m. m. '3 ,T -I no u.>0, Z u.=l, and v.>0, ZJ v.=l 1_ i=l 1 3 j=l 3 exist such that m. m. Z1 u.a. = Z3 v.b. . i=l 1 1 j=l 3 3 If the k row of A has all elements a^, i=l,2,... ,nu, equal to zero (one) and the k row of B has all Jr elements b^, i=l,2,...,iru, equal to zero (one), the +*Vi K11 ot of matrixes A and B can be eliminated without loss of possible solutions to subproblen P2. Under Condition 2 every convex combination of the k row elements of A and of the K 1 row elements of B are equal. Hence, a set of convex combinations of the columns of A and of the columns of B are equal if and only if the convex combinations of the remaining rows til (all rows except the kul row) are equal. Symbolically, * tlx let a^ denote the pattern a^ whose k component has * been eliminated and similarly let b., denote the 3K elimination of component k from pattern b^, then as 46 m. m. E1 u.a. = E3 v.b. , . 11 3 ] i=l 3=1 J J for any choice of m. u.>0, E1 u.=l o A > m. E3 i=l 1 3 j=l II H m. = E3 j=l v.b. 3 3 if and only if * E u.a.. = EJ v.b.. . i=l 3k Condition 3: th 1c If the k row of A hasall elements a^, i=l,2,... ,nu, equal to zero, and sane b equals one, m. m. no u.>0, E1 u.=l, and v.>0, v >0, E3 v.=l i- i=1 i 3 r j=]_ 3 exist such that m. m. E1 u.a. = E3 v.b. . i=l 11 j=l 3 3 Justification 3: Under Condition 3 any convex combination of the col umns of B that includes a non-zero product of the column b^ results in a k1 row term greater than zero. The value of the km row term for any convex combina tion of the columns of A is equal to zero. Hence, no set of convex combinations of the columns of A and B can be equal if the combination for B includes a specification that vr>0. Symbolically, A if vr>0, A then for any choice of v^, j=l,2,.-..,iru, j?r, 47 Condition 4: Justification 4 m. A where v >0 and E-3 v.=l r j-1 3 k m-i k I3 v.b. > E1 u.a. = 0 j=l i=l i x m. for any choice of u. such that u.>0 and I u.=l. 1 1 i=l 1 m. Hence, if v >0, there exist no u.>0, E1 u.=l r 1 i=l 1 m. and v.>0, yfcc, E-3 v.=l such that 3 3=1 3 m. m. E3 u.a. = E-3 v.b. . i=l 1 1 j=l 1 3 If the kUL row of A has all elements a^, i=l,2,... ,rru, equal to one, and seme b^. equals zero, m. m. no u.>0, E1 u.=l and v.>0, v ^_0, E-* v.=l 1 i=l 1 3 r j=i 3 exist such that m. m. = t3 E u.a. = EJ v.b. . i=l 1 1 j=l ^ 3 Condition 4 is similar to Condition 3 in that any convex combination of the rows of B that includes a non-zero product of the r1 column yields a k row term whose value cannot equal any convex combination +*Vl of the k row elements of A. Symbolically, for any choice of u. and v., where v >0, jt i j r m. m. , Z3 v.b. < E1 u.a. = 1. j=l ^ ^ i=i ^ 48 Note that Conditions 3 and 4 can also be stated, and justified, with .the role of the elements of the A and B matrices reversed. Given this set of four conditions, consider the following row par tition of the A and B matrices:- A = "a* " 'b* *1 *1 B = Bi *0. *0 I o o 1 1 W o i where by appropriate change of rows in A and B 1. every element in each row of A^ is a one 2. every element in each row of B^ is a one 3. every element in each row of A^ is a zero 4. every element in each row of Bq is a zero. The partitions A^, B^, A^, and B are the rows of A and B corresponding to B1, A1, Bq, and AQ, respectively, and A* and B* are the remaining rows of A and B. With this partitioning and the four previously established conditions, the size of the data vectors associated with many of the [p(p-l)]/2 subproblems P2 can be significantly reduced. The reduction process, Procedure 1, can be stated in this manner: Step 1: If for seme row k in A^ (B^) each element in the corre sponding row of B (Ap is equal to one, then row k of A and B can be eliminated by Condition 2. 49 Step 2: If for sane row k in AQ (Bq) each element in the corre sponding rcw of Bq (Aq) is equal to zero, then row k of A and B can be eliminated by Condition 2. Step 3: Step 4: Step 5; Step 6: If for sane row k in Aq (Bq) the corresponding row in Bq (Aq) has all elements equal to one or if for sane row (B1) elerrents equal to zero, then this particular subproblem P2 has no feasible solution by Condition 1. Procedure 1 and the search for a solution to P2 are terminated at this point because the convex hulls of pattern-classes A and B do not intersect. If for sane row k in (B^) the corresponding row in (a) has one or more elements equal to zero, i.e., , k k _,k n ,k_ k_ k r s t r s t columns b ,b ,...,b. (a ,a ,...,a ) can be eliminated by XT S l XT S t Condition 3. If for sane row k in AQ (BQ) the corresponding row in Bq (Aq) has one or more elements equal to one, i.e., = b* =.. = 1 (aÂ¡W=.. .=a^=l) then X S u XT S u columns b ,b ,...,b. (a .a ,...,a) can be eliminated by rs trs t Condition 4. If the use of Steps 1, 2, 4, and 5 has eliminated all elements of both matrices, then this particular subproblem has an infinite number of feasible solutions by Condition 2. Procedure 1 and the search for a solution to P2 are terminated at this point because the convex hulls of the pattern-classes A and B intersect. c c the corresponding row in B^ (A^) has all 50 Step 7: If the use of Steps 1, 2, 4, and 5 has eliminated one or more rows or columns iron either matrix then repartition the matrices and return to Step 1, otherwise terminate Procedure 1. In coding Procedure 1 for computer processing, there is no need to physically partition the rows of the A and B matrices. Summing the elements in any row of A or B reveals whether the individual elements in the row are all equal to zero or are all equal to one. Given this infor mation, the steps from Procedure 1 determine whether a pattern is re moved iron A or B, whether a row in A and B is removed, or whether the procedure should be terminated because no feasible set of convex combina tions for P2 exists. As an example of the use of Procedure 1 consider the set of matrices A and B in subproblem P2 where 0 1 1 - 0 1 1 1 1 A = 1 0 0 0 B = 0 0 0 0 1 0 0 0 1 1 1 0 0 1 1 1 1 1 1 0 In the first application of the steps of Procedure 1: 1. Column 4 can be eliminated from matrix A by Step 4 and 2. Column 1 can be eliminated from matrix A by Step 5. After the first application of the steps of the procedure 1 1 1111 A = 0 0 B = 0 0 0 0 0 0 1110 1 1 1 1 0_ 51 In the second application of the steps of Procedure 1: 1. Pow 1 can be eliminated from both matrices by Step 1 2. Pow 2 can be eliminated fron both matrices by Step 2 and 3. Column 4 can be eliminated frem matrix B by Step 4. After the second application of the steps of the procedure 0 0 1 1 l A = B = 1 1 . 1 1 1^ In the third application of the steps of Procedure 1: 1. Pew 2 can be eliminated fron both matrices by Step 1 and 2. Procedure 1 can be terminated by Step 3. Hence, for this set of A and B matrices, subproblem P2 has no feasible solution. Although the use of Procedure 1 may lead to a reduction in the size of most subproblems, the pattern vectors (a^ and b^) for ach of these problems may still be quite large. Restating subproblem P2 as a linear program yields P3: minimize [0 0] subject to and A -B 11...1 00...0 00...0 11...1 u > 0 V > 0 u* o' V 1 - 1 where the existence of any solution vectors U* and V* signals the inter section of the convex hulls of pattern-classes A and B. 52 Consito: the dual of P3, written in the following form: P4: maximize [0_ 1 1] n_ h h - n>-X1,X2 unrestricted in sign. Note that P4 may have many associated ir^ variables, but has only as many constraints as the number of patterns in A and B (as reduced by Procedure 1). P4 always has at least one solution to its constraint set. Thus, if an application of a linear-programming algorithm to P4 reveals the exis tence of an unbounded solution, then P2 has no solution. Therefore, if and only if P4 has a bounded solution do u. and v. exist such that x 3 m. I1 u.a. i=l 1 1 xu Â¡P v.b. j=l 3 3 where and m. u. >0, I1 u. = 1 1 i=l 1 m. v. > 0, iP v. = 1. 3 j=i 3 The preceding discussion with its development of a reduction proce dure and dual formulation provides the structure for a second procedure. Procedure 2 establishes a mechanism to verify the feasibility of any a. assignment of zeros and ones to the X vector of problen PI, see Figure 4. That is, given seme vector X and a set of patterns a, rtt=l,2,... ,rm, and i=l,2,...,p, the [p(p-l)]/2 subproblems P2 are formed by zeroing out 53 FIGURE 4 PROCEDURE 2 54 the appropriate pattern-vector elements. Then Procedure 1 is applied to each sutoproblem. Finally, for each pair of pattern classes the A boundedness of the dual formulation P4 is examined. Vector X represents a feasible set of a pattern-classifying features for PI if and only if each of the [p(p-l)]/2 subproblem formulations P4 is unbounded. Before a statement of the algorithm to solve problem PI is presented several terms must be defined. The assignment vector is defined as a listing of variables x^, elements of the vector X in Pi, whose values have been determined by .the steps of the algorithm. The elements in this vec tor are recorded with the value of their assignment, either zero or one. These elements are entered in the vector in the order they were assigned, with the first algorithm assignment in the first (left) position. For example, consider the assignment vector [x4 = 0, x10 =1, x2 = 0]. This vector records that the algorithm first assigned x^ equal to zero, then assigned x^q equal to one, and its last assignment was x2 equal to zero. Feasibility of a solution X, as determined by the assignment-vector component values, is checked by Procedure 2 with the value of those vari ables not included in the assignment vector temporarily set equal to one. The value V of an assignment vector is defined as minus one times the sum of the costs associated with each of the variables in the assignment vector, multiplied by the value assigned to the respective variable. For the example assignment vector, [x^ = 0, x^q = 1, = 0], where c^ = 5, c^q = 2, and = 7, the assignment vector has the value V = (-1) [5(0) +2(1) +7(0)] =-2. 55 3.4.2 Staten Step 0: Step 1: Step 2: Step 3: Step 4: Step 5: 'ent of the Minimum-Cost Synptcm-Selection Algorithm Create the assignment vector (at this point the vector is null as there is no variable assignment in the vector). Set V*=- and go to Step 4. Start at the right side of the assignment vector and move to left, stopping at the first variable assigned a zero value. If no variable in the assignment vector has a zero assignment, go to Step 2. Otherwise go to Step 3. Calculate V for the assignment vector. If V is greater than V*, record the values of the variables in the assign ment vector as the optimal solution X* to PI. Otherwise, record (as the optimal solution X* to PI) the values of the variables in the best current solution X. Terminate the algorithm. Change the value of the variable isolated in Step 1 to an assigned value of one, and eliminate frcm the assignment vector all variable assignments to the right of this new assignment. If the assignment vector includes the assign ment XÂ£=l for every x^ in X return to Step 2. Otherwise go to Step 4. Select a variable x^ that is not an element of the assign ment vector. Assign this variable the value x^=0 in the assignment vector. Use Procedure 2 to check the feasibility of this assignment. If the assignment vector is not fea sible, go to Step 6. Otherwise go to Step 5. If the assignment vector with the new assignment x^=0 does not include an assignment for every x^ in X, return to Step 4. Otherwise go to Step 7. 56 Step 6: If the assignment vector with the assignment x^-1 variable selected in Step 4) does not include an assignment for every x^ in X, return to Step -4. Otherwise go to Step 7. Step 7: Calculate V for the assignment vector. If V* is greater than V, go to Step 1. Otherwise go to Step 8. Step 8: Record as the best current solution X the values of the variables in this assignment vector. Set V*=V, and return to Step 1. Note that in the course of applying this algorithm all solutions are considered and the best current solution is replaced only when another solution has a larger associated value. As the number of possible solutions is finite, the algorithm must terminate, and at this termination the value of the optimal solution and its assignments are known. An application of the minimum-cost symptom-selection algorithm is presented in Appendix C. 3.4.3 Computational Considerations Returning to the setting of diagnostic classification of craniofacial- pain patients, application of the minimum-cost symptom-selection algorithm 295 would require an enumeration (explicit or implicit) over 2 possible solutions in order to find the optimal collection of data-vector elements. As the number of possible solutions is prohibitively large, heuristic modifications to the symptom-selection algorithm are required for this application. One possible modification could employ the fact that only a few of the elements in the patient data vector have large associated 'costs' for their utilization. In particular, the eight elements of radiographic data and the two measures of emotional trauma are significant ly more 'costly' to examine than the other items in the data vector. (x, is the 57 With this modification, the algorithm would only consider eliminating these ten high cost features. Another heuristic approximation to the optimal collection of features might rank the data-vector elements in order of descending cost of utilization. Procedure 2 would then be used to eliminate these components one by one, starting with the item of high est cost, until the procedure signaled an infeasible solution to PI. Cer tainly, other heuristics might also be developed to exploit the structure of this algorithm. 3.5 Model Applications The structure of the craniofacial-pain diagnostic-classification model permits model utilization for a variety of purposes. Since the model is developed in terms of general data-vector and diagnostic-alterna tive parameters, these model components can be altered to suit the appli cation in question. This section presents a brief discussion of seme of the possible applications of the diagnostic classifier. In a teaching environment, the diagnostic-classification model with its set of discriminant weights can be stored for computer-terminal ac cess. Then, on a set of tutorial example patients, students can compare their diagnoses with those of the diagnostic model. Moreover, the student can interact with the classifier in constructing his own 'sample* patients for the classifier to diagnose. Finally, the student can request the classifier to relate those discriminant-function weights that the model employs in considering the 'significance* (Section 3.2) of any one or group of symptoms. The effectiveness of new diagnostic tests can be evaluated using the minimum-cost symptcms-selection algorithm. This algorithm provides an immediate measure of the 'worth' of new research developments. Given a 58 cost for employing a new test, the algorithm returns an evaluation of the test's classifying capability. The algorithm reveals whether the test is included in the minimum-cost collection of features and whether the lose of the new test permits the practitioner to discontinue other examination procedures. Additionally, the algorithm can be employed to point out new areas for research, as it isolates diagnostic alternatives where correct classification of patients is difficult using existing tests and procedures. As employed in the practitioner's office, the diagnostic classifier veil provide a direct link between the practicing dentist and the know ledge of experts in the field of craniofacial pain. Information will flow over the link in both directions. As new patients are seen by the practitioner, the record of each visit will be reviewed by experts and then used to supplement the data base employed in model construction. Then, when developments dictate, new sets of discriminant-function weights can be transmitted to the dental practitioners. This kind of interaction results in a more accurate and representative diagnostic classifier as the patient-sample data base becomes larger. CHAPTER 4 TREATMENT PLANNING The selection of treatment regimens for craniofacial-pain patients is modeled as a Markovian decision process. The states in this Marko vian model are descriptions of a patient's health-care status and the decision alternatives are feasible treatments for the patient's dys function (see Section 4.1). In the first two sections of this chapter, motivation for the model structure is provided and the components of the decision model are developed. The third section provides a descrip tion of the validating procedures used to determine the appropriateness of the model and the model-generated treatment decisions. This chapter closes with a discussion of potential teaching, research, and private practice applications of the treatment-planning model. 4.1 Model Components Several model-building components from the craniofacial-pain care system are isolated to permit the construction of a Markovian represen tation of this system. A set of state descriptions that characterize, for decision-making purposes, the status of craniofacial-pain patients is presented in Section 4.1.1. Then transition probabilities measuring the effects of treatment applications are discussed in Section 4.1.2. Section 4.1.3 overlays the model's state descriptions and transition probabilities with costs accrued during the patient's progression -through the care system. These components are integrated and verified in the discussions of Sections 4.2 and 4.3. 59 60 Values for many of the treatment-planning model's parameters viere gathered from the set of patient records discussed in Section 3.1. As the patient histories from the contributing university dental clinics were reviewed, notations of treatment applications and time between suc cessive visits were made for each patient-practitioner interaction. The values of the remaining model parameters were either estimated by the reviewing practitioners, Dr. Fast and Dr. Mahan, or were gathered frcm responses to questionnaries completed by patients who visited the University of Florida's Dental Clinic. In modeling the complicated pro cess of care for craniofacial-pain patients, several simplifying assump tions were made. This section provides the motivation for these assump tions and presents the notation employed in the analytic description of the treatment-planning process. 4.1.1 Patient States In general, a Markovian system structure requires that the current state of the system completely characterizes the probabilities associated with future state occupancies of the system. To fully satisfy this Markovian condition for state structure in the craniofacial-pain treat ment-planning model would require that the model include as distinct mod el states every possible combination of diagnostic classifications a pa tient might have occupied, in conjunction with every combination of treat ment applications he might have undergone, during his stay in the care system. Unfortunately, such a model would have an infinite number of patient states.' However, for a majority of craniofacial-pain patients the know ledge of a patient's prior treatment record, coupled with his current diagnostic classification, is adequate to determine his prior diagnostic 61 classifications. Even in the cases where the current classification and prior treatment record do not provide a total description of a pa tients condition, these elements of patient status do provide signifi cant information about the probabilities associated with, a patient's future status in the care system. For example, in the data employed in model construction, 47 craniofacial-pain patients occupied Diagnostic Alternative 15 and were treated with an application of drugs at least once. Eight of these patients were 'well* after a first treatment with drugs, while 39 required multiple applications of drugs or other treat ments during their stay in the system. Yet of the 12 patients who were given two applications of drugs, 9 were well* following the second repetition of drug therapy. Thus, while the overall data-based transi tion-probability estimate for a transition from Diagnostic Alternative 15 into the well state following any one application of drugs is .36, the transition-probability estimate for a transition into the well state following two successive applications of drugs is .75. Hence, for this diagnostic classification, information on the prior application of drugs is important in determining a patient's future status in the care system. This form of 'current diagnostic classification augmented by treat ment record' patient-state description is employed in the craniofacial- pain treatment-planning model as an approximation to a 'true' Markovian state structure. Each of the diagnostic alternatives shown in Figure 3 forms the basis for a collection of patient states. The diagnostic al ternative is augmented with a record of treatments that have been applied since the patient entered the care system. Appendix D provides a list of the treatment alternatives that may be prescribed for craniofacial- pain patients. The record of each treatment given to the patient is noted in the patient-state descriptions without regard to its chronological 62 order. For example, a patient's occupation, of the state *<111,2,2 denotes that he is currently classified in diagnostic alternative J, and that since he entered the care system he has been treated with one application of treatment 1 and two applications of treatment 2. .Augmenting the patient-state descriptions with treatment history expands the dimensionality of the state space, yet the number of history- augmented states remains finite for two reasons. The treatment records used in model construction reveal that, for sane combinations of diag nostic alternatives and treatment applications, there is a feasible limit to the number of treatment repetitions that can be given to any one patient. Thus, the first reason for a finite state space is that no patient state in the treatment-planning model includes more repetitions of a particular treatment than the clinical data have established as a feasible limit. As an example, the records of patient visits used in model construction establish a feasible limit of only one application of treatment 18 for patients classified in any of the diagnostic alter natives. Therefore, the treatment-planning model includes patient states that exclude treatment 18 as a portion of their treatment history or exhibit the form 'Jl...,18,... for each diagnostic classification J* where 18 is a feasible treatment. The second reason for a finite state space is that there is a boundary application* of many treatments such that neither the treatment-record data nor the reviewing practitioners established differences between the transition probabilities for the boundary application and those for further repetitions of the treatments (see Section 4.1.2 and Appendix E). In Diagnostic Alternative 13, for example, the first application of treat- 63 ment 24 is the boundary repetition of that treatment. Hence, multiple repetitions of treatment 24 are not added to the state description of patient states based on Diagnostic Alternative 13, as the additional information on multiple applications does not influence transition pro babilities associated with this treatment's effectiveness. Thus, a second application of treatment 24 for a patient who continues to be classified in Diagnostic Alternative 13 places the patient in a state of the form 1 "3T 24 1 XJ X / f The craniofacial-pain treatment-planning model includes two terminal patient states in addition to the patient states that are based on diag nostic alternatives. One or the other of these two terminal states, 'well' or 'referred,' represents the patient's status when he exits the care system. A patient exists the system in the 'well' state when the effects of treatment applications result in sufficient improvement so that no further treatment is required. The patient moves into the 're ferred' state in lieu of further treatment. This alternative to treat ment is selected when the 'expected costs' of remaining in the care sys tem exceed the costs of referring the patient to another source of care (see Section 4.1.3). 4.1.2 Transition Probabilities Patient-state transitions that involve a change of diagnostic clas sification follow one of two basic formats, see Figure 5. For the initial diagnostic classifications in Format I, with each treatment application, the patient either remains in his original diagnostic classification or he transits into the well state. For Format II, the six diagnostic al ternatives shewn in the lower illustration form a different structure. 64 Format I Patients whose first-visit diagnostic classification is Diagnostic Alternative 1, 2, 3, 4, 5, 6, 10, 11, 14, 16, of 17, make transitions out of their original classification 'I' according to the following figure: Format II For patients originally classified in Diagnostic Alternative 7, 8, 9, 12, 13, or 15, the following kinds of diagnostic-classification transitions are possible: 65 Here it is possible for the patient to alternate between any one of several diagnostic classifications during the course of his stay in the care system. Note that in both formats for diagnostic-classification transitions a patient moves into the referred state not as a result of a treatment application, but rather as an alternative to further treat ment. To these underlying diagnostic-classification transitions the cranio facial-pain treatment-planning model adds a record of the changes in treatment history. Appendix F displays complete charts of all of the diagnostic-alternative-based patient states included in the treatment- selection model. In these charts the patient states are connected by arcs that represent feasible transitions fran one state to another. Not shown in the charts are the well and referred patient states and the arcs that connect every diagnostic-altemative-based state with these terminal states. Howard [25] establishes that in terms of the policy decisions gen erated by a Markovian decision model, holding-time distributions are im portant only insofar as they affect the mean weighting time in each sys tem state and the expected costs of each state occupancy. The records of the patient visits employed in model construction revealed that, in the care of the patients described by the data, one or more treatments were prescribed at each visit, and a series of return visits was scheduled for the patient following his initial interaction with the practitioner if return visits were warranted. Under these conditions, specifying holding-time distributions for the time between successive patient-state transitions does not refine the model. Therefore, the treatment-planning model employs a Markovian rather than semi-Markovian representation of 66 the care system, since a 'nl visit holding time in a particular patient state can be modeled with no loss of information as Tn' repetitions of the 'virtual* transition frcm the state in question to itself. Care for craniofacial-pain patients is modeled as a discrete-stage Markovian sys tem with the beginning of visits to the practitioner serving as stage indicators. Using the history-augmented patient states, transition probabilities are specified in terms of the treatment that generated the transformation. In making a state-transition following a treatment, a patient must move to a state that includes that treatment as a portion of its state descrip tion. For example, following application of treatment *k,' a patient must progress frcm patient-state 'Ilm,n* to 'Jlk,m,n' where I may be equivalent to J.1 The only exception to this rule is in the application of a treatment beyond its boundary number of repetitions. Here, if treat ment k1 has a boundary number of two, then following an application of treatment k' three or more times a patient progresses frcm patient state 'IIk,k,m,n' to 'Jl^k^n1 where again *1' maybe equivalent to *J.1 This structure is indicated because inclusion of more than the boundary number of applications (two in this case) in the state description does not affect the transition probabilities. Estimates of the values of the transition probabilities were ob tained frcm the patient records discussed previously. A discussion of the stability of these probability estimates under variations in patient data is presented in Appendix E. Where the data on the effects of treat ment alternatives were limited, the data-generated probability estimates were refined by estimates frcm the reviewing practitioners. Notationally, transition probabilities are represented in the analytic model in the 67 following form; Jr p the probability of making a transition from JuJ patient-state 'I' to patient-state 'J* following the application of treatment-alternative 'k.1 4.1.3 Cost Structure A patients progression through the craniofacial-pain systsn gener ates a multitude of implicit and explicit costs. The explicit costs can be measured in terms of the dollar charges paid by the patient or the practitioner during the patients stay in the system. Other costs are implicit in nature and can be quantified only as they relate to the opportunities lost by the patient and the practitioner while the pa tient remains in the care system. For modeling purposes four major system costs have been isolated. These costs are: (a) Cost of treatment applications (b) Cost of the practitioner and his staff's services (c) Cost to the patient of occupying a non-well patient-state (d) Patient-referral cost. Although these costs do not encompass all of the system costs, they mea sure significant explicit and implicit charges associated with a patient's stay in this system. In the treatment-planning model, each of these costs is charged on a per-patient-visit basis. Costs of the various treatment applications and the costs associated with the practitioner and his staff's services were estimated by the re viewing practitioners. Estimates of treatment and care-system service costs were partitioned by diagnostic classification as well as treatment 68 category- The cost estimates reflect typical charges in a dental clinic environment. The inconvenience experienced by a patient in making a visit to the practitioner was used as a measure of the cost of occupying a 'non-well' patient state. Estimates of this inconvenience cost were gathered from responses to a questionnaire completed by patients at the University of Florida's Dental Clinic. These were general dental patients not neces sarily suffering from craniofacial pain. Figure 6 shows the distribution of these patient estimates. Values for patient-referral costs were composed of the sum of three distinct estimates. The first component was an estimate of the total fee charged by the practitioner receiving the referred craniofacial-pain patient. Record transferral and duplication costs, as well as the fees lost by the referring practitioner, formed the second component. The third component of the patient-referral cost is a measure of the incon venience experienced by the referred patient, a value estimated by using a multiple of the value of the inconvenience cost discussed in the pre ceding paragraph. Appendix G provides a justification for using this particular combination of components in the referred-cost estimates. Symbolically, the patient-state transition costs (negative constants) are represented in the analytical model as k cTT = the sum of the costs generated by the transition Xu from patient-state 'I' to patient-state 'J' following the application of treatment 'k.' This sum includes the type (a), (b), (c), and (d) costs appropriate to each patient-state transition. 69 Fifty-eight patients at the University of Florida's Dental Clinic responded to the following questions; Hew much would you estimate that this trip to the Dental Clinic cost you in terms of lost wages, baby sitting fees, transportation costs, and other costs that you may have had to pay so that you could be hare for your appointment? The distribution of these, estimates is shown in this histogram. .99 9. 19. 29. 39. 49. 59. 69. 79. 300. Dollars The mean value for these 58 estimates of patient-visit inconvenience costs was $30.72. FIGURE 6 PATIENT-VISIT INCONVENIENCE COST 70 4.2 Selection of Optimal Treatments The craniofacial-pain treatment-planning model is transient in the sense that only two of the model's patient states, well and referred, can represent the patient's status when he exits the health-care system. In a stochastic sense, only the terminal states are recurrent as they alone possess non-zero long-run probabilities of state occupancy. Hence, the choice of treatment alternatives at each patient state is made with the goal of minimizing the costs accrued by the patient as he passes through the diagnostic-altemative-based patient states into one of the recurrent states. For notational convenience, in the analytic model the well patient state is denoted as state TWV and the referred state as state 'R.r In modeling the care system for craniofacial-pain patients there is no justification for providing costs for the transitions iron states *R' and 'W to themselves, hence, 'cR R' and 'c^ w' are set equal to zero. Analytically, the treatment-planning model is made monodesmic; i.e., having cnly one recurring state, by defining pR W=1 and p^ R=0. The total number of states, not including states 'W* and *R,' is denoted by 'S.1 With these definitions and the notation introduced in the previous section, a procedure for selecting the set of optimal treatment decisions is developed. Howard [25] has shown that for a monodesmic, transient Markovian decision model, a set of optimal decisions is defined as those decisions that maximize the expected-value 'v^' of occupying each system-state 'I.' Since the treatment-planning model for craniofacial-pain patients fits into this category of decision model, a modification of Howard's algorithm is employed tc select optimal treatment regimes. The process of select- 71 ing an optimal set of treatments is accomplished by finding the set of K treatment alternatives k, ,k,...,k that maximize each of the v_ (the ls I expected value of occupying patient-state I' given treatment alternative 'kj') where h J5! *i k. ^ = V + all patient ^ ^ ' states J 1=1,2,...,S and *1 rI y ^ *1 all patient PlJ ^ states J With treatment-augmented patient states, maximizing the v^ can be carried out in the following manner: 1. Group for simultaneous analysis all patient states possessing a common treatment history, where one or more of the treatments in this history are at their boundary level. Each of the 'T* sets of states complying with this description forms an analysis set B^, j=l,2,...,T. 2. Label sequentially the patient states, starting with state W as 1, state Ras 2, and then selecting numbers for the remaining unlabeled patient states on the basis that the one with the most treatments in its history receives the next number-label. For example, state ..111,2,2,4* would be labeled with a smaller number than state *.112,6,6.1 When the numbering scheme reaches the members of one of the analysis sets isolated in Step 1 (above), numbers for the members of that set may be arbitrarily assigned. Given this state numbering scheme, the selection of optimal treatments can proceed dynamically since for each state I that is not a member of an analysis set, 1=1,2,...,S, IBj, j=l,2,...,T I VI rI ?UVJ 72 and for the states of set B_.,. j=l,2,.... ,T t where t = the number of last non-group Bj state inrne- diately preceding .the smallest nurriber-labeled state in B ^ Thus, the process of selecting optimal treatments proceeds recur sively from the state of smallest number-label to the one of largest number-label, stopping to consider simultaneously the values of a number of states only when an analysis set is encountered. Howard's value iteration and policy improvement algorithm [25] is employed only in the case of selecting treatments for the analysis-set patient states. An example of this section's labeling and optimization procedure is presented in Appendix H. This optimization procedure was applied to the states of the cranio facial-pain treatment-planning model. Appendix G presents a list of the optimal treatment selections for each of the model's patient states. 4.3 Model Validation Validation of the craniofacial-pain treatment-planning model was accomplished in two phases. In the first phase of validation, the indi vidual components of the. Markovian representation were examined by the reviewing practitioners. The second phase of model validation compared model-generated treatment decisions with those made by the reviewing ex perts. In addition, statistics generated by the model were compared to the care-system description provided by the patient records from the university dental clinics. This section discusses the resulte of these validating efforts. 73 The review of model components was accomplished as values for the model parameters were collected. Seme of the data-fcased estimates of transition probabilities and boundary-level application numbers did not conform to expert judgment about the effects and effectiveness of vari ous treatment applications. When these disparities occurred, the esti mates were modified to reflect expert judgment. The general structure of the patient states was reviewed to insure that the representation shown in Appendix F did in fact portray a set of logical progressions through the care system. Although this examination established the validity of the patient progressions, the review did point out one deficiency in the model's structure. The number and types of treatment alternatives available for use at each patient state were determined by records of actual applications of these treatments in the data used for model construction. It was the judgment of the reviewing practitioners that in several cases the selection of treatment alterna tives for a patient state did not include the 'most appropriate' treat ment alternative. Nevertheless, model deficiency can readily be correct ed. With the collection of data on the effects of these 'most appropriate* treatments, these additional treatment alternatives can be incorporated as decision alternatives for the patient states in question. The reviewing practitioners made selections of treatments for each of the model's patient states. In those cases where the model's treat ment alternatives did not include the practitioners' 'most appropriate* choice of treatments, the practitioners made a selection iron the same list of alternatives used by the model. Appendix G lists their choices of treatment along with each model-generated selection. The two sets of treatment plans include the same treatment selection for 87 out of 94 74 patient states, or 92.6% of the patient states. The 7 differences in treatment selections arise in part fran the approximations the treatment planning model employs in its representation of the care system and in part from slight inconsistencies in the practitioner's treatment selections. One last test was performed to verify the suitability of the Mark ovian representation of the craniofacial-pain care system. Mean transit times through the care system to one of the terminal states were calcu lated using the model-generated treatment decisions, and each of six first-visit patient states. These .model-generated transit times were compared to estimates of the same statistics gathered fran the patient records contributed by the university dental clinics. Table 5 presents the values of both sets of statistics. The close correlation of these values reveals that the treatment-planning model not only duplicates the decisions of experts, but also provides a structure for gathering other relevant information about the underlying care system. 4.4 Model Applications Like the diagnostic-classification model presented in Chapter 3, the craniofacial-pain treatment-planning model has been structured to permit its utilization in a variety of applications. Markovian modeling provides an analytic representation of the craniofacial-pain care system as well as establishing a means of making treatment selections. This section dis cusses applications of the model's analytic representation and treatment selections in teaching, in research, and in practice. The model-generated treatment decisions reveal which treatments are most frequently used in the care of craniofacial-pain patients. In a teaching environment, this information can be used to specify treatment- 75 TABLE 5 MEAN TRANSIT TIMES THROUGH THE CRANIOFACIAL-PAIN CARE SYSTEM For a Patient Whose First Diagnostic Classification Was Model Generated Estimate* Truncated I-iodel- Estimate+ Patient Record Estimate' Myopathy-Myositis 1.50 1.34 1.35 Oral Pathology-Dental Pathology 1.11 1.04 1.08 Vascular Changes- Migrainous Vascular Changes 3.89 3.42 3.06 Myofacial Pain Dysfunction- Uneven Centric Stops 1.86 1.43 1.50 Myofacial Pain Dysfunction- Anxiety/Depression 3.87 3.47 3.18 Myofacial Pain Dysfunction- Reflex Protective Muscular Contracture 1.90 1.79 1.87 The values in these sets of estimates are specified in terms of the number of patient visits in which the patient occupies a non-well or non-referred patient state. Note: The treatment-planning model considers the possibility of 'infinite duration' occupancy of non-well or non-referred states. + These truncated estimates were generated fran the treatment planning model on the conditional basis that a patient must transit into either the well or the referred state by his fifth patient visit. The maximum number of visits for any patient described by the clinical data was five patient visits. v 76 application techniques that should be emphasized in training dental stu dents in craniofacial-pain care. Moreover, the parameters employed in model development, in particular the transition probabilities and refer ral costs, are themselves valuable instructional materials in developing the dental student's treatment-selection slcills. The treatment-planning model provides a method for evaluating new developments in treatment for craniofacial-pain patients. With estimates of the effectiveness of his new treatment, the researcher can use the craniofacial-pain treatment-planning model to get two immediate responses First, the optimization technique of Section 4.2 will determine if this new treatment provides 'better care' for the patient than any of the other treatment alternatives the model has to choose frcm. Second, if optimal treatment selections for the model include the new treatment, the model's statistics will show improvement in length of stay, and other relevant measures of treatment effectiveness, introduced by using this new treatment. In the office of the practicing dentist, the treatment-planning mod el's decisions could provide a concise reference of the treatment selec tions suggested by experts in the field of craniofacial pain. Moreover, the practitioner would have a chance to contribute to the refinement of the listing as the treatment records of his patients could supplement the data used in model construction. In addition, the practitioner could employ the statistics associated with the treatment-planning model in scheduling the length, and number, of his appointments for craniofacial- pain patients. CHAPTER 5 CONCLUSIONS AND FUTURE RESEARCH. This dissertation has presented analytic models of the decision pro cesses associated with diagnosing and selecting treatments for a partic ular health-care problem. The selection, construction, and testing of these models have been discussed in seme detail. Meanwhile, the model building effort itself has been the source of a number of insights into decision-making in a health-care environment. These insights will be reflected in this chapter's discussion of the dissertation's central re search conclusion and suggestions.of topics for future investigation. The similarity between the decision-making processes employed by the practitioner and the analytic structure of this dissertation' s models is quite revealing. In both diagnosis and treatment planning for cranio facial-pain patients it appears that the practitioner, like the analytic models, makes 'first-order' decisions. The linearity of symptom signifi cance (a first-order polynomial of symptom weights), and the present- patient-state dependency of transition probabilities measuring treatment effectiveness (a first-order stochastic dependence) provide a means of generating decisions that closely approximate the decisions made by dental practitioners. This general conclusion on the applicability of first- order decision techniques to craniofacial-pain diagnostic classification and treatment planning characterizes the central development of this dissertation. 77 78 Given this summary statement, there are several logical extensions to this dissertations research that should be examined in future inves tigations. The following suggestions identify sene of the more fruitful areas for further research efforts. These suggestions are ordered in the author's view of their significance. 1. This dissertation's research found that first-order decision making models are valid descriptions of the underlying thought processes employed by the craniofacial-pain practitioner. It is possible that these first-order descriptive decisions are suboptimal' and that higher order decision-making tools might yield prescriptive, or 'optimal,' diagnostic classifications and treatment plans for craniofacial-pain patients. That is, considering the interaction between significant symptoms and multiple- state dependency for patient-state transitions may lead to optimal diag nostic and treatment-selection decisions. As the models themselves can readily be increased in their decision-making 'order,' an investigation into this possibility would be hampered only by the necessity of collect ing an elaborate data base. Nevertheless, such an investigation should be undertaken in this, the most significant, of future research areas. 2. As this dissertation's analytic models can be applied directly to any health-care problem where there is verification that practitioners make first-order decisions, one potential avenue of future research would be to isolate those health-environments where these kinds of decisions are made. However, a word of caution is interjected at this point. Math ematical modeling demands an underlying structure for the process being modeled. Yet, in a process dealing with a product that is subject to considerable variation, such as the care of a patient in a health-care system, isolating an underlying process structure is difficult. Moreover, 79 the problem of finding process' structure is compounded in the health-care field by a lack of unifying and consistent nomenclature. In the health care field, scholarly literature and historical precedent can serve as the justification for two or more contradicting sets of terminology for the same anatomical structure or physiological process. Thus, in re searching the generality of first-order decision-making techniques, the investigator must consider process variability and nomenclature incon sistency before he makes any statement about the applicability of this dissertation's decision-making tools to other health-care environments. 3. A ncn-geanetric discussion of the criteria for pattern space separability was presented to provide a means of characterizing health care disorders for which diagnostic classification by a linear pattern classifier might be feasible. Unfortunately, this dissertation's tech niques are heuristic and do not provide an exact reproduction of the underlying mathematical specifications. Future research in this area could lead to a precise statement of non-gearetrie criteria for linear separability, and thus provide an indirect means for evaluating potential applications of linear non-parametric classifiers. 4. This dissertation's minimum-cost symptan-selection algorithm represents a clear departure from previous research in feature selection. The algorithm's utilization of the convex-hull representation of pattern space separability makes this development unique in the literature of feature selection. However, the algorithm's method of checking the fea sibility of potential feature collections is extremely tedious. A more efficient method to check feature-collection feasibility may be revealed through future investigations in this area. 80 5. Eran a matheiratical-programming point of view, the symptan- selection algorithm represents one of a limited number of techniques capable of solving a problem with non-linear constraints. The algorithm seeks an optimal assignment of components, where the feasibility of any assignment is determined by the existence of a set of discriminating com ponent multipliers. In this more general context, the structure of the algorithm may be applicable in a variety of problem areas not directly related to the feature-selection problem. The possibility of employing the algorithm in this general setting should be investigated. 6. In modeling the treatment-planning process for craniofacial-pain patients the concept of boundary-level treatment applications was intro duced. Boundary numbers on the effects of repeated treatment applications are likely to occur in data derived from the care of patients with a va riety of physiological disorders. Further investigations of this phenom enon may result in more effective methods of predicting which treatments will have boundary-level application numbers, and more efficient statis tical techniques to determine values for these numbers. 7. The training algorithm developed in the construction of the craniofacial-pain diagnostic classifier generates a feasible integer so lution to a large nurtber of linear constraints. This algorithm is both efficient and easily coded for computer applications. An investigation of the uses of this algorithm in a mathematical-programming setting may reveal applications in solution techniques for more general integer pro grams. 8. Potential applications have been suggested for the diagnostic- classification and treatment-planning models in teaching, in research, and in practice. The models and their applications have been presented so 81 that they might readily be employed by sore future investigator. Actual applications of the models should yield significant contributions to the effectiveness of the teacher, researcher, and practitioner. .APPENDIX A CRANIQFACIAL-PAIN PATIENT DATA VECTOR Referral Through 001 Medical GP 002 Medical Specialist 003 Dental GP 004 Dental Specialist Sex 005 Male 006 Female 007 Female, menopausal or post menopausal Age Group 008 0 - * 19 009 20 39 010 40 - - 55 011 56 up Duration of Pain 012 Less than 3 weeks 013 From 3 to 6 weeks 014 More than 6 weeks 015 Episodic Character of Pain 016 Aching 017 Burning 018 Cutting 019 Discanfort 020 Dull 021 Pressure 022 Pricking 023 Sharp 024 Soreness 025 Stinging 026 Tenderness 027 Throbbing Change in Character of Pain 028 Constantly getting 'worse 029 Got worse, then plateaued 030 Got worse, plateaued, then better 031 Getting better 032 Intermittent periods without pain 033 No change since beginning 82 83 List of Drugs Taken History of Trauma Location of Swelling 034 Mild Analgesics; Asprin, APC, etc, 035 Moderate Analgesics (non-narcoticl 036 Strong Analgesics; Narcotics and Synthetic Narcotics 037 Anti-anxiety Agents: Mellaril, etc, 038 Anti-arthritic Agents: Steroids, etc. 039 Anti-depressives: Tofranil, etc. 040 Birth Control Pills 041 Hormone Preparations 042 Anti-inflammatory Agents . 043 Muscle Belaxants: Valium 044 Muscle Belaxants: Meprobamate 045 Muscle Belaxants: Others 046 Sedatives: Barbiturates, etc. 047 Other Drugs 048 Accidental 049 Factitial 050 Surgical Side 84 Location of Tenderness Left Side Right Side Location of Pain Side Limited Jaw Opening 243 Yes Joint Sounds 244 Clicking 245 Crepitation 246 Pain accompanying joint sound Headaches 247 Frequent headaches 248 Headache associated with joint pain 85 Changes in 249 Taste 250 Hearing 251 Visual acuity 252 Perception of light touch on face Upper Respiratory Infection 253 In conjunction with beginning of TMJ pain Evidence of 254 Arthritis 255 Every's Syndrome 256 Neuropathy 257 Otitis 258 Salivary gland disease 259 Sinusitis 260 Strokes 261 Vascular disease Facets 262 1-3 263 4 up Lateral Slide Prematurities 264 On working side 265 On balancing side Tooth Ache 266 Yes Biting Stress Tooth Mobility 267 Yes Recent Restorative or Dental Prosthesis 268 Yes Jaw Deviates on Opening 269 Left 270 Right Impingeirent of Coronoid Process 271 Left 272 Right on Zygcsratic Arch Meniscus-Condyle Dyscoordination 273 Left 274 Right Radiographic Examination 275 Mandibular condyle apposition (such as spur formation} 276 Mandibular condyle resorption (such as flattening of anterior- superior surface or irregular surface) Radiographic Examination 277 Fosca apposition 278 Fossa resorption 279 Articular eminence apposition 280 Articular eminence resorption 281 Evidence of fracture 282 Clinical or radiographic evidence of pathoses Emotional Trauma 283 Anxiety 284 Depression Bruxism or Clenching 285 Yes Uneven Centric Stops 286 Yes History of Lengthy Dental Procedures 287 Yes History of General Anesthesia 288 Yes Tinnitus 289 Yes Extraction of Teeth 290 Less than 6 weeks prior to Tt'U pain 291 Leaving a space that permits extrusion Preauricular Pain 292 Yes Alteration of Inter-Occlusal or Inter-A:ch Space 293 Yes Paresthesia 294 Yes Luxation or Subluxation 295 Yes APPENDIX B MODIFIED FIXED-INCREMENT TRAINING ALGORITHM In presenting the modified fixed-increment training algorithm, the following notation is employed: p = the number of classification categories t = the number of training-sample row vectors a: (k) -3 = training sample row vector number 'k' preclassified in category 'j', j=l,2,...,p, k=l,2,...,t, and k=i[mod t] where i* is the index of the training- algorithm iteration wji} = the column of weights (the constraints in the 1 j^1' discriminant function) vised in the 'i**1' iteration of the training algorithm, j=l,2,...,p. a = non-negative constant specified by the analyst to adjust the size of the 'dead zone' [23] in dis- crirdnant function values, i.e., a >_ 0 6 = positive constant specified by the analyst to adjust the scale of the weight vectors, i.e., 0 > 0. Using this notation, let aP^ be the i^1 pattern examined by the algorithm, then case 1: if aik* wi1* > aik* WiiJ + a ~3 -3 -3 -c let W(i+1) = W(i) c c for all c^j for all c. 87 88 case 2: if a 00 -1 < (k) w(i) j + a let and wCi+1) w(i+1> -C w: 3 (i+1) wf B[a for a subset B of the p discriminants z e B, j / B z e B for all c / {B U j} where n^ = the number of discriminants in the subset B. The algorithm is terminated when the values of the W^, j=l,2,...,p, have not changed during a complete cycle of the t training patterns, i.e., vhen W. =W; =.. .=4\T. for all j where 0 is the last case 2 pattern } j --3 examined by tie algorithm. This algorithm is guaranteed to terminate in a set of feasible * W., j=l,2,...,p, if the training sample is linearly separable and a and 6 3 have been appropriately selected. If the training sample is linearly separable, the algorithm will converge for any fixed value of a > 0, where Â¡3 is selected appropriately large. Hence, the algorithm is nor mally applied to a training sample with a=0 and 6=1. If the algorithm converges, these constants can be adjusted and the training algorithm reapplied. The justification for specifying a non-zero a (a = size of the dead zone) is that as a is increased the accuracy of the classifier is increased in making classifications of data not used in developing the discriminant-function weights. For example, with the craniofacial-pain diagnostic classifier and the test samples discussed in Section 3.3, the diagnostic model correctly classified approximately 5% more of the test samples' data vectors when the model was trained with a=30, 6=3 (versus an original training with a=0, 6=1). 89 Proof that the algorithm converges if feasible weight vectors * Wj, j=l,2,...,p, exist (that is, the sample space is linearly separable) is developed in Nilsson [22]. Nilsson's proof can be directly applied since for any set of feasible W_. aP^ W* > a!k) W* + a 3 ~3 3 ~2 r(i) for all 10=1,2,... ,t, and 2=1,2,...,p, z^j, while for any j=l,2,...,p, i-l,2,,0 a(k) w(i) < -3 -3 + a for sane k and sane z. Typically, a training algorithm is applied to the members of a training sample without prior knowledge of whether the sample pattern space is linearly separable. The algorithm is allowed to process sample patterns until it either converges on a set of discriminating hyperplanes or.it has run for a 'reasonable' amount of time without termination. Ex perience with medical data and the modified fixed-increment algorithm has shown that if there is a set of discriminating hyperplanes, the algorithm will find it in no more than 3 complete cycles for each of the pattern classes. For exanple, if there are 5 pattern classes and the pattern space can be linearly partitioned, the algorithm should terminate in no more than 15 full cycles through the training data. This rough measure of training time provides an index for establishing a limit on computer processing time. An application of the modified fixed-increment training algorithm is presented in Figure 7. Given the training sample of the form a = [a^,a2,l] where aÂ£ = [0,0/1] a2 = [1,0/1] a2 = [0,1,1] the training sample patterns can be represented in 3-dimensional space by The modified fixed-increment algorithm with a = 0 and g 1 proceeds as follows: (* indicates correct sample classification) Sample *1 2 3 BEi 2 3 [0,0,1] [ 0, 0, 0] [ 0, 0, 0] [ 0, 0, 0] 0 0 0 [1,0,1] [ 0, 0, 2] [ 0, 0,-1] [ 0, 0,-1] 2 -1 -1 [0,1,1] [-1, 0, 1] [ 2, 0, 1] [-1, 0,-2] 1 1 -2 [0,0,1] [-1,-1, 0] [ 2,-1, 0] [-1, 2, 0] 0 0 0 [1,0,1] [-1,-1, 2] [ 2,-1,-1] [-1, 2,-1] 1 1 -2 *[0,1,1] [-2,-1, 1] [ 3,-1, 0] [-1, 2,-1] 0 -1 1 *[0,0,1] [-2,-1, 1] [ 3,-1, 0] [-1, 2,-1] 1 0 -1 *[1,0,1] [-2,-1, 1] [ 3,-1, 0] [-1, 2,-1] -1 3 -2. Hence, the set of weights generated by this training sample is Wx = [-2,-1, 1] W2 = [ 3,-1, 0] W3 = [-1, 2,-1]. FIGURE 7 .APPLICATION OF THE MODIFIED FIXED-INCREMENT ALGORITHM APPENDIX C APPLICATION OF THE MINIMUM-COST SYMPTOM-SELECTION ALGORITHM Given three pattern classes X, Y, and Z, with patterns of the form -j = faji'aj2'aj3,:L'' where = [0 10 1] = [0 0 11] 2 rr. i n 2 a^ = [0 0 0 1] = [10 0 1] , [0111] aj = [1 0 1 1] these patterns can be represented in three-dimensional space (without their constant = 1 components) by 1 feature 1 X /K feature 2 One set of feasible linear-classifier discriminant-function weights ,T for these patterns is Wx =[-2 3 0 -1] WY = [ 1 -2 2 0] Wz = [ 1 -1 -2 1]T. Suppose feature 1 costs* 2 units to employ in the classifier, feature 2 costs 6 units to employ in the classifier, and feature 3 costs 3 units to employ in the classifier. Then, for the minimum-cost symptcm-selection algorithm (Section 3.4) , and C = [2 6 3 0]. *1- 0 10 1 , a2 " 0 0 11 II 0 0 0 1 0 111 L J 10 11 10 0 1 * 91 92 This feature-selection algorithm will be employed to find the minimum- cost collection of classifying features. For the purposes of illustra tion, the feature* variables x^, i=l,2,3, are selected in the order x2, x^, x^ in Step 4 of the algorithm (Section 3.4.2). Note that this is a logical ordering of features in descending order of feature-utilization 'costs.1 This prior specification of the order of variable assignments permits the construction of a tree that represents the possible solu tions remaining to be considered at each step of the algorithm. This tree of possible solutions to PI has the form * Step 0: The algoritlm is initialized with V = -<. Go to Step 4 of the algorithm. Step 4: Select the variable x2 for assignment. The assignment vector is now [x2 = 0]. Apply Procedure 2 (Figure 4). In Procedure 2, zeroing out the column k=2 from A^, A2, and A^ yields 0 o" 0 1 0 1 0 0 :t 0 0 :t 0 0 0 1 A2 ~ 1 1 *3 0 0 1 1 1 1 1 1 'sm Application of Procedure 1 (Section 3.4.1) to A^ and A2 yields the following reduced matrices: Am Am = P 1 A2 = [ 0 ], 93 Step 6: The tree Step 4: where [ 0 ] is the null matrix. By Step 6 of Procedure 1, feasible convex combinations of these matrices exist. Hence the assignment vector [x2 0] is infeasible by the rules of Procedure 2. Go to Step 6 of the algorithm. The assignment vector is now [x2 = 1]. As this vector does not include an assignment for every variable, return to Step 4 of the algorithm. of possible solutions to Pi now has the form Select the variable x^ for assignment. The assignment vector is now [x2 = 1, x^ = 0]. Apply Procedure 2. In Procedure 2, zeroing out the column k=3 fran A^, A^, and A^ yields Al = Atp Application of Procedure 1 to A^ and yields the following reduced matrices: *0 o" 0 1 0 1 1 1 CT 0 0 *T 0 0 0 0 1 1 A2 _ 0 0 1 1 A3 = 0 0 1 1 * m Al = o o A2 = o 1 i i A 94 By Step 3 of Procedure,1, no feasible convex combination of these matrices exists. *rp Arp Application of Procedure 1 to and yields the following reduced matrices: aÂ£ = [1 1] [0] . By Step 3 of Procedure 1, no feasible convex combination of these matrices exists. AT at Application of Procedure 1 to and A^ yields the following reduced matrices: =[0 1] A^ = [0 1] . Ajp Arp For these reduced matrices, A^ and Ay P4 has the following form: maximize A^ + ^ subject to A^ 0 it + A^ <_ 0 X2 i 0 it + ^2 r t, A^, A2 unrestricted. P4 has the bounded optimal solution A^ = A 2 = 0. Hence the assignment vector ^ = 1, x^ = 0] is infeasible by the rules of Procedure 2. Go to Step 6 of the algorithm. Step 6: The assignment vector is now ^ = 1, x^ = 1]. As this vector does not include an assignment for every variable, return to Step 4 of the algorithm. The tree of possible solutions to PI new has the form Step 4: Select the variable x^ for assignment. The assignment vector is now [x2 = 1, x^ = If x2. = ^ Apply Procedure 2. In Procedure 2, zeroing out the column k=l from A^, A^, and A^ yields 0 o o o o o' 1 1 *T 0 0 CT 0 0 0 1 A2 1 1 *3 = 0 0 1 1 1 1 i i_ Am ^m application of Procedure 1 to A^ and A^ yields the following reduced matrices: Am Am AÂ£ = [ 1 ] A2 = [0 0] . By Step 3 of Procedure 1, no feasible convex combination of these matrices exists. Am Am Application of Procedure 1 to A^ and A^ yields the following reduced matrices: = [ 1 ] A^ = [0 0] . By Step 3 of Procedure 1, no feasible convex combination of these matrices exists. "T AT Application of Procedure 1 to A2 and A^ yields the foliating reduced matrices: Am Am Aj = [1 li = to o] . By Step 3 of Procedure 1, no feasible convex combination of these matrices exists. Hence tlie assignment vector [x2 = 1, x^ = 1, x^ = 0] is feasible by the rules of Procedure 2. Go to Step 5 of the algorithm. Step 5: The assignment vector [x2 = 1, x3 = 1, x^ = 0] includes an 96 assignment for every variable; go to Step 7 of the algorithm. Step 7: The value V is calculated for this assignment vector, where V= -1[1(6) + 1(3) + 0(2)] = -9 . * As V = -=, go to Step 8 of the algorithm. * Step 8: V is set equal to -9, and the values of the variables x^ = 0, XÂ£ = 1, and x^ = 1 in this assignment vector are stored for future reference. Go to Step 1 of the algorithm. Steps 2 and 3 of the algorithm dictate that the algorithm is terminated at this point since these steps generate the assignment vector [x^ = 1, x^ = 1, x^ = 1] which is known to be feasible to PI. * V for this assignment vector is -11, which is smaller than V Hence the minimum-cost collection of classifying features is feature 2 and feature 3, with a cost of 9 units associated with utilizing these features in a linear pattern classifier. APPENDIX D TREATMENT ALTERNATIVES FOR CRANIOFACIAL-PAIN PATIENTS Treatment Application Number Treatments 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 31 32 33 Chill Therapy Drug Therapy Fixation Heat Therapy Occlusal Adjustment Physical Therapy Prosthetics Tooth Extraction or Endodontics Drug Therapy and Fixation Drug Therapy and Health Therapy Drug Therapy and Occlusal Adjustment Drug Therapy and Physical Therapy Drug Therapy and Prosthetics Heat Therapy and Physical Therapy Occlusal Adjustment and Physical Therapy Physical Therapy and Prosthetics Chill Therapy, Drug Therapy, and Physical Therapy Drug Therapy, Fixation, and Heat Therapy Drug Therapy, Fixation, and Physical Therapy 97 98 Treatment Application Treatments Number 34 Drug Therapy, Heat Therapy, and Physical Therapy 35 Drug Therapy, Occlusal Adjustment, and Physical Therapy 36 Fixation, Heat Therapy, and Physical Therapy 41 Drug Therapy, Fixation, Heat Therapy, and Physical Therapy APPENDIX E STABILITY OF TPANSITIC^I-PEORABILITY ESTIMATES This appendix presents the technique employed to determine the sta bility of transition-probability estimates of treatment effectiveness for patients who occupy the same patient state, but who exhibit varying combinations of relevant data-vector elements. The analysis shorn here is limited to one treatment alternative, treatment 24, and one diagnostic classification, Diagnostic Alternative 13, but a similar investigation was performed for a majority of the other diagnostic classifications and treatment alternatives. Variability of the data-based estimates (Section 4.1.2) of transition probabilities is analyzed in terms of five factors, patient's sex, patient's age, duration of patient's pain, nature of pa tient's pain (continuous or episodic), and number of replications of the same treatment alternative. Statistically, '2-way' contingency tables [29] measure the effects of these relevant factors. The rows of each table specify the number of transitions out of Diagnostic Alternative 13 following an application of treatment 24, and each table's columns specify a value for the factor . 2 being analyzed. A chi-squared statistic x is employed to test for in dependence between the number of transitions and the factor in question. 99 100 Analysis by Sex Pre-menopausal Male Female Menopausal or Post-menopausal Female Transitions 8 into 0 1 0 13 1 11 2 15 0 1 0 Well 2 11 1 2 X 1-250 with 6 degrees of freedom Hence, the analysis reveals that the sex of the patient is not significant in determining estimates of transition probabilities out of Diagnostic 2 Alternative 13 following application of treatnent 24, as x g=12.592. Analysis by Age Group Transitions 8 into 13 15 Well 2 X = 2.286 with 3 degrees of freedom Hence, the analysis reveals that age group of the patient is not signifi cant in determining estimates of transition probabilities out of Diagnos- 2 tic Alternative 13 following application of treatment 24, as x 2=7.815. 20 39 40 55 Years Years 8 0 7 1 6 1 7 101 Analysis by Duration of Pain Less than From 3 to Mare than 3 weeks 6 weeks 6 weeks Transition 8 into 13 15 Wall ? X* = 5.047 with 6 degrees of freedon Hence, the analysis reveals that duration of pain is not significant in determining estimates of transition probabilities out of Diagnostic Al- o temative 13 following application of treatment 24, as x nt- =12.592. Uj f D Analysis by Nature of Pain Continuous Episodic Transition 8 into 13 15 Well 2 X = 3.964 with 3 degrees of freedom Hence, the analysis reveals that nature of pain is not significant in determining estimates of transition probabilities out of Diagnostic Al- 2 temative 13 following application of treatment 24, as x Ar -3=7.815. UD f 0 1 8 6 1 0 11 3 102 Analysis by Number of Replications of Treatment 24 Is*" Application 2n<^ implication 3^ Application Transitions 8 into 1 0 0 13 9 5 0 15 1 0 0 Well 6 3 5 2 X = 8.099 with 6 degrees of freedom Hence, the analysis reveals that the number of replications is not signif- cant in determining estimates of transition probabilities out of Diag nostic Alternative 13 following application of treatment 24, as Thus, for Diagnostic Alternative 13, the five factors of patient variation do not affect transition-probability estimates of the effective ness of treatment 24. The last type of analysis performed, analysis by number of treatment replications, established treatment-application boundary numbers. If the analysis revealed no significant effect for differences in treatment repetitions, then the boundary number for the treatment was set at zero or one by the reviewing practitioner. Note, that a zero boundary-applica tion number for a treatment alternative implies that a record of that treatment provides no additional information about the patient's progres sion through the care system, and, therefore, the treatment-planning model does not add a record of the treatments to its patient-state descriptions. If the analysis revealed a significant effect for treatment repetitions, the reviewing practitioners examined the data-based estimates of transi tion probabilities associated with multiple repetitions of the treatment 103 and established a boundary application number for the treatment that reflected their knowledge about the treatment's effectiveness as well as the information supplied by the data. APPENDIX F FLOW CHARTS OF PATIENT-STATE TRANSITIONS C 1 *) 1116 } (2 *-(^ 2124 ^ 2124,2.4) v 20 ^ C^) 2 20 C3 C324 104 105 c 1 ) or 106 13118 cry 8112 C S}12 \ 12 8112,12 C 10 > 10123 y ^10112,23^) O C 11 > +/ 11112 TJ 107 108 109 8112 17 28 APPENDIX G PATIENT-STATE Patient State Model Selection Practitioner Selection 1 16 16 1116 16 16 2 24 24 2124 24 24 2124,24 24 Refer*+ 3 23 23* 3112 Refer 12+ 312.5 41 41 3124 24 24 3132 32 32 3123,41 Refer Refer 3132,32 Refer Refer 4 35 35* 4120 Refer Refer 4124 24 24 4134 34 34 4135 24 24 4120,20 Refer Refer 4124,24 24 24 4124,35 24 24 TREATMENT SELECTIONS Patient State Model Selection Practitioner Selection 4134/34 34 34 5 35 35 5116 Refer 17+ 5124 17 17 5135 24 24 5136 24 24 5124,35 24 24 6 24 24 7 33 33 7112 24 24 7118 12 12 7133 24 24 7112,24 16 16 8 12 12 8112 12 12* 8118 12 12 8124 18 18 8112,12 16 16 8118,24 24 24 8134,41 12 12 no Ill Patient State Model Selection Practitioner Selection Patient State Model Selection Practitioner Selection 9 12 12 14141 24 24 9112 12 Refer+ 14112,12 Refer Refer* 10 23 23 14112,23 12 12* 10123 12 12 14124,24 22 Refer+ 10112,23 12 12 14124,35 24 24 11 12 12* 15 12 12 11112 12 12* 15112 15 15 12 15 15 15120 20 20 15115 15 15 15122 34 24 12123 Refer Refer 15123 23 23 12135 24 24 15124 12 12 12115,31 Refer 17*+ 15127 16 16 12124,35 24 24 15134 34 34 13 12 12 15135 24 . 24 13112 12 12 15141 34 34 13118 24 24 15112,12 12 12 13122 34 34* 15112,23 12 12 13123 12 12 15116,27 16 16 13124 24 24 15120,20 20 Refer*+ 13112,12 23 23 15122,22 12 12 14 23 23 15122,34 34 34* 14112 12 12* 15123,23 23 23 14123 12 12 15124,24 24 24 14124 24 24 15134,34 34 34 14126 26 26 15112,22,22 12 12 14135 35 35 15122,34, 34 34 34* 112 Patient Model Practitioner State Selection Selection 16 25 25 17 Refer Refer Note: * indicates that the reviewing practitioners made their selection of treatment from a set of alternatives that did not include the 'most appropriate' treatment alternative (Section 4.3) + indicates a difference between the treatment selections made by the treatment-planning model and the reviewing practitioners. Hie model-generated treatment selections agree with the reviewing prac titioners' selections in 87 out of 94 patient states. This represents a 92.6% agreement between the taro sources of treatment selections. In terms of the craniofacial-pain treatment-planning model's treat ment selections, the patient-referral costs are the most significant of the model's components. For each of the model's patient states, the cost of referral out of that state is used in making the decision whether to continue treatment for a patient, or whether to suggest that he go to another source of care. If this cost is set too low, then patients who should be treated in the craniofacial-pain care system are inappropriately referred out of the system. On the other hand, too high a referral cost leads the model to suggest that patients remain in this care system when it would be to their advantage to seel: care elsewhere. For these reasons, this cost was the subject of considerable examination in the building of the treatment-planning model. The reviewing practitioners suggested three possible alternative for mats for the cost of referring a patient out of the craniofacial-pain care system. These were: 113 Format 1: Referral Cost = [record transferral cost] + [practitioner's lost fee] + 2*[inconvenience cost associated with a dental visit] Fermat 2: Referral Cost [fee paid to referral care system] + 2*[inconvenience cost associated with a dental visit] Format 3: Referral cost = [fee paid to referral care system] + [practitioners lost fee] + [record transferral cost] + 2* [inconvenience cost associated with a dental visit] where in all three formats the multiple of the inconvenience cost was suggested by the fact that in the clinical records (Section 3.1} the median number of visits to the referral care system was two visits. The treatment-planning model was optimized with referral costs based on each of these formats. Use of the Format 3 referral costs leach to model-gen erated treatment selections that most closely duplicated the selections of the reviewing practitioners. Hence, this format for patient-referral costs has been selected for utilization in the treatment-planning model. APPENDIX H APPLICATION OF THE PATIENT-STAIE-IABELING AND OPTHlAL-TREATTIETr- SELECTION PROCEDURE Consider a health-care system with two diagnostic classifications, 'I1 and 'J.' In this system treatment T^ or treatment can be given to a patient in either diagnostic classification. Treatment has a boundary-level application number (Section 4.1.1) of one, and treatment T2 may be given only once during a patient's stay in the care system. Figure 8 presents a pictorial representation of this system. Using the labeling procedure of Section 4.2, the patient states in this system are numbered as follows: 1. W 2. R 3. HT^ 4. JlTjTg s. iit2 6. JIT2 7. 8. Jl^ 9. I 10. J . With this labeling of the system's patient states, analysis to de termine the optimal treatment decisions 'k^' for each patient-state *i proceeds as follows: 114 Two treatments and are available for patients classified in Alternative I and Alternative J. has a boundary-level application number of one application and T2 may be given only once during the patient's stay in the care system. Note that this figure omits the transition arcs between the diagnostic-classification-based patient states and the terminal states well and referred. FIGURE 8 MULTIPLE-STATE HISTORY-AUGMENTED PROCESS 116 * vx = O v2 = O and Vj: find the that maximize kI kI t D -kI vkJ VI I + jf3 PIJ J ,1=3,4 v = max kc k, 4 k * ^ +j3PW Vj V, = max jf and Vgt find the k^ that maximize h k 8 k k 6 V = ri +Jf7PU VJ +j!3puvj *1 ,1=7,8 v = max kn k9 8 k * r9 + VJ 1 ~ p99 V10 max "10 -10 '10 8 kio * + j=3Pl0J Vj 1 P101010 117 where r. *1 10 z J=1 *1 u u BIBLIOGRAPHY [1] .S. Department of Health, Education, and Welfare, Cumulated Index Medicus, Washington, D.C.: U.S. Government Printinq Office (1970-1973). [2] Ahevne, P., Ryan, G.A., and Walsh, R.J., 1972 Reference Data on the Profile of Medical Practice, Chicago: Center for Health Services Research and Development, American Medical Associa tion (1972). [3 ] Bureau of Economic Research and Statistics, "1971 Survey of Dental Practice," Journal of the American Dental Association, Vol. 85 (1972) 154-158. [4] Bruce, R.A., and Yrdall, S.R., "Computer-Aided Diagnosis of Cardiovascular Disorders," Journal of Chronic Diseases, Vol. 19 (1966) 473-484. [5] Schwartz, L., and Chayes, C.M., Facial Pain and Mandibular Dysfunction, Philadelphia: W. B. Saunders (1968). [ 6 ] TMT Research Center, Conference on Function and Dysfunction of the Temporomandibular Joint Complex, Chicaqo: University of Illinois (1969). [ 7] Mitchell, D.F., The Dental Clinics of Worth America, Symposium on Oral Medicine, Philadelphia: W..B. Saunders (1268). [8] Mitchell, D.F., Standish, S.M., and Fast, T.B., Oral Diagnosis/ Oral Medicine, 2nd Edition, Philadelphia: Lea & Febiger TT971). [9] Ledley, R.S., "Practical Problems in the Use of Computers in Medical Diagnosis," Proceedings of the IEEE, Vol. 57, No. 11 (1969) 1900-1918. [10] Lincoln, T.L., and Parker, R.D., "Medical Diagnosis Using Bayes Theorem," Health Services Research, Vol. 2, No. 1 (1967) 34-35. [11] Bunch, W.H., and Andrew, G.M., "Use of Decision Theory in Treatment Selection," Clinical Opthopaedics and Related Research, No. 80 (1971) 39-52. 118 119 [12] Boyle, J.A., Greig, W.R., Franklin, D.A., Harden, R.McG., Buchanan, W.W., and McGirr, E.M., "Construction of a Model for Computer-Assisted Diagnosis: Application to the Problem of Nontoxic Goiter," Quarterly Journal of Msdicine, Vol 35 (1965) 565-588. [13] Lodwick, G.S., Harm, C.L., Smith, W.E., Keller, R.F., and Robertson, E.B., "Ccmputer Diagnosis of Primary Bone Tumors: A Preliminary Report," Radiology, Vol. 80 (1963) 273-275. [14] Overall, J.E., and Williams, C.M., "Conditional Probability Pro gram for Diagnosis of Thyroid Function," Journal of the American Medical Association, Vol 183, No. 5 (1963) 307-313. [15] Toronto, A.F., Veasy, L.G., and Warner, H.R., "Evaluation of a Computer Program for Diagnosis of Congenital Health Disease," Progress in Cardiovascular Diseases, Vol. 5, No. 4 (1963) 362-377. [16] Wilson, W.J., Templeton, A.W., Turner, W.H., and Lodwich, G.S., "The Computer Analysis and Diagnosis of Gastric Ulcers," Radiology, Vol. 85 (1965) 1064-1073. [17] Burbank, F., "A Computer Diagnostic System for the Diagnosis of Prolonged Undifferentiated Liver Disease," American Journal of Medicine, Vol. 46 (1969) 401-413. [18] Collon, M.F., Rubin, L., Neyman, J., Dantzig, G.B., and Siegelaub, A.B., "Automated Multiphasic Screening and Diagnosis," American Journal of Public Health, Vol. 54 (1964) 641-750. [19] Lipkin, M., Engle, R.L., David, B.J., Zgorykin, V.K., Ebald, R., Sendrow, M., and Berkley, C., "Digital Computers as an Aid to Differential Diagnosis," Archives of Internal Medicine, Vol. 108 (1961) 56-72. [20] Overall, J.E., and Williams, C.M., "Comparison of Alternative Ccmputer Models for Thyroid Diagnosis," San Diego Symposium on Biomedical Engineering, Vol. 3 (1963). [21] Betague, N.E., and Gorry, A., "Automated Judgemental Decision- Making for a Serious Medical Problem," Management Science, Vol. 17, No. 1 (1971) B421-B434. [22] Ledley, R.S., "Computer Aids to Clinical Treatment Evaluation," Operations Research, Vol. 15 (1967) 694-705. [23] Meisel, W.S., Computer-Oriented Approaches to Pattern Recognition, New York: Academic Press (1972). [24]Nilsson, N.J., Learning Machines, New York: McGraw-Hill (1965). 120 [25] Howard, R.A., Dynamic Probabilistic Systems, Vol. II; Semi- Markov and Decision Processes, New York: Wiley (1971). <* [26] Rosen,- J.B., "Pattern Separation by Convex Programming," Journal of Mathematics and Application, Vol. 10 (1965) 123-134. [27] Nelson, G.E., and Levy, D.M., "Selection of Pattern Features by Mathematical Programming Algorithms," IEEE Transactions on Systems Science and Cybernetics, Vol. SSC-6 (1970) 20-25. [28] Balas, E., "An Additive Algorithm for Solving Linear Programs with Zero-One Variables," Operations Research, Vol. 13 (1965) 517-547. [29] Freund, J.E., Mathematical Statistics, Englewood Cliffs, N.J.: Prentice-Hall (1962). BIOGRAPHICAL SKETCH Michael Steven Leonard was bom February 2, 1947, in Salisbury, North Carolina. In June, 1965, he was graduated cum laude from Cocoa High School in Rockledge, Florida. He received the degree of Bachelor of Industrial Engineering with High Honors fron the University of Florida in June, 1970. In September, 1970 he began graduate work in engineering at the University of Florida. He received the degree of Master of Engineering in March, 1972. In June, 1972, he was designated a Distin guished Military Graduate of the Air Force Reserve Officer Training Corps. From September, 1970, until the present, his graduate training has been supported by a National Science Foundation traineeship. Michael Leonard is married to the former Mary Elizabeth Stewart of Cocoa, Florida. He holds the reserve ccrrmission of Second Lieutenant in the United States Air Force. He is a member of Lambda Chi Alpha fraternity; Alpha Pi Mu, Sigma Tau, and Tau Beta Pi honorary fraternities; and the Operations Research Society of America. 121 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Kerr^E. Kilj^trick, Chairman Assistant Professor of Industrial and Systems Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and qualify, as a dissertation for the degree of Doctor of Philosophy. // r? -j?' Thomas B. Fast Professor and Chairman of the Division of Oral Diagnosis I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and qualify, as a dissertation for the degree of Doctor of Philosophy. I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Professor and Chairman, Department of Basic Dental Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Richard S. Mackenzie Professor and Director, Education Office of Dental I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope arid quality, as a dissertation for the degree of Doctor of Philosophy. Associate Professor of Industrial and Systems Engineering This dissertation vas submitted to the Dean of the College of Engineering and to the Graduate Council, and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. Dean, Graduate School UF Libraries:Digital Dissertation Project l?75 Cathy Martyniak, Project Coordinator Christy Shorey, Project Technician Internet Distribution Consent Agreement In reference to the following dissertation: AUTHOR: Leonard, Michael TITLE: Analytical models for diagnostic classification and treatment planning for craniofacial pain, (record number: 580622) PUBLICATION DATE: 1973 , as copyright holder I, for the aforementioned dissertation, hereby grant specific and limited archive and distribution rights to the Board of Trustees of the University of Florida and its agents. I authorize the University of Florida to digitize and distribute the dissertation described above for 2 of 4 5/23/2008 11:35 AM U,F Libraries:Digital Dissertation Project 3 of 4 nonprofit, educational purposes via the Internet or successive technologies. This is a non-exclusive gra'h't 'of'^perrars-si-ons 'ft/i^sjbecific off-line and Sion .tfuS on-line uses for an indefinite term, QÂ£f-iine uses shall be limited , -A tbi .yqoD SVKtoiA to those specifically allowed by "Faif^UseW as pjbesa&sSLbed by the terms United States copyright legislaLibh' (trf7~~Titl-e iVP U. S. Code) as well as h&muisn to the maintenance and preservat^qj^gg^a digital^archive copy. Digitization allows "the" University of Florida~~bo generate image- and text-based versions as appropriate and to provide and enhance access using search software. This grant of permissions prohibits use of the digitized versions for commercial use or profit. Printed or Typed Name A of Copyright Holder/Licensee Personal information blurred *23 'ZjOoQ Date of 'Signature Please print, sign and return to: Cathleen Martyniak UF Dissertation Project 5/23/2008 11:35 AM with both decision-making processes. These capabilities are discussed in terms of applications of the models in teaching, research, and in the practice of dentistry. x 24 With this basic structure for the diagnostic-classification model, the classified patient data vectors, and the training algorithm presented in Appendix B, an initial test was performed to verify that the space of observed patient data vectors was separable by linear discriminant func tions. Application of the modified fixed-increment training algorithm to the set of 480 data vectors verified this requirement, as the algo rithm terminated in a set of feasible discriminant-function weights. Using the discriminant functions these constants determine, it is possi ble to duplicate the pre-established diagnostic classifications for each of the patient data vectors. This first test of the diagnostic classifier established that a non- parametric classifier could be employed to reproduce the original clas sifications for each data vector used in model construction. However, this test does not reveal how well the classification model will perform on patient data not employed in developing the discriminantfunction weights. The remainder of this section, and Section 3.3, address the question of how the diagnostic classifier performs on 'new' patient data vectors, that is, vectors that have no duplicate in the training sample. Model training has created a set of weights that, by the definition of the training procedure, correctly classify every patient data vector that 1 ips within the bounds of the training-sample pattern-class convex hulls. Since every data vector is a binary vector, new patient data vectors must fall outside the convex hulls established by the training- sample vectors. Yet, if new data vectors have a number of data-vector elements that are identical to those of the training-sample vectors with the same diagnostic classification, then this relationship will be reflected in a 'close proximity,1 as measured by a Euclidean-distance Radiographic Examination 277 Fosca apposition 278 Fossa resorption 279 Articular eminence apposition 280 Articular eminence resorption 281 Evidence of fracture 282 Clinical or radiographic evidence of pathoses Emotional Trauma 283 Anxiety 284 Depression Bruxism or Clenching 285 Yes Uneven Centric Stops 286 Yes History of Lengthy Dental Procedures 287 Yes History of General Anesthesia 288 Yes Tinnitus 289 Yes Extraction of Teeth 290 Less than 6 weeks prior to Tt'U pain 291 Leaving a space that permits extrusion Preauricular Pain 292 Yes Alteration of Inter-Occlusal or Inter-A:ch Space 293 Yes Paresthesia 294 Yes Luxation or Subluxation 295 Yes APPENDIX H APPLICATION OF THE PATIENT-STAIE-IABELING AND OPTHlAL-TREATTIETr- SELECTION PROCEDURE Consider a health-care system with two diagnostic classifications, 'I1 and 'J.' In this system treatment T^ or treatment can be given to a patient in either diagnostic classification. Treatment has a boundary-level application number (Section 4.1.1) of one, and treatment T2 may be given only once during a patient's stay in the care system. Figure 8 presents a pictorial representation of this system. Using the labeling procedure of Section 4.2, the patient states in this system are numbered as follows: 1. W 2. R 3. HT^ 4. JlTjTg s. iit2 6. JIT2 7. 8. Jl^ 9. I 10. J . With this labeling of the system's patient states, analysis to de termine the optimal treatment decisions 'k^' for each patient-state *i proceeds as follows: 114 22 Rosen [26] has provided a restatement of this assumption in the require ment that the sets of data vectors corresponding to each diagnostic al ternative have non-intersecting convex hulls. In either' form, this is a fairly restrictive assumption on the dispersion of patient data vec tors (see Section 3.2). Selecting the 'weights' for each of the discriminant functions is a process known as 'training.' For the linear non-parametric classifier, training generates each discriminant function's w. ,'s by applying a sys- JK tematic algorithm to the members of a set of representative patterns with pre-established classifications. Nilsson [24] discusses several algorithms suitable for training the craniofacial-pain diagnostic classifier. In the course of using these algorithms for model development, a new 'mod ified fixed-increment' training algorithm was constructed (see Appendix B). Employing the new algorithm has resulted in a reduction of approx imately 35% in the amount of training time required to derive the weights for the craniofacial-pain classifier. Symbolically, the craniofacial-pain diagnostic classifier, with its set of trained weights, can be represented in the following format: let an = the 296-dimension data vector describing patient 'i' a^ = the k element in the data vector describing patient 'i', whose value is either zero or one, k=l,2,...,295 (by definition a^ 296~^ Cj = diagnostic alternative 'j', j=l,2,...,17 d^j = the value of the discriminant function for diagnostic alternative 'j' generated by the data vector of patient 'i' 80 5. Eran a matheiratical-programming point of view, the symptan- selection algorithm represents one of a limited number of techniques capable of solving a problem with non-linear constraints. The algorithm seeks an optimal assignment of components, where the feasibility of any assignment is determined by the existence of a set of discriminating com ponent multipliers. In this more general context, the structure of the algorithm may be applicable in a variety of problem areas not directly related to the feature-selection problem. The possibility of employing the algorithm in this general setting should be investigated. 6. In modeling the treatment-planning process for craniofacial-pain patients the concept of boundary-level treatment applications was intro duced. Boundary numbers on the effects of repeated treatment applications are likely to occur in data derived from the care of patients with a va riety of physiological disorders. Further investigations of this phenom enon may result in more effective methods of predicting which treatments will have boundary-level application numbers, and more efficient statis tical techniques to determine values for these numbers. 7. The training algorithm developed in the construction of the craniofacial-pain diagnostic classifier generates a feasible integer so lution to a large nurtber of linear constraints. This algorithm is both efficient and easily coded for computer applications. An investigation of the uses of this algorithm in a mathematical-programming setting may reveal applications in solution techniques for more general integer pro grams. 8. Potential applications have been suggested for the diagnostic- classification and treatment-planning models in teaching, in research, and in practice. The models and their applications have been presented so 57 With this modification, the algorithm would only consider eliminating these ten high cost features. Another heuristic approximation to the optimal collection of features might rank the data-vector elements in order of descending cost of utilization. Procedure 2 would then be used to eliminate these components one by one, starting with the item of high est cost, until the procedure signaled an infeasible solution to PI. Cer tainly, other heuristics might also be developed to exploit the structure of this algorithm. 3.5 Model Applications The structure of the craniofacial-pain diagnostic-classification model permits model utilization for a variety of purposes. Since the model is developed in terms of general data-vector and diagnostic-alterna tive parameters, these model components can be altered to suit the appli cation in question. This section presents a brief discussion of seme of the possible applications of the diagnostic classifier. In a teaching environment, the diagnostic-classification model with its set of discriminant weights can be stored for computer-terminal ac cess. Then, on a set of tutorial example patients, students can compare their diagnoses with those of the diagnostic model. Moreover, the student can interact with the classifier in constructing his own 'sample* patients for the classifier to diagnose. Finally, the student can request the classifier to relate those discriminant-function weights that the model employs in considering the 'significance* (Section 3.2) of any one or group of symptoms. The effectiveness of new diagnostic tests can be evaluated using the minimum-cost symptcms-selection algorithm. This algorithm provides an immediate measure of the 'worth' of new research developments. Given a 70 4.2 Selection of Optimal Treatments The craniofacial-pain treatment-planning model is transient in the sense that only two of the model's patient states, well and referred, can represent the patient's status when he exits the health-care system. In a stochastic sense, only the terminal states are recurrent as they alone possess non-zero long-run probabilities of state occupancy. Hence, the choice of treatment alternatives at each patient state is made with the goal of minimizing the costs accrued by the patient as he passes through the diagnostic-altemative-based patient states into one of the recurrent states. For notational convenience, in the analytic model the well patient state is denoted as state TWV and the referred state as state 'R.r In modeling the care system for craniofacial-pain patients there is no justification for providing costs for the transitions iron states *R' and 'W to themselves, hence, 'cR R' and 'c^ w' are set equal to zero. Analytically, the treatment-planning model is made monodesmic; i.e., having cnly one recurring state, by defining pR W=1 and p^ R=0. The total number of states, not including states 'W* and *R,' is denoted by 'S.1 With these definitions and the notation introduced in the previous section, a procedure for selecting the set of optimal treatment decisions is developed. Howard [25] has shown that for a monodesmic, transient Markovian decision model, a set of optimal decisions is defined as those decisions that maximize the expected-value 'v^' of occupying each system-state 'I.' Since the treatment-planning model for craniofacial-pain patients fits into this category of decision model, a modification of Howard's algorithm is employed tc select optimal treatment regimes. The process of select- UF Libraries:Digital Dissertation Project l?75 Cathy Martyniak, Project Coordinator Christy Shorey, Project Technician Internet Distribution Consent Agreement In reference to the following dissertation: AUTHOR: Leonard, Michael TITLE: Analytical models for diagnostic classification and treatment planning for craniofacial pain, (record number: 580622) PUBLICATION DATE: 1973 , as copyright holder I, for the aforementioned dissertation, hereby grant specific and limited archive and distribution rights to the Board of Trustees of the University of Florida and its agents. I authorize the University of Florida to digitize and distribute the dissertation described above for 2 of 4 5/23/2008 11:35 AM 88 case 2: if a 00 -1 < (k) w(i) j + a let and wCi+1) w(i+1> -C w: 3 (i+1) wf B[a for a subset B of the p discriminants z e B, j / B z e B for all c / {B U j} where n^ = the number of discriminants in the subset B. The algorithm is terminated when the values of the W^, j=l,2,...,p, have not changed during a complete cycle of the t training patterns, i.e., vhen W. =W; =.. .=4\T. for all j where 0 is the last case 2 pattern } j --3 examined by tie algorithm. This algorithm is guaranteed to terminate in a set of feasible * W., j=l,2,...,p, if the training sample is linearly separable and a and 6 3 have been appropriately selected. If the training sample is linearly separable, the algorithm will converge for any fixed value of a > 0, where Â¡3 is selected appropriately large. Hence, the algorithm is nor mally applied to a training sample with a=0 and 6=1. If the algorithm converges, these constants can be adjusted and the training algorithm reapplied. The justification for specifying a non-zero a (a = size of the dead zone) is that as a is increased the accuracy of the classifier is increased in making classifications of data not used in developing the discriminant-function weights. For example, with the craniofacial-pain diagnostic classifier and the test samples discussed in Section 3.3, the diagnostic model correctly classified approximately 5% more of the test samples' data vectors when the model was trained with a=30, 6=3 (versus an original training with a=0, 6=1). 79 the problem of finding process' structure is compounded in the health-care field by a lack of unifying and consistent nomenclature. In the health care field, scholarly literature and historical precedent can serve as the justification for two or more contradicting sets of terminology for the same anatomical structure or physiological process. Thus, in re searching the generality of first-order decision-making techniques, the investigator must consider process variability and nomenclature incon sistency before he makes any statement about the applicability of this dissertation's decision-making tools to other health-care environments. 3. A ncn-geanetric discussion of the criteria for pattern space separability was presented to provide a means of characterizing health care disorders for which diagnostic classification by a linear pattern classifier might be feasible. Unfortunately, this dissertation's tech niques are heuristic and do not provide an exact reproduction of the underlying mathematical specifications. Future research in this area could lead to a precise statement of non-gearetrie criteria for linear separability, and thus provide an indirect means for evaluating potential applications of linear non-parametric classifiers. 4. This dissertation's minimum-cost symptan-selection algorithm represents a clear departure from previous research in feature selection. The algorithm's utilization of the convex-hull representation of pattern space separability makes this development unique in the literature of feature selection. However, the algorithm's method of checking the fea sibility of potential feature collections is extremely tedious. A more efficient method to check feature-collection feasibility may be revealed through future investigations in this area. APPENDIX E STABILITY OF TPANSITIC^I-PEORABILITY ESTIMATES This appendix presents the technique employed to determine the sta bility of transition-probability estimates of treatment effectiveness for patients who occupy the same patient state, but who exhibit varying combinations of relevant data-vector elements. The analysis shorn here is limited to one treatment alternative, treatment 24, and one diagnostic classification, Diagnostic Alternative 13, but a similar investigation was performed for a majority of the other diagnostic classifications and treatment alternatives. Variability of the data-based estimates (Section 4.1.2) of transition probabilities is analyzed in terms of five factors, patient's sex, patient's age, duration of patient's pain, nature of pa tient's pain (continuous or episodic), and number of replications of the same treatment alternative. Statistically, '2-way' contingency tables [29] measure the effects of these relevant factors. The rows of each table specify the number of transitions out of Diagnostic Alternative 13 following an application of treatment 24, and each table's columns specify a value for the factor . 2 being analyzed. A chi-squared statistic x is employed to test for in dependence between the number of transitions and the factor in question. 99 6 patients will exhibit symptoms that lead to any one of several alterna tive courses of patient care. Altering the occlusion of the natural teeth is one means of treating craniofacial-pain patients. Although in many cases minor occlusal abnormalities are only contributing factors to a patient's pain, attention by the dentist to occlusion is at least partially successful for a majority of craniofacial-pain patients [8]. However, it is important in early therapy not to alter the occlusion ir reversibly. Treatment by means of tooth extraction or endodontics, jaw fixation, prosthetic devices, or by topical treatments may also be sug gested by the patient's symptoms. The articular surface of the mandib ular condyle has an excellent reparative capacity [6]. Thus, the use of sedatives, antibiotics, and muscle relaxants, along with physical therapy, often leads to patient 'cures' as these treatments ease the patient's pain and increase jaw mobility while natural restoration of the joint is in progress. If, after a reasonable length of time (3 to 6 months) the pa tient's symptoms are not relieved, the dentist may consider referral to another source of care or therapy such as surgery [7]. typically, the health-care process for craniofacial-pain patients may be viewed as following the format of Figure 2 [9]. When a patient is admitted into the care system, he undergoes a data-collection process. This involves taking a 'full and pertinent' patient history and a phys ical examination of the areas of discomfort. The data gathered consist of symptoms, signs, medical and/or dental history, physical examination findings, psychosocial information, and so forth. Once these elements have been elicited, a diagnosis is attempted. If this is not yet pos sible, the severe symptoms are treated and the patient's health state is monitored. 39 where A. i "11 1 a.., a._ .. a. ll i2 in a 2 a 2 ail ai2 a.2 on m. m. m. . a.-i a.0i... a. x 1 L ll i2 xn W. = [w.. ,w.w. ,w. .,] i ll i2 xn xn+1 T C [c1,c2,...,cn,0] r X [x.^,x2, . ,x^,l] and Wy, is an unrestricted variable Cj is the cost of using feature j x. =4 i 0 if feature i is not used 1 if feature i is used . Note: The notation is to be read as element by element multiplication i.e., QDR = S [s^j] = [q^r^..]. 3.4.1 Algorithm Development The algorithm developed to solve problem PI is an enumerative algorithm similar in structure to that of Balas [28]. Unfortunately, the ncn-linear nature of problem Pi's constraints prohibits full imple mentation of the more powerful techniques used in implicit enumeration on linear integer problems. The structure of these constraints and their effect on the optimization of PI will be discussed in a step-by- step development. The minimum-cost feature-selection algorithm does not solve PI to the extent of finding the values of the vectors iA, i=lf2,,,.,p. This 55 3.4.2 Staten Step 0: Step 1: Step 2: Step 3: Step 4: Step 5: 'ent of the Minimum-Cost Synptcm-Selection Algorithm Create the assignment vector (at this point the vector is null as there is no variable assignment in the vector). Set V*=- and go to Step 4. Start at the right side of the assignment vector and move to left, stopping at the first variable assigned a zero value. If no variable in the assignment vector has a zero assignment, go to Step 2. Otherwise go to Step 3. Calculate V for the assignment vector. If V is greater than V*, record the values of the variables in the assign ment vector as the optimal solution X* to PI. Otherwise, record (as the optimal solution X* to PI) the values of the variables in the best current solution X. Terminate the algorithm. Change the value of the variable isolated in Step 1 to an assigned value of one, and eliminate frcm the assignment vector all variable assignments to the right of this new assignment. If the assignment vector includes the assign ment XÂ£=l for every x^ in X return to Step 2. Otherwise go to Step 4. Select a variable x^ that is not an element of the assign ment vector. Assign this variable the value x^=0 in the assignment vector. Use Procedure 2 to check the feasibility of this assignment. If the assignment vector is not fea sible, go to Step 6. Otherwise go to Step 5. If the assignment vector with the new assignment x^=0 does not include an assignment for every x^ in X, return to Step 4. Otherwise go to Step 7. I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Kerr^E. Kilj^trick, Chairman Assistant Professor of Industrial and Systems Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and qualify, as a dissertation for the degree of Doctor of Philosophy. // r? -j?' Thomas B. Fast Professor and Chairman of the Division of Oral Diagnosis I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and qualify, as a dissertation for the degree of Doctor of Philosophy. I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Professor and Chairman, Department of Basic Dental Sciences 15 applications, or by considering at any stage of analysis the effects of a fixed number of future treatments. Validation of the decisions gen erated by these models has thus far been limited to checks on the feasi bility of the treatment regimens selected. Unfortunately, the finite- horizon models either do not consider the possibility of a patient's prolonged stay in the health-care system, as is the case of the models with a maximum number of possible treatments, or, where only a fixed number of future treatments is considered, they provide no more than a heuristic treatment-selection procedure. 2.4 Uncertain-Duration Treatment Planning Bunch and Andrew [11] have considered the possibility of prolonged occupation of the same diagnostic state during the course of a patient's progression through the care system. In their Markovian representation of the care system for mid-shaft fractures of the femur, they provide this modeling refinement. As a consequence of this modification, the number of treatment decisions made for each patient is a random variable with no fixed upper bound. Howard's iterative scheme for policy selec tion [25] provides the means for choosing the optimal treatment regimen by selecting treatment alternatives that maximize the relative 'value' of occupying each disease state. Although the Bunch and Andrew model did not consider return visits to the same disease state, a more generalized Markovian representation could incorporate that possibility. Neverthe less, the proximity to reality that this category of transient Markovian models provides requires considerable effort as holding-time distribu tions, treatment 'costs,' and transition probabilities must be supplied by the analyst for all treatment alternatives at each of the disease states in the care system. 56 Step 6: If the assignment vector with the assignment x^-1 variable selected in Step 4) does not include an assignment for every x^ in X, return to Step -4. Otherwise go to Step 7. Step 7: Calculate V for the assignment vector. If V* is greater than V, go to Step 1. Otherwise go to Step 8. Step 8: Record as the best current solution X the values of the variables in this assignment vector. Set V*=V, and return to Step 1. Note that in the course of applying this algorithm all solutions are considered and the best current solution is replaced only when another solution has a larger associated value. As the number of possible solutions is finite, the algorithm must terminate, and at this termination the value of the optimal solution and its assignments are known. An application of the minimum-cost symptom-selection algorithm is presented in Appendix C. 3.4.3 Computational Considerations Returning to the setting of diagnostic classification of craniofacial- pain patients, application of the minimum-cost symptom-selection algorithm 295 would require an enumeration (explicit or implicit) over 2 possible solutions in order to find the optimal collection of data-vector elements. As the number of possible solutions is prohibitively large, heuristic modifications to the symptom-selection algorithm are required for this application. One possible modification could employ the fact that only a few of the elements in the patient data vector have large associated 'costs' for their utilization. In particular, the eight elements of radiographic data and the two measures of emotional trauma are significant ly more 'costly' to examine than the other items in the data vector. (x, is the TABLE 4 CIASSIFICAIIGN VARIABILITY 'AMONG DENTAL PRACTITIONERS Diagnostic Classification for Patient 1 Patient 2 Patient 3 Patient 4 Patient Original Classification 4 13 15 15 9 Practitioner 1 1 7 15 15 3 Practitioner 2 6 12 15 8 3 Practitioner 3 4 15 15 15 13 Practitioner 4 4 15 15 14 * * No classification given + Patient 5 exhibited a minimal amount of input data (only 17 non-zero'data-vector entries) These four dental practitioners exhibited 100.0% agreement of the diagnosis on one of the five patients, and 50.0% agreement on the diagnostic classification of the remaining four patients. 42 Note that relation (2) is the requirement for pattern separability by linear discriminants. Hence, a vector X is a component in a feasible A A A A A solution ** to ^ anc^ only ^ there exist VA i=l,2,...,p, such that (2) holds for all i^j. As discussed in Section 3.1, a pattern A space is linearly separable, and hence, feasible VA exist, if and only if the individual pattern classes have non-intersecting convex hulls. For the pattern vectors considered in this section, the individual components of each of the patterns in each pattern class are either zero or one. As there is a one-to-one correspondence between the individual patterns in a pattern class and the vertices of the pattern class's convex hull, the A convex hull of a pattern-class A^ can be expressed as all convex combina- * tions of the individual pattern-class vectors a/, m=l,2,... ,m^. Consider the following examples of the convex-hull representation of linear separa bility . Assume = [1,0], a^ = [1,1], a^ = [0,0], and a^ = [0,1]. Graphically this pattern space can be represented as 2 2 ^Y X Feature 2 **0 **-X 1 1 2y ^x Feature 1 12 where the line X from a^ to a^. represents the convex hull of pattern 1 2 class X and the line Y from to represents the convex hull of pattern-class Y. Since X and Y do not intersect, implying that the space is linearly separable, it is possible to draw an infinite number of lines 0 that serve as discriminating hyperplanes. LIST OF TABLES Tables 1. Survey of Diagnostic-Classification Models 12 2. Correlation Between Significant Symptoms and Discriminant-Function Weights.... 30 3. Tests of Diagnostic Classifier Accuracy 32 4. Classification Variability Among Dental Practitioners... 35 5. Mean Transit Times Through the Craniofacial-Pain Care System 75 vii 28 Then log [JJ Pts^Jc..]] = aWj, and decision rule (1) can be restated as s.eS i classify a patient who is characterized by the vector a in the j diagnostic alternative if aWj > a^_ for all k^j. (2) Note that decision rule (2) is identical to the decision rule employed in non-parametrie pattern classification. This equivalence implies that if (1) holds for every preclassified patient examined, the values log P[s.JCj] form a set of feasible discrim inant-function weights. If (1) leads to the correct classification of a majority of the patients examined, it is logical to assume that there may be a set of feasible discriminant-function weights. This assumption was examined using the craniofacial pain patient data. Fran the data vectors classified in Diagnostic Alternatives 13, 14, and 15, a total of 189 patient visits, the P[s^|Cj] were calculated. Each data vector was then classified with decision rule (1), and 164 of the data vectors (86.7%) were assigned to their pre-established diagnostic alternative. The second criterion provides a subjective measure of the feasibil ity of using a nan-parametric pattern classifier. If symptoms for most of the diagnostic alternatives, associated with the disorder of interest, can be isolated such that 1. a patients exhibition of a subset of these symptoms leads the practitioner to a selection of one of the diagnostic alternatives, or 2. a patient's exhibition of a subset of these symptoms leads the practitioner to eliminate from further consideration |