<%BANNER%>

A Knowledge-Based Toxicology Consultant for Diagnosing Multiple Disorders

Permanent Link: http://ufdc.ufl.edu/UFE0021958/00001

Material Information

Title: A Knowledge-Based Toxicology Consultant for Diagnosing Multiple Disorders
Physical Description: 1 online resource (124 p.)
Language: english
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: acquisition, adjusted, artificial, automated, automatic, automatically, based, bottleneck, case, center, centers, clinical, computerized, consultant, consultation, contributor, contributors, control, data, database, databases, decision, diagnose, diagnoses, diagnosing, diagnosis, diagnostic, differential, discovery, disorder, disorders, drug, drugs, effect, effects, expert, exposure, exposures, fault, faults, florida, generate, generated, generation, information, intelligence, intelligent, knowledge, learning, likelihood, machine, medical, medicine, mining, multiple, poison, poisons, primary, ratio, ratios, reasoning, rule, rules, sign, signs, substance, substances, support, symptom, symptoms, system, systems, toxic, toxicology, toxin, toxins
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Every year, toxic exposures kill twelve hundred Americans. More than half of these deaths are the result of exposures to multiple substances. In addition to being dangerous, multiple exposures are particularly difficult to diagnose. At this time, no general solution exists for the diagnosis of multiple disorders due to the non-linear interactions observed in such cases. This dissertation presents the development of a prototype knowledge-based system for diagnosing toxic exposures. The goal of the system is to generate differential diagnoses for unknown exposure cases based on the clinical effects observed in patients. The system is not meant to replace physicians, but, rather, to serve as a medical decision support system. Acting as a consultant, the system provides access to case-based summary data that is normally unavailable. The system is automatically generated by applying data mining techniques to a database supplied by the Florida Poison Information Center. For diagnosis, the system uses pre-test probabilities and likelihood ratios--calculations commonly used throughout the medical profession. To overcome certain shortcomings of likelihood ratios, the equation employed by the system is adjusted to account for every possible outcome. Using the adjusted likelihood ratio enables robust calculations while closely modeling the likelihood ratio that physicians know and trust. Trained and tested on single exposures, the system achieved an accuracy of 81.0% on cases involving at least three clinical effects. Repeating the process for multiple exposures alone resulted in a failure, at least partially due to insufficient data. However, training on various combinations of single, double, and/or multiple exposures, the system achieved an accuracy of 86.9% when diagnosing the primary contributors for multiple exposure cases. Although a solution for diagnosing multiple disorders remains elusive, the ability to identify primary contributors is a significant contribution to addressing the problem. This system is the first American diagnostic system for the field of clinical toxicology and its use of adjusted likelihood ratios serves as a method to bridge the gap between intelligent systems and the medical field. Furthermore, by automatically generating the system, this research addresses the knowledge acquisition bottleneck that plagues traditional expert systems.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Arroyo, Amauri A.
Local: Co-adviser: Dankel, Douglas D.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0021958:00001

Permanent Link: http://ufdc.ufl.edu/UFE0021958/00001

Material Information

Title: A Knowledge-Based Toxicology Consultant for Diagnosing Multiple Disorders
Physical Description: 1 online resource (124 p.)
Language: english
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: acquisition, adjusted, artificial, automated, automatic, automatically, based, bottleneck, case, center, centers, clinical, computerized, consultant, consultation, contributor, contributors, control, data, database, databases, decision, diagnose, diagnoses, diagnosing, diagnosis, diagnostic, differential, discovery, disorder, disorders, drug, drugs, effect, effects, expert, exposure, exposures, fault, faults, florida, generate, generated, generation, information, intelligence, intelligent, knowledge, learning, likelihood, machine, medical, medicine, mining, multiple, poison, poisons, primary, ratio, ratios, reasoning, rule, rules, sign, signs, substance, substances, support, symptom, symptoms, system, systems, toxic, toxicology, toxin, toxins
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Every year, toxic exposures kill twelve hundred Americans. More than half of these deaths are the result of exposures to multiple substances. In addition to being dangerous, multiple exposures are particularly difficult to diagnose. At this time, no general solution exists for the diagnosis of multiple disorders due to the non-linear interactions observed in such cases. This dissertation presents the development of a prototype knowledge-based system for diagnosing toxic exposures. The goal of the system is to generate differential diagnoses for unknown exposure cases based on the clinical effects observed in patients. The system is not meant to replace physicians, but, rather, to serve as a medical decision support system. Acting as a consultant, the system provides access to case-based summary data that is normally unavailable. The system is automatically generated by applying data mining techniques to a database supplied by the Florida Poison Information Center. For diagnosis, the system uses pre-test probabilities and likelihood ratios--calculations commonly used throughout the medical profession. To overcome certain shortcomings of likelihood ratios, the equation employed by the system is adjusted to account for every possible outcome. Using the adjusted likelihood ratio enables robust calculations while closely modeling the likelihood ratio that physicians know and trust. Trained and tested on single exposures, the system achieved an accuracy of 81.0% on cases involving at least three clinical effects. Repeating the process for multiple exposures alone resulted in a failure, at least partially due to insufficient data. However, training on various combinations of single, double, and/or multiple exposures, the system achieved an accuracy of 86.9% when diagnosing the primary contributors for multiple exposure cases. Although a solution for diagnosing multiple disorders remains elusive, the ability to identify primary contributors is a significant contribution to addressing the problem. This system is the first American diagnostic system for the field of clinical toxicology and its use of adjusted likelihood ratios serves as a method to bridge the gap between intelligent systems and the medical field. Furthermore, by automatically generating the system, this research addresses the knowledge acquisition bottleneck that plagues traditional expert systems.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Arroyo, Amauri A.
Local: Co-adviser: Dankel, Douglas D.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0021958:00001


This item has the following downloads:


Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID E20101111_AAAABQ INGEST_TIME 2010-11-11T16:27:34Z PACKAGE UFE0021958_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
FILE SIZE 26342 DFID F20101111_AABEVJ ORIGIN DEPOSITOR PATH schipper_j_Page_114.QC.jpg GLOBAL false PRESERVATION BIT MESSAGE_DIGEST ALGORITHM MD5
9225ab7f2f0bfd2b5f576a00c8c95f81
SHA-1
58176f03928959521aecca96e86993b385b9c178
22590 F20101111_AABEUV schipper_j_Page_075.QC.jpg
d185d9810af1a10b4c0f6c339c67bc13
e270e6bb5ee2daa2e228387cd8bf735ba2516bc8
29842 F20101111_AABEVK schipper_j_Page_032.QC.jpg
b1c34463e3822110484201154c7d1675
f04820c2aeb9ffaf10858b85d8fc920d427012d2
23925 F20101111_AABEUW schipper_j_Page_029.QC.jpg
a22d1e7a96a59cee1c0e3639a90326d7
5cd9bd076f4628ba9f939bb5f2da5d4e40786868
6773 F20101111_AABEVL schipper_j_Page_113thm.jpg
5576f27a2ff279d31b94902bf80e67dd
6689fe66c94c4cec90f689ecbdb1b16370c6afd3
28253 F20101111_AABEUX schipper_j_Page_057.QC.jpg
fd02b638cb09779f120d9fc219f2b2a7
199b5bdf8e1c993a677535e54993c75fa3f9812d
23702 F20101111_AABEWA schipper_j_Page_048.QC.jpg
ab50bbf2c3e82e4079fe98d763f246a9
7d732dc36c20633705b68af3595961e4dd3153bd
28053 F20101111_AABEVM schipper_j_Page_049.QC.jpg
59397f7f1614edb1b751908348ca6124
f659d0359a4a942ae7f24f82584072b0ad060328
26526 F20101111_AABEUY schipper_j_Page_018.QC.jpg
f48a13540b2bdc3c18b165bb075d27dc
3d0d50c13589000239e599f17f97599bdf38f48c
6757 F20101111_AABEWB schipper_j_Page_068thm.jpg
6dc54c139b90bc20defc03cd29bac30f
5a5d1a924ae5967a54c5dfd564d6177d57deb044
25546 F20101111_AABEVN schipper_j_Page_025.QC.jpg
03b8ac58f8834f36bc3065c822154c3c
a6db9f13ada323383975136325fb612fce8428be
23456 F20101111_AABEUZ schipper_j_Page_007.QC.jpg
e4277ca47d8e1760a7a3587ed9a3296c
0ae2d8cdd3c5aff4c09a948db41ad73de5f0efd9
54366 F20101111_AABDTA schipper_j_Page_017.pro
762719bb21c1b6b544ac82e4dc7991fd
91b82bf082d0c1d50b0472e10bd2f57681bcce03
1624 F20101111_AABEWC schipper_j_Page_003.QC.jpg
839556f032dc3325d5b5dc9b162e3868
3b6fbcea5238f2006a5c3b3a68ad2727ff9e757b
29036 F20101111_AABEVO schipper_j_Page_074.QC.jpg
5c0bc48ba0e452325a303a0bbc544b0e
55dafd0b12a24cd5da348f59a381fb136c9fb9c0
1084 F20101111_AABDTB schipper_j_Page_124.txt
1648a323c7bb9e227b4188e22aad75cc
5ec495cb20beaee27f4d9dc057487f90eedc6475
6590 F20101111_AABEWD schipper_j_Page_072thm.jpg
722aed25b10f8d3b901aea2b80e6e88a
be805a5027094017cb02962cfc9a0425bc3331ac
5694 F20101111_AABEVP schipper_j_Page_035thm.jpg
7e117607cc3ea82c985223ce79841454
8800d3521c2248094e2b62b05fd61d8305014646
21663 F20101111_AABDTC schipper_j_Page_035.QC.jpg
42f96e20a3fcaed9a070a9faca1486ed
575a8b577fb34aec97760974bb17c5a2ba74c16e
10019 F20101111_AABEWE schipper_j_Page_008.QC.jpg
9f3ac8cbbf2ea0ceef3ba68a00b04cad
812cb98c336530ae7ea7c8848f30127ba10daca5
5041 F20101111_AABEVQ schipper_j_Page_006thm.jpg
7c8178effe265de1f4db84da896c7805
515aea5de233012538867d31dd83055a4260ae04
3097 F20101111_AABDTD schipper_j_Page_118.txt
4939f90cc1d8396f27faa1ef0ea34c31
044a18c73ab9fd87419ee4ec64ffdf3362cce2a0
5871 F20101111_AABEWF schipper_j_Page_029thm.jpg
bdccb22acd8cc02478a68a5245135dda
bb80a0bd1affbc32f482e9916e6740becb8cf2be
28527 F20101111_AABEVR schipper_j_Page_069.QC.jpg
a848b33c6e84a455b44836f8268f9588
671d9dfcb41d0128ec1235e1c29f72ea5943f128
1051973 F20101111_AABDTE schipper_j_Page_104.jp2
5bc0211229a45d330882cfddd027b792
f7d15356de058f5693b2320773bf02546e7d523d
23833 F20101111_AABEWG schipper_j_Page_097.QC.jpg
b24fad1fab4aa63cd2798ec65cfc0521
f08f22715ad98cbcc53c0fa865af99a009ee4ebc
25918 F20101111_AABEVS schipper_j_Page_019.QC.jpg
4e6b95c795fe5948e3dc0b64e00ade39
1f505357239b7802829b6cb2bc7ed8c3b71723f6
497428 F20101111_AABDTF schipper_j_Page_010.jp2
a59dc2ce4cf79272565b4d29c01ec60e
b8d52fac9192bc6a698fcf3d6a130f9341547b18
4270 F20101111_AABEWH schipper_j_Page_123thm.jpg
3d20c0f389cba489580e8c9bdc83debd
d4f46098c95e5f64208e50042cfa0464fa42b693
6283 F20101111_AABEVT schipper_j_Page_037thm.jpg
513c1218b71988cef27644135fe0bd77
2ad278ca65b3d7dda71eae14677ea3e5a9fec895
2076 F20101111_AABDTG schipper_j_Page_100.txt
6555353847880e7eb064e677389a7d18
45efa4d5772e74441aa9d5278b7cce63d7bac3d3
6613 F20101111_AABEWI schipper_j_Page_087thm.jpg
a7e65bca7aebf7162025497c7340d456
c1b924c1dba89dda939d3be6f5df4e571d5e49f4
25768 F20101111_AABEVU schipper_j_Page_108.QC.jpg
b411e9820b9a6e51e94035d16fbaf426
6612a6163fb72cefbbb4bd164f051cb51157f2d8
25488 F20101111_AABDTH schipper_j_Page_045.QC.jpg
eef5f6a858ed32e4e038920dbaabed16
17975b39ee8d96f68e10133b115a16d5cc695940
24902 F20101111_AABEWJ schipper_j_Page_076.QC.jpg
4042341640d93c9b9386013cba69235f
8aa54d2c36ddce0c7075f688fa7611a2e05e754c
5861 F20101111_AABEVV schipper_j_Page_105thm.jpg
352a8796b8a69697d89d006eb99e56fe
7b2a5ce2b1ecf14a079d355a71b4661cd0e2d2f3
51479 F20101111_AABDTI schipper_j_Page_114.pro
cd51da4884885763c872777d992278c5
30f08bf2601a682cac5a38a79c63fc5c481eec65
6830 F20101111_AABEWK schipper_j_Page_094thm.jpg
a273d9edd435b60b7f9960f923c07174
2147b4f0ecfb685001d1868cf49ef5df7a101138
2119 F20101111_AABDTJ schipper_j_Page_093.txt
82fc8c939719065043e9475d28f3b933
63ad5824be3791a0b50d920c9bf38d6b26990e3e
13925 F20101111_AABEWL schipper_j_Page_124.QC.jpg
b4842a7484a16c0ce0737160960b5518
fedc6929cbc166c82cf394a079d5d14da2505ece
6246 F20101111_AABEVW schipper_j_Page_040thm.jpg
b7000dca0acbc5193282786917348d82
4000049ffc20398c65ffd730c858e25eedfa4c11
6589 F20101111_AABEXA schipper_j_Page_015thm.jpg
05f7136e983e7c62e9f427b2b697d45e
3c367ede0a3d2e30f2ed7bc08c981e106599170b
25271604 F20101111_AABDTK schipper_j_Page_013.tif
8f1d86fc1cc58b62c06df2e2af5c6c8a
8268a1f9d80e837a6a882c9443080d43cdec930d
26721 F20101111_AABEWM schipper_j_Page_051.QC.jpg
c1e64ad55e5bf35d682b13ebaa7c1c85
203197af8e32a423d460a282e40e4debabea86d2
3735 F20101111_AABEVX schipper_j_Page_117thm.jpg
52ac5b56e994c7a056fb5079a87b9fbe
027b9ca5a6d78623eff7992c1bd14d6b5155cc4a
2703 F20101111_AABEXB schipper_j_Page_004thm.jpg
53838f99e8e2e0f894e9febb9594f541
a987a5d43f730766b17dc9eca994b3bee02d1e96
2580 F20101111_AABDTL schipper_j_Page_094.txt
8d8d9412552a06201216045b695b8811
f6e9e9ddca59ffac925da77ac6e8c5438ef3a562
6818 F20101111_AABEWN schipper_j_Page_107thm.jpg
cd3f994bcd2e6e1ebe25b899ae01c2fc
e6e14aafba94ec807b5db9fad7f6878a280826cc
5580 F20101111_AABEVY schipper_j_Page_007thm.jpg
4f107494684d7dc7df2d0d5f7785bef9
d23ba315a755114e8d379141566d8a32b9dcf0d8
81344 F20101111_AABDUA schipper_j_Page_052.jpg
504a151a5ff7a03a69c008489f3ee777
2d67979de84220da9562453b82d881332deb3d7b
29704 F20101111_AABEXC schipper_j_Page_120.QC.jpg
07d4c336f5d29989037d481ef12656d4
eeab345828d6df2c1d397733ba009aafd0f44259
27296 F20101111_AABDTM schipper_j_Page_015.QC.jpg
e036be83431911c60ec154a324dc13b0
cdcaf3ed26b4d18940c88eb5b8ee6c423882d8be
3476 F20101111_AABEWO schipper_j_Page_010thm.jpg
83f07b654f3cd3a21be1cdbcc447ff3f
3c07701cc9f770f4c6e9ff0ab6c9d21df38f41a5
1051985 F20101111_AABDSX schipper_j_Page_090.jp2
5170a8d5062697b9f34e6ef0700625ed
67b9b8e9512e22faae83eb03f68c2b0e38cea340
6309 F20101111_AABEVZ schipper_j_Page_082thm.jpg
0d0a73ada3334b4a9e0187cd3d9e4e0f
23f0183e0b8214f5488b71258c201539025a1eef
28379 F20101111_AABEXD schipper_j_Page_017.QC.jpg
3c4683681670a06721da179123cd9666
d58bf000d65efef932229e36402dd1a3ecf4a931
39326 F20101111_AABDUB schipper_j_Page_123.pro
848cc32fb47698eca37e39f18e24bb22
527c0f4c45fc8d7443ef0179c88692b1f8f4716d
6669 F20101111_AABDTN schipper_j_Page_086thm.jpg
4d7ca9e60530b7cc73d46ea1c7830061
769ccb00146cdc97af554558801a3d724a70d510
28426 F20101111_AABEWP schipper_j_Page_072.QC.jpg
c95416c09b5804102e2034fdc9165e2d
c54b9af0e06324c24c58e10ba937026de1fe6875
F20101111_AABDSY schipper_j_Page_004.tif
e1b010b633d01c93b8fb1c8b1f0e5cf8
5b868c371524a5bbb207b5fc200ae3a7ff4c6af0
6612 F20101111_AABEXE schipper_j_Page_016thm.jpg
d16142d543f70b38b909568aff26c731
678ea4f7a46b3e0f7fe4e4a4739fb7c3b2cdff93
6847 F20101111_AABDUC schipper_j_Page_058thm.jpg
62e0dea5e60918a94197048c4b9c1f0a
507ee06eaf3453579b62dfc7486b652549c562ca
6976 F20101111_AABDTO schipper_j_Page_091thm.jpg
e3ef33bb89bd0d2d760fd848c66acd9d
53a63cb1a9073248a0458e9648bd07f145983649
30099 F20101111_AABEWQ schipper_j_Page_103.QC.jpg
a20192a574f6510851e0fe0f8bc70008
edcecb7b50ea8d83866a71cefa1b07b4cfe9a782
F20101111_AABDSZ schipper_j_Page_119.tif
a8329bb12c3f25dacac9087bbdde8b71
f94a17aa6bc2ec366a4bafbe46eb46e3677b3d1e
6217 F20101111_AABEXF schipper_j_Page_025thm.jpg
d3b7c2188da6bc122eb9e6f0e2fb1d2d
7a1ef2c70147bceb7ef7066eb9aaccd37e7782ea
5746 F20101111_AABDUD schipper_j_Page_011thm.jpg
d30e4c5bf3adb37f8e7f20494342f6c5
ec394eaf358356b212eac60095bdf53edd589188
F20101111_AABDTP schipper_j_Page_107.tif
cc913db40ad36cbe102e1730ec189ecc
69728b7182154e68efaad70ebc3cc2b17e0cc617
6696 F20101111_AABEWR schipper_j_Page_057thm.jpg
7bcf18007b35b14d9a9acecfa1f65342
ef491e04b9302b638353bcbb63695f549e3c92c4
6526 F20101111_AABEXG schipper_j_Page_014thm.jpg
135afd1a150092d2fc1996d2fc504c1d
f5254ae9fe1b7b004dd10f20bb624836c8ed622b
F20101111_AABDUE schipper_j_Page_007.tif
ce2b53b1cf943fb3b8677a53c28f797e
6c3be57b61c339db3595782450f6be77698c5620
85596 F20101111_AABDTQ schipper_j_Page_051.jpg
550c98bf1f6ced407a43b2bf8841cef6
9cfffe45f43c80a47a735c1ec8eac5fe1fa89629
6742 F20101111_AABEWS schipper_j_Page_110thm.jpg
0582b0d192618af77d01ac03970c836a
56565f7cfc1eba7be99ece59885178a3f265fec4
17841 F20101111_AABEXH schipper_j_Page_020.QC.jpg
a5b099801889515cc5b40c3934005a37
6a5676365c5c431b8b75efbd164e704ed434fa95
22790 F20101111_AABDUF schipper_j_Page_066.QC.jpg
2820efd4df12889bf3f2c9f44989cc10
020575b7ba89069e2148493854d20f6d2091b3f8
1051941 F20101111_AABDTR schipper_j_Page_085.jp2
78a44ab9c92dab8aa7e4277a012303cf
2e713c3302a7461536ce501411ab443bbfaa0abb
20614 F20101111_AABEWT schipper_j_Page_026.QC.jpg
e6cef9287fd4b1aaa621f6a86d7c276e
f17d556924d4773f249d1561d17050094fee5b8d
70880 F20101111_AABEAA schipper_j_Page_036.jpg
3aaa0c6c187718827852bbbf7307ec83
aca4eb5a47d3006004baca05c8b67e7826eecde9
6772 F20101111_AABEXI schipper_j_Page_116thm.jpg
801d652c5a8b639d706da6ee431e552d
76ef5100610f672c976d835ecfc0978478d46f1e
1051976 F20101111_AABDUG schipper_j_Page_031.jp2
f0a0bdf5cc3fcba78b15776a73146e79
54fc4a89d6d4aae03b470fbc6a62fbe0473566a7
2148 F20101111_AABDTS schipper_j_Page_022.txt
9ad6aaa8e3c8b6b2d138333e767c0552
11eaf9b86f180f2216b2b0443ac9e319ae97a0d9
28646 F20101111_AABEWU schipper_j_Page_088.QC.jpg
1204bc196a9f9e1634a39acff478a49b
6bd660d6cfda3698bb5873e564c4e60da964606b
6129 F20101111_AABEXJ schipper_j_Page_085thm.jpg
3b27ea82decc56197722e0d3577a9664
9a5d1230a87bf308b3be893b5129f151b41699b0
95831 F20101111_AABDUH schipper_j_Page_006.pro
56838246d4d0a312df9c84862608848d
cff1d2584f58745bccc941133c86912ac3d0755e
6173 F20101111_AABDTT schipper_j_Page_056thm.jpg
ac67843ae90354c75683a09fe4708298
3cb3856056390f19ba39db9f7ec7e84b8176d8f2
23164 F20101111_AABEWV schipper_j_Page_011.QC.jpg
a8925ff548c40cce87d69b604e53ae62
b47b109a7be05e42d3b450711f3edcc6dfc0b2b5
82953 F20101111_AABEAB schipper_j_Page_037.jpg
e0f73b9cefb4841cde06b78e69fc250a
b89f7591005de9a7dd49494a1003109cb8ed5bb1
28354 F20101111_AABEXK schipper_j_Page_042.QC.jpg
2f82f3df7372992a694a6c865e6f99f0
e544ad40cdb6b8fb960a17b9abfd1042998a249d
25302 F20101111_AABDUI schipper_j_Page_060.QC.jpg
bdd8ebecae57b59f076aea802b1785e5
f7a6000c39e79f0d41860a143cfaae2b64aae0d1
F20101111_AABDTU schipper_j_Page_037.tif
2b60ce01690350ef25f7206ba9646037
b543c150d8d6f12119eaafd58a34b2f4006b3282
6909 F20101111_AABEWW schipper_j_Page_032thm.jpg
65a44bd06b7c0e96ebd5ed160647323c
b9dcf86ace5ecb2cec39b04708851ca75582783d
88982 F20101111_AABEAC schipper_j_Page_038.jpg
b6b76e7073e88e17026f8b0edbbcd425
93637f704808f918fbdeb019c18e27c59476f160
27034 F20101111_AABEXL schipper_j_Page_027.QC.jpg
41bd2db2d8747629d19f30213fce5003
387c8748870354963769c22b4b7004c2e156b46e
26789 F20101111_AABDUJ schipper_j_Page_058.QC.jpg
a201cfee5dc560f6ff9553b2a0588da7
db5b02c148353fb7d5251499cbb1def3e41b35ab
80956 F20101111_AABEAD schipper_j_Page_040.jpg
2a6905984093772cc68390883a4bead5
910b21bbfb25ec130a7e4047a954df3b0d939c44
6512 F20101111_AABEYA schipper_j_Page_104thm.jpg
d55fd027fc6f05851188668585d7ab22
b16828a2d00e96225860adaca3d6323ecfc456ee
28128 F20101111_AABEXM schipper_j_Page_093.QC.jpg
302f8b7ead09b61b1f761e328605e731
26609c7de641184dc7d982d6e0eccf084f058125
F20101111_AABDUK schipper_j_Page_015.tif
1084345992495e2630c9c1f5738f938c
d5f33e32e7b2a81230c41aca174f7694d0c25697
6673 F20101111_AABDTV schipper_j_Page_042thm.jpg
e74e3ea69d785e057a6671cbd48a133b
fc0cf09174c87a4b12279227b300b2558f7b4bd0
6023 F20101111_AABEWX schipper_j_Page_108thm.jpg
db60cc3e4c38c377ef0bc749487bed0f
263abbed10bf230d1d0ba8c5449f405114de99ed
87049 F20101111_AABEAE schipper_j_Page_041.jpg
a8161a3ae5f327b371d91cf79f4da516
deb0fa8ade2d493fb2d69c22c7871e590476dfda
15383 F20101111_AABEYB schipper_j_Page_012.QC.jpg
14827d5ed3beb31871adb88c78e7d066
3e1867e2e7c9dc59e5499efa2254255e803d5bdf
6959 F20101111_AABEXN schipper_j_Page_103thm.jpg
43d163a386fd8a1cde1e8aff9e295556
7d4bccd8a70e886c48f22a2a402558c890a976e5
6917 F20101111_AABDUL schipper_j_Page_047thm.jpg
0b8ca8d88efa1a4cf36d5ffafff202af
a67159c294d27494109b250b2e05775bb445b8f8
24938 F20101111_AABDTW schipper_j_Page_105.QC.jpg
99af849835732d1b7ebc4158b5f15788
ed5cf367904365ea6df8191e202f6b13da350faa
6923 F20101111_AABEWY schipper_j_Page_095thm.jpg
1c072f617e84b54de3e95fb72ea2325c
db2eb114328274291abd17c37d6234074af0e668
89238 F20101111_AABEAF schipper_j_Page_042.jpg
521d9af3955ec25fc2699e7018885653
eec437daed638540943f7dc2a7c5b1af65860996
21569 F20101111_AABEYC schipper_j_Page_043.QC.jpg
ff691e02fe78d323cf786fa330d29aa3
1dbe28c26feb11ad39c10e7a7daafa13085e21da
6449 F20101111_AABEXO schipper_j_Page_077thm.jpg
fc16b47d888cd6323f217a790a24b353
a2a46eeaf67416bcb75777f8e62c387b6b91a942
29306 F20101111_AABDUM schipper_j_Page_091.QC.jpg
fc621baf19df1574daa5cc0474bdae7f
d9ab86fcd2afc72d45c6e33fa376e8b5d8e4aec0
6430 F20101111_AABDTX schipper_j_Page_083thm.jpg
9228c01775060bd6fae1d8475c112bad
932a3b5f550bdd5ca590e81368749d49e6d43f1a
6239 F20101111_AABEWZ schipper_j_Page_024thm.jpg
c90c8508d808b450a1a32129355157cc
9757f7be81ec8602d10da62e0d576e1aa54b4060
67998 F20101111_AABEAG schipper_j_Page_043.jpg
b4bf079898e60f379a8e8dd408f9c646
fdeac6752cc4e1bdc0d00d3070a394a119a08350
60210 F20101111_AABDVA schipper_j_Page_106.pro
43f817a3a7bdce4d5671391e4b6c3ad5
0a0f9c840cedc6709efcc885b83dbf6fb5dd7cef
27782 F20101111_AABEYD schipper_j_Page_014.QC.jpg
0713b66ea30c31e8459ef8596d2beb62
6964a3ec2d0f18f576c98e3605b695b53466cfba
28030 F20101111_AABEXP schipper_j_Page_083.QC.jpg
ffe6fc888a420da5cc04350cab872366
2775916921f5b8d34512d509f1b338036404a6f6
6714 F20101111_AABDUN schipper_j_Page_093thm.jpg
7d51903e35529e988a30ba848a74a975
35f9961d3fcca196515f73a67453b7d579ba099f
6656 F20101111_AABDTY schipper_j_Page_017thm.jpg
bd7d4141e3d05272e49b45729132b2f2
5c1d22e3c4df05c8c5441989ebd776051f7eb489
76175 F20101111_AABEAH schipper_j_Page_044.jpg
1546d98083d3b1a4ec451f6c102dbff0
da8f494b871a89621a9160fbc67f44caaa5b863d
2570 F20101111_AABDVB schipper_j_Page_095.txt
afd3f58295d9f1dc4d0d858ea475633b
f05fca14e8d9fbd67b0577d50192c26e6a783812
470 F20101111_AABEYE schipper_j_Page_002thm.jpg
3fe73ab707face43cfda160bd0fc215b
f760d297219cb636360939853be261714e5bf3cc
13534 F20101111_AABEXQ schipper_j_Page_010.QC.jpg
70ad421d4d2f2cfba3e900128c7f002e
2430b3d5f658ab38af169dd75ef1791759b87631
26887 F20101111_AABDUO schipper_j_Page_054.QC.jpg
bb58b1433cfa1c148a23a7ae42bd08bd
bac9fb695c2f61dbadde3618e1975735f0f7ab8f
83774 F20101111_AABDTZ schipper_j_Page_112.jpg
2bc930bfd24524701701e2fc04376e4a
4e5137b0ee16262be50f0f3d84b4c0a6c6d5d461
82598 F20101111_AABEAI schipper_j_Page_045.jpg
719a31da4fb85525e43a4ad37b269444
7994979abd82dd5277a8da506286fdac5a3b018c
80267 F20101111_AABDVC schipper_j_Page_025.jpg
ee80f9655171eca6a3aa941e8e5e89c9
0c44579830224271d5d61eafa4bf6ec2f8f814ce
29066 F20101111_AABEYF schipper_j_Page_119.QC.jpg
f9434ae9d840bfe3ae117012c2ade853
b759e6366e34c89ba4159c7cf6c5798f6d875ba9
5854 F20101111_AABEXR schipper_j_Page_061thm.jpg
2dfc78dc94c0b55f0614bf0ead635bbd
1c522a8028094c402b7b6a3c44d664fd177df9c0
5244 F20101111_AABDUP schipper_j_Page_055thm.jpg
a72f91a5543af6d0b405f1ab2615b6bf
bcf31a03400d508c68f50d264342d298e907facf
85978 F20101111_AABEAJ schipper_j_Page_046.jpg
f576dda1472d265ae75c6c889f9bb43e
bc78d69bf44872db302a28f539b036b5ab1fb542
49304 F20101111_AABDVD schipper_j_Page_039.pro
f16eca7f86bcf172a8a24d8fd762ec53
d34d8415ccea58fd4512cf43c70b14199a36aec2
28376 F20101111_AABEYG schipper_j_Page_023.QC.jpg
02eb31fc92b6ef27ba95a6d982dfb1d9
28eca367c7abea7d41a040df7eabcaf0cde83405
22750 F20101111_AABEXS schipper_j_Page_030.QC.jpg
684a3000a8e4e31a06b826e705384f5a
40f0fd1bc8e77b3cff798244ae85d5a9064a042c
53018 F20101111_AABDUQ schipper_j_Page_101.pro
508064efd6dd0dddfa87cdb2ee2e0979
74bf00122b9ff6319d420999429c192308e1c3c0
91486 F20101111_AABEAK schipper_j_Page_047.jpg
f4e0fb4f5e1ac29550fb4c1ef0818f5e
e67344749b97abc13638bca6bdab6b45f62daad0
84174 F20101111_AABDVE schipper_j_Page_013.jpg
b6d441a0c44ae5527251a69507bddd01
39a4f91a30f197935bfb036f135f7b6c749bb0d3
23028 F20101111_AABEYH schipper_j_Page_036.QC.jpg
70540f692445a4b5c3e981fabd5d46bc
10bca4848cb615374c5bcada0a063b5fb00ab8fc
28549 F20101111_AABEXT schipper_j_Page_071.QC.jpg
43768fbc8eb7a990b8f13ba0e849a09c
9d70bfd598bcd58781fb7df488c1b46a8ab46615
F20101111_AABDUR schipper_j_Page_089.tif
7db08a558212acc944529ad6cfc40ac8
d69ab0fe49326b2747dfe6af45bd5965431731b0
91342 F20101111_AABEBA schipper_j_Page_069.jpg
ab7a9976a7a3bccfdd0670eb4e8e4511
26b9323788be853db6f7b8f65fb3f4673f91631d
72812 F20101111_AABEAL schipper_j_Page_048.jpg
dfeeef2a00534dc8b7cdf3bf4643b8ba
b27ec06cbc630a46bac8ba4cd2ff0f70921942f2
1051906 F20101111_AABDVF schipper_j_Page_109.jp2
01898c622bc963909d4fd87e28bde1c6
742362f57ab5fc297b82e74fe395a938189bed42
6769 F20101111_AABEYI schipper_j_Page_050thm.jpg
0ec0a44072af69211bce726026262b96
c81ee1b508f204de3aa2f10ae356613fc75128b2
15989 F20101111_AABEXU schipper_j_Page_009.QC.jpg
97dc63c6ba74cb5473c717f0f6a5f3ed
db346fb838d3022c542e296b77d9c813aee655f7
20927 F20101111_AABDUS schipper_j_Page_055.QC.jpg
03111d508fd99fe74e9d958c15dcd528
d736d693842b55c13f2f5a6121c39e1f3c84634b
92934 F20101111_AABEBB schipper_j_Page_070.jpg
eafac3c3cf582cb767a14e0e3b503ee1
d012fc250d8986028e1140ebd9e5cd1c049a5e92
86070 F20101111_AABEAM schipper_j_Page_049.jpg
a87833ebca1a9681c3f6e80e2f125249
fe9b43e3884020ee53ee337e3cca9b7923b9a19c
577905 F20101111_AABDVG schipper_j_Page_124.jp2
25d4f4db3a81536c6cdb222ccaa76dfa
fb2835045465327c79bf319f73b701e0abb77041
6878 F20101111_AABEYJ schipper_j_Page_031thm.jpg
4ec5c5e82f3743d00773ed386aa8279c
847af38fccfa5adf27c0a5f44f7dfd3a2b722516
20964 F20101111_AABEXV schipper_j_Page_006.QC.jpg
f9eb2eebfedd978358dad6b111df5467
665ee245d4b5ba6fbce8e85c96cff394a0ef08ea
F20101111_AABDUT schipper_j_Page_079.jp2
e9e408bbdd6c9f3d96d425d43c6bfa19
cda416dfad2f327e9a4ec8c456ef9b3191312dc3
83731 F20101111_AABEAN schipper_j_Page_053.jpg
8f152da603191b0284362bbf3ffd4737
ee1003a495cad15ca6e347ee6ead4b1c5ff82f62
27274 F20101111_AABDVH schipper_j_Page_098.QC.jpg
64043849c8a9f07e877334719b5006e8
d61ab882cccf68f7ff74dcac30919a69834e256c
6230 F20101111_AABEYK schipper_j_Page_053thm.jpg
61b34988ec28daada3d2b7d0feadd675
27949ddc4b52381b79801e224683acd8eeeb2600
6086 F20101111_AABEXW schipper_j_Page_052thm.jpg
f8691b6270a853410a2dcd786ead25d3
98e555434abd023d4380ad1a1f4d0bc0cf9f460d
F20101111_AABDUU schipper_j_Page_077.tif
ed2312f01e44ac10bcd85a7c99c4caf9
75522d8e0b5ee6b4667f3a6bbe949539f7fe6375
89941 F20101111_AABEBC schipper_j_Page_071.jpg
f26ea025ffec44c9a56be0ab9202deb5
fd4b69ce8815d09ff2c439f6f3d59149a3fae4df
87001 F20101111_AABEAO schipper_j_Page_054.jpg
1626f1d5b620b60eb33697d532835b14
f764b044fa636e93b4e9c86ba3aec4b01cb3ed61
29198 F20101111_AABDVI schipper_j_Page_095.QC.jpg
f9301032efc29f2f3f738aeca54cf9b4
ec67843c26a60f61c3f016efcf5cff238b68159f
28506 F20101111_AABEYL schipper_j_Page_094.QC.jpg
556379c842c2f85392cece01768435cc
65438a7e6df847a77bcf74e16e76b415154f0e6d
6577 F20101111_AABEXX schipper_j_Page_079thm.jpg
2bd2cfdeed9d50cae15fdcbc7df09f76
050a53cca0bc9188090cae92411be037a011dec3
2135 F20101111_AABDUV schipper_j_Page_072.txt
b57326fca2e581ffacb87365cd4b53c0
3b0d9ae1908c1fd140d60ed3f1e21e6d35d4516d
90049 F20101111_AABEBD schipper_j_Page_072.jpg
dd7c993ef12f0a161437791f79baebfe
c4a194a1091ba8931a7868b36beb234eca4fa26d
66258 F20101111_AABEAP schipper_j_Page_055.jpg
8737adb38eee66094a72b5e8c202f391
1760bd3db8ff231066e27f0cee52c53198406ce5
37539 F20101111_AABDVJ schipper_j_Page_048.pro
1d1c886b2a143177d82c28cacb064cb4
38356f4c7083bf4ed7012749c39059c3676b5200
547 F20101111_AABEZA schipper_j_Page_003thm.jpg
6c32923079ff485dcbafd741024a1cb3
1bed1c971f0f93add2abf8f75e5df121269d72b8
F20101111_AABEYM schipper_j_Page_045thm.jpg
6ba8df342dba3da1d11323b0be039370
3376b03ee8e380b600f0f375c8b1c29a982c1702
91950 F20101111_AABEBE schipper_j_Page_073.jpg
46441a6c57f2facd297dbb32805d8579
7a5cf1eaa0ce68a87b97d4a34aec5a5baaf7938b
87904 F20101111_AABEAQ schipper_j_Page_056.jpg
2fdc6d3aa45d9b06ee42b77e5a483e30
14be7cf433491ff5353e26dfc2581dc933ad2995
7130 F20101111_AABDVK schipper_j_Page_033thm.jpg
2b960187bf13c6cc70a81c4f97c01614
29c68f0319e3a47a736aac11932194c5a407d4a5
5865 F20101111_AABEZB schipper_j_Page_092thm.jpg
d23759e7782f23e302a740e3abe25630
1a3a2950936893430f2be43198fd4a4b0da726a2
25938 F20101111_AABEYN schipper_j_Page_040.QC.jpg
feda8fbd9e53b7a040d31ec81d940d45
3a6d9ee4f800bb7f3156fc69d649059872a26263
3671 F20101111_AABEXY schipper_j_Page_012thm.jpg
4d146b0d7b1937b8bf589e033129eb8e
5f0cc92eed6337842ac86729dadb59b9f4e22786
93021 F20101111_AABEBF schipper_j_Page_074.jpg
15ed49e5a1498305717c4007df92139d
3d98ac62f0188dcfaec6f39e7228306f10783b4a
90565 F20101111_AABEAR schipper_j_Page_057.jpg
25286ebfab2880d6fc325fb2f7576eb4
06a727f5a69d3869e2a981f546ae74630e50ea7c
78640 F20101111_AABDVL schipper_j_Page_039.jpg
c335cfaddcdfbefcacc1eaa406d5a558
b4d716e4383443a0f331ddcaca29732e2c9541de
F20101111_AABDUW schipper_j_Page_070.tif
4a96e0a6197e5fa57ead2096628924ea
208d27217e73be2ed985f395a36c155579d927ce
187903 F20101111_AABEZC UFE0021958_00001.xml FULL
4d5fc0bd5ce11360c41f3081ed2ad798
c779823fb9bd4ca43c4f9b4ec4369e29fbe5465b
23957 F20101111_AABEYO schipper_j_Page_065.QC.jpg
0eb5c37ef1ba1a65487a17055053fb6e
dd2e0b20829d90b53305b10516ee33430659327e
6631 F20101111_AABEXZ schipper_j_Page_049thm.jpg
97ad7cc408957a49621994f9cea513a4
f66f377b13d9c9cfb5a528601219c60d938f7a51
71540 F20101111_AABEBG schipper_j_Page_075.jpg
98c96f1f5e8c8fc819d995df56619bd7
85d6816ad3e81a37a55036ac3bce4ca40749db97
13513 F20101111_AABDWA schipper_j_Page_059.QC.jpg
4f5a9e261dc30894fadf501d80cfeccf
10e82baea81d5f37f49d13499a053901007980f5
79985 F20101111_AABEAS schipper_j_Page_058.jpg
029b298da1f5826411dbf69aeabe34fd
9980cdc5d490de2f4e3ec6a0008049f4498d42cc
6925 F20101111_AABDVM schipper_j_Page_106thm.jpg
b69effac231063d3c821f4b531192a4b
b1a09311d2f7177a51ecdd656b4be14321987973
1051967 F20101111_AABDUX schipper_j_Page_110.jp2
193e2cbecb098d44b8a58ac9604d7a09
39720e76239ccd396311d7b61b454ed5cc49f642
6718 F20101111_AABEZD schipper_j_Page_001.QC.jpg
d767d136e1c6c7205dd71f005f4ded78
f4737374b48bf7f7e4fbac811039ad9d49abc6c0
7480 F20101111_AABEYP schipper_j_Page_121thm.jpg
5fbc5582f70c2d5bfff55ed0f7e75277
8d7e81a2e289795bd266bdf8c527281f13e57436
77080 F20101111_AABEBH schipper_j_Page_076.jpg
bebde3a475d7e167822d3d33489d324c
62c0cb10c5ee07bb0d7f33a20a16367849d99132
86993 F20101111_AABDWB schipper_j_Page_034.jpg
c216655515fcb7e09eeaf1f40549407c
39c2e2f61202151961a8668b121f06d9768cd450
42722 F20101111_AABEAT schipper_j_Page_059.jpg
f326657d238a6080077de96f2c75fb4b
08c9fd8d406e81da86f52c026edb741fe9ac7d0e
27013 F20101111_AABDVN schipper_j_Page_041.QC.jpg
f61f0323f3fa5dbfc3a3b2b9c31ed2e0
8884c8d98a1f253318cf3f255a7772c2d63ac4e6
F20101111_AABDUY schipper_j_Page_116.tif
1efdfc7890fd83eab0aad59eb73dac20
a784b2051d2d9c571d9af9f72faa1f2dae087df0
1103 F20101111_AABEZE schipper_j_Page_002.QC.jpg
230a94cd9010e3724a9d713d6631f97a
fd02af0c88446e170a951eb9c5945aa6535d0a7c
29587 F20101111_AABEYQ schipper_j_Page_115.QC.jpg
c4151a46f5750f50b6d8af3eaa11a273
9f78f301d2a117b93c0b3519449281b2c61b6005
89630 F20101111_AABEBI schipper_j_Page_077.jpg
3433e8dce772c4389fd84ff38058230c
8989cbeadcf8ef69217f95d19d2c817787b879d8
26176 F20101111_AABDWC schipper_j_Page_053.QC.jpg
241b4cebf7b660d1cc40f35ef8e5ca6c
2b2c45647b647ff29026e38893ca748cdd8b6f49
80851 F20101111_AABEAU schipper_j_Page_060.jpg
1c9d321eb416d20240eac4518a002012
4101a838b4bdc0e13227bf737607a8c03fd1faba
29163 F20101111_AABDVO schipper_j_Page_113.QC.jpg
24e005a2f53f1d8dd7605ca28038e958
67f67d7f78b88f526626d4a44a14f0613f82913c
2194 F20101111_AABDUZ schipper_j_Page_081.txt
98fab345201bedfa427aa659168ac4b7
a7a9da5ca8091fa244575ba73a11ac68dcca00b2
4998 F20101111_AABEZF schipper_j_Page_005thm.jpg
a68b69e3cd4b491fbc58c89a3a68bfbc
52a31612acc14a3b5764b0e9546a8b91b7b81f29
26545 F20101111_AABEYR schipper_j_Page_067.QC.jpg
23d1424ef7a6734034eec1773909dd71
d96ffe5ca90f955321a5508e656d874a31e9c8a9
90925 F20101111_AABEBJ schipper_j_Page_078.jpg
a9c5c5df9676f63edf41ca12016052ca
0ba3dda7e7b11a72ec21e51d072f642fbf7fc235
F20101111_AABDWD schipper_j_Page_117.tif
d39af29c1ed57f335a7d1487868673f3
f1f4d2d57e878f487ad53e2d75430246febdc71b
75069 F20101111_AABEAV schipper_j_Page_061.jpg
26354fadfd3641e5bb5419c70cf78269
1d54bfb16ad6605a263c27fdc34d8262c41bc9b1
5907 F20101111_AABDVP schipper_j_Page_060thm.jpg
0397adfb1f2568fa310688e77e406240
a0dd7da2efe549cde2fd6f9a02663bc34f4a6165
2558 F20101111_AABEZG schipper_j_Page_008thm.jpg
58dbcf437296b67f6ba3219bde4432ee
becd8172cc7268fab6c1ae6581050f7f8f3767ee
27807 F20101111_AABEYS schipper_j_Page_090.QC.jpg
b9fd597591717aa7f212313b7a4968d3
3bc1a614802965df6bb26a8e1263dcf894de17e4
86423 F20101111_AABEBK schipper_j_Page_079.jpg
3dbf15ef6f3dd9a68808b8095ffe6479
fc7f053daf4e9543d4d4f92e5a506508eea0c8f2
F20101111_AABDWE schipper_j_Page_043.tif
2436ba0ea244975bcf97605c0e10bf97
9f4cf94c57e6818d958dc4f3e98489b3506608b1
66762 F20101111_AABEAW schipper_j_Page_062.jpg
fa46c58253ce957da01c58c239ffff37
dcc497cc81d4f928b406e027c9d6581aee48007c
2156 F20101111_AABDVQ schipper_j_Page_080.txt
7e41cfe49541c3e22fe0656320a89836
374fb973b41727745518836dde3bb1c3cbb23491
4123 F20101111_AABEZH schipper_j_Page_009thm.jpg
e724e79dc9bc15b3573e7facb600519e
fb212ea923414651df1a6aa8f7f552e08a608a9d
7552 F20101111_AABEYT schipper_j_Page_118thm.jpg
6af5385b751e970da6a9138207031cac
b6cecf388f74c002fdca19d50d122837decb114c
89820 F20101111_AABEBL schipper_j_Page_080.jpg
c1527b840ebda704cd44280813342628
b4b2e9c6764a865f0c1a4817fcd273557221d565
29446 F20101111_AABDWF schipper_j_Page_106.QC.jpg
ac6a40bb9cd24c473637468ca090021e
41b7b7b5bf145a1bbb7b685f7493291fdd231f4f
75203 F20101111_AABEAX schipper_j_Page_063.jpg
6b08248fb8d7b42e85428b3e43826c81
46728add5a73aa73fd5ed46d2478c844e7b91f3f
84531 F20101111_AABDVR schipper_j_Page_067.jpg
d01effd0e3df8351be9e94fa1b62dea6
19c93f6afef9f253850caf2052c63820b3e2cbef
72756 F20101111_AABECA schipper_j_Page_096.jpg
43a63784876cc5fef2f679f116fe3350
99d2a8096510dcb4367606f3b3a944428c2beed6
24155 F20101111_AABEZI schipper_j_Page_021.QC.jpg
81126555df47ff21e544ef62d270a5e3
57d5119412bb095de2e9ccd551ccf25d409af8a4
29442 F20101111_AABEYU schipper_j_Page_033.QC.jpg
9977faf0ba39ff3ca11b6367c90fcf06
2a7036769592b869d35cd5bb8b4d301a92ec7937
89543 F20101111_AABEBM schipper_j_Page_081.jpg
bb9973801937ce91102e47e0b61a195e
ff97ec6bc1d156ddedafb233142405df76fec7bb
F20101111_AABDWG schipper_j_Page_108.tif
a07d119fa4207115e7c3ac6b4da64527
3e83fdd2d081bb8bb7325d9d2a2d67978f884d96
77307 F20101111_AABEAY schipper_j_Page_064.jpg
746900c713c888e540ece206eba191d1
b492b3c3916282e8beed85b15d4d256b41d328aa
2083 F20101111_AABDVS schipper_j_Page_045.txt
446d6b8a9dc165e803833e6d26d9efbf
53a6de8fa0bfe801db109198267af78d03c0e1a2
77954 F20101111_AABECB schipper_j_Page_097.jpg
09ce2d1aac6831c105dacea05596b2ed
4a62ac48ccf893e45cd79e0994c47d9e1776bebd
F20101111_AABEZJ schipper_j_Page_021thm.jpg
8edbce51265c98b260c5f03cc6e512aa
a3eb17b496000f850c250c1c345de9bbb5075482
6650 F20101111_AABEYV schipper_j_Page_084thm.jpg
50f0968277803b80835d0833a3b318cf
7cb112beaa79f54a7ab5cf6860dfcdae48d65727
80691 F20101111_AABEBN schipper_j_Page_082.jpg
c39c89fcd9c6c988faa6d8138b900002
129a60f057d1015c9ab9c7a6eeadb3b4308f60e9
6083 F20101111_AABDWH schipper_j_Page_112thm.jpg
aa685056df8bf673cf2e03645afc9f00
3207db03dca22b62657fbb7a850ce58f325d29e0
76488 F20101111_AABEAZ schipper_j_Page_065.jpg
fba3b9114756e87fa7014084b85fb99e
6ed0c6e4057ef15920c51bbfc9897debd53daebd
6600 F20101111_AABDVT schipper_j_Page_034thm.jpg
986266f644d77a4d9db75948ff3d1f0f
f7b7d62a331765fef46ed86b6e35478cb1c518db
87508 F20101111_AABECC schipper_j_Page_098.jpg
4bba696ca0b7206ae73c1aca93b4a8eb
014ba5b70ce79705063ed6fb3d0076f3fb3ac57f
6746 F20101111_AABEZK schipper_j_Page_023thm.jpg
2474cc07ca8455bb2b4a114cef74ee13
1417bf672c66147720eb09ce12cf3b5fb6160848
5656 F20101111_AABEYW schipper_j_Page_096thm.jpg
01bbc7fc1088a588d76ae90593db70e3
bae0042bdbaa788b04aa1050f31dff709ecf9fee
89564 F20101111_AABEBO schipper_j_Page_083.jpg
eff5e01bd83f820b1634d111fb26d1e7
38e17b7bd98eabbaa3e00f43677933d2d345d296
1051944 F20101111_AABDWI schipper_j_Page_067.jp2
88fd9dc4daeaead3385926b8db12baec
b9a6e8be852f63dc91a0c5159a3bd76250c4a8d2
1051972 F20101111_AABDVU schipper_j_Page_017.jp2
34973aa3a89f2fecc2a7f092e147ea3f
4254ed337e3619b850195400bff3c1e4ee27569b
6421 F20101111_AABEZL schipper_j_Page_027thm.jpg
517015fbddcfff523739f2e37b349e13
fda55f116c3818e8671c8c831b620713cb02c9cf
F20101111_AABEYX schipper_j_Page_013.QC.jpg
a69ed8bb162035baa62ba764ac4a1275
9fea0413fb441cb62aba8115b18e7044eaa78206
87574 F20101111_AABEBP schipper_j_Page_084.jpg
fd0ed7792f66d79fe1b14a58fe6a597f
0b76c0f9246503b9e7bf8e46399a5e2c4779c64d
F20101111_AABDWJ schipper_j_Page_121.tif
0dbfd740581ac51b4370ea72a7551c41
f696341f5ef92cabdb8f3fff2a4987e3fa0b0b55
63746 F20101111_AABDVV schipper_j_Page_113.pro
813fd12a73b46cf084c0fb0597d73796
54409b300dd8f80da579ccbca7e75396e90ad136
82067 F20101111_AABECD schipper_j_Page_099.jpg
fffd26d5673ed97e40213e09077acfeb
b5e589d79f39f1c2b94b229c49204955a8b258d9
27925 F20101111_AABEZM schipper_j_Page_031.QC.jpg
81d7bb9773b09e34d3b0614d8d6e89ab
868802c03de4a62c6026867ce5557e1074015d68
6289 F20101111_AABEYY schipper_j_Page_051thm.jpg
213f9080d020aaf4fafffd05ba95fdda
7d8236b86894af54091f877f492635cc1b933177
80497 F20101111_AABEBQ schipper_j_Page_085.jpg
75ea55f17b26daefa43be2081a3dc173
8999199ee089197795ff9016dbabea3aa8c8d86a
1051936 F20101111_AABDWK schipper_j_Page_101.jp2
b8bf8dbfe362600d679e0e77f793f6b0
3c227a826d45b5766457e25121c1515680c452cf
6482 F20101111_AABDVW schipper_j_Page_114thm.jpg
cf552a524755ced2258a7fad32093f61
4c7daac9844ace49c1e53c202be8ffbbe2edb4ab
85595 F20101111_AABECE schipper_j_Page_100.jpg
7516a0f568525665b591e1e6f822087d
105445ab087b91fdecd075c090575c8e84777e9f
27712 F20101111_AABEZN schipper_j_Page_034.QC.jpg
2f56667555a142233ce2e8e765023e1c
860efd35a8bba926b6ca97c41be905e44ed809bf
88801 F20101111_AABEBR schipper_j_Page_086.jpg
a5f605a7800b6643d8ec4da1011ac6dd
a1031be740e558ef268756778b2082af3b4aac54
965322 F20101111_AABDWL schipper_j_Page_048.jp2
e4b2302131a2169163d3e7eeaab3a4ee
4a6ff500d2e9ef1a1fd4e97abcde5e9d88beaef5
89193 F20101111_AABECF schipper_j_Page_101.jpg
ac910cdbf7f9ef79b2d00e419c01fb7f
c3434bf827f069b9664642269142cedb0e3331ea
27529 F20101111_AABEZO schipper_j_Page_037.QC.jpg
07483ed4424eb2387f78605f9e8fa91a
99742be69b245a5de91e6c1a1594afe6c2a042bb
26455 F20101111_AABEYZ schipper_j_Page_056.QC.jpg
bae7e587978a2ac3c461b3fe4b9f78cb
599178ca226fb97c40f6c8b9ea3c84908b5ab30c
79165 F20101111_AABEBS schipper_j_Page_087.jpg
fb7de656c8d8e514e6a5eacc5ec20852
d820b4f6ad274df67015b641ba08dc914ab7d62c
F20101111_AABDWM schipper_j_Page_032.tif
650b6cf253850189909df7af92975447
d8a714b435437d97dce64aa7b9fe36ba533f7b61
6569 F20101111_AABDVX schipper_j_Page_101thm.jpg
d99976f2bc93f58ff76cf614c06dfd3a
d75bc9703df1ed1e849339182afea166182293a8
89982 F20101111_AABECG schipper_j_Page_102.jpg
5a45278a2ea4823e4869bafd732981b8
d9704d276ec39681193d5b2f8793368a7473ace9
6301 F20101111_AABDXA schipper_j_Page_054thm.jpg
4d1b502533e1d53130cccf6ef2144fd1
d526476a2649b7c2cccdb52a43ee7b517a5a1dfa
27129 F20101111_AABEZP schipper_j_Page_038.QC.jpg
8c11f4dceda44848249960e72556f7ba
5abecb61d99d4e9224d9dca587b57cf1ad74a46e
F20101111_AABDWN schipper_j_Page_084.tif
b23eeb9146b09c2a5ca98c1d1bc3003e
d73459ca2a52e626532db43d416816e6664f6c56
F20101111_AABDVY schipper_j_Page_064.tif
9f58bd60cd2e8fbe2eb3aef8d3fc785a
106a1d0d40d9c1f9cf9a77ea266838574a62804e
92173 F20101111_AABECH schipper_j_Page_104.jpg
a7b4915ec058346197877aed3497c6b8
6694aea4661de4563c45154800e6d0d2a835b332
19939 F20101111_AABDXB schipper_j_Page_005.QC.jpg
e7b00f9f1f623485d9d6552e42d57594
25c708d421e2e363b41a267bda1b5d49915897ad
89831 F20101111_AABEBT schipper_j_Page_088.jpg
1e29d57290c74cf374b4688c48679b12
9e8ca5f2500858b0070cae49aff501657272bf9f
5600 F20101111_AABEZQ schipper_j_Page_043thm.jpg
7fde48c42cb49261e930065416adc9a3
5e7250af462d296e9aae448008b1a35de0e16cbc
F20101111_AABDWO schipper_j_Page_087.tif
f2fb6e6f75d47fa2ee0726622a4d259f
c0c089068956bd6c6a0b14c76f76474308b93c66
F20101111_AABDVZ schipper_j_Page_036thm.jpg
2ea82e518035eb7dbb1e4e35e8b328a6
92f6bee365039b849c25cacb2af33eddd1977df8
80304 F20101111_AABECI schipper_j_Page_105.jpg
3528907673041f13122e9e42d53b5af3
3de2147abc665a73d9b415c3e388979e1978b130
56244 F20101111_AABDXC schipper_j_Page_057.pro
2993a7b8c89a31ee0d0543649d8e65d7
2765c928680bb0e222f73bb27339e45f520a2e39
99504 F20101111_AABEBU schipper_j_Page_089.jpg
7a14aa3b265748f2f7cb71af0cc113ed
e792a1e552a18edafa9797cd5b65887d13af0c45
F20101111_AABEZR schipper_j_Page_044thm.jpg
4a9b812fecd1fcd3682c2fff1f1227a4
d5f45417f2e095d478170485073b588c33a58abc
6640 F20101111_AABDWP schipper_j_Page_041thm.jpg
14cdaea9cd2f7bfe76a6d3b4469e0408
739fd3eb21c4684dce017eda8a06b85f1c7f0cdd
90360 F20101111_AABECJ schipper_j_Page_107.jpg
165dd7e49f17e11ebc3759cd4ba493a0
98065a0bab1027f62db719a1edf6ef21bd9d78ea
96102 F20101111_AABDXD schipper_j_Page_103.jpg
94beffb6103328e0bef1c8706c94650f
56670102147ab8d9e2f5a3072170cb6d918629c5
90005 F20101111_AABEBV schipper_j_Page_090.jpg
aaa9b3ee69a060ac58d9eefc1fda94af
3a75cb4fb15c009125352b2c37c7baf918d48c38
28107 F20101111_AABEZS schipper_j_Page_050.QC.jpg
f4ac3fa03959422bc580da407bc2dfec
2d311334a7703d557800f844e77b0f305e07bb45
91904 F20101111_AABDWQ schipper_j_Page_023.jpg
aa9961080106b04c5a157f444e440360
061acf1c655240924ade7ac88f0067e89c40e1f3
81915 F20101111_AABECK schipper_j_Page_108.jpg
5661d98dd3b810d61b75b6979debd38d
d5770bc710848040054791b2906da1b5c6d39f74
55144 F20101111_AABDXE schipper_j_Page_081.pro
6d45b1e1503a235c115ac1aead0130be
e379caa8b40c3c53e3b900dd056ba8ae651e5ca6
80957 F20101111_AABEBW schipper_j_Page_092.jpg
70ea49d78a689224479aa76d95a0acb4
caa1f88542a7c1418f1d3e2eb72aca964e412bc1
5720 F20101111_AABEZT schipper_j_Page_063thm.jpg
860e78203f85aa87d428caa6163c16df
5b6fb1d290be7a8f2321c18fa844fd98cb8d9c7d
56681 F20101111_AABDWR schipper_j_Page_047.pro
7c8e8f3897c9ea590b3e5ef4fab0444d
04f9421b391020273c769de63bacef3940a18910
28962 F20101111_AABEDA schipper_j_Page_002.jp2
2ac99c9c505cf3f22c05e5c9cfdf3316
0668f89ec55add3d629da9c579a188b97dad3e5e
98498 F20101111_AABECL schipper_j_Page_109.jpg
885fc029f525d30026ed601ce6c1d985
be57fe70094d52422114633b1d14110d9f78706f
4831 F20101111_AABDXF schipper_j_Page_026thm.jpg
e154667f928087bf1f51e9494ea44403
fd2b2c6dc1e6fe195ff0ccfaf0a7d111fd78cbb9
90235 F20101111_AABEBX schipper_j_Page_093.jpg
3fd469d467b17c7be198caf9a51ac1ee
32fcb8680e3a79b1ed81632883892d2dc8c280de
23509 F20101111_AABEZU schipper_j_Page_064.QC.jpg
980851449820e4136d35f5bdb7938e9b
cbf605b3692f2b2ed481b29c2b14a95eeb342e3c
F20101111_AABDWS schipper_j_Page_080.QC.jpg
b4a7901a480615a0aecf1c0cce1b2736
40401101931bf3b13b9e2b7b3e1b43b47deb91bc
53481 F20101111_AABEDB schipper_j_Page_003.jp2
dc91993e1f9914eec1dc63aab9ceab0d
69cce0586ca44c82e8c2fffc13fbedf92347c414
96978 F20101111_AABECM schipper_j_Page_110.jpg
609d196ce5c9ee13c3a60dc10e598437
054cd2717ce7fc44307ce6a77c7f8acc60f2d0bd
1051921 F20101111_AABDXG schipper_j_Page_021.jp2
82043221b1cc60462fbb71d8a1bdffbf
469cfb9a5aa0392ca927885c83afedce9050cd97
96048 F20101111_AABEBY schipper_j_Page_094.jpg
6d9da128c10eb95bc7c5c6c365bfa3a4
b2dacf847094a9f9c3921ecf8efec55878288266
6263 F20101111_AABEZV schipper_j_Page_064thm.jpg
9475a592081e68a869822af87cdc73c8
f74ba1783bd180796c2fd4ae77059d56f92bbbe1
10574 F20101111_AABDWT schipper_j_Page_004.QC.jpg
fc3e12b7f5248a63590de2d69f7d9fb3
6287e6c67325004f4a44dfd37824f97e1b276d47
423400 F20101111_AABEDC schipper_j_Page_004.jp2
3c77f8f61c5d97c7429c02209ecd601e
1e23618f2c3bcd7f0237f75c72ce8900c5a01e42
88140 F20101111_AABECN schipper_j_Page_111.jpg
aed87321a709295dcdc10b782955ced5
ac8ad401823861c2b55373e82ac06a587bb6366b
1051982 F20101111_AABDXH schipper_j_Page_025.jp2
0cadaea6350259697ea625b78aef1e49
da229d6d107dcc944cb3ad7272baeef3e24b64d0
96572 F20101111_AABEBZ schipper_j_Page_095.jpg
d32f5aa101df19277333e6fdb703c43a
6926bf7911e3222343fe4f8c45768faf8dec412a
28192 F20101111_AABEZW schipper_j_Page_068.QC.jpg
587a5d5e919bf2b4efe477f2bb66bcdf
cb9d8689ceffab6561e2f26f0d794549c54afa10
48978 F20101111_AABDWU schipper_j_Page_085.pro
18bb7a5be9ab55b94c68cfd41fab6d29
faaf0805e2624579ea1754cc0e9b9f9fdfb8b994
1051981 F20101111_AABEDD schipper_j_Page_005.jp2
559648c256288df488907dff86dc0577
cbc32447d806c50b62a1cdb0f66715b78d4cf54b
98823 F20101111_AABECO schipper_j_Page_113.jpg
39147ca84a2cdd533c3ef8953383221d
ab59873aadddef9b0b046004c9b7de9ed5d53ae6
F20101111_AABDXI schipper_j_Page_068.tif
0a9b1ea0613cea3b69a45bb6efb9877b
84a10257e3565db7bd893cafce547f88cbd4c22a
6525 F20101111_AABEZX schipper_j_Page_069thm.jpg
8fc755b4d9e54cbb6c55ddccdfef694d
98e349332108b82e69e7eecb226b14f8102077fc
F20101111_AABDWV schipper_j_Page_039.tif
29871c73d4fc1618c37c76acd4ef3029
bad3c0b4c20693c960ba3993c909b6788468d634
85230 F20101111_AABECP schipper_j_Page_114.jpg
f89ba8b0b268f76fd28bd6fd4ac79435
a02a11697471bebac0161d06c48a4ddad15c3151
29304 F20101111_AABDXJ schipper_j_Page_047.QC.jpg
b059df92b0c60d4f0fa1f6d607ac41fe
277f2aa86b53c8d1d46796ca633082d7edd27bd5
27753 F20101111_AABEZY schipper_j_Page_077.QC.jpg
040d9bca11cc38cab5fb376d99147c7b
5cdee9b9ef29ad9fe5c0d2fcec042bf93c57def3
1051986 F20101111_AABDWW schipper_j_Page_058.jp2
0dee6534f094678a6ad57a65b1e2cef4
280523067a7e4009081808c1a4df2b905e32c989
1051968 F20101111_AABEDE schipper_j_Page_006.jp2
8f5cbdc2bdd28545303e1e73bceeff5d
6fe46822787a0379a1a70cce69f79b799a80dd04
92803 F20101111_AABECQ schipper_j_Page_115.jpg
15f2870a9b7d91a69f20e20f58a5f8e4
eca70ac0f6c97b7496efca83997c5b3595cf1320
1965 F20101111_AABDXK schipper_j_Page_108.txt
8423363ed052983612910e39f7950391
193b65d47a15c04d46420c3d52d022afe857613d
29024 F20101111_AABEZZ schipper_j_Page_078.QC.jpg
1a936de128033e1d3c31fd7dda386a6d
fd99088fc858d1e6bca45955437541de36ed5c04
7329 F20101111_AABDWX schipper_j_Page_089thm.jpg
aeeb5ebe962b61ee3d0925e4c3781f33
ef6bc35131b90a9c135055e03d69f65404ca4fdc
F20101111_AABEDF schipper_j_Page_007.jp2
fc6db4e5b0cc09a2e8e7045566c133d8
f6c86336f5da24b73fda96f6e131aacb56cc11f2
91639 F20101111_AABECR schipper_j_Page_116.jpg
20c7b5f3a7461549e336dfae2c211967
48fcb8a2c058c6b23f71c9b22a427c49582cb17f
90243 F20101111_AABDXL schipper_j_Page_068.jpg
158509b81192af41df63769149da6e64
f1434f6b9f2e444b51f592da12c59ec8339a8ce1
800686 F20101111_AABEDG schipper_j_Page_008.jp2
94eba440e0dbec0fb17c6f56243f207e
694b7ee410266454ed1499514acb26d0ab28004d
6223 F20101111_AABDYA schipper_j_Page_099thm.jpg
333975e798e57b0d678e130bf4f56708
40c49ef99a0f1d6554a60fa7c7a414d884c5d4c4
125413 F20101111_AABECS schipper_j_Page_118.jpg
fb3119f96b3a70c1123466494fa4f283
4e1b1f35eb8bc0f418c2abeaf83b1ae8c873dd95
6620 F20101111_AABDXM schipper_j_Page_080thm.jpg
cb94690ff32f7f80fcfecf918b7528d7
bd7bf0ac51cdd7631bd5ce1bad423006dd1360f7
5450 F20101111_AABDWY schipper_j_Page_066thm.jpg
cf73f899230b8d8a100e160383790a35
c765677b1e5b7d5a56c19437b28a90c652464022
1020840 F20101111_AABEDH schipper_j_Page_011.jp2
9e9f8167107cb988ae72217ca036e9ef
f221cdc0f1c5b15b9daaddc340f5ea5ba752aec8
48507 F20101111_AABDYB schipper_j_Page_021.pro
7e625046f428a16ab54466102bc03e84
2c476aee48966af93565c81c4dda760c2979af57
105623 F20101111_AABECT schipper_j_Page_119.jpg
e064c2d26d9bf4de1e529118d5670732
b1f094d8291edf4cdd59fe3039e9ef3d4b5a3bb5
6564 F20101111_AABDXN schipper_j_Page_046thm.jpg
069fc1d91764c799f20fba98cbf7e28a
7c1870f0af24e26bf0c6c19172fc7a2fc4e2d42d
1053954 F20101111_AABDWZ schipper_j_Page_020.tif
e77fcea9c00df6f92f1c684c0805af41
3e882332ad0c3d62e89f71bc73de168bbb817419
658900 F20101111_AABEDI schipper_j_Page_012.jp2
287e5581654b87dd53a26a3d3000538f
6247c0e5bc2223ab688b610e26041efcb404042a
6928 F20101111_AABDYC schipper_j_Page_070thm.jpg
263df607592bc7e63bc232f3a44d7408
6d4237e14f54f66765b2155549144fb9b6b471f1
107725 F20101111_AABECU schipper_j_Page_120.jpg
674aee5fd4910632519bf35c7d7242d2
4346fe9c9564493d7886ea8ff5547b8fc4515114
F20101111_AABDXO schipper_j_Page_074.tif
ae0bfec47d735b93f9e108e0dd716165
0880ccb24c2b10afecee0fcb636514c44196a24f
1051956 F20101111_AABEDJ schipper_j_Page_013.jp2
e8ac9f535eca7345a1c52774233bd0ab
a53b3d7a98df2a407ff0945bfa70e3d610fae5af
1051940 F20101111_AABDYD schipper_j_Page_080.jp2
c3d0fa72f4f180aaf4643eec2bdf8a7b
6a3fc7da1de43c03c96ea6b5c61d8e722e7481b7
113081 F20101111_AABECV schipper_j_Page_121.jpg
284b92d5381011e1295bcfdd8a452e55
fddfdf70d2d08c2a22835dd7678a285378a0f778
6420 F20101111_AABDXP schipper_j_Page_111thm.jpg
44a0910323be1b19660d60b0c6c46514
aa0353b36d06866c77e95274406fa9792558b2b8
1051943 F20101111_AABEDK schipper_j_Page_014.jp2
40a35c3756a25c8998c0b249e34b4fb9
4df7981e9a865143aa569fa7fa6ab6d7dcd41b0b
55582 F20101111_AABDYE schipper_j_Page_094.pro
3dded9460d12aee3e7b629f80c92e93a
a27128ec6781c50c162424baed8e43b54a9cca02
118332 F20101111_AABECW schipper_j_Page_122.jpg
4f9c2190810ddc9e1913bd3c85cc27e8
b08c095b3b4d3c9363c07907cc5cf158ba467840
97754 F20101111_AABDXQ schipper_j_Page_106.jpg
2c3198ae5e41af9318ced9cf64a35be6
981689cbc3b061960e9ab586717e21a53efc72ba
1051965 F20101111_AABEDL schipper_j_Page_015.jp2
3c85464df114bfed377be1ddec131897
de0282a4e8e69a0f4fc61ebaa0fdb201566fbce4
74588 F20101111_AABDYF schipper_j_Page_066.jpg
8cdf7d59da9309db4d6b3a8a6762dc2d
9aa0f263b8b42ba9c56a1603144c1e22221ef417
69421 F20101111_AABECX schipper_j_Page_123.jpg
c78d5c65f17ad918ea82a5778edd30ca
5e90eb7a3fba7417ac798c12e928ee3a81c8fa24
2191 F20101111_AABDXR schipper_j_Page_086.txt
ac3bb38a43e8ba5df925478a9d4fd6c7
f3a8bd1dd79f3d9bca82111bd507e5cbb6aa370e
F20101111_AABEEA schipper_j_Page_034.jp2
cdda620ee9c297e2eb65e1da9eede818
cc86ac85ff5a499f3815ade6558614e8b7ef65df
1051975 F20101111_AABEDM schipper_j_Page_016.jp2
1b5a5508a732357bd4986b9b5fef4439
8a412ba732b59c7ff08fbd63b7885c27b05ba188
F20101111_AABDYG schipper_j_Page_045.tif
393b1f6225b01acc8e9d51ac16e3e5ec
2a969a27bf439d9b1a5a1136e2be9d952f7e8817
45060 F20101111_AABECY schipper_j_Page_124.jpg
8ac8829e908a44777a97e378b792137a
0e127dfb451f4fe770db3ff7ea18686f8c274044
1051963 F20101111_AABDXS schipper_j_Page_041.jp2
9f736e910401423323b836c6025240a2
9d47f14fba52aa2bc78d00727e74a65399aa1b46
70154 F20101111_AABEEB schipper_j_Page_035.jp2
96d4ad88a65577b25bc982db554069f8
9544f5ad8a420005dcd53fa22910c0130c9f2ba6
F20101111_AABEDN schipper_j_Page_018.jp2
eb5618d6058e12a05d08f7e76652f3b0
5f27040b0c3ca510a5a91a3d8f68f8da1b30ff07
25009 F20101111_AABDYH schipper_j_Page_087.QC.jpg
b7a3aeace5747d0c0afcbdf20a767648
bcc9c2a2b9ba4948be8f9c4e6b04417a9d37e27d
249954 F20101111_AABECZ schipper_j_Page_001.jp2
1195df5b81898d147063364c597d6ab5
adfd8127882f59c36b75465694a00c0b3a8b639d
F20101111_AABDXT schipper_j_Page_009.jp2
f197c4244b66e0f4b919cf6e352370e6
ac0b44cbe2e8335ec7b3a7ec216bcb2ad24d20f3
85165 F20101111_AABEEC schipper_j_Page_036.jp2
f666b09f955b3552e9c4d82d68d69f93
5a152de8ea324d05c1c44b5643c547a0429d945b
1051959 F20101111_AABEDO schipper_j_Page_019.jp2
0eb47d54276d640edb5d50cbe702418d
2f5d9a5f8530f515b8dea0ae80047a0290337726
27537 F20101111_AABDYI schipper_j_Page_111.QC.jpg
0c3cb1b0a93a7a17b0fcae067295009a
f689a5955c63acca89b8884bcbd27a318c142c27
F20101111_AABDXU schipper_j_Page_057.tif
81dad5f4c77ccded52c22c5bd44f6f87
af0b6e9fa4f6874960c6fffbcddf948837b113da
F20101111_AABEED schipper_j_Page_037.jp2
1bec4f4c2a5a7ffdbb80dc07491866ea
f740e9a367a4b97176269690a2acebbcf270822a
61054 F20101111_AABEDP schipper_j_Page_020.jp2
8af6d98e2fbf68c7033354a720daf327
3d9c5f232e85a07afbc8ab5fbd9db53c31bd45d0
93879 F20101111_AABDYJ schipper_j_Page_091.jpg
099263cd2f217912515b458bbe8e75dd
3f74833cc5298b5567da479e04015535c4d8526d
2224 F20101111_AABDXV schipper_j_Page_063.txt
96dbfd0f8908ddaff6876b27a44b4b88
5ec8b5173a236db41a6cce2d8e28eecf675fc563
F20101111_AABEEE schipper_j_Page_038.jp2
b124bfb514ea71fed05fcad0a9de21f3
0a399df387c34ff0466b9e965dc2f9a1368ca62e
1051934 F20101111_AABEDQ schipper_j_Page_022.jp2
37898b1a649b9dbb874418edf1f5b96f
cf0ca425258408237e9c5bbfae0db4bc74ab84a8
26801 F20101111_AABDYK schipper_j_Page_024.QC.jpg
c8fa36cedd6074e6adbf04e2b6672ce6
f724e9ed9e22544419937693fcabbb438bcb0681
28251 F20101111_AABDXW schipper_j_Page_086.QC.jpg
596aaf72462409da07398958805edf7e
769b05c771d9591c9c5f96a977c729afd64319dc
1051891 F20101111_AABEDR schipper_j_Page_023.jp2
9fde8cfb5fed834f36619c40c9e4eceb
ef9f6889a1536a833fa21b2a9490d4a1adbffb2b
86215 F20101111_AABDYL schipper_j_Page_027.jpg
b2175130f9580df16cb5d3345f25e193
a9ac7a6fb23e66426994564a998608beb7d968ee
22635 F20101111_AABDXX schipper_j_Page_096.QC.jpg
f39b9cb979359ad27ed328bf74348ae9
8426bfc3cbf641ece9974e00dc52b0b6c300f528
1031014 F20101111_AABEEF schipper_j_Page_039.jp2
bb75a1e17af1715a1b68db6d5973ec67
4f8d321ac600508c0f4636170ea6d01243286692
34345 F20101111_AABDZA schipper_j_Page_004.jpg
c5dcdf46ac233cadfe086171563cc800
1718860b87c966e59efa44e5c88b1c2e77761448
F20101111_AABEDS schipper_j_Page_024.jp2
b7171847331c70274c1e391311eb5db1
a18179572f39b0b420e982907e141d7662ecbe6c
6799 F20101111_AABDYM schipper_j_Page_088thm.jpg
3d0a9731efecdf320354589a8b2584b8
14f99f91eaff546c378f0744e758dbd9a0fad92f
47983 F20101111_AABDXY schipper_j_Page_061.pro
60cf98dea6e9ad4d580e1b6ddf985581
ba3861b8f184deaf9b391bdca42aabe83e499b41
1051964 F20101111_AABEEG schipper_j_Page_040.jp2
33ca19865325737d0b893806d58387a6
6c2c58660591c35793e63ae65c856d44ae66b50e
90978 F20101111_AABDZB schipper_j_Page_005.jpg
493504b072f269faf8f8c1ce65dd8fbb
684f988ba7dc7533bd5b3dff6068bdb7904b79c8
917975 F20101111_AABEDT schipper_j_Page_026.jp2
0dfce7bff8fc606cafd1173b81332fb7
5766462d5a4c697ebe112772a67802b075a23bb1
F20101111_AABDYN schipper_j_Page_110.tif
7389413b89c8096593564f91e88b1ef7
025cba721bd070ecc3aae6f03a6e9fa86fc73554
1051984 F20101111_AABEEH schipper_j_Page_042.jp2
8091b9fc0ebe3b38b2ba421dda04df35
98849fd4703b85b9d9fcbbbf71f27fb3b6ab2853
95170 F20101111_AABDZC schipper_j_Page_006.jpg
1053f08f6894cb032f72070cbcc67f78
7b70f4800bb4099b46083df283e6471e5037ec6f
1051876 F20101111_AABEDU schipper_j_Page_027.jp2
26ff0cd6737604a7c0908d7c47c72fbc
ca7b6217852d7cb8018b3bda419e689967ed80e9
6833 F20101111_AABDYO schipper_j_Page_073thm.jpg
2b7230e44635efa601514c4dc88f3c21
2a3dca603e8d96f0b6a56fe5f644a6209fa01064
23930 F20101111_AABDXZ schipper_j_Page_039.QC.jpg
0f88f7bebd61667ed044e11bfe0dc05c
5a53122ccaa32ef29cb28cb9eedf70a4c3fd93f3
824457 F20101111_AABEEI schipper_j_Page_043.jp2
e81ffbf148e887774a2b1295759d029f
7dd6d055c9a67998dbc666658e60b3dca05104a6
84243 F20101111_AABDZD schipper_j_Page_007.jpg
c61ebd41b1cb6e8fdb21abf3b6a74c21
5c9ae5bc33619f029a9ee26f810b222df36d9bb7
55285 F20101111_AABEDV schipper_j_Page_028.jp2
1ea908d0e4205bd240a8debb4a8088f0
e47bcdda32f6fde40aade120aed958788efcb924
28579 F20101111_AABDYP schipper_j_Page_073.QC.jpg
5936c7ba3f3fb01faa1598565597920d
d8d0c20616a5f32fe47d1c575631e9e1be758714
986519 F20101111_AABEEJ schipper_j_Page_044.jp2
359ba789a2479180d446272991c1c9bf
b64b6bd8f755e8ed324f7265c6b8ff82bb97ca1c
34188 F20101111_AABDZE schipper_j_Page_008.jpg
71aba4ad4e398e46e80ff1ddad592c52
2549bcd9f7f33a2ff5368d90abc1fb263b67ec45
1021692 F20101111_AABEDW schipper_j_Page_029.jp2
5f3a9909b42c704143151f8c142fa454
10e2b3ff04b0e06f65fd0c3c67ed9545b1b95676
1629 F20101111_AABDYQ schipper_j_Page_035.txt
c224228603cf73112fd442ff0a063d48
276a2babd0d0288b00751355b5868dfebc6d6079
1051966 F20101111_AABEEK schipper_j_Page_045.jp2
3fc8b66b9386ad6716707328e022e786
749e911618919348ecfbb8c1de03bf48e04c55a6
59464 F20101111_AABDZF schipper_j_Page_009.jpg
f94e163148f8e8294741d5ed8ef41723
8ec2c78bf251a2adb9ef191d61f00a64234b59f7
911900 F20101111_AABEDX schipper_j_Page_030.jp2
97ee16bc3850c5bbb14801717cdd795b
b1f9c1461477924df930d469e4c53e32b98d78e0
50271 F20101111_AABDYR schipper_j_Page_117.jpg
07cdc9d18c82dabcb90a47b6cc4f9987
59947652049c010f74cbfa50f2aa5f4994ae2982
1007433 F20101111_AABEFA schipper_j_Page_063.jp2
41a368dcf81684c130ce64b79c1c1c24
0752470c85200225883dec7fa67de4ee93498684
F20101111_AABEEL schipper_j_Page_046.jp2
3dca9654abadcb42170a12bb48b7816c
72e00932037b9cd1a70326335c168fb6ab05f730
39719 F20101111_AABDZG schipper_j_Page_010.jpg
be24edb0dc17bfdf77594288b9981df1
8d4bc82cdf52f8aff45ce3d773aac35183ea196a
F20101111_AABEDY schipper_j_Page_032.jp2
565147bb90efa3815adf66edc1b7a560
a850558dd418188d925d0992f7ced64468c9380a
49780 F20101111_AABDYS schipper_j_Page_028.jpg
beb2f1859b998190e718d390d8bf27be
227b90a34f3a0e66f4f87ed049fe08937ca683be
1002660 F20101111_AABEFB schipper_j_Page_064.jp2
80aa151a5d2b8569fffa22adf6ea7f8d
747ba8c0cbee3091a37122e5ad96968fd202f570
1051932 F20101111_AABEEM schipper_j_Page_047.jp2
bcdcb2e480a04cbacb963e7cd4241937
1d9b71ac8aa8d3b2987bcdee4e983a142eb83dd4
77167 F20101111_AABDZH schipper_j_Page_011.jpg
e230765f71dd8077b0a5bc4f2efbf905
37ea009b0f8efa3e3ad02793333d38eeb19991ab
1051927 F20101111_AABEDZ schipper_j_Page_033.jp2
d262460137ea5c18a4562ef6e662e959
89040a59eecb7812881c73739c3b81ae684a6965
88538 F20101111_AABDYT schipper_j_Page_050.jpg
06cbb8c3d5b8cd8b23bf56d3c7b8e30c
2a85483839a8f89fe602c07f94b64eb889bd061e
1015257 F20101111_AABEFC schipper_j_Page_065.jp2
0369c923a41683cf0c4add541fc03cda
da5f1a0da975d53b48e1136859448ebb28b22849
1051942 F20101111_AABEEN schipper_j_Page_049.jp2
baee928343c51a9597993edaad033cb4
7822872053678274ad77cc0903abb02255b3795c
49623 F20101111_AABDZI schipper_j_Page_012.jpg
45560b46ce6de320d3c8c3209e27e80d
afda83e016684c50ea92e69ccf450a7df138ddca
145093 F20101111_AABDYU UFE0021958_00001.mets
1a862f28d53c26cfe0314a9b4025baf3
5fcdc9903bbfc7662f318fbeddc09848dd898518
978629 F20101111_AABEFD schipper_j_Page_066.jp2
8dd5b043f7aa90aaba487c927fb9879d
efbc56e0719945ed9b1e41826b9ef6a354247d26
F20101111_AABEEO schipper_j_Page_050.jp2
7f9797ebc7ed3eaf3afa0eabadc110d5
aa9a79d1e4537eb22faf33e268cc08713f466561
88591 F20101111_AABDZJ schipper_j_Page_014.jpg
a9d0746a03b233719f492e575748b90c
278fa7ff4a55697cfac3dbd7a6ba2635fcc5b5e0
1051933 F20101111_AABEFE schipper_j_Page_068.jp2
8e0a1fdfc7bfd8829e5e7bc1e80cd745
a2822597825fd055b6cbd87d13b256fe5e03f319
F20101111_AABEEP schipper_j_Page_051.jp2
3da6c0fcf8ba5c8533c7e5c99d06797d
abc4a0bbb1813a0ca3e181c8ce235684342e1cd6
86595 F20101111_AABDZK schipper_j_Page_015.jpg
d1b9fadbbc32447a79f821a89da3f62f
1d9ec14bd52de8b5d3fbce205e45e8d971f4d8d1
F20101111_AABEFF schipper_j_Page_069.jp2
7ea11bf426633613fbb57a7da109fcfb
1d3bdd5e94277e0e185cc9fd3540bbac95531695
1051958 F20101111_AABEEQ schipper_j_Page_052.jp2
83b1eca59cf08b7d4986e36e236c002a
fe706654b099de52892633c6688877c23d2e5062
91162 F20101111_AABDZL schipper_j_Page_016.jpg
dbc65e47971f9f90e47982a94c307c03
fe4263270b25c96a6058dc66442cffda33a8bc14
22444 F20101111_AABDYX schipper_j_Page_001.jpg
dc650a47c71bef6a23a1c468d9bad3a9
a0393db05a5896f1035f5fd6eb762f698b689262
1051952 F20101111_AABEER schipper_j_Page_053.jp2
ec0c08fb6fbee221c6641ad723756471
e027f83865d6d70bfa81ee7b7726973f24bdadff
90151 F20101111_AABDZM schipper_j_Page_017.jpg
3acf88aa80a9461ab49d5c3a4f23692c
0a10a59704698917c930ba08489fe851f09427ac
3906 F20101111_AABDYY schipper_j_Page_002.jpg
e3cb26358c497a5c31fcb63fa34a198e
56b2d0dd06403f4310429ca939f32893729cb81d
1051955 F20101111_AABEFG schipper_j_Page_070.jp2
17472028e5c098bec7a663fbca8d7b41
7f37978804240b17e1b37eb650cd95053b688978
F20101111_AABEES schipper_j_Page_054.jp2
bb15e8863a75f65d9b802b9b37b8bda5
97a3ed6dc47f232e9cf69123eb8502bc4a7f561f
84929 F20101111_AABDZN schipper_j_Page_018.jpg
14af79d88bb56c66ecfe8c81d568e851
2b8a8c3845a285b75074ea2d9cc23bf2095774e2
6152 F20101111_AABDYZ schipper_j_Page_003.jpg
f24fc158f2f76643c67736279433d506
47a9a1bfbb416427403a8a297cc952f160541d49
1051969 F20101111_AABEFH schipper_j_Page_071.jp2
13645b641e6af8e3f14eb9688af62858
2464600ea65cabe739d93ba93c7d0583a71d3a27
81911 F20101111_AABEET schipper_j_Page_055.jp2
a189ffa2b1d3037c7831143ec77d82d6
ab31c046e05cec792a9ef335bc0363cfcf751efd
81830 F20101111_AABDZO schipper_j_Page_019.jpg
47fb81a2a2cbbb44f0f608c3785775d6
e80719f66b1a7589ccd9e6a61167086598e2dbe1
F20101111_AABEFI schipper_j_Page_072.jp2
57980edd7bde7df27241082fbee31f64
8572833d54e9aee2046e546b4829451d56cc16c2
F20101111_AABEEU schipper_j_Page_056.jp2
9f757e6b888ec00d61c1bacb91c8f56c
42c9932e3a1d8a55f9e33971a405051578d1c1d8
53191 F20101111_AABDZP schipper_j_Page_020.jpg
2827ea6f9544ea6d2ca704c164fb7c35
0fa81870611e97e0fcbee7cf33df01f393ad8e53
1051892 F20101111_AABEFJ schipper_j_Page_073.jp2
e36e8d3190a80aaf5e7a18082dc75065
921b47d0ee3fd6c3bab17658f497ad29fa4048e1
F20101111_AABEEV schipper_j_Page_057.jp2
ad83ea3756d9181dd2f7aa314973a230
058ae6920a84ef83d0a3665466f4caf3419a3c36
79440 F20101111_AABDZQ schipper_j_Page_021.jpg
91f30bd8d587f63afa77884e5462fdaf
dc15211377bc62c177ae6f794852dcb13209a6c6
1051935 F20101111_AABEFK schipper_j_Page_074.jp2
dcc1aa80e0a630d6095cef4e335d122e
88b9b6c5ac0e97b5991d98303f17c0fee3b51366
551010 F20101111_AABEEW schipper_j_Page_059.jp2
df05c6ef5590f2e03d3272127b35cd79
838317dc743bdb81def4c3d18395f3437f2ea4aa
88919 F20101111_AABDZR schipper_j_Page_022.jpg
52a62362fa65f17582170a525b8e6801
c1a13eabb47f29dbcd026eac55f50ee35fd58e76
1051948 F20101111_AABEGA schipper_j_Page_094.jp2
4f69c1a11e1603183a2a5b35429c26ac
0841a9644afa193f3316d9110065843b69519a55
954429 F20101111_AABEFL schipper_j_Page_075.jp2
95f75017703651731bb0cc46241fb3a7
5a959f1e671b69bccf566db8bc96c6cd15ae0bf4
F20101111_AABEEX schipper_j_Page_060.jp2
aa44f3d48e1740ab885391d009bb927e
3387c4fa840eea045454451089b8c661151ed7f8
90369 F20101111_AABDZS schipper_j_Page_024.jpg
58d99d0874ab27e8efba1814bc560179
3e146b40752793a964bd29fe797840fe8939c983
F20101111_AABEGB schipper_j_Page_095.jp2
e00425eccc24bb332eed80c988a0ee46
c34a4342252578cb7a91bf0831310d4d3c05ef19
1037622 F20101111_AABEFM schipper_j_Page_076.jp2
f22a4ddc52ce828917661f7dbe4f1871
af9b4fb145d5f7ceb87eab3b6525c985b861a6d7
997332 F20101111_AABEEY schipper_j_Page_061.jp2
4edbbd63ea03caa1d7dcc8f47eb70dd7
5c6b54f168f98ddbc84af7b74b114aca20de21e8
980622 F20101111_AABEGC schipper_j_Page_096.jp2
0083d2ac47d537d0f2942bc918097955
feeb17c355ca6e2201ccd145a70ca0861b2e5400
1051971 F20101111_AABEFN schipper_j_Page_077.jp2
a0802c437340d288ed13b910de1d6c3c
60bc04694ebc97d774259de7fff1ac32c862bcde
879820 F20101111_AABEEZ schipper_j_Page_062.jp2
66a388c9c6961320d21f770c074eb8af
e5b168f5f52ba8f4a2bd8a33dfa9d618b2e500d6
70329 F20101111_AABDZT schipper_j_Page_026.jpg
8605be6c60f936f848870a58e66c6a1f
3bd0da3733ddcc55b96048cb52c504fc38f38bfa
1005969 F20101111_AABEGD schipper_j_Page_097.jp2
7cff77dfa48fc8c28f78f3681daa2d56
4bad2da4bfc81dc106d7f04b0675550c1eac203c
F20101111_AABEFO schipper_j_Page_078.jp2
5a3117cdacc409eae8021580834c6265
c41306f5b87480189b20faf581cb7debb130fda5
78297 F20101111_AABDZU schipper_j_Page_029.jpg
b281ef799b5b86904e75e327fdd32cdc
eb95e144f437f98c749c28f67b00f77702750fff
F20101111_AABEGE schipper_j_Page_098.jp2
18c84596f5658d5cb3b3579981b4fd68
149ce8c2365a5f2bb53a8f8f8c8aa24de5a3e92b
F20101111_AABEFP schipper_j_Page_081.jp2
fa2ced00da56cf610994642d7b69d9e6
1536b4a38b71659ff8f356b1e153e81200986134
71527 F20101111_AABDZV schipper_j_Page_030.jpg
8d212c93c37bbca057c3271f26889b0e
a77e6d99f2056771b4571c89cf5f04e8692d29f5
1051957 F20101111_AABEGF schipper_j_Page_099.jp2
9f713177a04be93f0599bff6fb776d0f
506e3fcb02fde3057d5e2640032c1b0be0526f5e
F20101111_AABEFQ schipper_j_Page_082.jp2
e993dddbe30730bf6d1acce5055352ee
80768abb40d945aff70330cf0e53d380beb43d25
87978 F20101111_AABDZW schipper_j_Page_031.jpg
415c66ad283c2d05ba6dfc83347069b0
40b06e4b1d8cd59fc961cddc631d7ebefdfc74ec
F20101111_AABEGG schipper_j_Page_100.jp2
58c2e63c6e5eae63733e560d231d16d0
8924787a6ad7697c260999c0164f04e172e2fdd0
F20101111_AABEFR schipper_j_Page_083.jp2
b3eca3d314b2eb65c21d41e118d7e1ce
4bd7a015a9157009b524d5492fb7962c9455dd39
92493 F20101111_AABDZX schipper_j_Page_032.jpg
ce11ce4b9e5e95ef84fa98ed863cf3dc
d9ce3f0050e1e5f37428327e2779fdc8927d2dfc
F20101111_AABEFS schipper_j_Page_084.jp2
d0334f5313100a29284cf5b728d319a0
109db98ef217439c86d06ba17132716434b52898
93393 F20101111_AABDZY schipper_j_Page_033.jpg
5a6b07960802364845111b62140a9fd0
a15efeffcc8dfb79e6178ba4d61484dd092a4675
F20101111_AABEGH schipper_j_Page_102.jp2
14dd3a7c171ca9b461e6abb40b1df531
b64d66fa0a04b1686e728a96eacc85c29756870f
1051929 F20101111_AABEFT schipper_j_Page_086.jp2
e2c5c49f2708da9dcebef7a18d88ac2c
be34ff1cb1457a748b65df1f9b9b3fbf00063490
62615 F20101111_AABDZZ schipper_j_Page_035.jpg
5ce741d9344e804a647a922d55e0e10b
f1c0e1702c029c7bb726b5822989cec60bccf78f
1051978 F20101111_AABEGI schipper_j_Page_103.jp2
4066c6dffb7764a3328732858a182c07
96878e3617c83a9ba4848962a9befad5d66f1e94
F20101111_AABEFU schipper_j_Page_087.jp2
2291f9cb8a7da95f086840fc45c572b3
0cc697797db1dad46ec53fcfc9f29b133764f7ac
F20101111_AABEGJ schipper_j_Page_105.jp2
5592b0c47af167278d6a86e80000c233
e0174af5f12571de796931d6dc8dc284db5a92d0
1051915 F20101111_AABEFV schipper_j_Page_088.jp2
814fba8040afef1e73b3313ab94adef4
afa2d1ad11f699bf2bb343ea4b35675a9552d1b6
1051970 F20101111_AABEGK schipper_j_Page_106.jp2
c3ed76269522d855cab0737ab47d20c8
6defdb71ceb0fcf13bbfb39a1bcb194db21feb0a
F20101111_AABEFW schipper_j_Page_089.jp2
57671ad6eb29bdb76101dd388470c8f9
671347c25d6a90864115ba5d6da15b125ad1faea
F20101111_AABEGL schipper_j_Page_107.jp2
a6154a6aea0ca344183dede3db8e6b80
38d6fabe12e9be8759c1da49dcce22e3cce4f124
1051962 F20101111_AABEFX schipper_j_Page_091.jp2
466e6601644904180ef26b4ef72fcef6
c6cb3815962673bb51c29c4e29cc28277430053a
F20101111_AABEHA schipper_j_Page_001.tif
4970274f507ae8d49903e4a9b3757614
714888df5f08c4c58f0d4ddea25fcd1b598e5bd8
1051983 F20101111_AABEGM schipper_j_Page_108.jp2
73ab693db14368c9b33813a30414f1d2
631258de13c0cae2cbdde4faad0240882a64229d
F20101111_AABEFY schipper_j_Page_092.jp2
664be70717437a5e72f8a490c8afa8f7
6c4291c331a7c98d3b4f434c19fbf2287412ca86
F20101111_AABEHB schipper_j_Page_002.tif
4c4f142c4d0697ab669063342ab9f5ec
3290dbfc663541c910ad2d504684be94bd930029
F20101111_AABEGN schipper_j_Page_111.jp2
8655ec20db2cddf0e8530473ad81128d
7c1dce94f6190526549c44520a8f88e5b5ea6bae
1051912 F20101111_AABEFZ schipper_j_Page_093.jp2
0436e41351a6895d4a839152fc4ac381
3285bf14ce4e916c511e213f19cc4026c249b92d
F20101111_AABEHC schipper_j_Page_003.tif
9ef4cd74bea650cda1b7b123e10782f6
b5a25433fe7edc99dd23b12cadff8d56ea06f7ee
1051913 F20101111_AABEGO schipper_j_Page_112.jp2
1dc11312a0dfa23bb1a8be9ef0800031
0222324f1b6f9c3820e4aea5cce77e36c267f1d8
F20101111_AABEHD schipper_j_Page_005.tif
6ff1be6f43cea28bbd1350c10cde0e24
c2631b482eeb975d0b063abcc405b51fe1854769
F20101111_AABEGP schipper_j_Page_113.jp2
f96251f1157887f7204670965aadedbd
a42478c151a1ee82791d0c4062cfbcb9d3c6b724
F20101111_AABEHE schipper_j_Page_006.tif
61e346e63d5118ceb004732d16cd21b4
e40a14d51ddd1aeeb6731455024e79212e07aea9
F20101111_AABEHF schipper_j_Page_008.tif
f6a58630df653f8d72bf3ef0451e7543
f743ca34f35d537f3ccea622c9dcfe4fac6ae354
F20101111_AABEGQ schipper_j_Page_114.jp2
0dee5dabb2789380b9fcfd3b8d4ae973
f7fb078426aef456e0eeaf184f669d7a9dd4d7f6
F20101111_AABEHG schipper_j_Page_009.tif
f67e5b2da7152ee3e7ff3decf9320c6e
e206743ff86be562db05ca88305b9f1f13e7c11a
F20101111_AABEGR schipper_j_Page_115.jp2
a24c24f9d2b66b02cbce9bf6b8659290
8407d01a827049a770a2676a8f288ab3063f46ff
F20101111_AABEHH schipper_j_Page_010.tif
8da5624df83fa181b5c86ce7570d9869
8517641cdf8eab84f34423ae6f5c6b926d096142
F20101111_AABEGS schipper_j_Page_116.jp2
a72a057a6fc9a8df91fcb1a0478187b3
41d0f4b94dec022d0d38e2566d8283eb13215a60
652197 F20101111_AABEGT schipper_j_Page_117.jp2
950dbb097f6577efcdee94dc4087563e
7125c23c7f1d3d226850d80fca32cfcab7f567f3
F20101111_AABEHI schipper_j_Page_011.tif
2d9931c364d6991a88537e6f52f686b7
9abc6a59a35334c7c2cb1bc623072e432e9d705d
F20101111_AABEGU schipper_j_Page_118.jp2
5c6e2d58321628b978f64ed287b246a9
b5aa645365058e7005bffeb33c4ee30d725f570b
F20101111_AABEHJ schipper_j_Page_012.tif
4989d400e5fc890a0343da96439459cd
2f411d42703360705bc583370b2ef9c455ab2e03
F20101111_AABEGV schipper_j_Page_119.jp2
474157d8dc3ca6be0e463a58403c6e34
794de131818ad82a765c469b26786caf8bb32223
F20101111_AABEHK schipper_j_Page_014.tif
97cdac8f700910eca0ccd4861f78d54e
8610274056be6a0aa60e9239b41e695fe515bd14
F20101111_AABEGW schipper_j_Page_120.jp2
a6e21f516520c614c6229ad1b32041a8
b99c1337ab032d5f41372d5b5a3f4266b6419763
F20101111_AABEIA schipper_j_Page_033.tif
a10e463c943532f0da8cef1da686af14
1b37972a795b90e3938cabcbc4d99d24bbbd10cb
F20101111_AABEHL schipper_j_Page_016.tif
b22a63b0009c6f56364e6c3b26822b8c
5bfb0781713948a2cdd6252d9388900d00e2c809
1051960 F20101111_AABEGX schipper_j_Page_121.jp2
9fb6bd5ed4423f8971df476c1bd4fa40
eea3acac6007fecba826d05a9bba3de9ae74418c
F20101111_AABEIB schipper_j_Page_034.tif
f375911e1f6d978c9cc24b6082e03f83
ce73787294d8d6ef92b472fe39a68c1dce0381c3
F20101111_AABEHM schipper_j_Page_017.tif
5d4ee45304ccb689a67612504b55de25
551aad2188bbc77716c3d784b4cac7f341298964
F20101111_AABEGY schipper_j_Page_122.jp2
4e85e20349d8d317708f9cb6883d84ee
55ac53b26d55816186e5784242a809082c5c31fa
F20101111_AABEIC schipper_j_Page_035.tif
0c9633b8a8ec8139448449e54c6eecb5
148d3f3de77de7c1c5e44fedd7160578e4af605d
F20101111_AABEHN schipper_j_Page_018.tif
4cbf331177ec4d33792607da1df10b99
60960ed92eb9273d98b64627e541c327fa827a8e
915538 F20101111_AABEGZ schipper_j_Page_123.jp2
54ec9210be49427bbb9c6d6621186dbd
861cca7ec93742f3d0032e79da9b73ef0eca520f
F20101111_AABEID schipper_j_Page_036.tif
8fc9025c4d7d8684e9ac405c5313257f
0a7449b4699dece266dae9461939e4be0ffa0145
F20101111_AABEHO schipper_j_Page_019.tif
265e9baece941249a9451e8fa8e2b281
d3d6f09d4bc22fe66b374968e0237bdc88e64a1e
F20101111_AABEIE schipper_j_Page_038.tif
589c2a1a09700b69721cd8c3f80d64e8
38bcb62d9bdfa0e8d8e7dabf4c66817716b511fb
F20101111_AABEHP schipper_j_Page_021.tif
56ebacb83ab7fc68178bef28762501e9
301abe2ab5c748d5e30fb1577a27c0eda28b8489
F20101111_AABEIF schipper_j_Page_040.tif
e0800e498da47b655cfcdfcc05683c25
c1466a46f4c371616cb0945e376b0a53737aede3
F20101111_AABEHQ schipper_j_Page_022.tif
3cd85b6f18da5cffddd798b40d1211be
852ffa76e2d6311d0737a249779e705cd497ee7b
F20101111_AABEIG schipper_j_Page_041.tif
4661397e64c8acf74f0156d146250fbe
f4fcac76c0175627bc377d71671ed1333d7592dd
F20101111_AABEHR schipper_j_Page_023.tif
d5a961e2618e4c2e9b35e743e120989f
b200812301d234c0fd42deeb92ae66dc344344c2
F20101111_AABEIH schipper_j_Page_042.tif
eb64b430e5999210682b3f8e6b3632e8
ebb1794925082540c0e1c0fae6141066e28a7b17
F20101111_AABEHS schipper_j_Page_024.tif
7531bad29e17aee5d77e40d6d56f143e
ece93950c3ce8b9823493e3ecdbbdd8b01fc974d
F20101111_AABEII schipper_j_Page_044.tif
3e1b41aa84cd2fd183866184c154cc50
f2f2407721f7087af8e14d80fa4245898cbb7edc
F20101111_AABEHT schipper_j_Page_025.tif
501650db401de40efb80646d85ad5df9
73c393006bd2132d205194007b50104402121807
F20101111_AABEHU schipper_j_Page_026.tif
4103c115aea9284c44b4ea0f8b5495d3
24c84aa6fe194e2e76aeac6ed64e2798c367b61d
F20101111_AABEIJ schipper_j_Page_046.tif
3719ae8042a9f998d1a9e0a5f00f26e8
96d04f2a0d0b752e8ae9f77e8ba0c841fdbda963
F20101111_AABEHV schipper_j_Page_027.tif
d615d751425d6b0612829ccc5b6071d6
3629ff3bceae7703af464a269c56202ba01e2952
F20101111_AABEIK schipper_j_Page_047.tif
48e45fe578f168708052a6c768675d39
bac7e545bd138e0fc67d4731dba13f772d720212
F20101111_AABEHW schipper_j_Page_028.tif
892b45103f64a4da3627458f48de4431
05bbe46d2308cdbbcad61b8a08f98642e14ac343
F20101111_AABEIL schipper_j_Page_048.tif
ba17e0e8e8fee6c493ac706f5ad28f14
45825240e8256559ad2b8d317e39d09603390509
F20101111_AABEHX schipper_j_Page_029.tif
2b156293816a6c7fbc09fb0e6ae0a1a8
51f6f37b33940b12cc14b476d47a29c02e18907e
F20101111_AABEJA schipper_j_Page_065.tif
8f8bf64c6bef82804d9060e086b53b29
a074631b81e121ae6c1995a2da63bda0ae0aad0e
F20101111_AABEIM schipper_j_Page_049.tif
a77bdca76a0262d0b69d3f3d20b67bc5
0f8ff054d11a00ff310c6dc184b651647fd24de4
F20101111_AABEHY schipper_j_Page_030.tif
b2a41bc5c43d1fd2b4ac2f463514467e
699e56dbf305c1adf8384d4b7f511d2fe9481fbe
F20101111_AABEJB schipper_j_Page_066.tif
ea00ee6332cfcd6f12d9f5d5b490eedd
c5eb953e60e0f1244b2365010f40d10102e514da
F20101111_AABEIN schipper_j_Page_050.tif
9d647131127c454ce81eb9758a05a971
a872cd2268a9ccc1a83bbfe9fc020adce24b1d8d
F20101111_AABEHZ schipper_j_Page_031.tif
8ab0e6f338bdcb1b46b067a63341d44e
c57b746e939011fc5ba53e86f3769b95bc1a7ae5
F20101111_AABEJC schipper_j_Page_067.tif
52f1b2d94c9ac5220bf1780cd558ee6c
7de090ba79d41c41fbe94371faff29544c1b37c8
F20101111_AABEIO schipper_j_Page_051.tif
4f95e472a96cc924dfa3ab3cf5a79af0
2a3b2fc8fa378b136c8f290ff9a20f051114d8ea
F20101111_AABEJD schipper_j_Page_069.tif
a5f0ca86dc51976eca00d0b143d09a2c
c2ab6cea2e55f9997528578e12e5d5bca32e0ff6
F20101111_AABEIP schipper_j_Page_052.tif
4a21a44d114ae926d0f33dfdb8711d7a
5b02a3a30074c779a180251ad800e344c661d1c9
F20101111_AABEJE schipper_j_Page_071.tif
f4e56c8ea1196c71a4bfd7c6f74bfd64
2fbf1b0dc8bc1614a935266d2ff05e75f17c7a02
F20101111_AABEIQ schipper_j_Page_053.tif
3d6f88ee56b4fd7436ebb280fcecc01c
c546d01155e005eef787f71a3dcf1d71e4db03a2
F20101111_AABEJF schipper_j_Page_072.tif
8f9e0611c549eeff1c16db3fdc53b19a
02a842d4e5d089409e10478f8d44b25914de70c1
F20101111_AABEIR schipper_j_Page_054.tif
c3366991e2cba446e8fee8a7cbccb672
690d5870e7a0df0b243de8fa6150bb07578f520b
F20101111_AABEJG schipper_j_Page_073.tif
10f8ad73a208fda6c1e952e785012978
496f010595bee40f5b35e80237e06db1e500f7ab
F20101111_AABEIS schipper_j_Page_055.tif
2aef8969fdcd6d77362e1210d66ac7df
1efbb9e16baf72d43535f66e90ad8b5f00b0e3c5
F20101111_AABEJH schipper_j_Page_075.tif
f2f9dd8c684bc5e42add42ed70036323
f315e09c8e4d832874eb91c44629369bff2520c7
F20101111_AABEIT schipper_j_Page_056.tif
9f8cb1423181bf78edc3b5757fedaf5f
786865a3d8399c4df5cba99d33ff8ee735aa1fde
F20101111_AABEJI schipper_j_Page_076.tif
df18d2dfa30a6bac2c5a9716778f3347
c7a4247fd5e2bdeb1faa3b315b4c95d7533f8812
F20101111_AABEIU schipper_j_Page_058.tif
a06f58019ab970d04347b2392e611f61
b5f07563ebd90e2c4354a147cdde0cc497eb2740
F20101111_AABEJJ schipper_j_Page_078.tif
9be39ed73b810de994cdfd3a403fb988
93b042a6d2ebd1e59a08c28404004e795834e271
F20101111_AABEIV schipper_j_Page_059.tif
d3ab4acb1644b68d4281c8d50c55da53
cb9b6c5f0b29941161f4c977830c8cb97590b169
F20101111_AABEIW schipper_j_Page_060.tif
d2b8c6c21c57d08fccf75bc5408783c0
d10b5373bb4a496bd8f58531ede6ac3d97de6dfd
F20101111_AABEJK schipper_j_Page_079.tif
0569b5b4dffb24dcf14c66f56ecc05d4
c5a2b00c8fd378c75a94390ef927b7f11ceabb6f
F20101111_AABEIX schipper_j_Page_061.tif
cfa28a470587d600adca8ad18f952dd9
6f58692e921123d75db832a4cbe55c6874d9d4b3
F20101111_AABEKA schipper_j_Page_098.tif
b0b3eddf79f1572f81766337eb5b4855
5411b9489cbb6d06ed71803e035b25112a9fb75a
F20101111_AABEJL schipper_j_Page_080.tif
1826556a8d45a9590ccc3a57b95e7d6e
2f2f17719de512b3d71a65986aa2eeb6a3835c1e
F20101111_AABEIY schipper_j_Page_062.tif
89dbe7fde0c89db92f242af8308ab165
f968ce091c57d96b508197fd527b642e3369a60f
F20101111_AABEKB schipper_j_Page_099.tif
3305c9ca8bf5871567a2084ee8d575ab
a0824a180a3ae327f4c62070334e3b78295f70a2
F20101111_AABEJM schipper_j_Page_081.tif
aeb4b942e4a74fff23406ef6609b4540
4d6531f01f1715e6597ea2bbbda83e61a3f5c154
F20101111_AABEIZ schipper_j_Page_063.tif
bfd61480eba64a16ae723d6f009dcb44
dac3d62d08363855605bdb1897e57a2d171b32c4
F20101111_AABEKC schipper_j_Page_100.tif
41ad727845678be8ed79f3586cafc1c9
c8ab571a3fc4625c66280fd20c6574963bfb77ab
F20101111_AABEJN schipper_j_Page_082.tif
3772c822fb3a83a14a24c6c56e81aa53
ca201050b219860cd93d58438cb7d311433d98cc
F20101111_AABEKD schipper_j_Page_101.tif
521812a4c86ca23938d26da0be204787
ee3df064e309dbca5b647b961b1bed91956013a8
F20101111_AABEJO schipper_j_Page_083.tif
084db6881c95f2ebf63245ac004136f9
637e4fe9846f7378526c4adfc5d7be525998a8b0
F20101111_AABEKE schipper_j_Page_102.tif
ab0d476cf8ff628f7e743dbd433755d6
7bcab5ff2f85f9d44c492e9a8c6a2e38c764d6ce
F20101111_AABEJP schipper_j_Page_085.tif
e78950ff30d3e3ad841a6740fa37fb4b
ee47a228143afcc44cc47da7bdc5e2d86918e913
F20101111_AABEKF schipper_j_Page_103.tif
bc32037a5675d847f219e97014b692ef
48df9b07e3bfcbd1ba16d7a15d9ef56e8315ea54
F20101111_AABEJQ schipper_j_Page_086.tif
ddd949bd8eb036a42754434801156728
c5cf313f0e780ac71508ea5de837bfdb10608491
F20101111_AABEKG schipper_j_Page_104.tif
cadeed8a73bcb367d071d3b0ead09e11
3c7c41ca58f4bc759f79e3f62fc770ebfd24ef46
F20101111_AABEJR schipper_j_Page_088.tif
8713d54cc78c66dcb152b1f62d9c1290
a6660249412d9ea9c3ddab915647136db24411bd
F20101111_AABEKH schipper_j_Page_105.tif
abd0d1fd1584da3c7091523899b0826a
dc688baf88780c37bbf8fefc0a7cb35d147b798b
F20101111_AABEJS schipper_j_Page_090.tif
f75a939962506cf3b0ac5ff1030f48f0
eae3c81367b87ebe87987ef0de259d23fca6c16b
F20101111_AABEKI schipper_j_Page_106.tif
ab2dac53aa08dbf14ab5508d910024a2
a88f98b37df531f7c76bc7b424024923517a0732
F20101111_AABEJT schipper_j_Page_091.tif
17ea80d577383bccc8eb0f7ecc8d2466
27d9f9d75032e0e4f88e35f658ddaf9d0c5489b0
F20101111_AABEKJ schipper_j_Page_109.tif
e73629d02dd53264caf373ca26d9afdb
ce035210a31b061d652990ce8ee7a173cfb29aa7
F20101111_AABEJU schipper_j_Page_092.tif
d04e5f8866e8045883f97387a2b97e7a
785ca4688d7d70ad13e13c2670d9625b11b20179
F20101111_AABEKK schipper_j_Page_111.tif
1504efa3833f02918bb0f63e6a7af39a
3db3c1a84fba47664a82516e154aa888f89f7f40
F20101111_AABEJV schipper_j_Page_093.tif
312926017b5db09c59a63942cded704c
192f88d90a92a76739224a435044b0156d974984
F20101111_AABEJW schipper_j_Page_094.tif
7155bfb4c384cf82ca638c19a5ff4afc
13ff64d40519b82345d01ef2223fce919c92f598
18833 F20101111_AABELA schipper_j_Page_008.pro
d6464e9e68c7cea599b2a5da8fc03e41
d3940868cc089d2ad96f3b200a56314fc0187a23
F20101111_AABEKL schipper_j_Page_112.tif
0820dc1d6cf2516bc75bfe3c330d7936
9e79ce22926a2d30a91fbfe1699c623e2aab07d7
F20101111_AABEJX schipper_j_Page_095.tif
9d1640fdb5e102c01399720aa4b6e4f0
68b99dc08c9cd58c9175efbe045fad6bd030e388
49095 F20101111_AABELB schipper_j_Page_009.pro
dd3b227cfb6023bc8ef14ac6aa294cab
a39473f87a2e66d1aa7112ed68f24eb784a774c5
F20101111_AABEKM schipper_j_Page_113.tif
dbdfada9dec36ab90d2de03ffa58dccf
c3e488e2c87b4d1c3921b587b62ba9c007a0f8d7
F20101111_AABEJY schipper_j_Page_096.tif
6d6cf6ea92ff709e9aa2f2e57a422790
c157e64b871f35178e526f8e31965fbdab90d8ae
25084 F20101111_AABELC schipper_j_Page_010.pro
769626f58afa80c2c12618c712681aeb
c95d2411ff9ef63f7d37675bab746b9308ce0154
F20101111_AABEKN schipper_j_Page_114.tif
c1af3142df41c37d685e3ff1dcd6af60
d1e3a5f3083331f43793accc0b1b287ebe15a473
F20101111_AABEJZ schipper_j_Page_097.tif
76feae957b27d645c19b6e8cd868bb3f
181fd7ca8bcc03cc6f96cc9db659c922d8cdb48d
44768 F20101111_AABELD schipper_j_Page_011.pro
d82728e49939bfbde4ef5f6f98718598
d6c98a82e089408854a793cd300645b5d60f2c28
F20101111_AABEKO schipper_j_Page_115.tif
c9dca76fda56870aeb651ea580e186e9
774c3e28b950facb52815faa28deb3e828e87c34
29096 F20101111_AABELE schipper_j_Page_012.pro
fd2c8a791ac2b3f947f02a7d84e9c036
3bcea5a1b834fb1571a2c5e1c1ec93c94761eb14
F20101111_AABEKP schipper_j_Page_118.tif
bed570669c05201ec81be219d70ac8ad
fea72b2c1f4eb1f0dab64745536cf75ac009d498
49565 F20101111_AABELF schipper_j_Page_013.pro
89c049d55c1f42a9725772fc6b28bfb4
0b7ad43363ab0c9c5a80f68fdf29778542ad3c38
F20101111_AABEKQ schipper_j_Page_120.tif
27cc7be775a440ba3c7e1467ff75055c
cb37c48458c9ef623b97d0fb4955eddbd4fbde83
54004 F20101111_AABELG schipper_j_Page_014.pro
6f4c86001aae4896153d168704d7ed89
3b56b3e3ab1c054344c4e5fad5bb89335419c725
F20101111_AABEKR schipper_j_Page_122.tif
c7f4a210acf834d941d6ead7346059d0
973fd21882c211b8f10881f3ceb6cde4039a0a7f
52865 F20101111_AABELH schipper_j_Page_015.pro
df9d7610a5c90c3e13391e3466da5e5e
83b652d4d93c2c5790dbd5b0243e3846876092aa
F20101111_AABEKS schipper_j_Page_123.tif
6f5a2510528a467b29152361dd5849bd
efaa12e91e072fbdd6752a18e84133750a206cfe
55267 F20101111_AABELI schipper_j_Page_016.pro
693b50888e959358f4ab14eb10bcd4ab
5aaa202dd2db440139f86b6f1d822fddaf928590
F20101111_AABEKT schipper_j_Page_124.tif
59c715ed6bbbff7de17c23d09908b39f
11a4ab59f9c7c2d09f07497a534d274017fc734a
50877 F20101111_AABELJ schipper_j_Page_018.pro
d42a535b81bf2c06eac30505de3d19b0
dea431da3f492ffe3fe74ee711096a9aab379d8b
8219 F20101111_AABEKU schipper_j_Page_001.pro
0a8b9ad9aafea1dd50766baa44013c7b
d243e58d8092f987acba3f382448d4692755d798
48148 F20101111_AABELK schipper_j_Page_019.pro
553c8964866ce1b052bb87c342093aaa
e2abbc4c62cd03ecf583cddab4819978bfee7fa4
993 F20101111_AABEKV schipper_j_Page_002.pro
93200c4d6687e387dd5c67752bb87824
7fba6394092fddfef546e8b10d763cb5b5cf20b3
22500 F20101111_AABELL schipper_j_Page_020.pro
f7d570fd041ee193cee82a30d57db117
d7ff9f7e289e1ecff42fa459bbec3fb50e568e4e
2202 F20101111_AABEKW schipper_j_Page_003.pro
53b77eb01cdce49424a573de6da8f840
41ba05b990b33ae68f7b9ffcdafa9398f27e5548
18479 F20101111_AABEKX schipper_j_Page_004.pro
7c8114198b5d3a5175b76b1a7c73adaf
87031c566c1372ae76a87b2b46ef0640d2f0306b
37204 F20101111_AABEMA schipper_j_Page_036.pro
13324ae5934fc27789b5ac1f48e86111
ea03e1219ea3dcbf2fd63b14621c97502f077373
54154 F20101111_AABELM schipper_j_Page_022.pro
b06982540a18fb8583e4739c758cf33d
6d5bf4ff41f8f562d25cb1c77e665cd4d3e1ba61
93780 F20101111_AABEKY schipper_j_Page_005.pro
cb9749be62917e9e9ae37e0bf4ccff93
068f31b44a6cde3d9e06a114d53003c6dfec2689
50650 F20101111_AABEMB schipper_j_Page_037.pro
a0aa00980e85c8e77f267765141b4bb1
24e0de7d90882b27ab9ce64d6c3e625db504f5d6
68394 F20101111_AABEKZ schipper_j_Page_007.pro
25187596283e717a98fb788a9b977844
2b16f1d78094be545dfe32681e12169106630a9e
54626 F20101111_AABEMC schipper_j_Page_038.pro
ae11f47b3527d370bb5e0a0304089a9c
44b611afd8699d417723d977ecf0965f27517a60
56103 F20101111_AABELN schipper_j_Page_023.pro
d5a9ea34f7fa107cfb21de12b8410518
bb701ce7e0477a5f54d64faf7263a9b199524b57
50660 F20101111_AABEMD schipper_j_Page_040.pro
b440080063a059d198712e11b209177d
2e5a828940d42a81fe62feea7da6deb3d67d134d
54099 F20101111_AABELO schipper_j_Page_024.pro
f6a13652c64a21f2eedd25735f7df4f1
ce2eaf8f8e1244ffd8203508ae7b4d8759ba39d7
54958 F20101111_AABEME schipper_j_Page_041.pro
5aacf3ba1669e64b3e517ca3f2458720
86c8f9638df1f2f07a42c94a5276ad722a766843
47695 F20101111_AABELP schipper_j_Page_025.pro
e788066fbc2fa4393e97304b3e43925e
6d8a18580d0718884d7d5d048b7d085badc1aa8b
54396 F20101111_AABEMF schipper_j_Page_042.pro
248ecbd727d3823ea4bb7fda63f081eb
bb16ee4596f01e7a72905b7fc661a721a2003f05
41568 F20101111_AABELQ schipper_j_Page_026.pro
b39b1c36f537c0139bd6bc78f1f1ff87
73c7dee08165404484da81b86dd9cc3e7457976b
43226 F20101111_AABEMG schipper_j_Page_043.pro
b0986e818003e1f38aa7189d41bf8035
017d96351627464dda61fe8f6a3e160bae8a00bd
50671 F20101111_AABELR schipper_j_Page_027.pro
aa2b3311d2495e60fe71eb682fa5fbea
cf80e7c3d5afc6af67ab64ff05d6b1f6a4e99ac1
47247 F20101111_AABEMH schipper_j_Page_044.pro
27cceda6b4d7a6c3529865d682a1746d
6d1256b23f599509dcdbdb37d497fb40c8a42155
20566 F20101111_AABELS schipper_j_Page_028.pro
942be656915f4abe7d0658d061e9379a
723a9ddca30f5458090ec3432fa366f96c8d9f3f
50480 F20101111_AABEMI schipper_j_Page_045.pro
40d07f796ef7cb98800de875ffe18754
8580c528bc58f35a74f4d1700aeb566587f28f36
48740 F20101111_AABELT schipper_j_Page_029.pro
7cec9d6db79307768700ddf944d5f9f6
a4f81bd49f35b6c0e8a0bcedc1b83be1fba5e26a
53701 F20101111_AABEMJ schipper_j_Page_046.pro
201e857b2bff8905287b88f10d5c3e9d
717438760a109c71f3d8f0effca403f564a2a79d
44071 F20101111_AABELU schipper_j_Page_030.pro
1314ee8442d0d245f0d72e97a1da08d3
df718b10d46f3a0f4d6585ea0697ab7dba2b00f8
52399 F20101111_AABEMK schipper_j_Page_049.pro
73f1b0acc8953e9f59480352e8356fa2
1df2b8c31ae2bfb2270db602c25e7598833a701d
53894 F20101111_AABELV schipper_j_Page_031.pro
d9431e577c7c49f659379b9e296dc6ff
1762b0f2622c5cd8f9f9f98b0086285f9797ab76
54404 F20101111_AABEML schipper_j_Page_050.pro
a6ef05a5ee3c1e42f7ee1ea9c44fa121
b8ddbf584d95da4167e7c19466b1bd933e3841ca
57726 F20101111_AABELW schipper_j_Page_032.pro
f8071dd5b576bf7f3da423a69674de49
5d5bbb8eca6b50ee81038d2756a43e2788db6072
51032 F20101111_AABENA schipper_j_Page_067.pro
051ba1d9b7efb96e95c92a5ddf80839a
cc3ccef989aa1bdec566f2a815be05a22ca63549
52710 F20101111_AABEMM schipper_j_Page_051.pro
f2e4ff9dcf0e93f701217a16b76aa095
f6059cf4e3ffe4c22eeba1d0b5433db491d05eb6
57874 F20101111_AABELX schipper_j_Page_033.pro
001e458aa4579f4522fbdb506f107acd
fbc565894beba115c53e9f9b3f1669fbc6154c4c
52154 F20101111_AABENB schipper_j_Page_068.pro
2e6d526e4f19cbac1ff39a2d4d193233
9ba6814edb810239b323174be18aad5c3a00d499
52883 F20101111_AABELY schipper_j_Page_034.pro
16055da2ceec23f3737d6e0397bcf54d
00e4052bef77c97a713dfe4b940ed68418dc013c
54451 F20101111_AABENC schipper_j_Page_069.pro
8dcb8dc8ec4300cfd93e38d9b0f75ddc
002c42903f91d6543f62410e05c4f4d5c9446fae
38625 F20101111_AABEMN schipper_j_Page_052.pro
0911e4e4e1793904bdb11922d01980cc
2006ed0a0dab3b729e7c5acdf7d8808eb665fb04
31226 F20101111_AABELZ schipper_j_Page_035.pro
7787572c03edb4f7d38123b3080c85b8
50bad262b2bbb9aab5f6c69f44a88354c7fd79e3
56148 F20101111_AABEND schipper_j_Page_070.pro
a1c43905266cc01ac56acb7f0f6b0bad
02d40a95dd39150ac93d40d91ce0e2a7ada62583
50269 F20101111_AABEMO schipper_j_Page_053.pro
3f972f4589ac7d161c3a1205ede9b98a
3f90a64876912302239636f9e483126cabc3e818
53637 F20101111_AABENE schipper_j_Page_071.pro
505c2ae7da78d3898fe963b306d96c95
0d0defd3f62b71ebef0e151cc9f35cc5045b6bee
52445 F20101111_AABEMP schipper_j_Page_054.pro
73d19e488b26b348f0b8451cfa5fa24a
594d297b4e8bf953b58486b6d261530b7866bf4f
54136 F20101111_AABENF schipper_j_Page_072.pro
1dc7eb0e6bb75a29e211a3049f263bdf
c8af9493ccd69bb679557c4424ac7849028b7f9e
35096 F20101111_AABEMQ schipper_j_Page_055.pro
d6a0c60b8b4e62f51336c92cec76842a
96e7f81dbd95209deaf581485fb55b6edc73916e
54715 F20101111_AABENG schipper_j_Page_073.pro
d826f33a2b3a5cb390f4ed8a745423f6
8d759ad92f6cd0302deb023a71a679f25cbb8862
53632 F20101111_AABEMR schipper_j_Page_056.pro
b8cc306120622b8fb1a81843db304466
915ccab56e4bcae774ead3fc0d7db6307d20b6d4
56564 F20101111_AABENH schipper_j_Page_074.pro
04bedab54d1eb204fa7207ec7c544fed
05589afb13fcfa90f39ee6b6915e8a5d6ceeedc8
34318 F20101111_AABEMS schipper_j_Page_058.pro
a6d26d04d182987d204a33a608c0656a
e985b6423ccb68a204bc1a09bc5c7b2a7de98fe5
49373 F20101111_AABENI schipper_j_Page_075.pro
876398d2b7d74860b52abd6494be3218
99fd6f0267a6aa0f1bd115a7bdc2d3ec393d6b19
23814 F20101111_AABEMT schipper_j_Page_059.pro
8b4f6baea55046d4b11f6d40cb90455b
122210e71acb38d9a16e4f3d940de0e38d598b5b
46811 F20101111_AABENJ schipper_j_Page_076.pro
2b103231deb85dff7d7f3a8e8e52c2f3
c38af9a041b7d1fbf8382ffa9705af1bc819d2d0
46476 F20101111_AABEMU schipper_j_Page_060.pro
d18eb352735b83de04f0e3d7453b95e5
5028febfe4fa76e45d3d5ef843e5c2e9e34973ad
53202 F20101111_AABENK schipper_j_Page_077.pro
99b78ab7f20e36cb1aaf0802a526e956
03249740acb9622773e20519c6f2d700633f8117
42997 F20101111_AABEMV schipper_j_Page_062.pro
cd338cc639dbb10cfb899fda136ff6a7
f6720db12d303accbd268db8b353a5d7e88ffe2d
54942 F20101111_AABENL schipper_j_Page_078.pro
527f035f7e1597d938cbb02036b60287
01832a9449e156798d739fb3577233937b033add
46395 F20101111_AABEMW schipper_j_Page_063.pro
5cfb78c7fa77f3cd43db45bd61e749ec
e2bfe0422682b1cfef6289b8fa4583fed5d3b300
52344 F20101111_AABENM schipper_j_Page_079.pro
4c4b1ba5bcc4282ed3d96ef44c5e759c
ed13b282970c76a51958ffa811dbd6670c61f7c1
47047 F20101111_AABEMX schipper_j_Page_064.pro
91888ccfd89be0b66934666414ada31d
db2040f554c02e2eaf124ea6584cd4473795478b
31711 F20101111_AABEOA schipper_j_Page_096.pro
41c4d9e9a74d74ce7590b83b4d3354d6
f4ce395fb638bb6f583c86d703d071505698e41c
54795 F20101111_AABENN schipper_j_Page_080.pro
23d60f9cf9b1392f452723ebeb859100
8895c6470eb593ee74829dc5f50f7e727768b6d4
46359 F20101111_AABEMY schipper_j_Page_065.pro
7cd010881c2e902cb94b7b5ed8d33fd9
3eb53c440f849ff41deb67f1a5991b2d96235b37
43016 F20101111_AABEOB schipper_j_Page_097.pro
31c06789e82e6ed665fe7af992ef4776
27fcdb56bc5f99442646f8b46bb9dc0727c63bbb
46688 F20101111_AABEMZ schipper_j_Page_066.pro
ca38745fda807c9bf74df8e88cf7cecc
de640b8d4dccd03b14bba676de5b802467524559
53942 F20101111_AABEOC schipper_j_Page_098.pro
229152f5f9b6f9fe52297f9ce1d9e29f
b6a84cf35b3ea59dbb8c1835e9d75f13e30f7449
52370 F20101111_AABENO schipper_j_Page_082.pro
f5dca193de58e47e67bedf60930bb4a5
7d4c9a36f67426fd892ac47e8d3c535bf50c26c1
49490 F20101111_AABEOD schipper_j_Page_099.pro
b1e65bb8cc12d6c92ac3327edb6d161f
22c33bb01e646e6048b1f2d86a9d36affee02fc4
55532 F20101111_AABENP schipper_j_Page_083.pro
5c8520512de4fcaea13dc2e81e8c4057
118255bc03a15aaeadfce57dc25295e21c2d856b
51642 F20101111_AABEOE schipper_j_Page_100.pro
3324544a01df3dd36dbcd71cc892eee1
cde07bc2efff7e6a290a28bb5526740b509b682a
54177 F20101111_AABENQ schipper_j_Page_084.pro
9dc4aeae9499811cee56ccd0a6da0f5d
243e2781090a5396cc2a06b03152b5704c6ed47b
56249 F20101111_AABEOF schipper_j_Page_102.pro
b8ac35830ad1c82830d6cc63a2cf7dce
6e76f31ac6368171d8b82d6990b61ef0ed06b1a9
54946 F20101111_AABENR schipper_j_Page_086.pro
8c38dccc33d21b714a1cedf4e8fc78c2
3bd94e1d1cdc70e7cbd9dec135d6efb88cabeccd
58778 F20101111_AABEOG schipper_j_Page_103.pro
025b8d16ff569edc92fc1c88b3d2e687
3b87bc38b2477002694c0c9ad61fa8cfaf28bc4b
51963 F20101111_AABENS schipper_j_Page_087.pro
f536ed453f9f52f5e36d1c42a9215e11
b2faa5d718aa0c03909f3239cb8ff3e5316457de
55199 F20101111_AABEOH schipper_j_Page_104.pro
d4f1e71a801cc1d2298c0e133794e60c
64bc44532156408bc87effc740d689d044085944
55406 F20101111_AABENT schipper_j_Page_088.pro
e282f947c73bcedad2ee5ae60bfe1f71
17433f8bd39048640091d81b7d52f8a542bf6456
47936 F20101111_AABEOI schipper_j_Page_105.pro
04366bc627823b4e044765d9e627256f
5035ab9898b6450347ff097aa1884e92321041b1
53155 F20101111_AABENU schipper_j_Page_089.pro
fea7d586bcdc1036929f0ae7bd3e8443
e23c6bc8461a964dd631d1700b57ddde41b5341b
55457 F20101111_AABEOJ schipper_j_Page_107.pro
62c428721006cc5c4cb1048a26c00aa2
07cbd5664dac506b1912ac3131ba69eab7260c00
54798 F20101111_AABENV schipper_j_Page_090.pro
551733a321367771824b87a7212589d4
5207377d4a5f2c1ec71e137db5dc7faf0f137ab9
49471 F20101111_AABEOK schipper_j_Page_108.pro
832dff5fadab51b91cba750fe4164e32
fd29fa53b38ea49853986bf3711831492cd84d69
56740 F20101111_AABENW schipper_j_Page_091.pro
2ee39764954a4c2ac96545fa90e7f8c3
d72a03b60e78c4012924f5e77695950f92bdeaa9
63521 F20101111_AABEOL schipper_j_Page_109.pro
6c41d5505a70e5c3fba993a406d83a8b
4b4dad1b89d0a81380da62ae17756171341f2adc
46645 F20101111_AABENX schipper_j_Page_092.pro
70b4f218d4462cad44ef889ff2b2c97b
aee4c6864e41d386841bf9a5fae44aead46ccf3b
150 F20101111_AABEPA schipper_j_Page_003.txt
cc89c4516b3284bddabd7627b4e1fb81
c421c63c1905e17625feec3007a86b19a221d304
62318 F20101111_AABEOM schipper_j_Page_110.pro
9e252d4df358f076c1b6485b9228259b
b8c5467ff145ad3ac0dff58dd4b2b14861442e7b
54012 F20101111_AABENY schipper_j_Page_093.pro
8e2415228fa45a08ea690473d91e42b7
c010fcee3fb9612deef4d90fda4aa0f370714e34
772 F20101111_AABEPB schipper_j_Page_004.txt
f88d4a9664f27d3cbd2a5e323c705a80
1a17845c592661f5732217fd35e81d057b20d544
53341 F20101111_AABEON schipper_j_Page_111.pro
a520965f9fdb97c03d38732c138aa5da
3f1b24b4bc6fba76a5e894264cd28474de466fcf
55636 F20101111_AABENZ schipper_j_Page_095.pro
718b76154ad13bfd3aa7478255f05303
bfbb58256a192939982f90b301f534b039e8914c
3801 F20101111_AABEPC schipper_j_Page_005.txt
022fe80669408c9a7f9f9fe0493f4f48
2184b587cb44914f629cc5294f851acee6128cce
49801 F20101111_AABEOO schipper_j_Page_112.pro
16707b7e25979e1d0822c7311ad81be3
e8de20ee91f3b63e36fe2e39495a437f38775fa0
3922 F20101111_AABEPD schipper_j_Page_006.txt
96fd659443f00e3c85d45c9b61425744
830572e2d89e172e6dc12e41a5696326a6c75947
2721 F20101111_AABEPE schipper_j_Page_007.txt
e12b9dc014d9344370a7b0beed38f121
ebef39bef643be53159f54433bb72a1d3159ab30
56591 F20101111_AABEOP schipper_j_Page_115.pro
4c2584146bc36abfd5eed76df602921a
26fc30f164b430032988190e282e11bb317e0695
754 F20101111_AABEPF schipper_j_Page_008.txt
3fd83168eea2033572af07a42ff01f16
076330a1f2078d39fef2a01b19f39ebdf3275909
55768 F20101111_AABEOQ schipper_j_Page_116.pro
4424d330fbaac76a9103b4e461fa3946
0a10f224ad17a74352ae6f07cd843ddeac5e2408
2107 F20101111_AABEPG schipper_j_Page_009.txt
1fede698296f544e7efcf25e968c20fe
649768822672d54c5d2bb0e1e87ebb0ea636a87d
29618 F20101111_AABEOR schipper_j_Page_117.pro
d24d707752510fbd75058ee004991656
614c9d7806dcb82e4e1164ebcc674294fbdcc455
1233 F20101111_AABEPH schipper_j_Page_010.txt
a6bd9726e92b9f0d017c7fa8dca37a70
540ee614f4577c3d292b8576703f3e450c722174
77085 F20101111_AABEOS schipper_j_Page_118.pro
60c8affba13dfc4d582805c990633fae
6e47bcf28555caee6020d4cc28b1831cff70d8f7
1981 F20101111_AABEPI schipper_j_Page_011.txt
a10a6387e108190e324155f5c340c3f6
abd836235ba27bec989e37fa756cefb785524828
64136 F20101111_AABEOT schipper_j_Page_119.pro
bba45fbc9a42cee8880487ab039d826c
ea3e4d646e88c9b1d89a02fd8b007ff0a688e8f6
1166 F20101111_AABEPJ schipper_j_Page_012.txt
095ad1700c377e9619ae8589a6564dc6
9c0f99d70d3df5df04d0335d3da5d4367a0c0dbd
65817 F20101111_AABEOU schipper_j_Page_120.pro
b98aa49feb01b6c09448465a119b2c2f
61288e586c61bb64d8f72c5ec7f505cc922355f9
2044 F20101111_AABEPK schipper_j_Page_013.txt
878be388062f4bdbaeb68e9b80a4ca9b
979a0473fc21ae772c0b77173dff8a92703c85c5
67756 F20101111_AABEOV schipper_j_Page_121.pro
ab371d8f29ac14c296a07e425eac4fbe
079582668e4f42fd707eb77dbc2f9a6742b53bdd
2121 F20101111_AABEPL schipper_j_Page_014.txt
0eec87940e40ce4f4b590f7713d3581b
17d0f186eb1a521434265c58f2ca94d4e5504930
71632 F20101111_AABEOW schipper_j_Page_122.pro
db4a0bf60163406993316c8de1ccde7b
eb61a9e48fa5aac6b2c8d65a3692e15d33b29798
2139 F20101111_AABEQA schipper_j_Page_030.txt
a6f7eb47968c832cddf983ae0fe90ed9
cdcd544ca3947634c54a196d3ad6f054e8634be8
2109 F20101111_AABEPM schipper_j_Page_015.txt
ce02c56c8135ed2765a8e9e465ec9d6b
54d9616350e5e9390ff7b5636d330eeac09378df
24988 F20101111_AABEOX schipper_j_Page_124.pro
485a95340738d2fca91eb946ca419dc2
168439bd3497df6f8efe7c36e663bd657bbca6d5
2123 F20101111_AABEQB schipper_j_Page_031.txt
efa81eab8bffe3ac1c1235d01d24085d
bfd680428979179b7c12e27154349261ad83aa62
2170 F20101111_AABEPN schipper_j_Page_016.txt
cb3d4e555bc34c81af589bd7cf96284f
e356db92bd18dc3ba9fbaf8b05b2d2eecdefe58c
492 F20101111_AABEOY schipper_j_Page_001.txt
2bc429f2d2e9ef7445b59c9eecc6a8f0
f20bf6248399aa610f0d5c485ca1fc3786924c4c
2260 F20101111_AABEQC schipper_j_Page_032.txt
a2d0bd592f713dae2f72fee28efd7849
f600daa8542b29b024ab47aff890c7861a28dc5e
2174 F20101111_AABEPO schipper_j_Page_017.txt
9059df628638650e8311f8f14982956e
b79fbcdfc284f64feb5e57e0bb63ddb89e69e869
96 F20101111_AABEOZ schipper_j_Page_002.txt
0be29783b4c25c64802b8d1a70cf9367
c375c66e38cdee43170fe78cccb26ea4648ccaa3
2257 F20101111_AABEQD schipper_j_Page_033.txt
470a8a092fea99e71305f5fe1a4cfc02
be89713384d8e9568e8608ff5cd1ff192ec00082
2047 F20101111_AABEPP schipper_j_Page_018.txt
7e4fdfd0b23603958b3275979396241c
5caa12c7d1542c54180985a54d6c5ab2266c04b2
2084 F20101111_AABEQE schipper_j_Page_034.txt
3afefb0d37e5fc1b129005fe04ae2d6d
881f0011cc977effbdd07b975fbc219d97270a9e
1719 F20101111_AABEQF schipper_j_Page_036.txt
84fc31251a4c80e7535971fe161ea4a9
bc400cb90a2a4f6c58557a96c986c22ea1cf1893
1976 F20101111_AABEPQ schipper_j_Page_019.txt
b70b7cbc721e50e0ef1a4f6f1fbf588d
8261f15d13edd14ed2d3a6924dfdba22379ea52a
2018 F20101111_AABEQG schipper_j_Page_037.txt
52acc720dd1ca51cac19aff1d374b140
c1b644020221b5d375ac4af21e242ed128203b35
879 F20101111_AABEPR schipper_j_Page_020.txt
68a5d344153f54b110b0d4a062e18578
abc7aa8cc91395a2e3d300f785b550ff8414f5ba
2239 F20101111_AABEQH schipper_j_Page_038.txt
1d8282b563f4f848f517eb76497b4dbb
36dd21fe9739825ef00759a4d9cf7505a2539851
1980 F20101111_AABEPS schipper_j_Page_021.txt
0a68d4222ebd0ea01981b6534683418c
2acc425c38338fd4d7da4d3261d6961aca8610a0
2031 F20101111_AABEQI schipper_j_Page_039.txt
099281218edd8c0ad02a304f55b154a3
f40c84a80983031ae0e9b18a7887cb55de6030c4
F20101111_AABEPT schipper_j_Page_023.txt
7acf3d2fa66afea08608d31bf5226f89
6b6803758acc32c45e1f173a0fae470b2df60868
2127 F20101111_AABEQJ schipper_j_Page_040.txt
c986945faac1909d13ebbeeda1b7661f
7771f41746b28be5d10697d2672894727a041896
2144 F20101111_AABEPU schipper_j_Page_024.txt
d5b2ab9e4a71f4baf3514498df7210ff
0ddb01adab99401367bcfff881eb14ba7db45395
2311 F20101111_AABEQK schipper_j_Page_041.txt
806a968f745036f55c1630cd327ea9cb
cb26a5b110554e930180fdca2f4fb1114ba13bd6
1891 F20101111_AABEPV schipper_j_Page_025.txt
d36ae28b3080c3a284e544bd19dc2ee8
b0bdf72e266d8ac9701128a79ebe930fe64ea223
2032 F20101111_AABEPW schipper_j_Page_026.txt
d16cbd50c9f69d0cd43478b5ed3a51c7
13321ae5759daaf8fc654b4a1a7f62714456c259
2165 F20101111_AABEQL schipper_j_Page_042.txt
394c14e46acac34e4564dd09f5fb3622
3ff206bd7bbf76e5e9fe2e8187617fd0382bd0d0
2080 F20101111_AABEPX schipper_j_Page_027.txt
7a0617fa43a654b6fb1e7e4665624f6a
803a06757a0aa7b0eb43f12b42b0f9aefbca071d
1477 F20101111_AABERA schipper_j_Page_058.txt
480c8bdda2af3bf06b049ce9c73a761b
aae4a50202e99da3096f80453e20aefbb062fdf6
2184 F20101111_AABEQM schipper_j_Page_043.txt
fdeed31634995215130ab9b9f7710aac
99f4d2f4f2c1f48a50256931183b8a1962679cd3
816 F20101111_AABEPY schipper_j_Page_028.txt
d589c39a0dc6ea5962d2073acf9b3500
7a130d87f21d1a5ea37556848c408f43765a7030
952 F20101111_AABERB schipper_j_Page_059.txt
aa9611057904d21e573ed348b94937fc
e2a6db01bf97b3bad6f4bc9613c48c7583a1772f
2029 F20101111_AABEQN schipper_j_Page_044.txt
82dd8715b54bd9cbf577ae9f76117587
77d74dadd6ffa1448a22263e563663c83a8438f1
2033 F20101111_AABEPZ schipper_j_Page_029.txt
ce667a01719db3d4dcd09fe557bf7758
fea3ac28f1cd500a9ef8051f9b0e14d29fd871c2
1988 F20101111_AABERC schipper_j_Page_060.txt
576e4e66062f02f3818045cc4afea6b4
49c7425d245f31a2f76ac57c2d4dc9e19a18418d
2201 F20101111_AABEQO schipper_j_Page_046.txt
9e6fce5cb229e27dbcc9768e2068f762
afba95119779e6ba179987c1e0d2d95667784e64
F20101111_AABERD schipper_j_Page_061.txt
f88a82908990415237be443a7e220ce0
97e60392c0280aeb8737fc60fb762b358c12b379
2226 F20101111_AABEQP schipper_j_Page_047.txt
117ebb4c7af6f5f5a833bc5b875ce6cd
e0ccad88c2acbf8462d354a03c9a88408629e813
F20101111_AABERE schipper_j_Page_062.txt
e7a962a9633965c6d5f4cbaf8bdc026d
0f18bdf73a25dcb751d0e147bbf428f6c4d8d100
1618 F20101111_AABEQQ schipper_j_Page_048.txt
ee0f6ce24eda433c7aa50c9ce0871ff3
e7eda6a4e4627cf99f60b311c9421d84be0de5f7
2652 F20101111_AABERF schipper_j_Page_064.txt
a9251faf4f16b32017ddbafd004b6766
b967b208860346168f481623d80fd989a7c09130
1986 F20101111_AABERG schipper_j_Page_065.txt
1dab11ff1da803ea8a415b1ad902f9d9
9f24dcd4bef38b232fb2dc018374e1091879fb07
2068 F20101111_AABEQR schipper_j_Page_049.txt
557ce06d96b9bf9203a7fef23fcd5118
6c59ba7e740887d79f5fa24b37ad4fe298371f9b
1973 F20101111_AABERH schipper_j_Page_066.txt
0122f50a74b40e3d012761d0ef428b6c
e7a6dad6ee74af37edb3a8b958fd9117b082e611
2131 F20101111_AABEQS schipper_j_Page_050.txt
a5d021b861f0bf0fe2ec729365e9aef6
ea14a222ec2af2aa04e655e3fb273d4e15fc4263
2098 F20101111_AABERI schipper_j_Page_067.txt
5da97debce20bc1758371c34a3a7856b
b4190ec385428e4fbf4a7f27a14a4f0b18f7ceb1
F20101111_AABEQT schipper_j_Page_051.txt
db35777c9041768858a167eb72b201dc
8e5e2f0890108cb93f228a6868e9e485076439bf
F20101111_AABERJ schipper_j_Page_068.txt
7ecfefcfe1118f87c636fd9c592fbf6a
9dd9fcbc5096a97ae8f6e9e0b4f3db478d78dcc5
1690 F20101111_AABEQU schipper_j_Page_052.txt
357fca26d63f3cdbb576663c81020d77
60042beecc939e74a71ba5faf67139f7de871b82
2140 F20101111_AABERK schipper_j_Page_069.txt
e0f401f6f1ee3ef0cd6beaf41b47e2af
edef8437ef809bb2c177fa4d06d1b4a6ddc3abf7
2007 F20101111_AABEQV schipper_j_Page_053.txt
29526b45ef7056300b4156d35b9e126f
c6afb78749c8fb1581ebe403ef9525ed0c9ca210
2197 F20101111_AABERL schipper_j_Page_070.txt
3d91de5ad21c2369c77742952c1eff2e
8ea4a6f8ad63ae64fbc4a067bad01789ba6311e9
2088 F20101111_AABEQW schipper_j_Page_054.txt
08df0a46bd92718e45aa382850f0c434
130ee9807af48ed1f1d8d4390486dff5eb2dd14d
2152 F20101111_AABESA schipper_j_Page_089.txt
9c21dfd8e9ca2912c4f1847ff4871ce3
0bd76fa3639ccf4e726def708a56cd153045f090
2115 F20101111_AABERM schipper_j_Page_071.txt
1f2c8f4ab73eae283349363a924a621b
d4e19127b9f1955af160f35080632ce3c08983a3
1727 F20101111_AABEQX schipper_j_Page_055.txt
130a5cbc0b87b656f87871cfa7e8ef6f
dc51abf3425aaf2d1c53f969bd2e05fe399330ba
2151 F20101111_AABESB schipper_j_Page_090.txt
81849f142d90eff9fbc145650044fb6c
a9a79bb08f17cddce747ab230c949248df6de7c4
F20101111_AABERN schipper_j_Page_073.txt
30042539e0536a3ea0aea278265aa091
738f0e138e1790ecd5d5ef2e847a9a0865b4d36f
2118 F20101111_AABEQY schipper_j_Page_056.txt
2c2327bbc61107b58cbaa65e87987fdb
0f91af5fdeb7340173536e35e8913105ca05ff4e
2220 F20101111_AABESC schipper_j_Page_091.txt
b6e438f9c005406c8daefba4fd4f9bad
0d826f7172fbafd4a0bd75351e952fbbd7d627ef
2212 F20101111_AABERO schipper_j_Page_074.txt
11fff73cb731b79faaa754d16f21e01d
28750087cd4b07b6fb44beb0251928dee2264421
2200 F20101111_AABEQZ schipper_j_Page_057.txt
beca81a76231b382bf6de3008318acad
ced31ce56e6bacc3b6e1ddf8aa7cca6e406b0b3a
2193 F20101111_AABESD schipper_j_Page_092.txt
afe340140a590d663ca35b28a33043c1
4d7e2b27f25fb5b8a5ab6850f2222b7d07430f96
2222 F20101111_AABERP schipper_j_Page_075.txt
f89131f120cde1659e0593cc9ee52a4a
2d348830f81af75f72b0a8042c63f453360fa151
1596 F20101111_AABESE schipper_j_Page_096.txt
5ee3812f6a09870b22be3723eeb53538
e53f1972e78e960cc47b71efc047c246e5931467
1936 F20101111_AABERQ schipper_j_Page_076.txt
3983498b38f069a7d3a1f51bc4d12074
88fa2622455f1b21a884bdc2119341526c58e547
1869 F20101111_AABESF schipper_j_Page_097.txt
ed66f9bb1b941d9e25584e1843eff5a8
73a4bb7b07ab9ca68c984f1dc81ba4e02efe44bd
F20101111_AABERR schipper_j_Page_077.txt
b17baba40c16fdf22ff62df383802ec8
cdd3f8d4156a92d20bfd526d0b0ce461ee4755b2
2277 F20101111_AABESG schipper_j_Page_098.txt
22811df61ebf9d026d90e43545648196
82a2185eb591c5bdc53e208f86e73536c45e81b1
2150 F20101111_AABESH schipper_j_Page_099.txt
e6f0c006f30bfa113d83ac73fe53fcd5
0cf17f1d0bd5011e0c83f66f096e04ec16b3c76f
F20101111_AABERS schipper_j_Page_078.txt
30044b98c3a5dec3897b08a08ff44b3d
b99be6529ab85fa1119293263190f5b96f71ec4c
2172 F20101111_AABESI schipper_j_Page_101.txt
5148a59512dfca43359130fc579009e0
1f5327ecbbf488b6b35e8049e43696646b714b38
2057 F20101111_AABERT schipper_j_Page_079.txt
c99f2f0d9242a839b8a6f9c81877a4c5
a33662b04dc5508ac0d06ee120ed5f55d19e9734
2293 F20101111_AABESJ schipper_j_Page_102.txt
cf2442906dfaf467c0af3a1d2aad4380
368c3278f70f25ec788d1d603d681aaa59231e27
2130 F20101111_AABERU schipper_j_Page_082.txt
5205f5641b74ed114ebbe315ed1ae9e8
f0258af634f43edc260f07a2e738b69201d048a8
2312 F20101111_AABESK schipper_j_Page_103.txt
18c5c18baf981df415f9254cd415c9a7
afdac2af7725d4951c2eae28d3871604fffbef06
2256 F20101111_AABERV schipper_j_Page_083.txt
87be9ffa52c0f884ab879d717569119b
00e647097f551c59e564d53f9938be1c7d77b6fe
2409 F20101111_AABESL schipper_j_Page_104.txt
a2147b8510412cb8929e4aea9112987b
b3158e6276a0ece2d7aed8b1201cda02eb5cd746
F20101111_AABERW schipper_j_Page_084.txt
8346213594c1c43ad5677292b7c7f452
7231c9f9b7868b2d75a73c9819ff514a9bc5325a
2082 F20101111_AABESM schipper_j_Page_105.txt
4910729c71a7a2d5c802125e4a163df3
e2d6424da8978fec509c09ccbea277115ca8fe04
2021 F20101111_AABERX schipper_j_Page_085.txt
7a3b2eb54707d341c1a4eab44d1916e7
8263f0d11ef1737e7b7d04551446da287f10feb2
2741 F20101111_AABETA schipper_j_Page_121.txt
880b050f17e6db4808ed0d48c14033d8
fc1eb1ac02bc5218552043426cf09a2f621f155e
2659 F20101111_AABESN schipper_j_Page_106.txt
8866e845fb7155619aa5c4e65d64bcb0
b1f4faad47b07e2d8d7347ffe4ffa70e68a96f8f
2305 F20101111_AABERY schipper_j_Page_087.txt
de603e499e923b8d1da62cb14aa5e886
126a1c73ea9580d167c8742b92b17f68457b64f7
2882 F20101111_AABETB schipper_j_Page_122.txt
5f0b7680834a4ce438f4ca3babd971d3
ee4ddfdfa1eae2a97d7b0f51cf882f7d1505f469
2176 F20101111_AABESO schipper_j_Page_107.txt
39e5856244ccec82907fd82726c7fd25
ee56f31450254c1f8d799ac9da805b8920317eba
2180 F20101111_AABERZ schipper_j_Page_088.txt
e6d96591620a61e014f8e989cec90b43
2a4d70fe315c8579e40a94e637ec1520f66d1211
1595 F20101111_AABETC schipper_j_Page_123.txt
ec9160a59f0b6ecd4e041d0921ddeb13
ffc9691ba9267dc66d7332ac1a78d529e1db1621
2826 F20101111_AABESP schipper_j_Page_109.txt
b86d9f5917111c6df7a01b25d41ce9da
354aedbdc2d51230328c9f460f9dc45958f24e8b
1716 F20101111_AABETD schipper_j_Page_001thm.jpg
d82b152c629bec63b17a4c5f3d37db01
34524551a58b1ec0966560d84dbd910f392e6b1e
2780 F20101111_AABESQ schipper_j_Page_110.txt
574f39ebd8d8cc646ee8633432219ef0
6cd5e1ab7eec56b9376c3ec69784eeb4f0758af1
494998 F20101111_AABETE schipper_j.pdf
0dcdcbaf2b9f15ddce9348ed17067a19
b4dcd83a3f2e49b29d71b4b23fe0c660e625b67a
2094 F20101111_AABESR schipper_j_Page_111.txt
8feabfa26761223095fe042d011fe8ab
6242207acfa7677e7c41140f7bed955eadb555f1
29069 F20101111_AABETF schipper_j_Page_070.QC.jpg
1bc57517dab5b68e3a96875016f888f4
8f27d49743afc319808e24837cb7e3664701d5cd
1966 F20101111_AABESS schipper_j_Page_112.txt
7bd492305c99878feaa5acbe3e6ca1d5
9725d18ca0ed50b203be66673225b5eb9e908827
F20101111_AABETG schipper_j_Page_074thm.jpg
c6756f4404298eec010d1cb5025b9eca
b949de240cb9ff6e60978be0e314cff23d7a0ee6
6790 F20101111_AABETH schipper_j_Page_022thm.jpg
18db385ad1e50857b05c5f2534361921
8c179f2c96ce70a28c566d0b5bd7e6e916ccdd2f
2879 F20101111_AABEST schipper_j_Page_113.txt
259543f219011a8612f413fa516d1c24
bc39a57c988d9ac62bac4de698d879236379b178
27775 F20101111_AABETI schipper_j_Page_104.QC.jpg
c88bb540f368735398b5f89f97f43581
bd0352f78d17df243de1f60594246eabb6802800
2034 F20101111_AABESU schipper_j_Page_114.txt
fe216b1ec5a64f35b51e9d20d8a24942
da84a62ba15b2209e66626220b9110c79e4fd790
6030 F20101111_AABETJ schipper_j_Page_048thm.jpg
71455afe503345af44af485791f40145
3e861202d317b6442c360e0e6082aab2581cd98a
2250 F20101111_AABESV schipper_j_Page_115.txt
1ee861d239043d129e3f63a43d14010c
04bef735604e7b0a230e44317015c6f3785c909d
27746 F20101111_AABETK schipper_j_Page_081.QC.jpg
f3058d029e347349ff8dc2dd37115752
97ff08947f89af222c4de7d716a00f29f11187f1
F20101111_AABESW schipper_j_Page_116.txt
dbba639c17d9a06ecd161d2323bb2e5e
4fc8bd6b93004e9ebde765ce9336c6244f92a993
6411 F20101111_AABETL schipper_j_Page_019thm.jpg
9e68bb73817e74ff452830d527612440
2faca94b86ddf06995049fd5fc43e91c21cec42a
1211 F20101111_AABESX schipper_j_Page_117.txt
0823e861269b147ebd06c9948e3083e6
f324ecb643b3e7b17744e368bc559e7a6b8689ce
30910 F20101111_AABEUA schipper_j_Page_121.QC.jpg
3b203b7e8a8639606bf8da1ef44a41d0
81e0388f21cbfb1035b6fdb1ef7d4924c06b1b2c
16153 F20101111_AABETM schipper_j_Page_028.QC.jpg
44cd390e43993f4486d6380d675e64af
59e8741772587075eb084f4a853d493d87a15ac8
2602 F20101111_AABESY schipper_j_Page_119.txt
a8bf61fa1b246c2d9ccd0c2003d41121
4e71c07e3a6bc569143ac8d17ec1d33762636739
5189 F20101111_AABEUB schipper_j_Page_062thm.jpg
7e9cdebd91eb850fe8e8426d96e8923a
335009de7d6c2e07ac587e8dff3d2a16be25f661
23764 F20101111_AABETN schipper_j_Page_063.QC.jpg
25179210994ae3096e238a3fd86e776c
af59b731a064bc0aaa0d3a80866d95c7fc57b716
F20101111_AABESZ schipper_j_Page_120.txt
a613df9c55663aa0593dec053ba33516
d398db3e721374c398479ab8437cec7f5b0660ff
28631 F20101111_AABEUC schipper_j_Page_116.QC.jpg
758a73fb05e67d9af6ae5dba7fad5298
7f016b6b6c92747c9342b4109879785e7d2443cc
24590 F20101111_AABETO schipper_j_Page_044.QC.jpg
409f34c6dde61f0c5e8d72d1f8e73fef
ed846f93dbc6f4e5a30c2ba427e0cfbfd8257dbd
7231 F20101111_AABEUD schipper_j_Page_120thm.jpg
e06c678dd5908cb0e6f353f0692a7181
585ded5400e81dcf878589c434da8d8afbdb28e3
6002 F20101111_AABETP schipper_j_Page_067thm.jpg
7113fc57dd1250024f2af113ecde7758
e7d65e2b3a545ce46c981676462349e7678ed64e
6463 F20101111_AABEUE schipper_j_Page_098thm.jpg
204f33739ba166e3cdce087623f94c26
f72f2b0765baac80f01f40902ef6a35583dcad92
27602 F20101111_AABETQ schipper_j_Page_101.QC.jpg
2922c8331799e7fec2fb538238efaea8
22b2d22e0a555dede7639ca57dfa69491ab2fd98
25648 F20101111_AABEUF schipper_j_Page_052.QC.jpg
68f744e042409a82734e92a45542c51e
0bf9776476404569af68db6ab715c0d0279f7b3b
6814 F20101111_AABETR schipper_j_Page_115thm.jpg
2f5a19062d9a590e6542abf66ed8be7b
a84bb38a32b836eda0578e3ff9cb998d41c7d2ef
31114 F20101111_AABFAA schipper_j_Page_089.QC.jpg
9f8476c10f25e0284b26a71a682b58da
79617679bfd6198cc8d3f83caae12390c43906c6
28324 F20101111_AABEUG schipper_j_Page_109.QC.jpg
e02693bf4a150e8e6bc8143c8c90a2a1
75a50082a2c857bc61cd599bf85c4eb37639f707
3233 F20101111_AABETS schipper_j_Page_059thm.jpg
45c5c0baf2f3d145b367cdae15449a88
fe284f36beec80f9f2238633ff0b20843ba55d26
F20101111_AABFAB schipper_j_Page_090thm.jpg
d3d5fcd80d12a15db9a136945e733ad9
87c443780233307f88abc858876a1f9a5b01cb81
6576 F20101111_AABEUH schipper_j_Page_119thm.jpg
b7d0236acf263f84d5eee608c00b8cc7
35d9f54a43272a09aa19dc3e474e5caec2ed2b27
28199 F20101111_AABETT schipper_j_Page_016.QC.jpg
7b4d5396ea065773dd77cd4de18b6649
b08bc74796925c67c375859b7c0471850fa6aa6f
5879 F20101111_AABFAC schipper_j_Page_097thm.jpg
9a9aa5747bb76812b5b3c45b694ca6ed
a9986bce487d18e82eea015486035849d8671be1
24637 F20101111_AABEUI schipper_j_Page_085.QC.jpg
d9dd8a3d390bf7f2c641d40508e75140
37c315e1f9f4a8d35ee6f2c949c0c3d44dd54268
6455 F20101111_AABFAD schipper_j_Page_100thm.jpg
e15575b3b17422171865edc4f3c498be
66ee8634ca4ed2e74ddf59a1003d9c20b0713fa0
5543 F20101111_AABEUJ schipper_j_Page_030thm.jpg
abdb8ecdd0b4ef7a3438a8c3b4fb71f7
9d2690a62751159b4a56b956342a6309dbd0e07d
27466 F20101111_AABETU schipper_j_Page_084.QC.jpg
354dc105e6fc19299404baee09428994
0a6ec4a85c5c791fabc2b2e99cd84cb8030fb68a
28096 F20101111_AABFAE schipper_j_Page_102.QC.jpg
140f1c16ebecff66b2320a5a1440bd13
1c5cde10e75ce68aeccd97635b5f12974e226384
7355 F20101111_AABEUK schipper_j_Page_122thm.jpg
e6efeef1df8381cb514c8466bf08e332
24c860110c5eb639cce4b5cfe6568c8860bcb599
6901 F20101111_AABETV schipper_j_Page_038thm.jpg
9d8980c6fbedaca273431bac1d0cc88d
d5597fbb6f430e64f2a1a9aabdd330d3d3c11b24
6659 F20101111_AABFAF schipper_j_Page_102thm.jpg
5ddfccf5f5a48516c4efddc61938f9ed
2b022dfb8615be549a8002857e942dbd2aff3a6f
24846 F20101111_AABEUL schipper_j_Page_092.QC.jpg
d1f95d6d49bd63dd10b10285de11f462
2cfe2685531553aa1e21c645d6ef51e06c2aad8e
5605 F20101111_AABETW schipper_j_Page_076thm.jpg
37045fce0bfc4979bcf63da66d654c59
2277d7e80b1b9e490d15bf6cb87d436f864d7e4e
28106 F20101111_AABFAG schipper_j_Page_107.QC.jpg
d54e0b663c24797eb4b202b8386ffe58
06dd000661baaa6c327f7b3e3b6fb4c09e9d5245
6542 F20101111_AABEVA schipper_j_Page_071thm.jpg
1dd42914460c465b4317ddb52f60e478
cf85619fd234c7f3ca4747077dcb64073b2ca334
15487 F20101111_AABEUM schipper_j_Page_117.QC.jpg
2f869174fa2733b6fa1f67056cf94808
21c639865ed6d2cdaf2050f1af82b2bc042fb308
26558 F20101111_AABETX schipper_j_Page_079.QC.jpg
c26520d750bed57bcca36727453e642b
bb6c47b05c332675a01ae4ee84ddd45edf5cde5f
6896 F20101111_AABFAH schipper_j_Page_109thm.jpg
99d8240d0f0b0e8ee3a0cc18b4bad82a
639b9c39ef84dd7c65d2ad9d366bcbbb7a547482
28610 F20101111_AABEVB schipper_j_Page_022.QC.jpg
c61fe9eebffd93f62bbdfd14c8387a1d
a7f3a92956a495d7dfa86bc5d8d3d5fc77eb4c45
6209 F20101111_AABEUN schipper_j_Page_039thm.jpg
49d93a4dd27bc192e92dc9f2655d35db
48f852c5743d4a540ade2077fb41a1e2b9be23ca
6709 F20101111_AABETY schipper_j_Page_081thm.jpg
1b94b73036d65ae5545fd738bd674e5c
5771f8950a29a32bc9372607af9e5c06dab2a0f6
28244 F20101111_AABFAI schipper_j_Page_110.QC.jpg
a4e73bebc8e21edd0590667fe4ff5a15
035dcb9ea9b9683f038026a207aaa1b98932d717
4548 F20101111_AABEVC schipper_j_Page_020thm.jpg
ebe578ec94de47fcf2c9b59d2bfd7041
3c995f4cca667ffefca87e80a0bf4a056132eb85
26339 F20101111_AABEUO schipper_j_Page_112.QC.jpg
256d6bc56bc045e4acc52d86965b4a5d
f7d27dc5cf23459f6aa4ce8c43eba8f90aced430
31794 F20101111_AABETZ schipper_j_Page_122.QC.jpg
577030ba10cc45435a02767fa90686a8
5ef98c045cc95fe5d6943b990b2bc33b579fe79d
33434 F20101111_AABFAJ schipper_j_Page_118.QC.jpg
78d86ca889d3b6bbb7661d7f8550482c
77637a7a2646394ae5cee2b0c13e51caeac99d7d
26641 F20101111_AABEVD schipper_j_Page_100.QC.jpg
a1fcbb214d0e038a6ec69962fe4d3365
615cd5eee91effbafcd6635107d41ca6c3e80b95
6288 F20101111_AABEUP schipper_j_Page_013thm.jpg
ffe97f664fc44ddbfc483fe80fd419c0
eeb8dca34c5929086cded6de03fc04642e8d8e25
18760 F20101111_AABFAK schipper_j_Page_123.QC.jpg
74e897f6cb4598e72d3cfe401c3ce892
ecd58db6b9bfcf632b30745e5acb07ab9366f1b2
27462 F20101111_AABEVE schipper_j_Page_046.QC.jpg
adaf3a5942a5251c2a367a948930f504
9888df8ccf876370e24fdb248f78f99adbded9fa
25743 F20101111_AABEUQ schipper_j_Page_082.QC.jpg
1134253ca07421981357648a17677da5
a0b9750aa8975d7775b24224abfe6ba85ace53b4
3483 F20101111_AABFAL schipper_j_Page_124thm.jpg
0908a5af6fec511b74c8e81d7e6c4f79
b6410d559bcdd82a0c085398e90594601e9a86d6
25605 F20101111_AABEVF schipper_j_Page_099.QC.jpg
023df40ba411959305917bc4d2ad5f94
c4186289abdb4d3b64a5a73578ab9fb51432d7b9
4436 F20101111_AABEUR schipper_j_Page_028thm.jpg
4f3aa5a7ce12e8b76c2d530b6467ae8d
fe3e6e42f7aa4fe898836e34da566431c314f920
6033 F20101111_AABEVG schipper_j_Page_065thm.jpg
18efdac757c500b97075ccee51aa16e7
1a4f0beff6afd4ce33f42c7559c8f71fdcf17afb
6606 F20101111_AABEUS schipper_j_Page_078thm.jpg
b3aa33ceee59b53343ea95d2a5728540
56b35238a2075d2c98e2f5131b3e2d2a1f078dd6
24290 F20101111_AABEVH schipper_j_Page_061.QC.jpg
76c7f533fac7b053c6eca63712d2bccc
215ad26dd51414c5589f975923b72e6faafbbee7
5722 F20101111_AABEUT schipper_j_Page_075thm.jpg
b72c0d310acc8f72311ecfe9cc44374d
7ccc26f22ca7895997be12391ec4818ab0685fe3
20744 F20101111_AABEUU schipper_j_Page_062.QC.jpg
f67a7b263673e9364b084c304910ff96
5094e20704b1f65dcf4fa3c1e54a1ab67a8bc2c8
6433 F20101111_AABEVI schipper_j_Page_018thm.jpg
2dae7f11dabc5db6713adc50ef9ef5b9
01cc7741998009b9acca119e75dc7006213e01d5







A KNOWLEDGE-BASED TOXICOLOGY CONSULTANT FOR DIAGNOSING MULTIPLE
DISORDERS





















By

JOEL DANIEL SCHIPPER


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2008


































2008 Joel Daniel Schipper



































For Alice, my beautiful wife.
May we grow ever nearer as the years go by.









ACKNOWLEDGMENTS

I am indebted and grateful to Dr. Jay L. Schauben, Dausear "Dar" McRae, and the Florida

Poison Information Center in Jacksonville for their willingness to provide data, technical

support, and consultation. Without them my research would not have been possible. I thank Dr.

A. Antonio Arroyo and Dr. Douglas D. Dankel II for their guidance and encouragement

throughout my doctoral studies. I also thank my parents and brothers, Tom and James, for their

support and prayers throughout my academic career. I am grateful to Alice, my beautiful wife,

for standing by me in all I do with constant, unwavering love. Above all, I give glory to God,

my creator, without whom I am nothing.









TABLE OF CONTENTS

page

A CK N O W LED G M EN T S ................................................................. ........... ............. .....

LIST OF TABLES ......... ........... ..............................................7

LIST OF FIGURES .................................. .. ..... ..... ................. .9

L IST O F A B B R E V IA T IO N S ......... ................. ....................................................................... 10

A B S T R A C T ................................ ............................................................ 1 1

CHAPTER

1 IN TR OD U CTION .......................................................................... .. ... .... 13

Applicability of Knowledge-Based Systems to Toxicology ...............................................13
System O overview ................................................... 15
Database Resources ................. ... ........... ..... .. ......... .........17
C o n clu sio n ................................18.............................

2 OVERVIEW OF KNOWLEDGE-BASED SYSTEMS AND DATA MINING ................... 19

K now ledge-B asked Sy stem s ......................................................................... ..................... 19
U utility and Structure .................. ......................................... .. ............ 19
R reasoning from E xam ples................................................................ .......................2 1
Data Mining and Knowledge Discovery in Databases....................................................22
Defining Data Mining and Knowledge Discovery ........................................................22
Seven Steps of D ata M inning ...................................... ........... .............................. 24
M ining D ata: W hat and H ow ? ........................................................................... .... ... 25
C o n c lu sio n ................... .......................................................... ................ 2 6

3 DESIGN APPROACHES TO KNOWLEDGE-BASED SYSTEMS ....................................27

R u le-B a sed S y stem s ........................................ ............................................ ................ .. 2 7
Forw ard Chaining ............... ............... ......... .......... ............... ... 30
Backward Chaining .................... .......................................32
Inference N etw works ................. .................................... ................ .. ............. 34
D decision T rees .............................................................................................................35
Certainty Factors ............................... ... ...... ... ................... 37
Case-Based Reasoning.......................... ........ .. ....... ..... ....... ...... 40
N earnest N neighbor A approaches ..................................................................... .....................42
B ay e s' R u le .................................... ..........................................4 5
Other Approaches to Knowledge-Based Systems ...................................... ............... 47
Fuzzy Logic ........................................................................... .............. 47
D em pster-Schafer ............................................................................................. ....... 49









R o u g h S e ts ................................................................................................................. 4 9
G genetic A lgorithm s ................................... .. .... ..... .. ............50
A artificial N eural N etw works ...................................................... ................................. 51
Modem Approaches for Diagnosing Multiple Disorders .................................................53
B ayesian B relief N etw works .................................................................. .... ...................54
S et C o v e rin g ........................................................................................................5 6
C o n clu sio n ................... ...................5...................8..........

4 MEDICAL MATHEMATICS AND RELEVANT KNOWLEDGE-BASED SYSTEMS ....60

M ed ical M ath em atics................................................................................................. .. 6 0
Probabilistic M easurem ents.................................................. ............................... 60
Diagnostic Scores ............................... ....................... ............... 66
Literature Review of Knowledge-Based Systems ............ .............................................67
H historical M medical Expert System s ........................................ ........................... 67
Expert Systems in Toxicology.................................... .. ......................69
Knowledge-Based Systems for the Diagnosis of Multiple Disorders.............................71
C o n clu sio n ................... ...................7...................6..........

5 DIAGNOSING SINGLE EXPOSURE CASES .............. ..............................................77

S o u rc e D a ta ....................................................................................................................... 7 7
Sy stem D design P principles ............................................................................. .................... 78
System Development .................. ......................................................8.. 81
System O operation and U ser Interface ......................................................................... ..... 86
Sy stem T testing and R results .......................................................................... ......... ........... 89
System Perform ance ............... .. ................................. ......... 99
C o n clu sio n ................... ...................1...................0.........0

6 DIAGNOSING MULTIPLE EXPOSURE CASES .................................. ...............101

Motivation for Diagnosing Multiple Exposures..... .................. ...............101
System A pproach.............. ........... ....................... ..... .... ...... ............... 102
Diagnosing Multiple Exposures using Solely Multiple Exposure Cases ...........................103
Diagnosing Multiple Exposures with Single Exposure Cases.........................................108
C onclu sions.......... .........................................................113
F future W ork ............. .. ............... ...................... ............................... 115

L IST O F R E F E R E N C E S ......... .. ............... ................. ..........................................................118

B IO G R A PH IC A L SK E T C H ......... ..... ............ .......................................... ........................... 124










6









LIST OF TABLES


Table page

2-1 Seven steps of data m inning ........................................................................ .................. 24

2-2 Types of patterns that can be mined ....................... ......... ...................26

3-1 Treatments required for each of Doug's cats................................. ...............28

3-2 Cat characteristics for identification ......... .............................................. ............... 29

3-3 System parameters for cat identification..................... ..... ........................... 29

3-4 Houses most similar to Alice's house ................ ....................................... 41

3-5 Characteristics of various sports balls.................................................................... 43

4-1 Contingency table ..................................... .. ... ... .. .................. 61

4-2 Experimental HIV testing extended contingency table .............................................. 63

4-3 HIV testing (1% chance of HIV) extended contingency table ................. ...............64

4-4 HIV testing (0.1% chance of HIV) extended contingency table .................................64

4-5 Diagnostic scores for acute appendicitis...................... .... .......................... 66

4-6 Final diagnosis score significance .................................. .....................................67

5-1 Accuracy by substance in 10% (M C = 25) ............................................. ............... 92

5-2 Accuracy by major and minor categories in 10% (MC = 25)....... ...... .......... .........92

5-3 Accuracy by major category in 10% (M C = 25) ........................................................... 92

5-4 A accuracy by substance in 10% (M CE = 10)........................................... .....................94

5-5 Accuracy by major and minor categories in 10% (MCE = 10) .............. ...................94

5-6 Accuracy by major category in 10% (MCE = 10).........................................................94

5-7 Accuracy by substance in 10 (M CE = 10)................. ......... ............................. 95

5-8 Accuracy by major and minor categories in 10 (MCE = 10) ................................. 95

5-9 Accuracy by major category in 10 (M CE = 10) ..................................... .................95

5-10 Accuracy in 10% with MC = 10 and MCE = 0 ...................................... ............... 98









5-11 Accuracy in 10% with MC = 10, MCE = 0, and 3+ CE's ..............................................99

6-1 Accuracy (varying MC) of system trained & tested on multiple exposures....................104

6-2 Accuracy (varying A) of system trained & tested on multiple exposures......................105

6-3 Accuracy comparison of various systems for multiple exposure diagnosis ....................106

6-4 Accuracy diagnosing primary contributors using single exposures .............................109

6-5 Accuracy diagnosing secondary contributors using single exposures ..........................110

6-6 Comparison of system accuracies when diagnosing single exposure cases ....................113









LIST OF FIGURES

Figure page

2-1 E expert system block diagram ............................................................................. ............20

3-1 Rule-based system block diagram .................................. .....................................28

3-2 Rules for identifying D oug's cats ............................................. ............................. 30

3-3 Inference network for Doug's cats based on rules in Figure 3-2.................................. 35

3-4 Exhaustive inference network for Doug's cats .......... ............................. .............35

3-5 D decision tree for identifying D oug's cats ........................................ ....... ............... 36

3-6 Vectors for sports balls plotted in 2-dimensional solution space ....................................43

3-7 Fuzzy logic graph for hum an heights........................................... .......................... 48

3-8 Typical artificial neural network with two hidden layers ............................................52

3-9 Example Bayesian belief network ......... ............................................................... 55

3-10 Set covering graph of relationships between disorders and symptoms ..........................58

5-1 U ser interface ......... ................................... ...........................87

5-2 R results table ......... .................................... ...........................89

5-3 A accuracy ratios by substance.......... ................. .. ....... ..................... ............... 96

5-4 Accuracy ratios by major and minor categories ........................................96

5-5 Accuracy ratios by major category ...................................................................97


















9









LIST OF ABBREVIATIONS

A A small positive parameter in the adjusted likelihood ratio equation

AAPCC American Association of Poison Control Centers

CE Clinical Effect

CF Certainty Factor

FAR False-Alarm Rate or False-Positive Rate

FN False Negative

FNR False-Negative Rate

FP False Positive

FPIC Florida Poison Information Center

FPR False-Positive Rate or False-Alarm Rate

LR Likelihood Ratio

MC Minimum Exposure Cases

MCE Minimum CE Occurrences

NPDS National Poison Data System

NPV Negative Predictive Value

PCC Poison Control Center

PDA Personal Digital Assistant

PPV Positive Predictive Value

TESS Toxic Exposure Surveillance System

TN True Negative

TNR True-Negative Rate or Specificity

TP True Positive

TPR True-Positive Rate or Sensitivity









Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

A KNOWLEDGE-BASED TOXICOLOGY CONSULTANT FOR DIAGNOSING MULTIPLE
DISORDERS

By

Joel Daniel Schipper

May 2008

Chair: A. Antonio Arroyo
Cochair: Douglas D. Dankel II
Major: Electrical and Computer Engineering

Every year, toxic exposures kill twelve hundred Americans. More than half of these deaths

are the result of exposures to multiple substances. In addition to being dangerous, multiple

exposures are particularly difficult to diagnose. At this time, no general solution exists for the

diagnosis of multiple disorders due to the non-linear interactions observed in such cases.

This dissertation presents the development of a prototype knowledge-based system for

diagnosing toxic exposures. The goal of the system is to generate differential diagnoses for

unknown exposure cases based on the clinical effects observed in patients. The system is not

meant to replace physicians, but, rather, to serve as a medical decision support system. Acting as

a consultant, the system provides access to case-based summary data that is normally

unavailable.

The system is automatically generated by applying data mining techniques to a database

supplied by the Florida Poison Information Center. For diagnosis, the system uses pre-test

probabilities and likelihood ratios-calculations commonly used throughout the medical

profession. To overcome certain shortcomings of likelihood ratios, the equation employed by

the system is adjusted to account for every possible outcome. Using the adjusted likelihood ratio









enables robust calculations while closely modeling the likelihood ratio that physicians know and

trust.

Trained and tested on single exposures, the system achieved an accuracy of 81.0% on cases

involving at least three clinical effects. Repeating the process for multiple exposures alone

resulted in a failure, at least partially due to insufficient data. However, training on various

combinations of single, double, and/or multiple exposures, the system achieved an accuracy of

86.9% when diagnosing the primary contributors for multiple exposure cases.

Although a solution for diagnosing multiple disorders remains elusive, the ability to

identify primary contributors is a significant contribution to addressing the problem. This system

is the first American diagnostic system for the field of clinical toxicology and its use of adjusted

likelihood ratios serves as a method to bridge the gap between intelligent systems and the

medical field. Furthermore, by automatically generating the system, this research addresses the

knowledge acquisition bottleneck that plagues traditional expert systems.









CHAPTER 1
INTRODUCTION

Toxicology is the study of poisons and their effects on living organisms. One of the most

prominent uses of toxicology for the benefit of mankind is the development of poison control

centers. Thousands of people call poison control centers daily for free consultation and

information regarding chemicals and drugs. In 2004, the American Association of Poison

Control Centers (AAPCC) consisted of 62 poison control centers serving all 50 of the United

States and handling more than 2.4 million reported human poison exposure cases (Watson et al.,

2004). The AAPCC has compiled a database containing the details of over 38.7 million human

poison exposure cases from the calls received and documented by its 62 poison control centers

(Watson et al., 2004). A medical database of this magnitude represents a great opportunity for

data mining and knowledge-based systems research. By tapping into the vast amount of data

contained in the AAPCC database, a knowledge-based system could use the information to help

diagnose and treat poison patients quickly and effectively.

Applicability of Knowledge-Based Systems to Toxicology

Knowledge-based systems should not be applied in every situation. In many cases,

conventional algorithms offer a more appropriate and effective solution to the problem.

However, the field of medicine inherently contains many traits that make it an ideal domain for

knowledge-based systems. On a daily basis, physicians must make decisions based on

experience using incomplete data. Knowledge-based systems also excel at solving problems

from uncertain data using heuristics. Additionally, the field of medicine is continually changing

as more knowledge is acquired and new technology becomes available. Likewise, a strength of

knowledge-based systems is adaptability in dynamic domains. Beyond the general obstacles









common to all fields of medicine, the field of toxicology itself faces three specific challenges for

which knowledge-based systems are well tailored.

The first challenge is making pertinent information available to physicians, emergency

medical services, and the public involved at the time of a poisoning. Toxicology is a narrow

specialization within the medical profession consisting of a small number of experts, called

toxicologists. To make the expertise of toxicologists available to the medical field at large, the

AAPCC offers direct consultation with toxicologists to physicians at hospitals around the

country. Physicians may call poison control centers for information on how to treat a drug

overdose or identify an unknown drug that a patient has ingested. In spite of the efforts of the

AAPCC, the limited number of toxicologists makes expertise in toxicology a scarce commodity.

Knowledge-based systems offer a solution to this scarcity. Creating a readily available system

that can aid physicians in diagnosis when experts are unavailable could be an invaluable asset in

saving lives.

A second challenge in toxicology is dealing with cases involving multiple substances. In

many cases, consultations are a simple matter for the toxicologist, consisting mainly of matching

signs and symptoms that are known to be directly associated with the mechanisms and behaviors

of one class of drug. Cases that toxicologists find difficult tend to consist of multiple unknown

drugs interacting to produce signs and symptoms that cannot be matched with any single

substance. If all substances had linear interactions, determining multiple unknown drugs by their

signs and symptoms would amount to identifying the drug combinations that, when summed

together, produce the observed results. Unfortunately, many drug interactions are non-linear.

Some drug combinations cause a dramatic increase in symptom severity, some mask symptoms

normally observed with one of the drugs, and some can cause symptoms that normally would not









appear with any of the drugs individually. In 2004, although only 8.6% the exposures reported

were multiple substance exposures, "50.6% of fatal cases involved 2 or more drugs or products"

(Watson et al., 2004, p. 593). Being able to address multiple exposures is an important concern

for saving lives. A knowledge-based system can aid in addressing multiple exposures by

effectively making the relevant information in the AAPCC database available to the toxicologist.

The goal of a knowledge-based system is not to replace the toxicologist, but to act as a powerful

consulting tool providing case-based summary data for the toxicologist. Human beings have

senses and intuition that are important for diagnosis, which computers cannot replicate.

However, by offering speculative advice, the system may facilitate accurate and timely

diagnoses.

A third toxicological challenge is ensuring the rapid diagnosis and treatment of exposures.

When dealing with poisons and drug overdoses, time is of the essence. In 2004, 1183 people

died of toxic exposures (Watson et al., 2004), many because they did not receive the correct

treatment in a timely manner. Every minute spent waiting to speak with an expert or consult a

clinical manual could make the difference between life and death for a patient. A

knowledge-based system is a rapid aid in diagnosing toxic exposures. Because the system is

computerized, it offers physicians a directed search with a faster response time than written

literature.

System Overview

The goal of this research is to create a general purpose knowledge-based system that can

automatically learn relationships in diagnostic domains. This particular application of the system

uses the Florida Poison Information Center (FPIC) database as its foundation. By mining the

FPIC database, the system extracts associations between the signs and symptoms observed in a

patient and the final diagnosis. Automatically extracting these relationships enables system









designers to bypass much of the knowledge acquisition bottleneck by removing the need to

interview experts, a requirement for traditional knowledge-based system design. Furthermore,

by applying a generalized process, knowledge engineers need not acquire a comprehensive

understanding of every domain for which they create a knowledge-based system.

Being applied to toxicology, the system utilizes the simple, standard medical mathematics

of pre-test probabilities and likelihood ratios to calculate and communicate the relationships

discovered in the database. Currently, the system is a proof-of-concept prototype and only

primary contributors with a significant number of exposure occurrences are included. As the

system grows to include more substances and substance combinations, however, the system's

simple, mathematical representation will become essential for scalability purposes. Additionally,

medical mathematics is not only more understandable to users in the medical field, but

communicates information that is more relevant for medical diagnosis than other traditional

measurements, such as accuracy (Cios & Moore, 2002; Lavrac, 1999).

In spite of the simplicity necessarily inherent in the system, the system seeks to diagnose

complex cases involving multiple unknown substances. In the past, very little research has been

performed in the area of diagnosing multiple disorders, and this system seeks to further the fields

of machine intelligence and knowledge engineering by offering a simple and practical approach

for addressing the problem. Fundamentally, the system treats multiple disorders in much the

same way as single disorders. Multiple exposures are treated as a separate case from the

individual substances involved, with identical operations being performed on each multiple

exposure case to create associations. At this time, the data available is insufficient to fully test

the diagnosis of multiple exposures; however, the system demonstrates significant potential in

accurately identifying the primary contributors in multiple exposure cases.









Ultimately, the system's goal is to serve as a consultant to all physicians that may

encounter toxic exposure cases. For a toxicologist, the system may serve as an idea generator by

offering plausible drug combinations that perhaps the toxicologist failed to consider. For other

physicians, the system may act as a solution finder or simply be used to confirm an uncertain

diagnosis. As the system develops, expanding to encompass the entire FPIC database, the

system may begin to discover relationships previously undocumented in the field of toxicology.

Further development may lead to the real-time monitoring of cases as they are entered into the

database so the system can signal a warning for epidemics or perceived threats, such as

substances associated with terrorism.

Database Resources

The Florida Poison Information Center (FPIC) consists of three of the poison control

centers in the AAPCC. Since 1996, the FPIC has compiled a database logging every call it

receives. When a caller goes to the hospital, the FPIC makes a follow-up call to gather all the

medical information available on the case. In 2004 alone, the FPIC received over 120 thousand

calls and made more than 43 thousand follow-up calls related to human exposures (Florida

Poison Information Center Network, 2005). The FPIC database also contains over 65 thousand

records of multiple exposure cases. Entries in the database are regulated by AAPCC Toxic

Exposure Surveillance System (TESS) standards that ensure the collection of a specific set of

information about each case. In following TESS requirements, the majority of entries in the

database have discrete values that are easy to process with a computer program. Furthermore,

the national standardization by the AAPCC increases portability of a system designed for the

FPIC to other poison control centers around the country.

For this research, the FPIC has generously granted access to all relevant information

recorded in their database from 2002-2006. Initially, concerns were expressed regarding the









accuracy of the data. In some cases, patients may lie about the drugs they took. In others, nurses

may relay either inaccurate or incomplete information to the FPIC. Although these errors affect

system accuracy, the system's performance shows that, in general, the discrepancies can be

treated as random errors whose contributions will become negligible as the database grows.

Another problem is that two drugs taken together in varying proportions can yield different

symptoms. The observed symptoms might vary depending on the amount of interaction

occurring between the two drugs or which drug is affecting the body more strongly at the time.

However, the findings of the system indicate that most multiple exposures are dominated by the

signs and symptoms associated with a primary contributor. As a result, the system is capable of

diagnosing the primary contributor, which is a significant contribution to addressing the problem

of multiple disorder diagnosis.

Conclusion

This chapter began by introducing the field of toxicology and the American Association of

Poison Control Centers. It then explained the applicability of knowledge-based systems to the

medical field, particularly the field of toxicology. Finally, it presented a broad overview of the

system followed by a discussion about the database used in the development of the system.

Chapter 2 presents a general overview of knowledge-based systems and data mining, while

Chapter 3 elaborates on Chapter 2 by describing in greater detail the traditional approaches used

in designing knowledge-based systems. Chapter 4 discusses medical mathematics and gives a

literature review of relevant systems that have been created. Chapter 5 presents the system

design in detail along with the results for diagnosing single exposure cases. Finally, Chapter 6

presents the results for diagnosing multiple exposure cases followed by some concluding

remarks.









CHAPTER 2
OVERVIEW OF KNOWLEDGE-BASED SYSTEMS AND DATA MINING

In the past two decades, the availability of information has skyrocketed. Continual

advances in computer technology have made the collection and storage of massive amounts of

data a reality, while the advent of the Internet has enabled the data to be shared and accessed by

many users throughout the world. Today, the volume of data generated and stored is so

enormous that it has become impossible for the human mind to locate and process most of the

available information. Furthermore, as we learn more about the complexity of our world,

researchers and practitioners alike are forced to specialize to the point where only a few people

are truly knowledgeable in any particular field. If humans are to continue in the quest to

understand and subdue the world, it has become imperative that we create systems and

algorithms capable of filtering out useless data while identifying, processing, and applying

relevant information.

Knowledge-Based Systems

Utility and Structure

Knowledge-based systems, also known as expert systems, are computerized systems that

use information to provide relevant advice and problem solutions within a specific domain.

Knowledge-based systems enable expert knowledge to be accessed 24 hours a day, even when an

expert is unavailable. They also provide a means to preserve information that otherwise might

be lost when an expert retires.

Figure 2-1 shows the basic structure of an expert system consisting of an inference engine,

a knowledge base, and a fact base. The inference engine is a program that manipulates the

knowledge base and fact base using a general problem solving technique. The knowledge base is

the fixed set of information or data that is necessary to solve problems within a particular









domain. The fact base contains problem-specific data, such as user inputs and information

derived from the knowledge base by the inference engine (Gonzalez & Dankel, 1993).


Figure 2-1. Expert system block diagram

Unlike conventional algorithms that embed domain knowledge within the program,

inference engines are problem independent. Such independence provides versatility, enabling

the inference engine to be applied to any number of domains by simply changing the knowledge

base. The same diagnostic inference engine could be effectively applied to the medical field as

well as automobile repair or trouble shooting a manufacturing process. The beauty of this

independence is that it allows the programmer to focus on the domain knowledge, often

expressed as facts and rules, without having to debug faulty algorithmic code.









Reasoning from Examples

For a knowledge-based system to produce accurate results, it must obtain its conclusions

via some logical process. In logic, there are three fundamental ways of reaching a conclusion:

deductive reasoning, inductive reasoning, and abductive reasoning. Deductive reasoning is

reasoning from general to specific. For example:

Premise: All oceans have waves.
Premise: The Pacific is an ocean.
Conclusion: Therefore, the Pacific has waves.

Deductive reasoning is a sound form of argument, meaning that if its premises are true its

conclusion is guaranteed to be true as well. Inductive reasoning is inferring from specific to

more general statements. For example:

Premise: The Pacific is an ocean.
Premise: The Pacific has waves.
Conclusion: Therefore, oceans have waves.

Inductive reasoning is an unsound form of reasoning, meaning that even if the premises are true

the conclusion is not guaranteed to be true. Abductive reasoning is drawing a hypothesis based

on observed characteristics. For example:

Fact: Oceans have waves.
Observation: This body of water has waves.
Hypothesis: This body of water is an ocean.

Like inductive reasoning, abductive reasoning is unsound. In fact, abductive reasoning can be

viewed as a form of inductive reasoning because it is reasoning from specific observations to

draw generalized hypotheses that are plausible but not guaranteed (Gonzalez & Dankel, 1993).

An ideal knowledge-based system should offer the correct solution to every problem

within its domain. To guarantee the validity of every solution, the system would have to contain

all first principles within its domain and employ a sound reasoning technique, such as deductive

reasoning. Although some systems attempt to reason from first principles, in general, attempting









to program a system in such a manner is not practical or even possible. Many fields are not

understood well enough to compile a list of foundational rules and, even if they were, the

compilation and programming of such rules would prove an extremely arduous task for any

domain of significance.

Because we cannot create an ideal knowledge-based system, many systems take the

practical approach of reasoning using examples. Rather than directly programming the system

from first principles, the system is given examples from which it generates its own governing

principles. These principles can be expressed as rules, statistics, case matching, or another

representative form. Chapter 3 discusses many of these approaches. Like inductive reasoning,

the system makes inferences from the specific, i.e. an example, to the general, i.e. a governing

principle. At first, such an approach seems troublesome because the system's reasoning is

unsound and, therefore, inherently can make mistakes. It is important to note, however, that

scientists discovered every scientific principle that we accept as fact in the same manner: by

observing that many examples, or experiments, all followed the same law of nature.

Additionally, Kononenko et al. (1998) have shown that, in many domains, systems that

automatically generate their own diagnostic rules are capable of performing with a higher degree

of accuracy than physicians, when given identical information. Furthermore, knowledge-based

systems that generate their own governing principles for problem solving take less time to create

because the programmer need not spend time determining the governing principles by hand.

Instead, the system itself can determine its own rules by processing a database of examples.

Data Mining and Knowledge Discovery in Databases

Defining Data Mining and Knowledge Discovery

In recent years, the development of mass storage devices has enabled the creation of

extremely large and complex databases. The amount of available data has increased so greatly









that it would be impossibly tedious for the human mind to process all of the information. As a

result, the rising demand for systems capable of meeting this new need has given birth to the

field of data mining. Data mining is the process of extracting information from a database (Han

& Kamber, 2001). Data mining is also referred to as knowledge discovery in databases, where

"discovery...is the generation of novel, interesting, plausible, and intelligible knowledge about

the objects of study" (Valdes-Perez, 1999, p. 336). The knowledge discovered through data

mining comes in two different forms: novel information and established information.

Novel information is previously undiscovered knowledge that is a new concept within a

domain. To demonstrate how a computer system can uncover valuable information, let us

consider the medical field. Most facts within the medical field were established by researchers

who performed studies and processed data numerically to determine relationships between

various observations. In these studies, it is the numerical and statistical values that give credence

to the study. If a physician claims that smoking increases one's chance of lung cancer based on a

general trend the physician has observed in his patients, the physician will be asked for the

numbers to support his statement. It is not until the physician performs a study using a

numerically significant amount of cases and produces values that support his statement that his

observation will be taken seriously. Since computers perform numerical analysis exceedingly

well, it makes sense to create data mining systems that will automatically search for and output

the numerically significant relationships they encounter. Researchers can then examine and

determine the validity of the discoveries made by the computer system.

The second form of knowledge that data mining systems can discover is established

information. Established information is information that is already known and available within a

domain. At first, rediscovery of such information may seem like a confirmation of knowledge at









best or redundant reiteration at worst. However, the ability to automatically discover established

principles through data processing is in fact a powerful tool in the field of knowledge-based

systems. Traditionally, knowledge engineers create knowledge-based systems incrementally

through an interview process with experts in the domain of interest. In each interview, the

knowledge engineer tries to glean rules and heuristics from the expert so that these can be

implemented in the system. Using this process of generating an expert system takes years of

man-hours to complete and forces the knowledge engineer to train himself through immersion in

the domain to create the system properly. It has always been desirable to shorten the creation

time of these systems without compromising their accuracy. The solution to these problems lies

in the power of data mining to automatically extract rules and knowledge from a database

without the necessity of a human to accumulate and program these rules directly.

Seven Steps of Data Mining

According to Han and Kamber (2001), there are seven steps in data mining (Table 2-1).

Do not be confused that data mining is listed as only one of the steps in knowledge discovery in

databases. In practice, the step of data mining is indispensable and usually requires the most

computation and intelligence. As a result, the whole process of knowledge discovery in

databases has commonly become known as data mining.

Table 2-1. Seven steps of data mining
Step name Description
Data cleaning Removing noisy, inconsistent data from the database
Data integration Combining data from multiple sources
Data selection Choosing the data relevant to the task
Data transformation Changing the selected data into a useable format for data mining
Data mining Extracting relationships and patterns from the data
Pattern evaluation Determining if the knowledge discovered has value
Knowledge presentation Presenting the results to the user via tables, graphs, charts, etc.









Although Table 2-1 lists seven steps in data mining, these steps are not rigidly enforced in

data mining system design. Depending on the form of the data, not every step on the table is

required for every data mining problem. For example, data integration is not required if only one

data source is used. Some steps may be performed in a different order. For example, data

cleaning may be handled by allowing a robust algorithm implemented in the data mining step to

eliminate noise. Although many of these steps may be indispensable with a particular data set, in

general the data mining and pattern evaluation steps are the core components of a data mining

system.

Mining Data: What and How?

Table 2-2 summarizes the six types of patterns that can be mined according to Han and

Kamber (2001). The research presented here focuses primarily on classification in the medical

field of toxicology. In general terms, classification attempts to take a database of cases

belonging to known classes and create models that are used to identify cases with unknown

classes based on the information supplied about the case. Specifically, given a database

containing the signs and symptoms observed in a patient paired with the appropriate diagnosis of

the substances affecting the patient, a system can learn to identify different substances based on

the associated signs and symptoms.

Many different methods can be implemented to perform data mining. Some overlap with

the methods of knowledge-based systems, discussed in more detail in Chapter 3. A good

summary of the most common methods can be found in Lavrac (1999) or Han and Kamber

(2001). Zhou (2003) discusses three philosophical approaches to data mining which focus

primarily on the efficiency, effectiveness, or validity of the system design.









Table 2-2. Types of patterns that can be mined
Patterns Description
Characterization & Discrimination Summarizes different classes within the data so they can
be compared and contrasted with other classes
Associational analysis Searches for rules that reveal relationships between
different classes in the data
Classification & Prediction Identifies models where, given certain inputs, the system
can output the most probable class or number associated
with the inputs
Cluster analysis Treats every parameter as a value and groups the most
similar cases into clusters that will be treated as a single
class
Outlier analysis Identifies cases that are sufficiently deviant from all
other cases so they can be examined further
Evolution analysis Searches for tendencies of class parameters to change
over time in a characteristic manner

Conclusion

This chapter has given a general overview of knowledge-based systems and data mining.

The section on knowledge-based systems discussed the general structure and usefulness of

knowledge-based systems followed by the importance of reasoning from examples. The section

on data mining presented the concepts of novel information and established information as well

as the seven steps to data mining and the types of patterns that can be discovered. The next

chapter discusses many of the different approaches to knowledge-based system design.

Although these approaches are presented within the context of knowledge-based systems, many

are used jointly in the field of data mining.









CHAPTER 3
DESIGN APPROACHES TO KNOWLEDGE-BASED SYSTEMS

Since the inception of knowledge-based systems in the 1970's, researchers have developed

many varied approaches for their design and implementation. This chapter presents a brief

overview of the most common design schemes, with an emphasis on those most similar to the

system presented in Chapters 5 and 6. Although the schemes are presented within the context of

knowledge-based systems, most are used jointly in the field of data mining. The chapter begins

with the foundational topics of rule-based systems, case-based reasoning, nearest neighbor

classification, and Bayes' rule, followed by a discussion of lesser topics, including fuzzy logic,

Dempster-Schafer, rough sets, genetic algorithms, and artificial neural networks. The chapter

concludes by discussing the modern approaches most relevant to solving problems involving

multiple disorders, namely Bayesian belief networks and set covering theory.

Rule-Based Systems

In designing knowledge-based systems, the use of rules is an obvious choice. Not only do

humans naturally use rules when they reason and solve classification problems, but rules

inherently are heuristic in nature, enabling them to handle uncertainty. As discussed in

Chapter 2, rule-based systems consist of an inference engine, a knowledge base, and a fact base

(Figure 3-1). The inference engine is the general problem solving technique utilized by the

system, such as the forward and backward chaining approaches discussed below. The

knowledge base consists of a domain specific list of if-then statements, known as rules, used to

gather information and solve problems. The "if' portion of a rule is known as the premise and

the "then" portion of the rule is known as the conclusion. The fact base is problem specific and

consists of knowledge obtained from the user and sensors along with all knowledge derived from









implemented rules. Greater detail on rule-based systems can be found in Gonzalez and Dankel

(1993).


Figure 3-1. Rule-based system block diagram

To better understand rule-based systems, let us consider an example using Doug and his

cats. Let us assume that Doug owns four cats named Princess, Panther, Ivan, and Jimmy, each

requiring special care that it must receive daily. Doug wants to go on vacation, so he hires a

pet-sitter and creates a list of the treatments for each cat (Table 3-1).

Table 3-1. Treatments required for each of Doug's cats
Cat's name Treatment
Princess Requires at least 30 minutes of petting per day
Panther Given 50% more food
Ivan Must not be allowed outside at all costs
Jimmy Must receive antibiotics once a day









Doug soon becomes aware, however, that the pet-sitter does not know the names of the

cats. To ensure that each cat receives the necessary treatment, he decides to create a rule-based

system to help the pet-sitter identify the cats. He begins by writing down the distinguishing

characteristics of each cat, including the cat's major color, fur length, and whether the cat's fur is

a solid color or not (Table 3-2).

Table 3-2. Cat characteristics for identification
Cat's name Major color Solid color? Fur length
Princess Tan No Medium
Panther Black Yes Short
Ivan Tan No Short
Jimmy Black No Short

Doug begins to define the parameters used by his system as well as their allowable values

(Table 3-3). He quickly realizes that defining fur length as being short, medium, or long is a

subjective measurement. To reduce uncertainty, he creates a new parameter called

"FurMeasurement," that allows the pet-sitter to input a length of fur in inches. From this

measurement, the fur length is determined.

Table 3-3. System parameters for cat identification
System parameters Allowable values
MajorColor black, tan
SolidColor yes, no
FurMeasurement Length of fur in inches
FurLength short, medium, long
Cat Princess, Panther, Ivan, Jimmy

Finally, Doug creates seven system rules that identify each cat based on the characteristics

observed by the pet-sitter. As a whole, these rules are known as the knowledge base

(Figure 3-2). The fact base contains any facts entered by the pet-sitter, such as stating that the

unknown cat has a FurMeasurement = 1". Additional facts derived from the rule set are also

included in the fact base, such as rule R1 deriving that the cat must have FurLength = short if

FurMeasurement = 1".










Rules:
R1 If FurMeasurementS 1"
Then FurLength = short

R2 If FurMeasurement> 1" AND FurMeasurement 2"
Then FurLength = medium

R3 If FurMeasurement> 2"
Then FurLength = long

R4 If FurLength= medium
Then Cat= Princess

R5 If SolidColor= yes
Then Cat = Panther

R6 If MajoColor = tan AND FurLength = short
Then Cat= Ivan

R7 If Majo Color = black AND SolidColor = no
Then Cat= Jimmy
Figure 3-2. Rules for identifying Doug's cats

The following subsections discuss the basic inference engine algorithms used in rule-based

systems. Throughout these sections, this example of Doug and his cats is referenced frequently.

Forward Chaining

Forward chaining is the process of reasoning from inputs to conclusions. The first step in a

forward chaining system is to receive user and sensor inputs by storing them in the fact base.

Next, the system searches the rule set and identifies those rules whose premises are satisfied by

the facts contained in the fact base. The process of identifying these rules is called pattern

matching. If more than one rule is satisfied, the system identifies the rule with the highest

priority and executes it, also known as rule firing. The results obtained from the fired rule are

added to the fact base. This process of pattern matching, prioritizing, and rule firing continues

until a solution is reached or no solution can be reached. If no solution is attained, complex

systems may request information from the user that might enable the system to reach a

conclusion. Alternately, the system might apply uncertainty management to offer the most

fitting solutions based on the facts it has received.









Using the example of Doug and his cats, the user might input that a cat has

MajorColor = tan, FurLength = 1.5", and SolidColor = no. The forward chaining system adds

these facts to the fact base and then searches the premise of every rule for a match. It discovers a

match on R2 and a partial match on R6. Being the only rule satisfied, R2 fires, adding the fact

FurLength = medium to the fact base. Once again, the system searches through the premises and

finds matches on R2 and R4 as well as a partial match on R6. The system must now prioritize

the rules. Since the rules closer to solutions are further down the list, the system gives priority to

rules with higher rule numbers. Note that this also prevents the system from entering an infinite

loop by evaluating R2 over and over again. R4 is selected as higher priority than R2, so R4 fires

yielding the result that Cat = Princess. Since the variable Cat is the solution variable, the system

stops and informs the user that the cat being observed must be Princess.

In complex rule-based systems, forward chaining is extremely inefficient due to the

exhaustive search performed during pattern matching. To alleviate this bottleneck, an algorithm

known as the Rete algorithm was developed. The Rete algorithm creates predetermined

networks, known as the pattern network and the join network, to limit the amount of matching

that must take place every cycle of the pattern matching process. The Rete algorithm along with

the formation of pattern and join networks are discussed in detail by Gonzalez and Dankel

(1993).

Forward chaining systems are used primarily for problems that involve a small number of

inputs compared to the number of possible solutions. Synthesis problems, including design,

configuration, planning, and scheduling problems, are good candidates for forward chaining

applications. These types of problems are often open ended, where many solutions or

configurations can satisfy all the given constraints. Since the solutions cannot be known until









they are generated, it would be impossible to work from the problem conclusions to the inputs.

There are, however, many problems with a finite number of solutions and, in these cases, it may

be advantageous to begin with the conclusions and work towards the inputs.

Backward Chaining

Backward chaining is the process of reasoning from conclusions to inputs. Backward

chaining systems assume an answer and then attempt to prove or disprove the truth of that

assumption. To begin this process, the system selects a rule whose conclusion yields a solution.

The system then attempts to satisfy the rule by obtaining values for the variables in the premise

of the solution. For each premise, the system first checks the fact base for the value, then

searches for a rule that can generate the necessary value to satisfy the premise, and finally asks

the user when all else fails. If the fact base contains a value that contradicts the premise, the

system disregards the solution as invalid and assumes a new solution by moving on to a different

rule. When examining a rule, if the fact base contains a value matching one of the rule's

premises, the system continues to assume that the rule is correct and attempts to prove the next

premise until all the premises are satisfied. If no value is found in the fact base for a premise, but

a rule is discovered that can derive its value, the system attempts to prove the premises of that

rule through the same process. If, however, no rule capable of satisfying the premise can be

found, the system asks the user as a last resort. Then the user can enter a value, which the

system adds to the fact base. If the value entered by the user corresponds to the necessary

premise value, the system continues trying to prove the rule. If it contradicts the premise, the

system moves to a new rule that can generate a solution. This process continues until the system

has either found a solution or exhausted all rules capable of yielding a solution.

The backward chaining process is much easier to understand with an example, so let us

return to Doug and his cats. When the pet-sitter enters Doug's house, Ivan comes over to greet









him. To identify Ivan, the pet-sitter consults the system that Doug designed for him. The

backward chaining system knows that when the variable Cat has a value it has reached a

solution. It begins by searching the list for a rule whose conclusion assigns a value to Cat. Rules

R1, R2, and R3 do not have the variable Cat in the conclusion, so the system begins with R4. To

prove that R4 is true, the system must satisfy the premise that FurLength = medium. It searches

the fact base and finds nothing. Next, it searches for a rule whose conclusion can generate

FurLength = medium and discovers that R2 satisfies this requirement. The system now attempts

to prove R2 by looking at its first premise, FurMeasurement > 1". The system again checks the

fact base and finds no matching values, so it searches for a rule that generates a corresponding

solution. Finding none, it asks the user to input the length of the cat's fur in inches. The user

inputs 0.5", the length of Ivan's fur. This value is saved in the fact base, but since 0.5" is not

greater than 1", R2 fails and the system returns to R4. The system discovers that there are no

more rules that can satisfy the premise FurLength = medium, so it discards R4 as false and

proceeds to R5. R5's premise requires SolidColor = yes. The system searches the fact base and

finds no values corresponding to SolidColor. It then searches for rules that can generate the

value SolidColor = yes. Again if finds none. As a last resort, the system asks the user if the cat

is one solid color, and the user enters "no." Since SolidColor now has a contradicting value, R5

fails and the system moves to R6. The first premise on R6 is MajorColor = tan. The system

again checks the fact base and then searches for rules that can satisfy the parameter, but finds

none. It asks the user for the cat's fur color. The user enters "tan." Since this satisfies the first

premise, the system attempts to prove the second premise, FurLength = short. The system finds

no facts in the fact base corresponding to FurLength; however, it finds that R1 can generate the

desired solution. The system then attempts to satisfy R1 by looking at its premise









FurMeasurement < 1". It checks its fact base and discovers that the fact base contains the value

FurMeasurement = 0.5". Since this value satisfies FurMeasurement < 1", both premises for R6

are satisfied and the system concludes that the cat is Ivan.

Backward chaining systems can only be used for problems that involve a finite number of

conclusions. Diagnostic problems, where the inputs outnumber the solutions are good candidates

for backward chaining applications. Diagnostic systems can vary from determining automobile

malfunctions to properly identifying diseases in the medical field (Gonzalez & Dankel, 1993).

Inference Networks

Inference networks are some of the simplest rule-based systems and can only be used when

the relationship between each rule is known in advance. Figure 3-3 shows an inference network

that was directly translated from the rules in Figure 3-2. Note that at the intersection of each line

an arc is drawn. The arc represents the AND operator and, although not included in this

example, the absence of arc implies an OR operator. Because the relationship between each rule

is known ahead of time, inference networks only need to execute the rules directly connected to

facts and rules that have been satisfied. This makes inference networks significantly more

efficient than the exhaustive search used by the pattern matching systems discussed above. The

drawback is that inference networks are often impractical or unfeasible for complex systems with

a large number of interacting rules.

Although Figure 3-3 is correct for the rules that Doug generated in the example above, it

should be noted that the inference network does not contain all of the information available for

each cat shown in Table 3-2. Should a second solid colored cat be added to the knowledge base,

the inference network as drawn would require modification. To allow for expansion of the

knowledge base, it may be better to include all of the information available (Figure 3-4).









Unfortunately, this results in a loss of efficiency, since the system would require all three of

Panther's characteristics to identify him.


Major Color = Black Jimmy

= Tan Ivan


Solid Color = Yes Panther

= No Princess


Fur Measurement < 1" Fur Length = Short
> 1" & < 2"- Fur Length = Medium
> 3" Fur Length = Long

Figure 3-3. Inference network for Doug's cats based on rules in Figure 3-2


Major Color = Black Jimmy

= Tan Ivan


Solid Color = Yes Panther

= No Princess


Fur Measurement < 1" Fur Length = Short
> 1" & < 2"- Fur Length = Medium
> 3" Fur Length = Long

Figure 3-4. Exhaustive inference network for Doug's cats

Decision Trees

A decision tree is a specialized form of inference network that arranges inputs in a

hierarchical fashion. Rather than utilizing every input available from the beginning, decision

trees only address one input at a time. Once that input has been assigned a value, the next input









in the tree is addressed until enough information has been gathered to offer a solution.

Figure 3-5 displays an example decision tree for Doug and his cats. As shown, the system first

asks the user for the cat's major color. After receiving an input, it asks whether the cat is a solid

color or not. Given a black cat, the system can offer a solution after two questions. Tan cats,

however, require three questions to identify. Note that if the user informs the system that the cat

is tan and solid colored, a null set is reached, causing the system to output an error message or

backtrack in an attempt to find a solution. It is also important to understand that the decision

point for solid color need not be located at the same depth on every branch of the tree.




Major Color

Black/ Tan


Solid Color Solid Color

Yes/ \No Yes/ \No

Panther Jimmy 0 Fur Length


Short Medium Long


Ivan Princess 0



Figure 3-5. Decision tree for identifying Doug's cats

The ordering of the decision tree in Figure 3-5 was assigned in an arbitrary manner;

however, there exist mathematical approaches based on information theory that seek to minimize

the number of branches necessary to solve a problem. The most widely recognized approach is

known as the ID3 algorithm, which was created by J. Ross Quinlan and is discussed in detail in









Gonzalez and Dankel (1993). More recently, a descendent of the C4.5 algorithm, also created by

Quinlan, has become popular (Quinlan, 1996). Other variations of decision trees enable the

system to handle uncertainty. One such approach, implemented by Althoff et al. (1998), adds an

extra branch to encompass uncertainty. For example, the node requesting a major color would

not only include the responses black and tan, but also a branch representing that the color could

not be determined. Certainty factors, discussed in the following section, are another method for

handling uncertainty in rule-based systems.

Certainty Factors

The use of certainty factors is one of the oldest and most established methods of handling

uncertainty in rule-based systems. Certainty factors were originally created for use in MYCIN,

an expert system for the treatment of infectious blood diseases (Buchanan & Shortliffe, 1984a).

MYCIN was the first medical expert system and is discussed in more detail in Chapter 4.

Certainty factor values range from -1 to 1, where -1 represents a statement being false, 1

represents a statement being true, and 0 represents complete uncertainty whether the statement is

false or true. Each rule is assigned a certainty factor (CF) that represents confidence that a

statement is true or false. The general form of a rule with a certainty factor is:

If ,
Then (CF),

where is the premise containing the observed or derived facts and is

the conclusion that results from satisfying the premise. CF represents the confidence that the

hypothesis is correct, given the evidence.

Certainty factors can be assigned in a number of ways. Some may be assigned

subjectively by asking an expert to assign a value of confidence to a rule based on past

experience. Others may be determined by using probability to calculate a measure of belief and









disbelief, then mathematically combining the results to yield a certainty factor. Regardless of

how they are determined, the math for combining and propagating certainty factors is as follows

(Buchanan & Shortliffe, 1984b):

CFevsed = CFld + CFne (1- CFold), if CFold and CFnew > 0, (3-1)
CFevsed = -(-CFold CFne (1 + CFold)), if CFold and CFnew < 0, (3-2)
CFO +CF
CFevse = d nw if CFold XOR CFnew < 0. (3-3)
1- min(CFold, CFne )

These equations assume that a rule's premises are known with absolute certainty. Unfortunately,

such an assumption is often false because premises also have associated certainty factors. To

handle situations where the evidence itself may be uncertain, the following rules of combination

are used (Gonzalez & Dankel, 1993):

1. A rule with a single uncertain premise yields a CF that is the product of the conclusion's
CF and the premise's CF.

2. A rule with a conjunction of uncertain premises yields a CF that is the product of the
conclusion's CF and the minimum CF of all the premises.

3. A rule with a disjunction of uncertain premises yields a CF that is the product of the
conclusion's CF and the maximum CF of all the premises.

To better understand the use of these equations and rules, let us consider an example from

driver training. In driver training, students are told that they should scan ahead for dangerous

driving situations. When they identify a possibly dangerous situation, they should predict what

might occur. At this point, they should decide what to do and execute their planned course of

action. Let us imagine that Steve is an android that has been created to function as a normal

human being in society. One day, he is riding his motorcycle along a narrow side street where a

car is parked on the left hand side. In general, he would want to stay farther to the right of the

street to minimize the danger of an unseen child running out from behind the car or a person in

the car opening a door into his driving path. As Steve nears the car, however, he notices a young









boy on the right hand side of the street. He must now decide whether the parked car or the child

is more likely to introduce a dangerous driving situation. Surveying the situation, Steve notices

that the child's mother is present, but appears preoccupied with her gardening. The child is

playing with a ball and seems completely oblivious to the approach of the motorcycle. Steve's

computer brain begins processing this information by assigning certainty factors to his

observations. He is confident that the child is playing with a ball and that the mother is present,

so he assigns these premises a CF = 0.9. He is only half certain that the mother is inattentive, so

he assigns this a CF = 0.5. He is fairly certain that the child has not noticed the vehicle, so he

assigns that a CF = 0.7. Steve recalls the rules of thumb that he learned from similar driving

experiences:

Rule 1: If the child is playing with a ball,
then the child will run out into the street. (CF = 0.6)

Rule 2: If the child's parent is present AND not attentive,
then the child will run out into the street. (CF = -0.3)

Rule 3: If the child has not noticed the vehicle,
then the child will run out into the street. (CF = 0.7)

Steve initializes his CF to zero and starts at Rule 1. Based on the first rule of combination, he

combines the certainty factors of the premise (CF = 0.9) and the conclusion (CF = 0.6) by

multiplying to obtain a CF = 0.54. He then implements Equation 3-1 to calculate his new

certainty factor:

CFreved = 0 + 0.54 (1- 0) = 0.54. (3-4)

Next, Steve moves to Rule 2. Using the second rule of combination, he selects the lower of the

two CF's for the premise (CF = 0.5) and multiplies it with the CF for the conclusion (CF = -0.3)

to obtain a CF = -0.15. He then combines the current CF of 0.54 with the new CF using

Equation 3-3:









0.54 0.15
CF d = .54 15 0.46. (3-5)
S1- min(0.54, -0.15)

Finally, Steve addresses Rule 3. Using the first rule of combination, he combines the

certainty factors of the premise (CF = 0.7) and the conclusion (CF = 0.7) by multiplying them to

get CF = 0.49. Using Equation 3-1, he combines the current CF of 0.46 with the new CF to

obtain his answer:

CFfna, = 0.46 + 0.49 (1- 0.46) = 0.72. (3-6)

So, Steve has a certainty factor of 0.72 that the child will run out into the street in front of him.

Since this is a fairly high value, Steve chooses to move away from the right hand side of the

street where the child is and closer to the car on the left hand side.

Case-Based Reasoning

Case-based reasoning is the method of using documented examples and their solutions to

solve problems. Unlike traditional methods where the system designer must generate a cohesive

set of rules that yield the correct answer, case-based reasoning systems generally maintain a

database of examples, or cases, that are used to solve a problem. The basic structure of a

case-based reasoning system consists of a library or database of historical cases, a way to retrieve

similar cases from the library, and a way to modify the solutions if the retrieved case is not

identical to the problem. When a problem is introduced to a case-based reasoning system, the

system searches its database for cases with similar attributes. When a sufficient number of

similar cases are discovered, the solutions to these cases are then combined and modified to

better match the problem being solved before the final solution is presented to the user. Greater

detail on case-based reasoning can be found in Kolodner (1993).

To better understand case-based reasoning, let us examine an example adapted from

Gonzalez and Dankel (1993). Alice is considering selling her home in Wonderland; however,









she is unsure how much her house is worth. Searching the Internet, Alice discovers an online

case-based reasoning system designed to calculate the current market value of houses based on

region. The system accomplishes this by keeping a record of all the recent home sales by region

as well as basic information on each house, including the square footage, number of bedrooms,

number of bathrooms, and whether the house has a pool or not. Alice tells the system that her

house has 1500 square feet, 3 bedrooms, 3 bathrooms, and a pool. The system proceeds to look

up all the property sales in Wonderland with comparable size and characteristics to Alice's

house. At the end of the system search, the top five most similar houses are selected (Table 3-4).

The system selects the first house as most similar to Alice's house, giving it a starting value of

$150,000. The system now seeks to adapt the house value to include the value of Alice's pool.

The primary difference between the second and fourth house is that the second house has a pool

and the fourth house does not. The system takes the difference between the values of the two

houses and determines that a pool is worth approximately $45,000. The same operation is

performed for the third and fifth houses, yielding an approximate value of $65,000 for the pool.

Averaging these two values, the system determines that Alice's pool is worth approximately

$55,000. Adding the pool's value to the starting value, the system approximates the value of

Alice's house to be $205,000.

Table 3-4. Houses most similar to Alice's house
House ID Price Square footage # of bedrooms # of bathrooms Pool?
1 $150,000 1470 3 3 No
2 $225,000 1540 4 3 Yes
3 $180,000 1480 3 2 Yes
4 $180,000 1520 4 3 No
5 $115,000 1460 3 2 No

There are some distinct advantages to case-based reasoning. First, case-based reasoning

bypasses the bottleneck of gathering information from experts and converting them into rules.









Second, case-based reasoning can be used in fields where examples abound, but the fundamental

principles are not well understood. As long as there are sufficient cases that the system can

access, the system can still function. The drawback is that without a well documented set of

available cases a case-based approach cannot be implemented. Much research has been

performed in the area of case-based reasoning in recent years. For a summary of this research

refer to Nilsson and Sollenborn (2004). For an example of a case-based reasoning applied to the

medical field, refer to papers on CASEY, a system for diagnosing heart failure (Koton, 1988).

Also, see Althoff et al. (1998) for a case-based reasoning system directly applied to the field of

toxicology.

Nearest Neighbor Approaches

Another common approach for solving classification problems is the use of nearest

neighbor methods. Nearest neighbor methods require a number of training samples with

characteristics that have been parametrized to create a numerical vector. Each vector can be

thought of as the point in n-dimensional solution space that is occupied by the sample, where n is

the number of parameters in the vector. Once all the training samples are situated in the solution

space, a clustering algorithm is used to label classification regions within the space as

corresponding to a specific class of objects. When an unknown object is introduced to the

system, the system parametrizes the object by creating a vector of the same form as the training

samples. These vector coordinates are used to calculate the distance between the object's vector

and each of the classification regions. Finally, the unknown object is classified based on the

label of the nearest region (Han & Kamber, 2001).

Let us examine a simple nearest neighbor system designed to identify sports balls at a

recreation center. Each ball is classified based on size and color. The system could use the

diameter of the ball in centimeters as its size value; however, our system will simplify the










problem by identifying balls as small, medium, or large with the corresponding values of 1, 2,

and 3 respectively. Since some colors might be misidentified by certain people, it is important to

assign similar colors with consecutive numbers. For this reason, the system will use the order of

the visible light spectrum in numbering the colors where red = 1, orange = 2, yellow = 3,

green = 4, blue = 5, indigo = 6, and violet = 7. Table 3-5 shows six types of balls that the

recreation center stocks along with their sizes, colors, and the corresponding values. (For readers

unfamiliar with indoor soccer balls, simply imagine a tennis ball the size of a standard soccer

ball.) In the final column of the table, the vectors containing size and color values are shown.

This vector situates each ball as a point in the domain's 2-dimensional solution space

(Figure 3-6).

Table 3-5. Characteristics of various sports balls
Type Size Size value Color Color value Vector
Tennis ball Small 1 Green 4 (1,4)
Racket ball Small 1 Blue 5 (1,5)
Water polo ball Medium 2 Yellow 3 (2,3)
Indoor soccer ball Medium 2 Green 4 (2,4)
Four square ball Large 3 Red 1 (3,1)
Basketball Large 3 Orange 2 (3,2)


7

6

5 Racket Ball

S4 -Tennis Ball Indoor Soccer Ball
o
o
0 3 -Water Polo Ball

2 Basketball

1 Four Square Ball

0
0 1 2 3 4
Size

Figure 3-6. Vectors for sports balls plotted in 2-dimensional solution space








If one of the workers at the recreation center were trying to identify an unknown ball that

he describes as a "large, green ball," he could input the description into the system. For this

example, let us assume that the system uses Euclidean distance to determine the nearest neighbor

match. Euclidean distance from point to point q is defined as:

d(p,q)= l 1- 1 +p2 -x2 + + ... + xpn --x (3-7)

where p = (xp, x,2,..., x,) and q = (x x,q2..., x ,) are n-dimensional vectors that define the

object. To identify the "large, green ball," the ball's description must be parametrized. Looking

at the definitions above, we can see that large is defined as 3 and green is defined as 4 giving the

ball a vector of (3, 4). Next, the distance between the unknown ball and each ball type within the

system must be measured as follows:

d(p, q) = Xpsze Xes2 + x2pColor xqColor (3-8)
where p represents the unknown ball and q represents a ball recorded in the system. Calculating

the distance to each ball in the system yields:

d(p, TennisBall) = 3-1 + 4 4 = 2, (3-9)

d(p,RacketBall) = 3l + 4 5 = 2.236, (3-10)

d(p, WaterPoloBall) =3 2 + 4 3 = 1.414, (3-11)

d(p, IndoorSoccerBall) = /3- 2 + 4 42 = 1, (3-12)

d(p, FourSquareBall) = F3 -3 + 4 -12 = 3, (3-13)

d(p, BasketBall) = F3-3 2 + 4 2 = 2. (3-14)

From these results, the ball with the smallest Euclidean distance is selected as the nearest

neighbor. Namely, the unknown "large, green ball" is identified as an indoor soccer ball.

The example above is a simplistic system. It is extremely limited in size and represents

only one approach to solving classification problems using the nearest neighbor method. In









reality, there is no limit, other than computational power, to the number of parameters that can be

included in a vector. Characteristic parameters are not limited to linear mathematical values.

They can also include non-linear values, binary values, nominal variables with multiple states,

and many other representations. The Euclidean distance is only one of many methods for

determining the nearest neighbor and may not always be appropriate for some parameters.

Although our system did not use a clustering algorithm, most real world systems do. One way to

incorporate a clustering algorithm into the sports ball recognition system would involve taking

many samples of each ball type and entering them into the system. Perhaps only some tennis

balls are green, while others are yellow or orange. Some systems might take a sampling of all of

these types of tennis balls and then calculate the centroid as the point to which distance should be

calculated. More complex systems might define multiple points or a region to represent the

tennis ball. For more general information on nearest neighbor methods and clustering

algorithms, consult Han and Kamber (2001). Bradley et al. (1998) discuss the scaling of

clustering algorithms to handle large databases.

Bayes' Rule

Bayes' rule is perhaps the most widely known and implemented technique for uncertainty

management. Given certain knowledge, it enables the user to identify and select the most likely

solution through the use of probability theory. Bayes' rule is defined as:


p(y I x) = (3-15)
p(x)

where p(x) and p(y) are the probabilities of events x and y occurring, respectively. The

probability of event x occurring, given that event y has occurred, is represented by p(x I y).

Likewise, the probability of event y occurring, given that event x has occurred, is represented as

p(y I x) (Duda et al., 2001).









To put this equation in perspective, let us look at an example. The University of Florida

campus includes a lake called Lake Alice. The lake is known to have alligators. For this reason,

many people go to Lake Alice in the hopes of seeing an alligator in the wild. Many birds also

inhabit the regions of Lake Alice, including ducks. Unlike humans, who may have a hard time

locating an alligator, ducks are more aware of their surroundings and tend to steer clear of areas

where an alligator is present. By gathering data from many visits to Lake Alice, it has been

determined that the probability of seeing a duck, p(duck), is 0.8, and the probability of seeing an

alligator, p(gator), is 0.4. Since ducks avoid alligators, the probability of seeing a duck given

that an alligator is present, p(duck | gator), is only 0.2. With this knowledge, we visit the lake in

an attempt to locate an alligator. Looking around, we notice that there are ducks present, so we

use Bayes' rule to calculate the probability that an alligator is present:

p(duck I gator) p(gator) 0.2 0.4
p(gator I duck) = -- 0 = 0.1. (3-16)
p(duck) 0.8

We find that the probability of an alligator being present is only one in ten, so we probably

should come back to look for alligators on a different day.

Bayes' rule has a distinct advantage over many other methods in that it has the support of

well established mathematical theory. Bayes' rule is limited, however, in that it assumes that all

observations are mutually independent. Unfortunately, this is not the case in the real world. One

proposed solution to this problem is the use of Bayesian belief networks, which are discussed

below in the section entitled "Modern Approaches for Diagnosing Multiple Disorders." In spite

of its limitations, Bayes' rule can be applied effectively in many situations and can be expanded

to include the probability of many events. For further information on Bayes' rule, including its

derivation, refer to Duda et al. (2001).









Other Approaches to Knowledge-Based Systems


Fuzzy Logic

In 1965, Lotfi Zadeh wrote a paper introducing "fuzzy sets" to the world. This paper was

the birth of fuzzy logic. Fuzzy logic is an advanced form of Boolean algebra that allows partial

membership within different sets or categories. Boolean variables can only be absolutely true,

represented by a 1, or absolutely false, represented by a 0. Fuzzy logic, however, allows

variables to be partially true and partially false, represented by any value between 0 and 1.

In a normal Boolean representation, a person can be tall or not tall. The problem with this

representation is that there must be a clean cutoff for where tall begins and ends. If 6' were set

as the cutoff for being tall, someone with a height of 5'11.9" would be considered not tall. Such

differentiation does not fully represent the world in which we live because our world is not

discrete. To compound the problem, human beings often think and speak using general,

imprecise language where the characteristics of things, such as tallness, are subjective in nature.

Fuzzy logic is an attempt to capture the meaning of the imprecise, or fuzzy, statements inherent

to human thinking and represent them in a manner that enables a system to solve problems.

Figure 3-7 shows a fuzzy logic graph with four sets: midget, short, tall, and giant. The

graph shows that the set(s) to which a person belongs have memberships ranging from 0 to 1

depending on the height of the individual. Based on the graph, a person with a height of 5' is

considered fully short with a value of 1 and belongs to no other sets because they all have a

membership value of 0 at 5'. Likewise, a person with a height of 6' is tall, a person shorter than

4' is a midget, and a person taller than 7' is a giant with no membership in any other sets. What

happens between these heights demonstrates the difference between fuzzy logic and Boolean

algebra. If Joann is 5'6" tall, as shown on the chart, she has a 0.5 membership in both short and

tall sets. Likewise, Ryan, who is 6'9", has a membership of 0.25 in tall and 0.75 in giant.









Although Figure 3-7 is drawn with linear slopes, any variety of functions may be used.

Furthermore, it is not necessary for memberships to add up to 1. If a medium height membership

were created with its peak at 5'6", Joann would then have a membership of 1 in the medium set

in addition to her 0.5 membership in both short and tall sets.



Midget Short o Tall r Giant




0

4' 5' 6' 0 7'
Height (feet)


Figure 3-7. Fuzzy logic graph for human heights

Fuzzy logic uses a modified version of Boolean operators to perform its operations. The

fundamental operators are as follows:

1. Compliment: NOT(A)= 1-A,
2. Union: A OR B = Max(A,B),
3. Intersection: A AND B = Min(A,B).

Other operators have been created to mimic human language, such as using A2 to represent

"very" and /A to represent "more or less" functions.

Due to its imprecise nature and lack of mathematical proofs, fuzzy logic has many

opponents in the technical world. In spite of this, it has been successfully implemented in a

variety of fields including data mining and knowledge-based systems. Recent research in

applying fuzzy logic to data mining includes a system by Delgado et al. (2000) to mine medical

databases, Au and Chan's (2003) system for mining rules from a large banking database, and

Wang's (2003) system for generalized data mining. Liu and Yan (1997) have also created a









system that combines fuzzy networks and case-based reasoning to solve diagnostic problems.

For more general information on fuzzy logic, consult Gonzalez and Dankel (1993).

Dempster-Schafer

The Dempster-Schafer theory was developed by Arthur Dempster and Glenn Schafer in

1967. Although rarely used in practice due to its high computational requirements, the

Dempster-Schafer theory is one of the classic approaches to handling uncertainty in

knowledge-based systems. Dempster-Schafer is unique in that it gives confidence values to sets,

rather than solely individual facts, and is capable of representing our "certainty about certainty"

(Gonzalez & Dankel, 1993, p. 253). For more information regarding the Dempster-Schafer

method, refer to Gonzalez and Dankel (1993) or the original paper by Arthur Dempster (1967).

Rough Sets

Rough set theory was proposed for the field of data mining by Zdzislaw Pawlak in 1982.

Rough sets are formed by examining the data available to the system and identifying any

extraneous feature points that are not necessary for differentiating between cases. These

extraneous features are then removed and the remaining features form a construct called a reduct

that is used for classification or identification of unknowns (Kusiak et al., 2000). Although

computationally effective to use reducts, users of expert systems can be "reluctant to make

decisions based on the minimum number of features, rather they would like to see the same

decision reached by alternative sets of features" (Kusiak et al., 2001, p. 225). For this reason, the

use of rough sets can be abhorrent to doctors that feel the more information used in making a

decision, the better.

Kusiak et al. (2000) offer an excellent short summary of rough sets as it relates to their

research in diagnosing and treating lung abnormalities called solitary pulmonary nodules.

Kusiak et al. (2001) present an algorithm based around rough sets for extracting rules relating to









heart arrhythmia. Tsumoto (2000) also presents a rough set approach to diagnosing diseases. To

increase system accuracy, Tsumoto's system creates both positive rules to "rule in" and negative

rules to "rule out" possible diseases.

Genetic Algorithms

John Holland introduced the idea of genetic algorithms in 1975. The philosophy behind

genetic algorithms is to model natural selection in nature. In short, natural selection states that

the organisms with the most advantageous genes for survival tend to pass their genetics on to the

next generation of organisms, while those with inferior genes tend to die before they reproduce.

Through this process, the offspring of a species gradually become fitter and more capable of

survival.

In the same way that the DNA of a species is divided into chromosomes, genetic

algorithms are made of building blocks of code, called primitives. These primitives are the

smallest functional units of code and cannot be separated. By randomly assembling algorithms

from primitives, the first generation of algorithms is created. Each of these algorithms is then

evaluated by a fitness function to quantify its performance, or fitness. The fitness of each

algorithm is used to determine the probability that the algorithm is selected to contribute to the

next generation of algorithms. The better the algorithm's fitness, the more likely it is to be

selected. There are three ways that an algorithm can contribute to the next generation:

reproduction, crossover, and mutation. For each of these methods, a certain percentage of the

current generation of algorithms is randomly selected. The algorithms selected for reproduction

are copied directly to the next generation without modification. The algorithms selected for

crossover are paired with a second algorithm. Both algorithms are broken at a random location

in their code. The segments are then exchanged between the algorithms so that each algorithm

has a piece of the other's code and these newly modified algorithms become a part of the next









generation. Algorithms chosen for mutation have a random segment of code deleted from their

programming and replaced by another randomly generated set of code. Once all of this has

occurred, the algorithms selected for reproduction, crossover, and mutation are all compiled to

become a new generation of algorithms. Like their predecessors, the new generation will be

evaluated by a fitness function and then some are selected to reproduce, crossover, or mutate to

create the next generation of algorithms (Nilsson, 1998).

The process of creating a useful algorithm using the genetic algorithm approach takes

thousands to millions of iterations to complete and may never fully optimize the algorithm's

code. Due to their randomness, however, genetic algorithms have great utility in optimization

problems because they are much less likely to converge on local maxima or minima. Vinterbo

and Ohno-Machado (2000) have also applied genetic algorithms to the problem of diagnosing

multiple disorders. For more general information on genetic algorithms, refer to Nilsson (1998).

Artificial Neural Networks

Like genetic algorithms, artificial neural networks were inspired by nature. Artificial

neural networks are composed of units that roughly approximate the firing of a neuron in a

biological organism. In a biological organism, a neuron sits inactive until it is stimulated beyond

its activation threshold. When this occurs, the neuron fires, sending its signal to the brain. The

firing of a neuron is an all or nothing response. There is no variability in the signal it sends. In

the same way, units in artificial neural networks can be thought of as having an activation

threshold that turns on when its inputs exceed a certain threshold and has a value of zero at all

other times. In practice, the units employ a differentiable function, like a sigmoid, to

approximate a step response. Differentiability is important in the training of a neural network,

which is discussed below.









The most common artificial neural network is known as a feedforward multilayer

perception. This type of neural network consists of an input layer, an output layer, and any

number of hidden layers. Figure 3-8 shows an example neural network with two hidden layers.

In creating a neural network, the designer must determine the inputs to the system as well as how

many outputs are necessary to solve the problem. The designer must also select the number of

hidden layers and the number of units to be included per layer. As seen in the figure, inputs are

directly connected to every unit in the first hidden layer. Likewise, every unit in the first hidden

layer is connected to every unit in the second layer and so on, all the way through the output

layer. Each connection has an associated weight that acts as a multiplier. By adjusting the

weights, the influence an input has on a specific unit can be controlled. The weights of a neural

network are usually initialized randomly with small values. A bias unit, that always outputs 1, is

also included at every layer. Adjusting the weight between the bias unit and another unit shifts

the activation threshold of that unit (Nechyba, 2003).

1 2 .r











-M h .idden i unit player #/




bias wito. -.- Vnpr^ ^ t,,,,.,
X1 X2 x1,
Figure 3-8. Typical artificial neural network with two hidden layers. Figure used with
permission from Nechyba (2003, p. 7).









Training a neural network to solve problems involves adjusting the weights for the

connections between each unit. The most well known algorithm for adjusting weights is the

backpropagation algorithm, published in 1986 by Rumelhart and McClelland. By introducing an

input with a known solution to the neural network, the difference between the output of the

system and the desired solution output can be compared. The backpropagation algorithm is then

used to adjust the weights accordingly. By repeating this process many times with a variety of

samples, the weights gradually converge to a local minima in an attempt to maximize the number

of samples the system can correctly identify.

Artificial neural networks are non-linear function approximators. Their strength lies in

their ability to train themselves from sample cases, however, this is also their weakness. Because

of the complexity of the network itself, it is hard to understand and explain the internal workings

of a trained neural network. Abidi and Manickam (2002) have created a hybrid system using

case-based reasoning and neural networks to data mine medical systems. For a general

discussion on neural networks, refer to Nilsson (1998).

Modern Approaches for Diagnosing Multiple Disorders

Over the years, researchers have implemented a variety of methods, including those

discussed above, in an attempt to diagnose problems involving multiple disorders. In most cases,

linearity and statistical independence cannot be assumed in problems of this nature. For this

reason, the challenge of efficiently and effectively diagnosing multiple disorders remains an

important area of research today. In recent years, two problem solving methods appear to be

taking the forefront. The first involves the modification of Bayesian methods to account for

dependencies. The second involves the use of set theory to approach the problem.









Bayesian Belief Networks

As discussed above, Bayes' rule requires statistical independence of events to solve

problems. Over the years, many variations using Bayesian methods have been developed to

account for dependencies in a data set. In this section, we discuss perhaps the best documented

of these approaches, namely Bayesian belief networks.

Bayesian belief networks allow dependencies to be included in a system's probability

calculations. Figure 3-9 displays a graphical representation of a belief network. As shown, it

can be seen that belief networks consist of two parts, a directed acyclic graph and conditional

probability tables (Han & Kamber, 2001). The graph portion consists of nodes, which represent

random events, and arcs, which portray statistical dependencies between nodes. The example in

Figure 3-9 contains six random events, including DarkClouds, Humidity, and Rain. Arcs are

drawn between DarkClouds and Rain as well as Humidity and Rain to show that the presence of

DarkClouds and/or Humidity influences the likelihood of Rain. A conditional probability table

is drawn to the right of the graph. Belief networks have one table for every node in the graph,

however, only the table for the Rain node is given in this example. The table shows the various

probabilities for the occurrence of Rain given the presence or absence of Rain's parents,

DarkClouds and Humidity. Represented mathematically, the first column of the table states that:

P(Rain = True D DarkClouds = True, Humidity = True) = 0.9, (3-17)
P(Rain = False I DarkClouds = True, Humidity = True) = 0.1. (3-18)

There are many ways to train a Bayesian belief network. If the structure of the network is

known and the events are observable, conditional probability tables can be calculated using

standard probability and statistics calculations. If the structure is known but not all of the events

are observable, a gradient descent method can be used to determine a local optimum of










probabilities. For more general information on the structure and generation of Bayesian belief

networks, refer to Han & Kamber (2001).



(A) (B)


DarkClouds Humidity -- --
DC, H DC, H DC, H DC, H

R 0.9 0.7 0.4 0.1
Rain
R 0.1 0.3 0.6 0.9

Erosion LandSlides



RoadClosures


Figure 3-9. Example Bayesian belief network. (A) Directed acyclic graph of dependencies.
(B) Conditional probability table for Rain, where R = Rain, DC = DarkClouds, and
H = Humidity.

The application of modified Bayesian methods is one of the most promising areas of

research in diagnosing multiple disorders. Bayesian belief networks have been used by

van der Gaag and Wessels (1994) in an attempt to efficiently diagnose multiple disorders. The

HEPAR II system, by Onisko et al. (2000, 2001), also uses belief networks to diagnose multiple

disorders in the field of hepatology. Other Bayesian variations exist, including Peng and

Reggia's (1989) use of a "comfort measure" that attempts to adapt Bayes' rule to the diagnosis of

multiple disorders. Additional research by Peng and Reggia (1986, 1987) includes the creation

of a hybrid system, combining both Bayesian classification techniques and the set covering

model. The next section provides an introduction to set covering.









Set Covering

Another approach to diagnosing multiple disorders is the use of set covering theory. Given

a case with a set of observed symptoms, set covering seeks to construct a solution set of

disorders that can best account for the symptoms. In generating solution sets, it is not

uncommon for several plausible problem solutions to exist. In such cases, the principle of

Occam's Razor is generally applied, meaning that the simplest explanation is usually the best.

For this reason, set covering is also known as the parsimonious covering theory (Peng & Reggia,

1986).

To understand set covering, we must begin by formally defining three universal sets:

1. D = {d, d2,..., d }, where D is the set containing every possible disorder, d,
2. S = {s, s2 ,..., s }, where S is the set containing every possible symptom, s,
3. R = {r, r2,..., where R is the set containing every possible relationship, r.

The relationships within set R are tuples consisting of a disorder and a corresponding symptom

such that r, = (d,, s ), where s, is a symptom that may be caused by disorder d,. It is important

to note that disorder d, does not always result in symptom s,. Likewise, symptom s, may be

caused by a disorder other than d, (Peng & Reggia, 1986).

Generate-and-test is the simplest algorithm for solving cases involving set covering. To

implement the algorithm correctly, three more sets must be defined:

4. Fo c S, where Fo contains the observed symptoms for a particular case,
5. H D, where H is the hypothesized disorder set that may be responsible for Fo,
6. FH c S, where FH contains the symptoms associated with H by set R.

When a set of observed symptoms is presented to the system for diagnosis, each of these

symptoms is stored in Fo. The system then generates hypothetical sets of H to determine the

disorders that could be causing the observed symptoms. When a hypothetical set H is generated,

FH is populated by all of the symptoms that can be caused by the proposed disorders in H. FH is









then compared with Fo to determine if FH "covers," or contains, every symptom in set Fo. If FH

covers Fo, the solution is a plausible solution. As stated above, set covering systems generally

follow the principle of Occam's Razor. For this reason, generate-and-test algorithms usually

begin with sets of H consisting of single disorders. If no suitable solution is found, the system

then considers double exposure cases followed by complex multiple exposure cases as necessary

(Baumeister et al., 2001).

Figure 3-10 presents a graphical representation of the relationships, R, between disorders,

D, and symptoms, S. Let us assume that that all five symptoms are presented to the system as

observed findings, Fo. Using a simple generate-and-test algorithm, the system checks every

individual disorder, d, in an attempt to find a single disorder solution to the problem. As seen

from the graph, no single disorder can satisfy the observed symptoms. Then, the system checks

for multiple disorders capable of covering the symptoms contained in Fo. It should be apparent

that there exist at least three solution sets to the problem, (d2, d3), (da, d3, d4), and (da, d4, ds).

Using the heuristic of minimality, the system would select (d2, d3) as the solution. At times,

however, minimality will not yield the best solution to a problem. For example, if we were

aware that disorder d2 was an extremely rare disease and highly unlikely to occur, either of the

other two possible solutions might provide a better answer. Furthermore, in comparing

(dj, d3, d4) to (dj, d4, ds), we can see that the former results in more redundancy due to the

presence of d3 in the solution. If redundancy is considered a negative aspect in a solution, the

system should ultimately select (dj, d4, ds) as the solution set. From this example, it can be seen

that the principle of parsimony and Occam's Razor can be applied in many different ways and,

although favoring the smallest solution set is often a useful heuristic, there are many instances

where this approach does not yield the best result. For more information on the nature of









parsimony, refer to Peng & Reggia (1986). Atzmueller et al. (2004a) also address this topic from

a different perspective.



Disorders, D d, d, ) d





Relationships, R





Symptoms, S sS (4 (


Figure 3-10. Set covering graph of relationships between disorders and symptoms

Set covering theory has been applied to numerous systems for diagnosing multiple

disorders. The paper by Reggia et al. (1983) offers a solid introduction to set covering theory

and its applications to knowledge-based systems. Peng & Reggia (1986, 1987) expanded this

work by creating a hybrid system to take advantage of both set covering and Bayesian

classification techniques. In more recent years, a paper by Baumeister et al. (2001) presents a set

covering system that incrementally refines itself along with an excellent overview of set covering

theory. In the past 5 years, Atzmueller et al. (2003a, 2003b, 2004a, 2004b) have presented a

significant amount of research including the expansion of set covering to make use of diagnostic

scores and case-based reasoning.

Conclusion

In this chapter, we discussed the methods used in designing and implementing both

knowledge-based systems and data mining systems. We began with an extensive discussion of









rule-based systems and certainty factors. From there we moved to other foundational topics,

including case-based reasoning, nearest neighbor classification, and Bayes' rule. Next we

discussed the less mainstream topics of fuzzy logic, Dempster-Schafer, and rough sets along with

other approaches less relevant to our research, such as genetic algorithms and artificial neural

networks. The chapter concluded by discussing Bayesian belief networks and set theory, which

are two of the most relevant modem approaches to diagnosing multiple disorders with

knowledge-based systems.

The following chapter begins with a discussion of the mathematics used throughout the

medical field. It continues with a discussion of important knowledge-based systems in the

medical field. Finally, it concludes with a literature review of systems that have been designed

for the purpose of diagnosing medical disorders.









CHAPTER 4
MEDICAL MATHEMATICS AND RELEVANT KNOWLEDGE-BASED SYSTEMS

The previous chapter discussed a variety of established approaches for knowledge-based

systems and data mining. This chapter gives an overview of many systems that have been

developed using those techniques. A strong emphasis is given to historical medical expert

systems, diagnostic systems in toxicology, and modem systems for diagnosing multiple

disorders, as these are most relevant to this research. The information presented in this chapter is

cursory at best, and the reader is encouraged to study the references for a proper understanding

of any systems of interest. Before addressing the systems, however, the chapter begins with a

discussion of the mathematics employed in the field of medicine.

Medical Mathematics

The medical field presents a unique set of challenges for knowledge engineering.

Distinctions ranging from ethical and legal issues to fundamentally different mathematical

understandings set the domain of medicine apart from all other domains. Cios and Moore (2002)

thoroughly discuss the considerations that must be observed in the medical field. For our

purposes, however, let us focus on the standard mathematical approaches used for

decision-making in medicine.

Probabilistic Measurements

Many knowledge-based systems use precision as a conclusive measurement of

performance. Precision is the percentage of true positives (TP) compared to the total number of

cases classified as positive events:

TP
precision = x 100%, (4-1)
TP + FP

where FP represents false positives. According to Cios and Moore (2002), "This measurement is

very popular in machine learning and pattern recognition communities, but is not acceptable in









medicine because it hides essential details of the achieved results" (p. 4). To better understand

the performance of a diagnostic test, the medical profession defines a number of other

measurements. Let us begin by examining a contingency table (Table 4-1). Contingency tables

contain four variables: true positives (TP), true negatives (TN), false positives (FP), and false

negatives (FN). A true positive occurs when a test correctly diagnoses a patient as having a

disorder. A true negative occurs when a test correctly diagnoses a patient as not having a

disorder. A false positive occurs when a test incorrectly diagnoses a patient as having a disorder.

A false negative occurs when a test incorrectly diagnoses a patient as not having a disorder.

Table 4-1. Contingency table
Test results Disorder present Disorder absent Total
Positive TP FP TP + FP
Negative FN TN TN + FN
Total TP + FN TN + FP

Another measurement of performance, frequently used in conjunction with precision, is

accuracy. Accuracy is the number of correctly classified cases compared to the total number of

cases presented to a system:

TP + TN
accuracy = x 100%. (4-2)
TP + TN + FP + FN

Even when used in combination, however, precision and accuracy do not fully capture the

information necessary for medical diagnosis. Perhaps the most common measurements in the

medical field are sensitivity and specificity, which are defined as:

TP
sensitivity -, (4-3)
TP + FN
TN
specificity = T (4-4)
TN + FP

Sensitivity is also known as the true-positive rate (TPR). It represents the probability that the

test detects the disorder, given that the patient has the disorder. Specificity is also known as the









true-negative rate (TNR). It represents the probability of the test detecting no disorder, given

that the patient truly does not have the disorder. These measurements are important in diagnosis

because tests are never absolutely accurate. There are always instances where a patient with a

disorder displays fewer symptoms than a patient without the disorder. For this reason, diagnostic

tests in medicine are tuned to "rule in" or "rule out" a diagnosis. If a physician is attempting to

"rule in" a disorder, he should select a test with a high specificity so that a positive test result

strongly confirms his premonition. Likewise, a physician attempting to "rule out" a disorder

should use a test with a high sensitivity.

Two measurements often used in conjunction with sensitivity and specificity are the

false-negative rate (FNR) and the false-positive rate (FPR), also known as the false-alarm rate

(FAR):

FN
FNR = = 1- sensitivity, (4-5)
FN + TP
FP
FAR = P = 1- specificity. (4-6)
FP + TN

The false-negative rate and false-alarm rate are probabilities associated with a test inaccurately

diagnosing a patient. The false-negative rate represents the probability of a test failing to detect

a disorder that is present, whereas, the false-alarm rate represents the probability of a test falsely

indicating that a patient has a disorder.

Alternates to sensitivity and specificity are the positive predictive value (PPV) and

negative predictive value (NPV):

TP
PPV = (4-7)
TP + FP'
TN
NPV = (4-8)
TN + FN









The positive predictive value is the likelihood that, given positive test results, the patient does

indeed have the disorder. In a similar manner, the negative predictive value is the likelihood

that, given negative test results, the patient truly does not have the disorder. There is an

important difference between the measurements of sensitivity and specificity and the

measurements of positive and negative predictive values. While sensitivity and specificity are

independent of the population being tested, positive and negative predictive value are effected by

the prevalence of a disease within a population.

To contrast sensitivity and specificity with positive and negative predictive values, let us

consider an example from the Medical University of South Carolina Doctoring Curriculum

(2000). Imagine that a new test for detecting HIV is discovered. To determine its usefulness, an

experiment with 10,000 HIV infected blood samples and 10,000 non-infected blood samples is

performed. The testing results in all correct answers except for 10 false positives and 10 false

negatives, yielding a sensitivity, specificity, PPV, and NPV of 99.9% (Table 4-2). Additionally,

the pre-test probability indicates that there is a 50% chance for a randomly selected blood sample

to contain HIV.

Table 4-2. Experimental HIV testing extended contingency table
Test results Disorder present Disorder absent Total
Positive 9,990 (TP) 10 (FP) 10,000 PPV
= 9,990/10,000
= 99.9%
Negative 10 (FN) 9,990 (TN) 10,000 NPV
= 9,990/10,000
= 99.9%
Total 10,000 10,000 20,000

Sensitivity Specificity Pre-test probability
= 9,990/10,000 = 9,990/10,000 = 10,000/20,000
= 99.9% = 99.9% = 50%









Now let us apply the test to a population of one million people where 1% of the population

is infected with HIV. Since sensitivity and specificity are a function of the ability of a test to

identify HIV carriers, their values do not change (Table 4-3). In contrast, the PPV decreases by

8.9% and the NPV increases slightly. The significance of the decrease in PPV is that if the

physician informed patients that they had HIV based solely on this test, 990 individuals would be

falsely informed that they were infected. Neither sensitivity nor specificity gives the physician

an indicator of this change from the previous contingency table.

Table 4-3. HIV testing (1% chance of HIV) extended contingency table
Test results Disorder present Disorder absent Total
Positive 9,990 (TP) 990 (FP) 10,980 PPV
= 9,990/10,980
= 91.0%
Negative 10 (FN) 989,010 (TN) 989,020 NPV
= 989,010/989,020
= 99.999%
Total 10,000 990,000 1,000,000

Sensitivity Specificity Pre-test probability
= 9,990/10,000 = 989,010/990,000 =10,000/1,000,000
= 99.9% = 99.9% = 1%

Table 4-4. HIV testing (0.1% chance of HIV) extended contingency table
Test results Disorder present Disorder absent Total
Positive 999 (TP) 999 (FP) 1,998 PPV
= 999/1,998
= 50.0%
Negative 1 (FN) 998,001 (TN) 998,002 NPV
= 998,001/998,002
= 99.999%
Total 1,000 999,000 1,000,000

Sensitivity Specificity Pre-test probability
= 999/1,000 = 998,001/999,000 =1,000/1,000,000
= 99.9% = 99.9% = 0.1%

Let us look at one more example in relation to HIV testing. Applying the test to a pool of

blood donors that have already been screened for HIV risk factors, we would expect the









percentage of HIV infected individuals to be closer to 0.1%. Again, the contingency table is

shown for a population of one million people (Table 4-4). The calculations show that the PPV

drops to 50%, while sensitivity and specificity remain constant. The results from these three

contingency tables demonstrate that while sensitivity and specificity normally should remain

constant, the PPV and NPV are dependent upon the prevalence of a disease within a population.

The final mathematical expression commonly used in the medical field is the likelihood

ratio (LR). The likelihood ratio is the odds that a specific test result is given to a patient with the

disorder compared to the same test result being given to a patient without the disorder. There are

two types of likelihood ratios, LR and LR-, which can be calculated as follows:

( TP
LR + =TP + FN) sensitivity (49)
FP FAR

FN

LR -= (4-10)
SFPTN ) specificity
FP + TN

where LR is the odds that a positive test result occurs for a patient with the disorder versus one

without the disorder and LR- is the odds that a negative test result occurs for a patient with the

disorder verses one without the disorder. Thus, good diagnostic tests should have a high LR

and a low LR-. One major advantage of likelihood ratios is that they can be easily combined

through multiplication. For this reason, the system discussed in Chapters 5 and 6 utilizes LR+ to

diagnose toxic exposures.

In this section, we have discussed the most common statistical expressions in medical

literature. For more detailed information, refer to Owens and Sox (2001). The next section

discusses a different process frequently used in medical diagnosis.









Diagnostic Scores

The use of diagnostic scores is a simple approach to risk analysis in the medical field. In

this method, signs and symptoms are assigned a point value, or score, based on their correlation

to a specific disorder. In forming a diagnosis, the physician gathers a list of all the signs and

symptoms observed in the patient. He then looks up the score for each observation on a chart.

Adding the scores together yields the final diagnostic score, which is compared to another chart

to determine the risk of the patient for a specific disorder.

Table 4-5 and Table 4-6 are examples of the charts that a physician might use when

implementing diagnostic scores for appendicitis, based on the research by Ohmann et al. (1999).

As can be seen, if a patient's only variables are rigidity and having an age less than 50 years old,

the final diagnostic score sums to 2.5. In Table 4-6, we find that the odds of the patient having

appendicitis are 3%. From these observations, the physician can be fairly certain that the patient

does not have appendicitis. However, if the patient satisfied the requirements for every variable

in Table 4-5, the patient's final diagnostic score would be 16.0 indicating a 68% risk of

appendicitis. Depending on the final diagnostic score, the physician may recommend different

tests or treatments for the patient.

Table 4-5. Diagnostic scores for acute appendicitis (Ohmann et al., 1999)
Variable Points
Tenderness, right lower quadrant 4.5
Rebound tenderness 2.5
No micturition difficulties 2.0
Steady pain 2.0
Leukocyte count > 10.0 x 109/L 1.5
Age < 50 years 1.5
Relocation of pain to right lower quadrant 1.0
Rigidity 1.0









Table 4-6. Final diagnosis score significance (Ohmann et al., 1999)
Diagnostic score Frequency
< 4.0 points 3%
4.0-5.5 points 5%
6.0-7.5 points 11%
8.0-9.5 points 24%
10.0-11.5 points 32%
12.0-13.5 points 55%
> 14.0 points 68%

Literature Review of Knowledge-Based Systems

Although the field of medicine possesses its own established mathematical methods,

surprisingly few systems have taken advantage of them. The majority of systems rely on

established engineering approaches or devise their own representation scheme. This section

begins by discussing two significant historical systems in the field of medicine. It then presents

a selection of systems directly applied to the field of toxicology. The chapter concludes by

addressing the systems specifically designed for the diagnosis of multiple disorders.

Historical Medical Expert Systems

Research in the field of medical expert systems began in the early 1970's with the

development of MYCIN. Created at Stanford University by Buchanan and Shortliffe (1984a),

MYCIN was designed for the purpose of diagnosing infectious blood diseases to recommend the

appropriate antibiotics for treatment. MYCIN utilized a knowledge base consisting of

approximately 500 rules, and its inference engine architecture was constructed as an inference

network. The system would query the user using simple yes or no questions until enough

information was gathered to identify the bacteria responsible for the symptoms. To handle

uncertainty, MYCIN employed certainty factors, discussed in Chapter 3. Research has shown

that using these methods MYCIN was able to outperform faculty at the Stanford medical school

in diagnosing diseases within its domain (Yu et al., 1984). Interestingly, later research seems to

indicate that the use of certainty factors was superfluous and that MYCIN could perform equally









effectively without them (Buchanan & Shortliffe, 1984b). Although a foundational pillar in the

field of expert systems, MYCIN was never used in the medical field. Ethical and legal issues

along with a mistrust of computer systems within the field of medicine were major contributors

in preventing its commercialization. For an exhaustive discussion about MYCIN, see Buchanan

& Shortliffe (1984a).

A second foundational system in the field of medical expert systems is INTERNIST.

INTERNIST was developed by Pople (1977) at the University of Pittsburgh during the same era

as MYCIN. The goal in developing INTERNIST was to create a system capable of handling

general internal medicine, as opposed to the specialized domains traditionally occupied by expert

systems (Pople, 1985a). INTERNIST was developed and refined over the course of a decade

through interviews with Jack Myers, MD, and became one of the largest and broadest expert

systems ever created. Before completion, the system contained information on more than 3550

symptoms (Miller et al., 1982) and could diagnose more than 750 diseases (Pople, 1985b).

INTERNIST's inference engine employed a ranking program to perform diagnosis. To make the

domain size manageable, it also used heuristically guided partitioning rules. By breaking

problems down into smaller subsets, the system was able to better handle the broad domain of

internal medicine. CADUCEUS, an eventual successor to INTERNIST, implemented a problem

decomposition method in an attempt to better handle multiple disorders. Regrettably,

CADUCEUS suffered from other limitations due to its requirement for prior knowledge of

domain structure. Pople (1985b) presents a thorough progression of the INTERNIST system

from its origins through the development of CADUCEUS. Other references of note include

Pople (1977) and Miller et al. (1982). For an excellent example of INTERNIST's interface and

interaction with the user, see Pople (1985a).









Expert Systems in Toxicology

There exist surprisingly few knowledge-based systems in the field of clinical toxicology.

In fact, according to Darmoni, in 1995 "Toxline and Toxlit [showed] that less than ten

computer-aided decision support systems [had] been developed in clinical toxicology" (p. 234).

Of these systems, two in particular stand out from the rest: a French system called SETH and a

Bulgarian system called MEDICOTOX-CONSILIUM. In recent years, two more systems of

interest have been developed, the Inreca system for use in Russia and a Polish veterinary system.

A summary of each of these four systems is given below.

SETH was developed in France by Darmoni et al. (1994, 1995) for use in the Rouen

University Hospital. The system uses 70 signs and symptoms for diagnosis and contains over

1000 drugs from over 75 toxicological classes (Darmoni, 1994). SETH was implemented on a

commercial off the shelf, object oriented, expert system shell called KBMS. Its inference engine

is a rule-based, forward chaining system that utilizes the Rete algorithm for pattern matching.

SETH also makes use of set theory for diagnosing cases involving multiple drugs. In 1992, the

system began experimental use at the Rouen University Hospital where it was used in the

diagnosis of over 2000 drug intoxication cases (Darmoni, 1995). Although its creators caution

that the system was not designed for use by experts, the ratings given by residents at the hospital

indicate that they were pleased with the system. For more information on SETH, refer to

Darmoni et al. (1994, 1995).

MEDICOTOX-CONSILIUM was developed for use in the hospitals of Bulgaria as a

diagnostic system for first aid clinical toxicology. It was first implemented in 1988 at a single

hospital and eventually distributed to 11 more hospitals around the country. The system is

described as a classical system that uses frame structures, rules, and scores provided by experts

for diagnosis. Within the frame structure, poisons are divided into 10 classes with 310 groups









containing a total of 2500 different kinds of poisons (Monov et al., 1992). The system contains

1000 rules and facts that use 47 syndrome and 134 symptom definitions to identify poisons and

supply the user with information about the appropriate cure from any of 86 treatments and 55

antidotes (Monov et al., 1992). MEDICOTOX-CONSILIUM is focused on user interaction and,

rather than simply producing a diagnosis, seeks to leave the final decision to the user. It also

offers three different modes to maximize its utility in different circumstances. The first mode is

the clinical orientation mode and is useful for diagnosing urgent cases where immediate action is

required. The second mode is the diagnostic research mode that can be used to carefully reason

through less urgent cases. The final mode is the expert-reference mode that enables the user to

look up information on any of the drugs and toxins contained within the system. For more

information on MEDICOTOX-CONSILIUM, refer to Monov et al. (1992).

Another system for toxicology was developed by Althoff et al. (1998) for use by the

Russian Toxicology Information and Advisory Center in Moscow. The system is based on

previous research called the Inreca (Induction and Reasoning from Cases) European project. The

Inreca approach is a case-based reasoning system designed to use historical cases to diagnose

disorders. This particular system utilizes the database of the Toxicology Information and

Advisory Center of the Russian Federation Ministry of Health and Medical Industry to supply

the cases for diagnosing poison exposures. The location of Russia was explicitly chosen for its

abundance of data because "every year Russia has more intoxication cases than any other

country in Europe" (Althoff et al., 1998, p. 27). A distinct aspect to this case-based reasoning

system is that, rather than interpreting the cases at run time, it compiles the data into an

Inreca-Tree in advance to improve performance. The Inreca-Tree is basically a specialized

decision tree that includes a branch at every decision node to account for the possibility of









unknown measurements. For more information about the Inreca system for diagnosing poison

cases, refer to Althoff et al. (1998).

The final toxicology system of interest is being developed at Warsaw Agriculture

University in Poland by Kluza (2004) for the area of veterinary medicine. The system utilizes

case-based reasoning for the purpose of offering remote consultations to veterinarians working in

the field. As presented by Kluza in 2004, the project is still in the launching phase; however,

since it is being designed for veterinarian medicine the system faces some unique challenges of

interest. First, being designed for animals, Kluza's system must not only include gender and age

in its diagnosis, but must also account for differences between various species and breeds.

Second, because animals cannot verbally communicate with veterinarians, every diagnosis must

be performed without certain prior knowledge that is often available in toxicology cases

involving humans. For more information, see Kluza (2004).

The four systems presented in this section were selected to give the reader a general

understanding of the techniques that have been used in clinical toxicology. SETH and

MEDICOTOX-CONSILIUM are two of the most prominent systems in the field. The Inreca

approach is an excellent example of the simplicity and robustness that are necessary for

advancement in the fields of knowledge-based systems and data mining. Finally, the Polish

veterinary system represents a current topic of research in the field and poses some unique

challenges for consideration. Very few knowledge-based systems exist for the field of clinical

toxicology. For a fairly exhaustive list of the systems being used throughout the field, consult

Darmoni et al. (1994).

Knowledge-Based Systems for the Diagnosis of Multiple Disorders

The previous section discussed expert systems in the field of clinical toxicology. Although

most of these systems are by necessity forced to address the challenge of diagnosing multiple









disorders to some degree, it was not the primary thrust of the research. This section presents an

overview of the research that has been done explicitly for the purpose of diagnosing multiple

disorders. Like the field of clinical toxicology, relatively little knowledge-based system research

has been performed in the area of multiple disorders. This section presents research on four

major types of systems used in multiple disorder diagnosis. The four methods discussed include

Bayesian approaches, case-based reasoning, set covering, and diagnostic scores. Note that,

although we divide these systems into four types for the sake of discussion, many systems may

contain aspects from multiple approaches.

To begin, let us discuss systems that use Bayesian approaches. Bayes' rule is a probability

based equation that can be used to identify the most likely disorder. The problem is that Bayes'

rule requires independence of the symptoms used in diagnosis. Much research has focused on

the generalization of Bayes' rule to account for dependencies within a domain.

Research by Ben-Bassat et al. (1983) presents a Bayesian pattern recognition algorithm

used in the IMEDAS emergency diagnosis system to overcome the limitations of Bayes' rule.

Ben-Bassat et al. (1980) also discuss some of the other early approaches for "handling a violation

of the conditional independence assumption in classical Bayesian diagnosis models" (p. 153).

One of the most noteworthy accomplishments of Bayesian research was the construction of

Bayesian belief networks, discussed in Chapter 3. One system, created by van der Gaag and

Wessels (1994) uses belief networks to diagnose multiple disorders. The distinctive feature of

the system is that it utilizes a clustering algorithm to strategically focus on small sets within the

domain as a method of improving efficiency.

In more recent years, another system called HEPAR II was developed by Onisko et al.

(2000). HEPAR II uses belief networks to diagnose multiple disorders in the field of hepatology.









Onisko et al. (2001) further developed the system by creating a method for building belief

networks from a small data set. To accomplish this, they implemented what they refer to as

"Noisy-OR gates" to increase the accuracy of the system.

A second approach to diagnosing multiple disorders is case-based reasoning. The

advantage of case-based reasoning is that systems can essentially create themselves from

historical cases, unlike most complex models, including Bayesian networks, that generally

require knowledge acquisition from experts (Atzmueller et al., 2004b). It is important to note

that case-based reasoning is an approach to system development rather than a method for

reconciling uncertainty and probabilistic dependencies. For this reason, many case-based

reasoning systems make use of other methods. The ADAPtER system combines case-based

reasoning with abductive model based reasoning to diagnose multiple car engine faults (Portinale

& Torasso, 1995). SONOCONSULT makes use of inductive methods to augment its case-based

reasoning and recognize multiple disorders in the field of sonography (Baumeister et al., 2002).

Atzmueller et al. (2003a) continued the research on SONOCONSULT by exploring the use of

decomposition methods within a case-based system. Finally, Atzmueller et al. (2004b) present

three approaches to case-based reasoning for the diagnosis of multiple disorders. The

approaches presented include compositional case adaptation, where a group of cases is recalled

for diagnosis rather than a single case, the partition class approach, where domains are divided

into independent subsets for diagnosis, and set covering, which is discussed in Chapter 3.

Set covering is a method that seeks to find combinations of disorders that can account for

observed symptoms. The simplicity and elegance of the approach makes it one of the most

promising areas in research relating to multiple disorder diagnosis. In the 1980's, Reggia and

Peng published a large amount of foundational research on set covering. The paper by Reggia et









al. (1983) is one of the clearest and most referenced papers in the area of set covering for

multiple disorders. The system they propose, however, only holds "for the extreme case that

might be called complete decomposability" (Wu, 1991, p. 240). In later research, Peng and

Reggia (1986, 1987) expand on what they refer to as "parsimonious covering theory" by adding

Bayesian calculations to allow for multimembership classification. Parsimonious covering

theory is essentially set covering where the simplest solution is considered the best solution. In

their 1989 paper that presents further enhancements to the system, Peng and Reggia implement

the use of "comfort measures." The purpose of comfort measures is to ensure that the system

maintains a certain level of quality in the solutions it returns to the user. In the early 1990's, Wu

extended the field of set covering by developing algorithms to increase efficiency. Wu's

research primarily centers around decomposing a problem into smaller sub-problems using a

clustering algorithm (Wu, 1990, 1991). In other research, genetic algorithms were applied to

generate a set covering system for multi-disorder diagnosis (Vinterbo & Ohno-Machado, 2000)

and simple systems were given the ability to incrementally refine themselves, adding complexity

as more samples become available (Baumeister et al., 2001).

The final approach to be discussed is the use of diagnostic scores for the diagnosis of

multiple disorders. In particular, this discussion details the research performed by Atzmueller et

al. (2003b, 2004a), as it bears the most resemblance to the system presented in Chapters 5 and 6.

Atzmueller et al. (2003b) have implemented a case-based system for the diagnosis of multiple

disorders in the field of sonography. The system is semi-automatic, meaning that the system

generates its rules automatically but still requires an expert to oversee its development and adjust

parameters as necessary to ensure the system functions properly. Atzmueller et al. (2003b)

believe that "understandability and interpretability...is of prime importance" and so their system









attempts to apply "the same representation the human expert favors" by using diagnostic scores

(p. 23). Diagnostic scores are a simple approach for risk analysis used in the medical field,

discussed earlier in this chapter.

Using a case base, the system creates scoring rules, r, of the form:

r =f d, (4-11)

wherefrepresents a finding, such as the observation of a sign or symptom, d represents the

diagnosis related to that finding, and s represents a qualitative measure of uncertainty with

s {S3, S2, S1,0, S 1, S2,S }. Scores of s E {S1,SS2, 3} represent a positive correlation, where

S3 strongly supports diagnosis d and S weakly supports diagnosis d. Likewise, scores of

se {1, S 2, S 3} represent a negative correlation, where S-3 strongly opposes diagnosis d and S-_

weakly opposes diagnosis d. When s = 0, no significant correlation is found and the rule is later

pruned from the rule set. As defined by Atzmueller et al. (2003b), four scores from the same

category yield the next higher score, such that:

S, + S + S + S, = S,, (4-12)
S + S + S + S = S3, (4-13)
S + S, + S, + S, = S_,, (4-14)
S2 + S+ S, = S3. (4-15)

Also, any two scores of equal and opposite number cancel, such that:

S, +S = 0, (4-16)
S +S2 =0, (4-17)
S, +S3 =0. (4-18)

A diagnosis d is considered "probable" if the aggregate score is equal to or greater than S3. Note

that substituting a score of 1 for S,, 4 for S2, and 16 for S3 makes the system presented here

comparable to the diagnostic scoring system presented earlier in the chapter, where a final

diagnostic score of 16 represents the cutoff point for a diagnosis being considered highly likely.









To determine the score value that a rule should receive, Atzmueller et al. (2003b) use a

quasi probabilistic score. The quasi probabilistic score is calculated by a mathematical equation

that combines the statistical dependence of a finding with its precision and specificity. The

resulting value ranges from -1.0 to 1.0 and is mapped to a corresponding s value.

Atzmueller et al. (2003b) are concerned with the balance between accuracy and

complexity. For this reason, their system utilizes diagnostic profiles, Pd, defined as:

P, = (F, frecF), (4-19)

where Fd represents the findings most frequently associated with a diagnosis andfrecF contains

the frequencies of those findings. The frequencies in the diagnostic profile are used to prune less

important rules for system efficiency. Further efficiency is gained through other pruning criteria

as well as partitioning the domain using background knowledge provided by an expert in the

field. In later work, Atzmueller et al. (2004a) proposed a quality measure equation as a means to

measure and determine the appropriate balance between accuracy and simplicity.

Conclusion

This chapter began by presenting the common mathematical calculations used in the

medical field. From there, the discussion moved from historical medical expert systems to

systems designed specifically for the field of toxicology. The chapter concluded by discussing

four approaches to diagnosing multiple disorders. Considerable emphasis was placed on the

system created by Atzmueller et al. (2003b) due to its similarity to the system presented in the

following chapters. The next chapter details the development of a system for toxic exposure

diagnosis and its performance when diagnosing single exposure cases.









CHAPTER 5
DIAGNOSING SINGLE EXPOSURE CASES

The previous chapter presented diagnostic systems for toxicology as well as modem

research towards the diagnosis of multiple disorders. This chapter describes the first stage of

exploratory research performed using data from the Florida Poison Information Center (FPIC) to

create a system for diagnosing multiple exposure cases. The system presented in this chapter is

capable of generating a differential diagnosis for exposures to a single toxin. The chapter begins

by describing the source data, continues by discussing system design principles and

development, presents the system's operation with respect to the user interface, and concludes

with a discussion of system testing, research results, and, finally, system performance.

Source Data

Since 1996, the FPIC has collected data on every call received and made follow-up calls to

obtain additional information about cases referred to hospitals. The collected data is stored in a

relational database, consisting of tables where each entry in a table is an object with a key that

enables relationships to be drawn between tables. In 2004 alone, the FPIC received over

120 thousand calls and made more than 43 thousand follow-up calls related to human exposures

(Florida Poison Information Center Network, 2005). The FPIC database also contains over

65 thousand records of multiple exposures. For this research, the FPIC provided access to all the

cases recorded in its Jacksonville database from 2002 through 2005. The information supplied

contains more than 160 thousand toxic exposure cases, with nearly 14 thousand cases involving

multiple toxins. To improve data quality, the database records were cleaned so that only cases

with clinical effects that were followed to a known outcome remained. The cleaned database

contained 30,152 single exposure cases and 7,096 multiple exposure cases, however, the

system's training only involved single exposure cases for this portion of the research.









The database supplied by the FPIC conforms to the Toxic Exposure Surveillance System

(TESS) standard. TESS is the older of two national standards defined by the American

Association of Poison Control Centers (AAPCC) to regulate the fields contained within the

database of each poison control center (PCC). The newest standard, known as the National

Poison Data System (NPDS), was not fully developed at the time of this research. As a result,

the system presented here utilizes TESS standardized data fields. However, using TESS rather

than NPDS standards does not affect the system's general design principles because both TESS

and NPDS use the same paradigm and record the same set of data. Both are national standards

that will enable the system to be expanded to a national level and implemented at various PCC's

throughout the country. Both require that the majority of entries in the database have discrete

values, which are easy to process with a computer program. Most importantly, both contain the

observed signs and symptoms, jointly called clinical effects, and the final diagnosis of patients

referred to hospitals for treatment.

Although the FPIC database is a valuable resource, it may contain errors. Patients may lie

about the substances they consume or physicians and nurses may not fully recount all the

important details of a case when reporting to the PCC. Fortunately, these errors can be viewed

as random errors. As the case base for the system grows, the incorrect information should

become negligible when contributing to system calculations.

System Design Principles

From the outset, a major objective of this research was to bypass the knowledge

acquisition bottleneck by generating a knowledge-based system capable of producing meaningful

and useful results without the need for an active, overseeing expert. In developing this system to

diagnose unknown exposures, certain guiding principles were followed to produce the desired

system characteristics, which include simplicity, understandability, automatic system generation,









and incremental updates. Each of these characteristics is discussed briefly in the following

paragraphs.

The characteristic of simplicity is of the utmost importance. Holsheimer et al. (1995) have

shown that success in extracting information from databases does not require complex

algorithms. In fact, "simpler, even trivial, processes are better than complicated ones if they are

enough for the job of discovery" (Valdes-Perez, 1999, p. 336). Simplicity inherently gives

systems several advantages. Generally, systems with simple representations and algorithms are

more efficient and require less processing power. Simple, linear calculations grant the system

scalability, which is extremely important given the size and continual growth of the FPIC

database. Systems designed with simpler architectures are often more portable to other systems.

Portability is desirable not only for aiding other PCC's around the country, but also so that the

system approach can be used to solve diagnostic problems in other domains. Finally, simplicity

of design gives the system inherent understandability. Not only should the system and its

processes be easier to comprehend and implement by other knowledge engineers, but the

solutions yielded by the system should be explained in terminology that physicians will

understand.

The understandability of system results was another chief concern during development. If

physicians understand the method by which the system obtains its answers, they are more likely

to trust the system and use it within the spectrum of its intended purpose. According to

Atzmueller et al. (2003b), "understandability and interpretability of...learned models is of prime

importance" and "ideally, the learning method constructs knowledge in the same representation

the human expert favors" (p. 23). For this reason, the final system design makes use of

likelihood ratios. Likelihood ratios are commonly used throughout the medical field and are









discussed in Chapter 4 along with other medical mathematics. After processing, the system

presents its results to the user as a differential diagnosis. A differential diagnosis is a list of

various disorders that can produce similar clinical effects. It is used to determine the most likely

cause of a disorder and is a method commonly practiced in the medical field. By using these

familiar approaches, physicians should find the system to be relevant, understandable, and easy

to operate. Furthermore, the methods used in the system's mathematics are similar to medical

case studies seeking to identify patterns of clinical syndromes. It is believed that this will help

the system gain acceptance in the medical field.

Automatic system generation is another desirable trait. Atzmueller et al. (2003b) state that

"pure automatic learning methods are usually not good enough to reach a quality comparable to

manually built knowledge bases" (p. 23). In spite of this deficiency, automatic methods offer

certain advantages that should not be overlooked. Automatically trained systems fully bypass

the knowledge acquisition bottleneck of obtaining information from an expert. An expert's time

is valuable, and the more processing a system can do without expert input, the more rapidly it

can be developed and implemented. Additionally, automatically generated system designs can

be broadly applicable to solving problems, which makes the system significantly more portable

than one containing expert input that leads to specialization within a given field. The system

presented in this chapter was generated by an engineer with no expertise in the area of toxicology

and no guidance from toxicologists regarding specific diagnostic approaches. Bypassing the

information bottleneck and using a generally applicable, medical solution increases the value of

the system as a whole.

The final desired attribute of the system is the ability to perform incremental updates. As

the FPIC database grows in size, more valuable information will become available for aiding in









diagnosis. Although the system could recompile all the data from 1996 to the present with every

update, such an operation would be inefficient and could require significant processing time.

Rather than beginning anew each update, the system can maintain key information about current

values and incorporate the information from the latest cases into its calculations. Currently,

incremental updates have not been implemented because the system is not directly linked to the

central database. However, the use of likelihood ratios makes the implementation of incremental

updates a straightforward procedure. To calculate likelihood ratios, a count of true positives,

true negatives, false positives, and false negatives must be determined for each clinical effect.

By saving a table of these four values with their corresponding substance, the likelihood ratio can

be calculated. Updating the system then becomes a simple matter of querying the new data for a

count of each of the four values, adding the results to the old table, and recalculating the

likelihood ratios. Graefe et al. (1998) presents examples of other information that can be used in

incremental updates. Han and Kamber (2001) also briefly discuss incremental and parallel data

mining for the combining of gathered information.

System Development

The goal of the research presented in this chapter is to create a system using data mining

and knowledge engineering techniques on a database obtained from the FPIC to aid in the

diagnosis of exposures to a single unknown toxin. The system must receive a physician's input

in the form of signs and symptoms observed in a patient, process the data, and return a list of the

substances that are most likely to induce these clinical effects. The aim of the system is not to

produce infallible results for every case. Rather, the system attempts to give the physician easy

access to a refined and organized version of the knowledge stored in the FPIC's vast database.

The system offers direction by presenting a differential diagnosis of drugs and other toxic









substances that should be considered. Ultimately, the physician makes the final decision

regarding the treatment the patient should receive.

In generating the system, data mining techniques are used to clean the records and extract

the appropriate information from the FPIC database. First, informational calls are removed so

that only exposure cases remain. Then, the exposure cases are filtered so that only cases with

clinical effects that were followed to a known outcome remain. Although this reduces the size of

the dataset to 30,152 single exposure cases and 7,096 multiple exposure cases, the filtering

process ensures that only significant representative cases with the best documentation are used to

train the system. Each exposure case has clinical effects associated with it. The clinical effects

observed in a patient are rated as either "related," "unknown if related," or "not related" to the

substance involved in the exposure. For the purposes of system training, clinical effects that are

"not related" are removed from the database while those that are "unknown if related" are used

for training in the same way as "related" clinical effects.

After extracting and cleaning the cases in the database, a table of prior probabilities, also

known as pre-test probabilities, is calculated for each toxin. A prior probability represents the

likelihood of a particular substance being involved, given that a toxic exposure has occurred.

Prior probability, P, is calculated as:

Cases
P =Cases (5-1)
Total

where Cases is the number of cases involving a particular substance and Total is the total number

of exposure cases in the database.

In addition to prior probabilities, a table of likelihood ratios is calculated. When

calculating likelihood ratios, the system treats each clinical effect as a diagnostic test that is

useful in detecting the presence of a toxic substance. Likelihood ratios represent the odds that an









observed clinical effect is caused by a particular toxin versus the odds that the clinical effect is

the result of exposure to any other toxin. The likelihood ratio, LR is calculated as:


r TPN
LR = PF+ (5-2)
FP
FP + TN)

where TP represents true positives, TN represents true negatives, FP represents false positives,

and FN represents false negatives. An exhaustive table of likelihood ratios relating every

individual clinical effect to every possible substance exposure is the primary resource utilized by

the system in creating a differential diagnosis. An advantage of likelihood ratios over many

other medical measurements (i.e. sensitivity, specificity, positive and negative predictive values,

etc.) is that likelihood ratios can be easily combined through multiplication. Additionally, by

including the prior probability, likelihood ratios can account for disorder prevalence.

Furthermore, likelihood ratios are easily calculated and characterize many cases with a single

number, making the system scalable to large databases and ensuring a rapid response time.

Although the likelihood ratio has its advantages, it inherently contains the drawbacks of

every mathematical ratio, the possibility of evaluating to zero or causing a divide-by-zero error.

A likelihood ratio of zero only occurs when TP = 0 and may not seem like a problem until we

understand how the system calculates combined likelihood ratios. Every clinical effect is treated

as a test for detecting the presence of a toxic substance. If there are multiple clinical effects,

their likelihood ratios are multiplied together to obtain a combined likelihood ratio. If any of the

clinical effects has a likelihood ratio of zero, then the substance's combined likelihood ratio also

evaluates to zero, regardless of the evidence presented by other clinical effects. The problem is

that the absence of cases associating a substance with a clinical effect does not mean that the

substance absolutely cannot cause that clinical effect. Furthermore, even if the substance truly









cannot cause the clinical effect, patients may have unassociated clinical effects caused by other

ailments.

The divide-by-zero error is an obvious problem for any computer system. Looking at

Equation 5-2, we can see that if TP + FN = 0 or FP = 0, the calculation fails. (Note that

although FP + TN = 0 causes an error, addressing FP = 0 also prevents that error from

occurring.) The sum of true positives and false negatives (TP + FN) is the total number of cases

where a particular substance is involved. Due to the structure of the database and its queries, a

substance with no recorded cases in the database is ignored and not included as a valid diagnosis

in the system. As a result, TP + FN never equals zero. The second divide-by-zero error,

FP = 0, occurs whenever a clinical effect only appears in the database with an association to one

particular substance. As calculated, the likelihood ratio concludes that since no other substance

causes the clinical effect, that substance absolutely must be the cause, so it divides by zero to

obtain an infinite likelihood. In reality, however, no single substance is the only possible cause

for any clinical effect in the system. The problem is lack of sufficient data. The divide-by-zero

error was encountered during development because the database contains only one instance of

fetal death. Although fetal death can be caused by any number of substances, the system

attempted to conclude that only acetaminophen could cause the death of a fetus.

The preliminary system used a simple-minded approach to solving the multiplication by

zero and divide-by-zero problems. Multiplication by zero was handled by replacing all

zero-valued likelihood ratios with a value of one. Although this prevents the system from

gaining any knowledge about a substance from a clinical effect not associated with the

substance, it prevents that clinical effect from destroying the knowledge gained from other

clinical effects. The divide-by-zero error was solved by examining the data set and manually









modifying the offending clinical effect records. The likelihood ratios calculated using the

method described in this paragraph are referred to as "non-adjusted" likelihood ratios from this

point forward.

Although using non-adjusted likelihood ratios expedited the research process, it introduced

significant drawbacks. First, replacing likelihood ratios of zero with the value of one ignores the

information that could be gained from the calculation. Likelihood ratios can be fractional,

indicating a negative correlation to a substance, and multiplying by zero indicates an infinitely

negative correlation. Rather than throwing the negative association out completely, the zero

value might be tempered by using some fractional likelihood ratio. Second, manually removing

problematic cases from the database violates the important design principle of automatic system

generation. To solve these problems, a generalized equation was developed to replace the

likelihood ratio:

TP + AF
C TP+A+FNA)
LR+ = PA (5-3)
Ad FP + A
CFP+A+TN+A)

where TP represents true positives, TN represents true negatives, FP represents false positives,

FN represents false negatives, and A is a small, positive constant. As discussed in Chapter 4, TP,

TN, FP, and FN represent the four possible outcomes of a diagnostic test. By adding A to each

outcome, the equation states that any of these outcomes is a possibility, even if no supporting

cases exist in the database. The end result is a stable equation that closely approximates the

likelihood ratio, avoids the difficulties of multiplying by zero, prevents the divide-by-zero error,

and converges to the same value as the likelihood ratio as the number of cases increases. A

variety of A values were calculated and compared, including 1.0, 0.1, 0.01, and 0.001.









Ultimately, a A of 0.01 was selected as it appeared to improve diagnosis significantly while still

yielding a suitable substitute for the likelihood ratio. Equation 5-3 with A = 0.01 is referred to

as the "adjusted" likelihood ratio from this point forward.

The system described is a hybrid system containing elements of data mining, case-based

reasoning, rule-based systems, and uncertainty management. Data mining techniques are used to

clean and extract relevant information from the database. Case-based reasoning methodology

makes use of the example cases obtained by data mining to develop a system that runs on

composite observations. The system calculations are essentially a set of simple rules running in

parallel with likelihood ratios implemented to handle uncertainty. From the results of these

rules, a ranked list is generated to indicate the most likely substances that account for the given

signs and symptoms. Moreover, uncertainty management is employed by the use of adjusted

likelihood ratios to make system calculations robust in the face of database anomalies.

System Operation and User Interface

As discussed in the previous section, the system utilizes two tables of calculations to create

a differential diagnosis. The first table contains the prior probabilities for every substance. The

second table consists of likelihood ratios relating every individual clinical effect to every

possible substance exposure. When supplied with a set of clinical effects, the system calculates a

combined likelihood ratio, including prior probability, for every potential single exposure

diagnosis. The results are then sorted and presented as a differential diagnosis to the user.

The user interface reveals more about the functionality of the system (Figure 5-1). Clinical

effects are grouped into nine categories defined by TESS: cardiovascular, dermal,

gastrointestinal, heme/hepatic, neurological, ocular, renal/GU, respiratory, and miscellaneous.

Each group of clinical effects can be viewed by selecting the appropriate tab from the top of the









interface. In the figure, the gastrointestinal disorders tab is selected to show the various TESS

defined clinical effects associated with this category. Three disorders are selected: abdominal

pain, dehydration, and diarrhea. More disorders may be selected from other category tabs as

well.



^ Cardiovascular Dermal Gastrointestinal Heme/Hepatic I Neurological Ocular Renal/GU Respiratory Miscellaneous
1W Abdominal Pain Ileus/No Bowel Sound
SAnorexia Melena
r Blood per Rectum [other) r Nausea
r Constipation I Oral Burns (include lips) -Calculate By:
P Dehydration 1- Oral Irritation ( Substance
W Diarrhea Oropharyngeal Edema r Major + Minor Categories
r Dysphagia F Throat Irritation r Major Category
SEsophageal Iniury Vomiting t7 Adjusted
SEsophageal Stricture 125 Minimum Exposure Cases
SFecal Incontinence 0 Minimum CE Occurrences
Gastric Burns Diagnose
r Hematemesis/UGI Bleed Clear Fields


Figure 5-1. User interface

The controls for various system parameters are on the right hand side of the user interface.

The "Calculate By" selection box enables the user to select substance, major and minor

categories, or major category. Thus far, we have discussed the research only in terms of

diagnosing exposures to a single toxic substance; however, each substance belongs to a minor

category, which in turn belongs to a major category. In the same manner that the system is

trained to diagnose individual substances, it can be trained to diagnose based on major and minor

categories or even solely based on major category. Giving physicians a general idea of the drug

categories they should consider may prove every bit as valuable as attempting to directly

diagnose a substance.









Below the "Calculate By" selection box is a check box where the user can select to

calculate likelihood ratios as non-adjusted or adjusted. As discussed in the previous section,

non-adjusted calculations replace all likelihood ratios of zero with a one and require the system

designer to manually remove anomalies that might cause divide-by-zero errors. The adjusted

likelihood ratio, shown in Equation 5-3, makes a slight modification to the traditional likelihood

ratio to create a more robust equation that prevents the system from failure due to multiplication

or division by zero. As shown in Figure 5-1, the box is checked so that the system calculates the

adjusted likelihood ratio.

Below the "Adjusted" check box are numerical values for "Minimum Exposure Cases"

(MC) and "Minimum CE Occurrences" (MCE). These numbers serve as data filters that are used

to eliminate diagnoses and clinical effects with poor representative sampling sizes. The MC box

enables the user to set the minimum required number of cases for a diagnosis. If a diagnosis

does not have at least as many cases in the database as the number in the box, the diagnosis does

not appear on the results table. The MCE box enables the user to set the minimum number of

times a clinical effect (CE) must appear in the database. If a clinical effect does not appear in the

database at least as many times as the number entered, the clinical effect is ignored when

calculating the likelihood ratio even if the clinical effect is checked by the user.

The last two features of the user interface are the "Clear Fields" and "Diagnose" buttons.

The "Clear Fields" button removes all check marks from every clinical effect regardless of the

selected tab. This enables the user to be sure that the check marks on other tabs have been

cleared without having to manually flip through each tab individually. The "Diagnose" button

runs the system program, displaying a differential diagnosis table to the user. Clicking on the

"Diagnose" button with the settings in Figure 5-1 displays a table similar to the one shown in









Figure 5-2. The table contains the calculated likelihood ratio (LR) on the left and the associated

diagnosis on the right. The results in the figure indicate bacterial food poisoning, with a

likelihood ratio of 148.9, is by far the most likely cause of abdominal pain, dehydration, and

diarrhea. The second most likely cause is mushrooms, with a likelihood of 3.32. It should be

noted that, although likelihood ratios are helpful for indicating the strength of support for various

diagnoses, rank on the list is more important. Physicians should consider many of the substances

in the top ten before making their final diagnosis.



LR SubDesc
) 148.949829478 BACTERIAL FOOD POISONING: UNKNOWN TYPE
3.32482311621 UNKNOWN MUSHROOM
0.31018510633 MULTI-BOTANICAL WITHOUT MA HUAN OR CITRUS AURANTIUM
0.29315333996 CARDIAC GLYCOSIDE
0.18296274892 LITHIUM
0.13801692951 ORGANOPHOSPHATE
0.11248568761 SUSPECTED FOOD POISONING-UNKNOWN TYPE-PATIENT SYMPTOMATIC
0.06391201604 ACETAMINOPHEN WITH HYDROCODONE
0.06183397575 MULTI VITAMIN-TABLET: CHILD WITH IRON (NO FLUORIDE)
0.05855603720 LAXATIVE
Record: 14 I I1 T > I lf*l of 265
Figure 5-2. Results table

System Testing and Results

For testing, the system's prior probabilities and likelihood ratios were trained on

approximately 90% of the cases in the database. After training, the system attempted to diagnose

the remaining 10% of the cases using only the associated clinical effects. The correct diagnosis

for each case was then compared to the system's differential diagnosis and the rank of the correct

diagnosis was saved to a summary table. The system then retrained on a new set of data and was

tested against a different 10% of the database. The training and testing datasets were determined

by the last digit of each case identification number, ensuring a unique test set every cycle. The

process was repeated ten times, completely testing the system against every case contained in the









database. Throughout the process a large amount of data was gathered, the results of which are

presented in the following paragraphs.

The ten-cycle testing process was used to compare the effectiveness of adjusted versus

non-adjusted likelihood ratios. Both likelihood ratio calculations were tested at all three

diagnostic levels: diagnosing by substance, diagnosing by major and minor categories, and

diagnosing by major category alone. Additionally, the settings for the MC and MCE filters were

varied to produce multiple points of comparison. While maintaining a constant MCE value of

10, MC was tested at 10, 25, and 100. Likewise, while maintaining a constant MC value of 25,

MCE was tested at 0, 10, and 50. Furthermore, four levels of medical outcomes were tested

against the system: all exposures with a minor severity or worse, moderate severity or worse,

major severity or worse, and a severity level where the outcome was death. These tests yielded

sixty resultant sets for both adjusted and non-adjusted likelihood ratios.

After generating these results, the accuracy of the sixty adjusted sets was compared to the

accuracy of the sixty non-adjusted sets. Accuracies were calculated in three ways: the

percentage of exposures appearing as the top diagnosis, the percentage of exposures appearing in

the top ten diagnoses, and the percentage of exposures appearing in the top 10% of the trained

diagnoses. Comparing adjusted accuracies with non-adjusted accuracies, it was determined that

adjusted likelihood ratios appear to be a good approximation of non-adjusted likelihood ratios,

with adjusted calculations yielding a higher accuracy 90% of the time. Of the 180 accuracy

calculations, there were eighteen exceptions where non-adjusted calculations outperformed

adjusted calculations. Ten of these exceptions involved the outcome of death. There are a few

explanations for this anomaly. First, there are very few death cases recorded in the database,

making it more likely that random variation might favor one system approach over another.









Second, death cases may often display clinical effects that are not normally associated with a

particular toxic exposure. The reason is that the systems in the body begin to shut down and

extreme failures begin to cause cascading effects. In such cases, it becomes impossible to

reliably compare two diagnostic systems. The accuracies of the remaining eight exceptions were

within 0.5% of the corresponding adjusted performances. This nominal gain is more than

compensated for by the 127 instances where adjusted calculations outperformed non-adjusted

calculations on test cases not limited to the outcome of death. Additionally, a system based on

adjusted calculations is much easier to generate automatically than one based on non-adjusted

calculations because it does not require any manual intervention by the system designer. Having

established that the adjusted likelihood ratio is a valid substitute for the traditional likelihood

ratio, the remainder of the research results is discussed in terms of adjusted calculations.

The next step in system development was to determine the best values for the MCE and

MC filters. Beginning with a constant MC value of 25, MCE was varied and tested for values of

0, 2, 5, 10, and 50. For each of these values, the adjusted system was also tested at the three

diagnosis levels of substance, major and minor categories, and major category alone. Each of the

three diagnosis levels yields a system with a significantly different number of trained diagnoses.

To enable comparisons between the three diagnosis levels, the percentage of exposures

appearing in the top 10% of the trained diagnoses was used as the accuracy measurement.

Table 5-1 shows the accuracy of the system when diagnosing by substance, Table 5-2 when

diagnosing by major and minor categories, and Table 5-3 when diagnosing by major category

alone. Looking at Table 5-1 under minor severity, it can be seen that varying MCE has no effect

on the accuracy of the system. Under major severity, the accuracy decreases from 77.8% to

77.6%, a negligible change. Likewise, looking at Table 5-2 and Table 5-3 it becomes obvious









that varying MCE causes little to no change for minor, moderate, and major severities. Once

again, the exception is the severity where the outcome is death, which is most likely due to a

small sampling size. For example, the 5.1% increase in accuracy observed in exposures with an

outcome of death being diagnosed by major and minor categories is a difference of only four

additional cases being diagnosed in the top 10%. Prior to these tests, it was believed that using

too low of an MCE cutoff might create falsely high or low likelihood ratios in some substances,

decreasing diagnosis accuracy. However, based on these results, it is reasonable to conclude that

filtering by MCE yields negligible changes in system accuracy. Using an adjusted likelihood

ratio with A = 0.01 already mitigates the potential problem, thus, the filter can be removed from

the system.

Table 5-1. Accuracy by substance in 10% (MC = 25)
Minimum CE Occurrences (MCE)
0 2 5 10 50
Severity Minor 64.7% 64.7% 64.7% 64.7% 64.7%
Moderate 74.4% 74.4% 74.4% 74.4% 74.2%
Major 77.8% 77.8% 77.7% 77.7% 77.6%
Death 62.2% 62.2% 62.2% 62.2% 58.1%

Table 5-2. Accuracy by major and minor categories in 10% (MC = 25)
Minimum CE Occurrences (MCE)
0 2 5 10 50
Severity Minor 64.1% 64.1% 64.1% 64.1% 63.9%
Moderate 72.7% 72.7% 72.7% 72.6% 72.4%
Major 75.4% 75.4% 75.4% 75.1% 75.4%
Death 58.2% 58.2% 58.2% 58.2% 63.3%

Table 5-3. Accuracy by major category in 10% (MC = 25)


Severity Minor
Moderate
Major
Death


Minimum CE Occurrences (MCE)
0 2 5 10 50
63.9% 63.9% 63.8% 63.8% 63.8%
70.0% 70.0% 69.9% 69.9% 69.8%
70.5% 70.5% 70.4% 70.0% 70.5%
55.7% 55.7% 54.4% 54.4% 54.4%









The second filter to be examined was the MC filter. Using adjusted calculations with a

constant MCE value of 10, MC was tested for values of 0, 2, 5, 10, 25, 50, and 100. Again, to

enable comparisons between the three diagnosis levels of substance, major and minor categories,

and major category alone, the percentage of exposures appearing in the top 10% of the trained

diagnoses was used as the accuracy measurement. Additionally, since varying MC directly

affects the number of trained diagnoses in the system, it was hoped that the 10% accuracy

measurement would enable comparisons between systems generated by different MC filter

values. Table 5-4 shows the accuracy of the system when diagnosing by substance, Table 5-5

when diagnosing by major and minor categories, and Table 5-6 when diagnosing by major

category alone. Looking at the accuracies for minor, moderate, and major severities in both

Table 5-4 and Table 5-5, it is readily apparent that accuracy generally appears to decrease as MC

increases. Table 5-6 shows the same tendency for MC steps from 10 to 25 and 25 to 100, but

appears to plateau for MC values from 0 to 10 and 25 to 50. At first it might appear that using a

lower MC yields a more accurate system, and, therefore, the MC filter should be removed.

However, such a conclusion fails to account for the purpose of the MC filter. As MC decreases,

more possible diagnoses with less supporting cases are added to the system. As more diagnoses

are added to the system, the accuracy calculation based on the top 10% includes substances that

are ranked lower on the differential diagnosis. It turns out that the number of diagnoses that are

added to the top 10% outweighs the number of new exposure cases being tested against the

system. As a result, the lower the MC value, the more accurate the system appears. The plateaus

observed in Table 5-6 are also accounted for by this explanation because the top 10% of cases

evaluates to the same number for MC's of 0, 2, 5, and 10 as well as for MC's of 25 and 50.









Table 5-4. Accuracy by substance in 10% (MCE = 10)
Minimum Exposure Cases (MC)
0 2 5 10 25


Severity Minor
Moderat
Major
Death

Table 5-5. Accuracy


Severity Minor
Moderat
Major
Death

Table 5-6. Accuracy


Severity Minor
Moderat
Major
Death


50 100


74.4% 72.9% 69.6% 67.4% 64.7% 62.9%
e 80.3% 79.7% 78.0% 76.6% 74.4% 71.6%
80.6% 81.7% 81.3% 79.8% 77.7% 75.1%
62.0% 65.8% 63.3% 62.8% 62.2% 61.2%

by major and minor categories in 10% (MCE = 10)
Minimum Exposure Cases (MC)
0 2 5 10 25 50
71.7% 70.1% 68.9% 67.5% 64.1% 63.0%
e 79.1% 78.0% 76.9% 75.6% 72.6% 71.8%
81.1% 80.3% 79.7% 78.5% 75.1% 74.2%
67.1% 68.4% 68.4% 67.1% 58.2% 59.2%

by major category in 10% (MCE = 10)
Minimum Exposure Cases (MC)
0 2 5 10 25 50
68.5% 68.5% 68.5% 68.6% 63.8% 64.1%
e 73.4% 73.4% 73.4% 73.5% 69.9% 70.2%
73.9% 73.9% 74.0% 74.0% 70.0% 70.3%


58.2% 58.2% 58.2% 58.2% 54.4% 56.4% 51.3%


Since comparing MC values using an accuracy based on the top 10% of trained diagnoses

failed to yield the desired results, a second accuracy measurement was calculated using the

correct diagnoses appearing in the top ten slots of the differential diagnosis. From a user

standpoint, this accuracy measurement is more appropriate because the list size that a user can

process without being overwhelmed is not dependent on the number of trained substances.

Table 5-7 shows the accuracy of the system when diagnosing by substance, Table 5-8 when

diagnosing by major and minor categories, and Table 5-9 when diagnosing by major category

alone. Looking at the minor, moderate, and major severity rows in Table 5-7, Table 5-8, and

Table 5-9, it can be seen that as MC increases, accuracy also increases. The data tells us little

about selecting a value for MC because it indicates what is expected of any system: As more

cases are used to define each substance, system accuracy should increase. Another contributor to


58.9%
64.8%
68.6%
46.8%



100
58.4%
66.7%
69.2%
49.3%



100
60.3%
66.4%
65.2%









the increase in accuracy is that fewer substances are trained as MC increases. With fewer

substances, the top ten substances become a larger portion of the available diagnoses. Even

random guessing would experience an increase in accuracy under these circumstances.

Table 5-7. Accuracy by substance in 10 (MCE = 10)
Minimum Exposure Cases (MC)
0 2 5 10 25 50 100


Severity Minor
Moderate
Major
Death

Table 5-8. Accuracy by


Severity Minor
Moderate
Major
Death

Table 5-9. Accuracy by


Minor
Severity Moderate
Major
Death


41.2% 41.4% 42.1% 43.2% 47.2% 54.4%
50.0% 50.5% 51.4% 52.9% 57.1% 63.2%
53.4% 54.6% 56.0% 58.2% 62.4% 68.1%
35.4% 39.2% 41.8% 44.9% 45.9% 50.7%

major and minor categories in 10 (MCE = 10)
Minimum Exposure Cases (MC)
0 2 5 10 25 50
63.0% 63.1% 63.2% 63.4% 64.1% 65.3%
71.4% 71.6% 71.7% 72.0% 72.6% 74.1%
73.6% 74.2% 74.3% 74.7% 75.1% 76.4%
58.2% 58.2% 58.2% 58.2% 58.2% 60.5%

major category in 10 (MCE = 10)
Minimum Exposure Cases (MC)
0 2 5 10 25 50
79.5% 79.5% 79.5% 79.6% 79.8% 80.1%
83.8% 83.8% 83.9% 83.9% 84.0% 84.5%
83.9% 84.1% 84.3% 84.3% 84.5% 84.7%
69.6% 70.9% 72.2% 72.2% 72.2% 71.8%


In an attempt to normalize accuracies, a ratio of the data in Table 5-7, Table 5-8, and

Table 5-9 versus the accuracy of diagnosing by random guessing was calculated. However, it

was found that the ratio suffered from problems similar to the accuracies calculated in Table 5-4,

Table 5-5, and Table 5-6. Lowering MC increases the number of trained diagnoses in the

system, adversely effecting random guessing. As a result, the ratio falsely indicated that a lower

MC cutoff would yield better results. A second attempt at normalizing the accuracies calculated

the ratio of the data in Table 5-7, Table 5-8, and Table 5-9 against a system that selected its top

ten choices based on prior probabilities alone. Figure 5-3 shows a graph of the ratio for minor,


71.0%
76.4%
77.7%
57.4%



100
71.0%
78.1%
79.0%
61.3%



100
82.4%
86.3%
85.4%
71.8%











moderate, and major severities when diagnosing by substance. Likewise, Figure 5-4 displays the

ratio for diagnosing by major and minor categories and Figure 5-5 the ratio for diagnosing by

major category alone. The graphs indicate that as MC increases, ensuring better representative

likelihood calculations, the system tends to perform better. The increase appears to be almost

linear, with perhaps a slight tendency towards diminishing returns as MC increases. There is no

evidence of any breakpoints that would yield a superior MC cutoff These results indicate that

the adjusted likelihood ratio is performing well and that the exact value used for MC is

unimportant. However, a reasonable MC value of at least ten should be chosen to ensure that

outliers do not excessively influence diagnosis.


4.0

3.5

S3.0

2.5

S2.0

2 1.5


0 10 20 30 40 50 60 70 80 90 100
Minimum Exposure Cases (MC) -*- Minor -.- Moderate ---- Majo

Figure 5-3. Accuracy ratios by substance

2.2

2.0
0
i 1.8

1.6

1.4

1.2

S1.0
0 10 20 30 40 50 60 70 80 90 100
Minimum Exposure Cases (MC) -*- Minor -- Moderate-A- Majo

Figure 5-4. Accuracy ratios by major and minor categories










1.5

1.4

0 1.3
S1.2 ---------------------------------------------------




1.0
0 10 20 30 40 50 60 70 80 90 100
Minimum Exposure Cases (MC) -*-- Minor--- Moderate- Majo
1.0 -----------------------



Figure 5-5. Accuracy ratios by major category

Comparing Figure 5-3, Figure 5-4, and Figure 5-5, it can be seen that the slopes and the

ratios are higher for diagnosis by substance than diagnosis by major and minor categories, which

in turn are higher than diagnosis by major categories. The reason is that the number of diagnoses

trained for diagnosing by substance (around 200 to 600) is significantly more than diagnosing by

major and minor categories (around 100 to 200), which is more than diagnosing by major

category alone (around 50 to 60). With more possible diagnoses, the problem becomes more

difficult to diagnose in the top ten without intelligence. Thus, the system's performance ratio

improves as more substances are added. Additionally, the curves indicate that the system scales

well to a large number of diagnoses since the ratios and slopes increase as the available

diagnoses increase. Another notable characteristic of the curves is that they indicate that the

system performs better the more severe the case. The primary reason is that more severe cases

generally have more associated clinical effects. With more clinical effects, the system has more

information to properly differentiate between various diagnoses, yielding a higher accuracy. The

good news is that the most important cases are the most severe cases, and this is precisely where

the system performs best.









Table 5-10 displays a representative chart of system performance with MC = 10 and

MCE = 0. To enable comparison between the various forms of diagnosis, the percentage of

exposures appearing in the top 10% of trained diagnoses is used as the accuracy calculation.

Table 5-10 reiterates the fact that the system performs better the more severe the case. Once

again, death is the exception due to limitations in the data and system failures in the body leading

to cascading clinical effects. Moreover, the difficulties associated with the cases involving death

make it futile to discuss trends for that severity. For major and moderate severities, diagnosing

by substance performs best, followed by major and minor categories and finally by major

category alone. The converse is true for minor severity cases, where diagnosing by major

category alone performs best. Though not universally observed in the test runs, this accuracy

inversion is not uncommon and is most likely due to the lack of clinical effects in minor severity

cases. With minimal clinical effects it is easier to classify the general major category of a toxin

than to identify the specific toxin involved.

Table 5-10. Accuracy in 10% with MC = 10 and MCE = 0
Diagnosis by:
Substance Major & Minor categories Major category
Severity Minor 67.4% 67.5% 68.6%
Moderate 76.6% 75.7% 73.5%
Major 79.8% 78.9% 74.8%
Death 62.8% 67.1% 59.5%

The accuracy calculations in Table 5-10 show a high value of 79.8%, which occurs when

diagnosing major severity cases by substance. These accuracy calculations include a large

number of cases involving only a single clinical effect, which would be difficult for even the

most experienced expert to diagnose without additional information. To better demonstrate

system functionality, the accuracies from Table 5-10 are recalculated in Table 5-11 to include

only cases with at least three recorded clinical effects. A large improvement in system accuracy









is observed, particularly in minor severity cases where accuracies are boosted from the 60%

range into the mid-70% range. Additionally, the accuracy of diagnosing major severity cases by

substance and by major and minor categories is raised above 80%. Further system

improvements could be achieved by removing useless categories, such as the "unknown drug"

diagnosis, and consolidating nearly redundant substances, such as "aspirin: pediatric

formulation," "aspirin: unknown if adult or pediatric formulation," and "aspirin: adult

formulation." However, one purpose of the research presented here is to bypass the need for

expert input when generating a system and making such improvements would assume knowledge

in the domain of toxicology.

Table 5-11. Accuracy in 10% with MC = 10, MCE = 0, and 3+ CE's
Diagnosis by:
Substance Major & Minor categories Major category
Severity Minor 75.1% 74.6% 74.0%
Moderate 78.4% 77.2% 75.0%
Major 81.0% 80.5% 75.9%
Death 69.4% 71.4% 66.7%

Further credence to the system's viability was subjectively given by two toxicologists at

the FPIC who experimented with the system's user interface. The toxicologists found the top

diagnosis to be reasonable for every input given to the system as well as several other

appropriate diagnoses listed in the top ten. Considering the purpose of the system, as an

automatically generated toxicology consultant, and the intentional simplicity of the system

design, the resulting accuracies and positive reactions by toxicology experts confirm that the

research performed to create this system was a success.

System Performance

An important aspect of system usability is the amount of processing time required to train

the system and the response time of the user interface to diagnostic queries. System calculations









were intentionally kept simple to enable scalability, rapid system generation, and a low response

time. For research purposes, the system was developed using Microsoft Access 2002 on a

Compaq Presario 2100 laptop with a 2.4GHz processor and 320MB of RAM. Training the

system on four years of data took just under three minutes. Running a diagnosis under worst

case conditions takes approximately three seconds when the program is first queried. Once

loaded into RAM, however, the diagnosis runtime is cut in half. Obviously, porting the program

to the dedicated SQL server used by the FPIC would offer further speed improvements.

As the number of cases in the system increases, system training time could increase

significantly, though there should be a minimal impact on diagnosis time due to the architecture

of the system. The section in this chapter entitled "System Design Principles," discusses a

method for incremental updates that enables the system to retain its training from previous years

and simply adjust the system's calculations based on new data. Based on the performance

measurements taken, it is reasonable to expect that the system could be trained rapidly by a

central database server. In the future, training results could be downloaded to applications on

handheld personal digital assistants without significant loss of usability.

Conclusion

This chapter has presented the research and development of a system capable of generating

a differential diagnosis for exposures to a single toxin. First, we discussed the source data and

guiding system design principles of simplicity, understandability, automatic system generation,

and incremental updates. Next, the theory behind system development was explained. Finally,

the system operation, user interface, system results, and system performance were presented.

The system presented here serves as a foundation for the multiple exposure research presented

next chapter.









CHAPTER 6
DIAGNOSING MULTIPLE EXPOSURE CASES

The previous chapter presented the development of a system for diagnosing exposures to a

single toxin. The resulting system serves as a foundation for the multiple exposure research

discussed in this chapter. Although system development did not proceed as expected, the results

reveal intriguing insights into the diagnosis of multiple exposures in the field of toxicology. The

chapter begins with the motivation for developing the system, continues by briefly describing the

system approach, discusses the results of diagnosing multiple disorders using various sets of

training data, presents the research conclusions, and closes with a discussion of future work.

Motivation for Diagnosing Multiple Exposures

Although many established methods for designing knowledge-based systems exist, as

discussed in Chapter 3, none have fully solved the problem of diagnosing multiple disorders.

The difficulty is that multiple disorder cases can display non-linear interactions, a problem

observed in the field of toxicology. When simultaneously present in the body, toxins can interact

antagonistically or synergistically, masking or otherwise altering the signs and symptoms that

would normally appear for each individual exposure. Little documentation exists for the

majority of toxic exposure combinations that can occur and only a limited number of systems

exist in the field of toxicology that attempt to account for multiple exposures in some way, as

covered in Chapter 4. None of these systems fully solve the problem, nor are they readily

available for use by American toxicologists.

Beyond the motivation of developing technology that addresses an unsolved diagnostic

problem, the more important concern of saving lives is at stake. The Toxic Exposure

Surveillance System (TESS) report states that in 2004 "50.6% of fatal cases involved 2 or more

drugs or products" (Watson et al., 2004, p. 593). This statistic makes it plain that timely and









accurate identification of exposures involving multiple substances is extremely important. For

the sake of advancing the field of information engineering as well as the preservation of life, the

problem addressed by the research in this chapter is both relevant and important.

System Approach

Chapter 5 presents the development of a system for diagnosing exposures to a single toxin.

That system serves as a foundation for the diagnosis of multiple exposure cases discussed in this

chapter. Like the single exposure system, the goal of the multiple exposure system is to serve as

a consultant by producing differential diagnoses based on the clinical effects supplied by the

user. Unless otherwise noted, the training and testing procedures for the system described in this

chapter conform to the following characteristics:

* The clinical effects and substance identifiers are based on TESS standards.

* Prior probabilities and adjusted likelihood ratios with a A of 0.01 are used to determine the
differential diagnoses, see Equations 5-1 and 5-3 for details.

* The system is tested at three diagnostic levels: diagnosing by substance, diagnosing by
major and minor categories, and diagnosing by major category alone.

* Three levels of medical outcomes are tested against the system: exposures with a minor
severity or worse, moderate severity or worse, and major severity or worse. (Note that
testing solely based on exposures resulting in death is not included due to the inaccuracies
discussed in Chapter 5.)

* A minimum exposure cases (MC) value of 10 and minimum clinical effect occurrences
(MCE) value of 0 serve as the cutoffs for testing the system.

* Accuracies are calculated as the percentage of test exposures identified correctly in the top
10% of the trained diagnoses.

For this research, the Florida Poison Information Center (FPIC) provided access to all the cases

recorded in its Jacksonville database from 2002 through 2006. With the addition of a fifth year,

the cleaned database used for system generation now contains 37,617 single exposure cases and

8,901 multiple exposure cases.









Diagnosing Multiple Exposures using Solely Multiple Exposure Cases

During the initial phase of testing, all multiple exposure cases are extracted from the

database. TESS standards require that each substance involved in a toxic exposure be assigned a

sequence number that ranks the substance in accordance with its relative contribution to the

observed clinical effects. To simplify the initial attempts to diagnose multiple disorders, only the

primary and secondary contributors in each multiple exposure case are considered. TESS

standards also require that substances be recorded by a product specific code as well as a generic

substance code. From this requirement, a problem arises. When determining the number of

substances involved in an exposure, the FPIC database uses the product specific code. As a

result, two products marketed by different companies are listed as separate substances, even if

their active ingredient is the same. When cleaning the data, if the generic substance codes for the

top three contributing substances are identical, the case is removed from the dataset. If the first

two generic substance codes are identical but the third is different, the third substance is treated

as the secondary contributor for the case. Finally, the multiple exposure cases are filtered so that

only cases resulting in minor effects, moderate effects, major effects, or death are used to train

and test the system. The cleaned dataset contains 8,901 multiple exposure cases.

When generating the multiple exposure system, each pair of primary and secondary

contributors is trained individually as a single diagnosis. Prior probabilities, adjusted likelihood

ratios, and both MC and MCE filters are calculated and implemented in the same manner as

discussed in Chapter 5. Testing also follows the same process of training the system on

approximately 90% of the cases and then attempting to diagnose the remaining 10% of the cases.

By repeating the process ten times, the system is completely tested against every case in the

database. Finally, the results are combined and accuracies calculated as the percentage of test

exposures identified correctly in the top 10% of the trained diagnoses.









The original results from training and testing the system on multiple exposure cases are

displayed in the first column of Table 6-1. With an accuracy ranging from 28.3% to 50.1%, the

system's deplorable performance is painfully obvious. To further explore the failure, the system

was tested for MC values of 15, 20, and 25. The results of these tests show a similar lack of

accuracy (Table 6-1). Looking at the rows in the table from left to right, we can see that the

performance gradually decays as MC increases. As discussed in Chapter 5, such an observation

is expected due to the MC cutoff lowering the number of diagnoses included in the top 10%.

The most interesting characteristic of the data in Table 6-1 is that as the severity increases, the

accuracy decreases. This observation is contrary to the results observed in the single exposure

system. Normally, the system's accuracy increases with severity because more severe cases

contain more clinical effects, making diagnosis easier for the system.

Table 6-1. Accuracy (varying MC) of system trained & tested on multiple exposures
Minimum Exposure Cases (MC)
Diagnosed by Severity 10 15 20 25
Substance Minor 33.5% 30.4% 29.0% 27.6%
Moderate 30.0% 26.9% 25.3% 22.9%
Major 28.3% 23.3% 21.8% 18.5%
Major & Minor Minor 47.3% 43.6% 39.5% 38.2%
categories Moderate 45.9% 42.1% 37.6% 36.5%
Major 37.6% 34.5% 30.9% 30.6%
Major Minor 50.1% 46.8% 45.7% 43.4%
category Moderate 47.2% 44.1% 43.0% 40.4%
Major 43.0% 39.6% 38.2% 36.5%
Average 40.3% 36.8% 34.5% 32.7%

There are a number of plausible explanations for why accuracy might decrease with

severity, but two are particularly compelling. The first explanation is that the decrease in

accuracy is caused by the non-linear interactions between multiple toxins. As the severity of an

exposure increases, there is greater opportunity for a combination of toxins to produce effects not

normally associated with any of the toxins individually. This could lower the accuracy of the









system because the clinical effects would behave more erratically and might not correspond to

the majority of cases. The second explanation is that the decrease in accuracy is simply caused

by lack of quality data. As the severity cutoff becomes more stringent, fewer cases are tested

against the system, leading to a poor sampling and quite possibly lower accuracies on average.

Lack of quality data could account for both the low accuracy observed overall as well as the

decrease in accuracy as the severity increases.

Another parameter that might contribute to the system's poor accuracy is the A parameter

implemented in the adjusted likelihood ratio equation, see Equation 5-3. The A parameter is

meant primarily to safeguard against multiply-by-zero and divide-by-zero errors, however, a

small training set might cause A to adversely influence the diagnostic results. Table 6-2

compares the original system accuracy, when using a A of 0.01, to accuracies calculated with a

A of 0.1 and 0.001. It was discovered that increasing A to 0.1 causes an average decrease in

accuracy of 1.6%, while decreasing A to 0.001 causes an average increase in accuracy of only

0.1%. These results imply that a A of 0.01 yields satisfactory relative performance compared to

other A parameters that might be selected.

Table 6-2. Accuracy (varying A) of system trained & tested on multiple exposures
Diagnosed by Severity A = 0.1 A = 0.01 A = 0.001
Substance Minor 32.3% 33.5% 33.5%
Moderate 29.0% 30.0% 29.8%
Major 26.4% 28.3% 28.0%
Major & Minor Minor 46.5% 47.3% 47.4%
categories Moderate 44.6% 45.9% 46.1%
Major 35.0% 37.6% 38.3%
Major Minor 49.4% 50.1% 50.2%
category Moderate 46.1% 47.2% 47.2%
Major 39.5% 43.0% 43.4%
Average 38.8% 40.3% 40.4%









In an attempt to improve accuracy and better understand the system's poor performance, a

number of system variations were tested. The resulting accuracies for these systems are

presented in Table 6-3, where the column labeled "original accuracies" represents the original

system. The first column of accuracies displays the results for a system that assumes all trained

diagnoses are equally likely. As expected, the system performs worse than the original.

However, the results of this test do reveal a few important insights. Note that, unlike the

original, the accuracies for diagnosis by substance as well as major and minor categories increase

as severity increases. The significance of this observation is that the system is indeed processing

clinical effects correctly. Thus, the accuracies decreasing with increased severities in the

original testing are not due to the non-linear interactions of multiple substances. Rather, the

results imply that the prior probability is dominating the original diagnoses. The most likely

cause for this problem is lack of quality data. Additionally, the fact that diagnosis by major

category alone still displays a decreasing accuracy with increasing severity fits the explanation.

Major categories cover a broad variety of substances, making it difficult to train a general model

that properly fits the major category as a whole. The problem is compounded when attempting

to identify two different major categories in the same diagnosis.

Table 6-3. Accuracy comparison of various systems for multiple exposure diagnosis
Exposure No prior Original Double Order Primary
Diagnosed by severity probability accuracies exposures reversed correct
Substance Minor 16.5% 33.5% 35.3% 42.4% 64.8%
Moderate 17.5% 30.0% 30.9% 40.3% 63.0%
Major 23.9% 28.3% 28.5% 39.8% 63.7%
Major & Minor Minor 21.7% 47.3% 47.1% 54.0% 82.7%
categories Moderate 23.0% 45.9% 45.1% 53.7% 82.9%
Major 23.5% 37.6% 42.3% 49.4% 81.2%
Major Minor 24.2% 50.1% 50.8% 56.0% 81.3%
category Moderate 23.7% 47.2% 47.1% 54.5% 81.5%
Major 22.7% 43.0% 42.0% 53.3% 80.9%
Average 21.9% 40.3% 41.0% 49.3% 75.8%









Another issue that could contribute to the low accuracy of the system is that multiple

exposure cases can consist of more than two substances. Since the system only considers the

primary and secondary contributors, any additional substances involved could affect the clinical

effects in a manner not normally predicted in a case only involving two substances. To improve

the quality of the training data, a system was created based solely on cases where exactly two

substances are involved. The system accuracy is reported in Table 6-3 under the column titled

"double exposures." Although this approach improves data quality, it also reduces the amount of

training cases from 8,901 to 5,149, a data reduction of over 40%. The end results yield a

nominal increase in the average accuracy of only 0.7%.

Further attempts to improve accuracy resulted in two more variations of the system. The

original system requires the correct identification of both primary and secondary contributors for

a diagnosis to be considered successful. The first variation relaxes the constraints of the original

system by allowing the order of the primary and secondary contributing substances to be

reversed. Thus, diagnosing a test case with a primary contributor of A and a secondary

contributor of B as having a primary contributor of B and a secondary contributor of A is

considered an accurate diagnosis. As seen in Table 6-3 under the column labeled "order

reversed," the relaxed diagnosis criteria increase accuracy by an average of 8.9%.

Unfortunately, the resulting system is still not viable, having only achieved a maximum accuracy

of 56.0%. The second variation on the original system attempted to improve accuracy by

allowing the system to count any diagnosis as a correct match if the primary contributor matched

the primary contributor of the test case, regardless of the secondary contributors involved. As

shown in Table 6-3 under the column labeled "primary correct," this increases the system's

accuracy drastically, yielding a maximum accuracy of 82.9%. It should be noted that these









results are falsely optimistic because the most common substances involved in multiple

exposures are the primary contributors for many different substance combinations. As a result, a

number of different possible diagnoses could be considered "correct" diagnoses for any single

test case. Additionally, diagnosing multiple exposures by substance has a maximum accuracy of

64.8%, which is not an outstanding number. In spite of these shortcomings, the final system test

seems to indicate that the primary contributor might be the dominating force in most multiple

exposure cases. For that reason, the research presented in the following section focuses on

diagnosing the primary contributor.

Diagnosing Multiple Exposures with Single Exposure Cases

The findings in the previous section seem to indicate that the clinical effects observed in

most multiple exposure cases are dominated by the signs and symptoms associated with the

primary contributor. To determine the truth of the matter, a system trained entirely on single

exposures was tested to see if it could accurately diagnose the primary contributor in multiple

exposure cases. The first column of Table 6-4 shows the accuracy of the system when

diagnosing the primary contributor for every multiple exposure case. The next column shows

the results when the test cases are limited to double exposures. With accuracies reaching as high

as 84.9%, the results confirm that the clinical effects observed in most multiple exposure cases

are indeed dominated by those associated with the primary contributor. Furthermore, the

evidence indicates that the poor performance observed in the system trained solely on multiple

exposure cases was not due to non-linear interactions between multiple toxins. As discussed in

the previous section, the remaining explanation for the system failure is lack of sufficient data.









Table 6-4. Accuracy diagnosing primary contributors using single exposures1
Singles Singles Combined Combined
diagnosing diagnosing diagnosing diagnosing
Diagnosed by Severity multiples doubles multiples doubles
Substance Minor 75.4% 75.2% 79.1% 77.5%
Moderate 77.2% 77.3% 81.1% 79.3%
Major 78.7% 81.8% 83.5% 83.1%
Major & Minor Minor 77.8% 76.4% 80.4% 78.2%
categories Moderate 81.3% 80.4% 83.3% 81.6%
Maj or 84.9% 84.9% 86.9% 86.2%
Major Minor 74.4% 74.9% 77.7% 76.9%
category Moderate 75.5% 76.2% 78.7% 78.5%
Major 75.8% 78.3% 79.9% 80.5%
Average 77.9% 78.4% 81.2% 80.2%

To test whether lack of data caused the poor performance observed in the system trained

solely on multiple exposures, a system was trained using a combination of multiple exposures

and single exposures to diagnose the primary contributor in multiple exposure cases. For

training purposes, each multiple exposure was treated as a single exposure case with the primary

contributor as the correct diagnosis. All single exposure cases were used for training along with

approximately 90% of the multiple exposure cases. The remaining 10% of the multiple

exposures were tested against the system to see if it could identify the primary contributor. The

training and testing was repeated ten times to thoroughly test the system against every available

multiple exposure case. In a similar manner, a system trained on a combination of double

exposures and single exposures was tested to see if it could identify the primary contributor in

double exposure cases. The results of these two tests are displayed in the last two columns of

Table 6-4. On average, the accuracy increased by 3.3% when diagnosing multiple exposures and

1.8% when diagnosing only double exposures. These results indicate that valuable information



1 To enable maximum comparability, minor restrictions were instated to ensure that all test runs within the same
diagnosis level contained exactly the same number of trained substances on every test cycle. Explicitly, there were
exactly 431 possible diagnoses for diagnosing by substance, 129 possible diagnoses for diagnosing by major and
minor categories, and 60 possible diagnoses for diagnosing by major category alone.









capable of yielding greater than 80% accuracy is contained in the multiple exposure cases.

Moreover, these results are consistent with the explanation that the system failure when training

on multiple exposures alone was due to lack of sufficient data. It is also interesting to note that

the system performed slightly better diagnosing multiple exposures, which generally should

contain more extraneous clinical effects, than when diagnosing double exposures. The

explanation is that training with multiple exposures included the information from approximately

8,011 cases per diagnosis cycle, whereas, training with double exposures included approximately

4,634 cases per diagnosis cycle. Presumably, having the same number double exposures as

multiple exposures would result in the double exposures performing better. A similar

observation can be made of the data presented in Table 6-5.

Table 6-5. Accuracy diagnosing secondary contributors using single exposures
Singles Singles Combined Combined
diagnosing diagnosing diagnosing diagnosing
Diagnosed by Severity multiples doubles multiples doubles
Substance Minor 69.6% 68.6% 77.6% 75.7%
Moderate 70.5% 69.5% 79.7% 77.6%
Maj or 69.5% 70.0% 81.6% 77.0%
Major & Minor Minor 67.8% 63.2% 78.3% 76.8%
categories Moderate 73.0% 69.5% 82.4% 80.9%
Major 77.6% 76.2% 86.2% 83.9%
Major Minor 62.1% 57.1% 71.4% 69.0%
category Moderate 64.4% 59.7% 72.9% 70.2%
Major 67.4% 63.0% 74.3% 69.6%
Average 69.1% 66.3% 78.2% 75.6%

The first two columns in Table 6-5 display the accuracies of a system trained solely on

single exposure cases and tested against the secondary contributor for both multiple and double

disorder cases. With average accuracies of 69.1% and 66.3%, the system performance is not



2 To enable maximum comparability, minor restrictions were instated to ensure that all test runs within the same
diagnosis level contained exactly the same number of trained substances on every test cycle. Explicitly, there were
exactly 431 possible diagnoses for diagnosing by substance, 129 possible diagnoses for diagnosing by major and
minor categories, and 60 possible diagnoses for diagnosing by major category alone.









stellar, however, it is high enough to raise a question: If the clinical effects in multiple exposure

cases are dominated by the primary contributor, why is the accuracy in diagnosing the secondary

contributor so high? Recall that during data cleaning all multiple exposure cases involving only

products with the same generic substance code are removed from the dataset. This cleaning is

only performed at the substance level. It is still likely that many multiple exposure cases consist

of primary and secondary substances that share the same major and minor categories. Belonging

to the same category makes it much more likely that the two substances exhibit similar clinical

effects. Examining the data, it was determined that 21.0% of the primary and secondary

contributors in all multiple exposure cases belonged to the same major category and 11.6%

belonged to the same minor category as well. Likewise, 21.9% of all primary contributors in

double exposure cases belonged to the same major category and 11.1% belonged to the same

minor category. Because these cases are more likely to be diagnosed correctly based on the

primary contributor, the accuracies are falsely optimistic.

The last two columns in Table 6-5 show the accuracies of a system trained on a

combination of single exposures and the secondary contributors for either multiple exposures or

double exposures. The addition of the secondary contributors improves the average system

accuracy by 9.1% for multiple exposure diagnosis and 9.3% for double exposure diagnosis.

Such a significant jump in accuracy attests that, although dominated by the primary contributor's

clinical effects, secondary contributors do produce enough clinical effects that the system can be

trained to at least recognize the most common multiple exposure combinations. Although some

of the accuracy can be accounted for by prior probabilities, the results give hope that further

research might enable reasonably accurate identification of secondary contributors.









The final step necessary to fully explore the impact of combining multiple exposure cases

with single exposure cases was to train a system with the combined data and use it to diagnose

only single exposure cases (Table 6-6). The first column shows the accuracy of a system trained

on single exposures alone when diagnosing single exposures. The second and third columns

display the accuracies for systems trained on single exposures along with the primary

contributors for either multiple or double exposures. The last two columns contain the

accuracies of systems trained on single exposures along with the secondary contributors for

either multiple or double exposures. Interestingly, those systems trained with the primary

contributors increased the average system accuracy from 74.6% to 74.9% when including

multiple exposures and 75.1% when including double exposures. Although a minor increase, it

is an increase nonetheless and lends further support to the conclusion that the clinical effects in

multiple exposure cases are dominated by the primary contributor. Furthermore, the average

accuracy for systems trained with secondary contributors decreased from 74.6% to 74.2% when

including multiple exposures and 74.4% when including double exposures. A lower accuracy is

to be expected since training on the secondary contributor would associate clinical effects caused

by the primary contributor with the secondary contributor instead. The minimal change in

accuracy can be partially explained by the multiple and double exposures that involve closely

related substances from the same major and minor categories, as discussed above. Additionally,

on average 33,855.3 single exposure cases were used to train the system on each cycle. The

added 8,901 multiple exposure cases or 5,149 double exposure cases only account for

approximately 20.8% and 13.2% of the training cases.









Table 6-6. Comparison of system accuracies when diagnosing single exposure cases3
Single Singles & Singles & Singles & Singles &
exp Multiples Doubles Multiples Doubles
Diagnosed by Severity alone (primary) (primary) (secondary) (secondary)
Substance Minor 68.3% 68.2% 68.4% 68.1% 68.2%
Moderate 77.5% 78.2% 78.0% 77.4% 77.4%
Major 80.7% 81.4% 81.4% 80.6% 80.8%
Major & Minor Minor 69.0% 68.9% 69.0% 68.6% 68.8%
categories Moderate 77.6% 77.7% 78.0% 77.2% 77.5%
Major 79.8% 80.6% 81.0% 80.6% 80.3%
Major Minor 68.8% 68.4% 68.9% 67.6% 67.9%
category Moderate 73.9% 74.3% 74.3% 73.4% 73.3%
Major 76.2% 75.9% 76.8% 74.7% 75.0%
Average 74.6% 74.9% 75.1% 74.2% 74.4%

Conclusions

This dissertation presents research performed to create a prototype knowledge-based

system for diagnosing toxic exposures. A major goal of the research is to bypass the knowledge

acquisition bottleneck of traditional knowledge-based systems by using data mining to

automatically generate the system. Because system generation assumes no knowledge about the

field of toxicology, lower accuracy percentages are to be expected; however, future research can

build off this foundation and intelligently modify substance groupings to improve performance.

Another important aspect of the system is the use of adjusted likelihood ratios. Likelihood ratios

are mathematical calculations that are commonly known and used throughout the medical field.

In this research, traditional likelihood ratios are adjusted by adding a fractional possibility to

every potential outcome. The result is a robust equation that mitigates multiply-by-zero and

divide-by-zero errors while rapidly converging to the same value as non-adjusted likelihood

ratios. Ultimately, the system is intended to serve as a diagnostic consultant by providing

3 To enable maximum comparability, minor restrictions were instated to ensure that all test runs within the same
diagnosis level contained exactly the same number of trained substances on every test cycle. Explicitly, the average
number of possible diagnoses for each testing cycle was 414.4 when diagnosing by substance, 125.6 when
diagnosing by major and minor categories, and 58.1 when diagnosing by major category alone.









differential diagnoses for toxic exposure cases based on observed clinical effects. The system

enables physicians to tap into the knowledge stored in poison control center databases, giving

decision support information in a simple, understandable format.

Chapter 5 presented the development of the system and its subsequent testing on single

exposure cases. The research explored the effects of two different filters for refining diagnosis

based on a minimum number of exposure cases and a minimum number of clinical effects.

System accuracy reached as high as 79.8% and increased above 80% when test cases were

required to involve more than one clinical effect. Furthermore, the user interface and system

operation received a positive response from two toxicologists and the diagnostic process was

found to be simple and fast enough to make implementation on personal digital assistants

(PDA's) a reality.

Chapter 6 continued the research by applying the system approach to multiple exposure

cases. Although initial tests yielded a poor performance, further examination determined that the

low accuracy was primarily due to lack of multiple exposure training cases. Further testing

revealed that the clinical effects observed in multiple exposures tend to be dominated by a single

substance called the primary contributor. Systems generated from a combined training set of

both single exposures and primary contributors from multiple exposure cases yielded

performances as high as 86.9% accuracy when diagnosing primary contributors. More

specifically, 86.9% of the cases were diagnosed in the top 13 out of 129 possible major and

minor category combinations.

The research performed on this system offers a number of contributions to both the field of

knowledge-based systems and medicine. First, being automatically generated, the system

bypasses the knowledge acquisition bottleneck of traditional knowledge-based systems. Second,









the system implements an approach to the unsolved problem of diagnosing multiple exposures.

Although lack of data inhibited the diagnosis of more than one substance at a time, the system

demonstrates effective diagnostic capabilities in identifying the primary contributor in multiple

exposure cases. Being able to diagnose the disorder causing the most detrimental clinical effects

is certainly valuable. Once the primary contributor is treated, it becomes easier to identify the

other contributors in a multiple exposure case. Furthermore, there is hope that with the

collection of more data the accuracy when simultaneously diagnosing multiple exposures will

improve. A third contribution is the application of intelligent systems to the field of toxicology.

At the present time, no American diagnostic systems exist for the field of clinical toxicology.

Although systems have been implemented for France, Bulgaria, and Russia, they use different

methods and are not readily available to assist American physicians. Finally, the creation of the

adjusted likelihood ratio serves as a method to bridge the gap between intelligent systems and the

medical field. Too often, intelligent systems fail because they use methods that are unknown and

distrusted by the medical community. The adjusted likelihood ratio utilizes mathematics

commonly accepted in medicine with a slight modification that creates a robust calculation

without losing the essence of the original equation.

Future Work

The system presented in this dissertation is a prototype. Although the results show great

promise, there is much to be done before a final system can be implemented in the real world.

Recently, poison control centers (PCC's) around the United States have converted their

databases from TESS standards to a new system known as the National Poison Data System

(NPDS). To enable long-term growth and development of the knowledge-based consultant, the

system must also be converted to the NPDS standard. Additionally, more data must be acquired

through a petition to the FPIC's in Tampa and Miami and a proposal written to the national









repository. With more data in hand, the system can be thoroughly tested for diagnosing

secondary contributors, both individually as well as in combination with primary contributors.

From the outset, a major objective of the research was to bypass the knowledge acquisition

bottleneck by generating a knowledge-based system capable of producing meaningful and useful

results without the need for an active, overseeing expert. This design principle inherently limited

the designer from making any changes that required even a fundamental knowledge of

toxicology. Now that the prototype is complete, several changes can be implemented for the

betterment of the system. First, useless substance diagnoses, such as the "unknown drug"

diagnosis, should be removed. Second, redundant substances, such as "aspirin: pediatric

formulation," "aspirin: unknown if adult or pediatric formulation," and "aspirin: adult

formulation," should be consolidated into a single diagnosis. Third, the category divisions

should be examined by a toxicologist to create groupings based primarily on clinical effects. For

example, most opioids tend to exhibit similar clinical effects, whereas, the effects associated with

spider bites vary greatly depending on the species of spider. Intelligently restructuring diagnosis

groupings could greatly increase the accuracy and utility of the knowledge-based consultant.

After refining the system, the next step is to field test the system in a PCC. Based on these

results, further improvements can be implemented. One possible concern is that, although the

system may perform well on toxic exposure cases as a whole, it may be more beneficial for the

system to specialize on more difficult and deadly problems. In other words, it may be better to

sacrifice accuracy on simple, routine exposures to increase the accuracy of the system on

exposures that are dangerous and difficult to diagnose.

Once the system is fully tested, there will be freedom to expand in various directions. The

general system approach could be applied to other domains, particularly those in the medical









field. The consultant could be implemented as a program on a PDA that physicians can carry

with them at all times. Beyond simply diagnosing disorders, the system could be expanded by

the addition of recommended treatments for each type of exposure. Once a physician makes a

diagnosis, the program could serve as a reference for the treatment of the patient. Finally, the

system could be converted into a program for knowledge discovery within toxicology. When

training on cases in the database, the system identifies relationships between specific exposures

and their clinical effects. While many of these relationships are already known, it is quite

possible that the system is discovering new relationships that were previously undocumented.

This is particularly true when characterizing multiple exposure cases, many of which have little

documentation. Examining the relationships within a trained system could lead to new

discoveries in the field of toxicology.4


























4 For examples of systems created to discover unknown relationships within a field, refer to Breault et al. (2002) and
Brossette et al. (1998).









LIST OF REFERENCES


Abidi, S. & Manickam, S. (2002). Leveraging XML-based electronic medical records to extract
experiential clinical knowledge: An automated approach to generate cases for medical
case-based reasoning systems. International Journal of Medical Informatics, 68, 187-203.

Althoff, K., Bergmann, R., Wess, S., Manago, M., Auriol, E., Larichev, O., Bolotov, A.,
Zhuravlev, Y. & Gurov, S. (1998). Case-based reasoning for medical decision support
tasks: The Inreca approach. Artificial Intelligence in Medicine, 12, 25-41.

Atzmueller, M., Baumeister, J. & Puppe, F. (2003a). Evaluation of two strategies for case-based
diagnosis handling multiple faults. In M. Nick & K. Althoff, Eds., Proceedings of the 2nd
German Workshop on Experience Management, GWEM 2003, Luzern, Switzerland.
http://CEUR-WS.org/Vol67/, February, 2006. CEUR Workshop Proceedings.

Atzmueller, M., Baumeister, J. & Puppe, F. (2003b). Inductive Learning of Simple Diagnostic
Scores. In P. Petra, R. Brause & H. Holzhutter, Eds., Medical Data Analysis: 4
International Symposium, ISMDA 2003, Berlin, Germany, pp. 23-30. Berlin: Springer-
Verlag.

Atzmueller, M., Baumeister, J. & Puppe, F. (2004a). Quality measures for semi-automatic
learning of simple diagnostic rule bases. In D. Seipel, M. Hanus, U. Geske & O.
Bartenstein, Eds., 15th International Conference on Applications ofDeclarative
Programming and Knowledge Management, INAP 2004 and 18th Workshop on Logic
Programming, WLP 2004, Potsdam, Germany, pp. 65-78. Berlin: Springer-Verlag.

Atzmueller, M., Baumeister, J., Puppe, F., Shi, W. & Barnden, J. (2004b). Case-based
approaches for diagnosing multiple disorders. In V. Barr & Z. Markov, Eds., Proceedings
of the 17th International Florida Artificial Intelligence Research Society Conference,
FLAIRS 2004, Miami Beach, Florida, pp. 154-159. Menlo Park, California: AAAI Press.

Au, W. & Chan, K. (2003). Mining fuzzy association rules in a bank-account database. IEEE
Transactions on Fuzzy Systems, 11, 238-248.

Baumeister, J., Seipel, D. & Puppe, F. (2001). Incremental development of diagnostic set-
covering models with therapy effects. In G. Kem-Isberner, T. Lukasiewicz & E. Weydert,
Eds., Proceedings of the KI-2001: Workshop on Uncertainty in Artificial Intelligence,
Vienna, Austria.

Baumeister, J., Atzmueller, M., & Puppe, F. (2002). Inductive Learning for Case-Based
Diagnosis with Multiple Faults. In S. Craw & A. Preece, Eds., Advances in Case-Based
Reasoning. Proceedings of the 6th European Conference on Advances in Case-Based
Reasoning, Aberdeen, Scotland, pp. 28-42. Berlin: Springer-Verlag.

Ben-Bassat, M., Carlson, R., Puri, V., Davenport, M. Schriver, J., Latif, M., Smith, R., Portigal,
L., Lipnick, E. & Weil, M. (1980). Pattern-based interactive diagnosis of multiple
disorders: The MEDAS system. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2, 148-160.









Ben-Bassat, M. Campell, D., MacNeil, A. & Weil, M. (1983). Evaluating multimembership
classifiers: A methodology and application to the MEDAS diagnostic system. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 5, 225-229.

Bradley, P., Fayyad, U. & Reina, C. (1998). Scaling clustering algorithms to large databases. In
R. Agrawal, P. Stolorz & G. Piatetsky, Eds., Proceedings of the 4 International
Conference on Knowledge Discovery and Data Mining, KDD-98, New York, pp. 9-15.
Menlo Park, California: AAAI Press.

Breault, J., Goodall, C. & Fos, P. (2002). Data mining a diabetic warehouse. Artificial
Intelligence in Medicine, 26, 37-54.

Brossette, S., Sprague, A., Hardin, J., Waites, K., Jones, W. & Moser, S. (1998). Association
rules and data mining in hospital infection control and public health surveillance. Journal
of the American Medical Informatics Association, 5, 373-381.

Buchanan, B. & Shortliffe, E. (1984a). Rule-BasedExpert Systems. Reading, Massachusetts:
Addison-Wesley Publishing Company.

Buchanan, B. & Shortliffe, E. (1984b). Uncertainty and evidential support. In B. Buchanan & E.
Shortliffe, Eds., Rule-Based Expert Systems: The MYCIN Experiments of the Stanford
Heuristic Programming Project, pp. 209-232. Reading, Massachusetts: Addison-Wesley
Publishing Company.

Cios, K. & Moore, G. (2002). Uniqueness of medical data mining. Artificial Intelligence in
Medicine, 26, 1-24.

Darmoni, S., Massari, P., Droy, J., Mahe, N., Blanc, T., Moirot, E. & Leroy, J. (1994). SETH:
An expert system for the management on acute drug poisoning in adults. Computer
Methods and Programs in Biomedicine, 43, 171-176.

Darmoni, S., Massari, P., Droy, J., Blanc, T. & Leroy, J. (1995). Functional evaluation of Seth:
An expert system in clinical toxicology. In P. Barahona, M. Stefanelli & J. Wyatt, Eds.,
Artificial Intelligence in Medicine: 5th Conference on Artificial Intelligence in Medicine
Europe, AIME '95 Proceedings, Pavia, Italy, pp. 231-238. Berlin: Springer-Verlag.

Delgado, M., Sanchez, D., Martin-Bautista, M. & Vila, M. (2000). Mining association rules with
improved semantics in medical databases. ArtificialIntelligence in Medicine, 21, 241-245.

Dempster, A. (1967). Upper and lower probabilities induced by a multi-valued mapping. Annals
ofMathematical Statistics, 38, 325-399.

Duda, R., Hart, P. & Stork, D. (2001). Pattern Classification, 2nd Ed. New York: John Wiley &
Sons, Inc.









Florida Poison Information Center Network. (2005). FPIN statewide annual reports, calendar
year (Jan-Dec) 2004: General call summary report.
http://fpicn.jax.ufl.edu/Data/Reports/Callsstate_2004.pdf, December 2005. Florida Poison
Information Center Jacksonville.

Gonzalez, A. & Dankel, D. (1993). The Engineering ofKnowledge-based Systems Theory and
Practice. Englewood Cliffs, New Jersey: Prentice Hall.

Graefe, G., Fayyad, U. & Chaudhuri, S. (1998). On the efficient gathering of sufficient statistics
for classification from large SQL databases. In R. Agrawal, P. Stolorz & G. Piatetsky,
Eds., Proceedings of the 4 International Conference on Knowledge Discovery and Data
Mining, KDD-98, New York, pp. 204-208. Menlo Park, California: AAAI Press.

Han, J. & Kamber, M. (2001). Data Mining Concepts and Techniques. San Francisco: Morgan
Kaufmann Publishers.

Holland, J. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor, Michigan:
University of Michigan Press.

Holsheimer, M., Kersten, M., Mannila, H. & Toivonen, H. (1995). A perspective on databases
and data mining. In U. Fayyad & R. Uthurusamy, Eds., Proceedings of the 1st International
Conference on Knowledge Discovery and Data Mining, KDD-95, Montreal, Canada, pp.
150-155. Menlo Park, California: AAAI Press.

Kluza, A. (2004). Veterinary toxicology information system. TASK Quarterly, 2, 297-301.

Kolodner, J. (1993). Case-Based Reasoning. San Mateo, California: Morgan Kaufmann
Publishers.

Kononenko, I., Bratko, I. & Kukar, M. (1998). Application of machine learning to medical
diagnosis. In R. Michalski, I. Bratko & M. Kubat, Eds., Machine Learning andData
Mining: Methods and Applications, pp. 389-428. New York: John Wiley & Sons, Inc.

Koton, P. (1988). Reasoning about evidence in causal explanations. In Proceedings of the 7th
National Conference on Artificial Intelligence, AAAI-88, St. Paul, Minnesota, pp. 256-261.
Los Altos, California: Morgan Kaufmann Publishers.

Kusiak, A., Kern, J., Kemstine, K. & Tseng, B. (2000). Autonomous-decision making: A data
mining approach. IEEE Transactions on Information Technology in Biomedicine, 4, 274-
284.

Kusiak, A., Law, I. & Dick, M. (2001). The G-algorithm for extraction of robust decision rules-
Children's postoperative intra-atrial arrhythmia case study. IEEE Transactions on
Information Technology in Biomedicine, 5, 225-235.

Lavrac, N. (1999). Selected techniques for data mining in medicine. Artificial Intelligence in
Medicine, 16, 3-23.









Liu, Z. & Yan, F. (1997). Fuzzy neural network in case-based diagnostic system. IEEE
Transactions on Fuzzy Systems, 5, 209-222.

Medical University of South Carolina. (2000). Sensitivity and Specificity.
http://www.musc.edu/dc/icrebm/sensitivity.html, February 2006. Medical University of
South Carolina.

Miller, R., Pople, H & Myers, J. (1982). INTERNIST-I, an experimental computer-based
diagnostic consultant for general internal medicine. The New England Journal of Medicine,
307, 468-476.

Monov, A., Iordanova, I., Zagorchev, P., Vassilev, V., Nissimov, M., Kojuharov, R., Tconev, R.
& Damianov, V. (1992). MEDICOTOX CONSILIUM An expert system in clinical
toxicology. In K. Lun, P. Degoulet, T. Piemme & O. Rienhoff, Eds., Proceedings of the 7th
World Congress on Medical Informatics, MEDINFO 92, Geneva Palexpo, Switzerland, pp.
610-614. Amsterdam: Elsevier Science Publishers.

Nechyba, M. (2003). Introduction to feedforward neural networks.
http://www.mil.ufl.edu/courses/eel5840/classes/introneuralnetworks.pdf, February 2006.
Machine Intelligence Laboratory, University of Florida.

Nilsson, M. & Sollenborn, M. (2004). Advancements and trends in medical case-based
reasoning: An overview of systems and system development. In V. Barr & Z. Markov,
Eds., Proceedings of the 17th International Florida Artificial Intelligence Research Society
Conference, FLAIRS 2004, Miami Beach, Florida, pp. 178-183. Menlo Park, California:
AAAI Press.

Nilsson, N. (1998). Artificial Intelligence: A New Syiuhe\i\ San Francisco, California: Morgan
Kaufmann Publishers.

Ohmann, C., Franke, C. & Yang, Q. (1999). Clinical benefit of a diagnostic score for
appendicitis: Results of a prospective interventional study. Archives of Surgery, 134, 993-
996.

Onisko, A., Druzdzel, M. & Wasyluk, H. (2000). Extension of the HEPAR II model to multiple-
disorder diagnosis. In M. Klopotek, M. Michalewicz & S. Wierzchon, Eds., Intelligent
Information Systems, pp. 303-313. Heidelberg: Physica-Verlag.

Onisko, A., Druzdzel, M. & Wasyluk, H. (2001). Learning Bayesian network parameters from
small sets: Applications of Noisy-OR gates. International Journal ofApproximate
Reasoning, 27, 165-182.

Owens, D. & Sox, H. (2001). Medical decision-making: Probabilistic medical reasoning. In E.
Shortliffe, L. Perreault, G. Wiederhold & L. Fagan, Eds., Medical Informatics: Computer
Applications in Health Care andBiomedicine, pp. 76-131. New York: Springer-Verlag.

Pawlak, Z. (1982). Rough sets. International Journal of Computer & Information Sciences, 11,
341-356.









Peng, Y. & Reggia, J. (1986). Plausibility of diagnostic hypotheses: The nature of simplicity. In
Proceedings of the 5th National Conference on Artificial Intelligence, AAAI-86,
Philadelphia, Pennsylvania, pp. 140-145. Los Altos, California: Morgan Kaufmann
Publishers.

Peng, Y. & Reggia J. (1987). A probabilistic causal model for diagnostic problem solving Part
I: Integrating symbolic causal inference with numeric probabilistic inference. IEEE
Transactions on Systems, Man, and Cybernetics, 2, 146-162.

Peng, Y. & Reggia, J. (1989). A comfort measure for diagnostic problem solving. Information
Sciences, 47, 149-184.

Pople, H. (1977). The formation of composite hypotheses in diagnostic problem solving: An
exercise in synthetic reasoning. In Proceedings of the 5th International Joint Conference on
Artificiallntelligence, IJCAI-77, Cambridge, Massachusetts, pp. 1030-1037. Pittsburgh,
Pennsylvania: Carnegie-Mellon University.

Pople, H. (1985a). CADUCEUS: An experimental expert system for medical diagnosis. In P.
Winston & K. Prendergast, Eds., The AIBusiness: The Commercial Uses ofArtificial
Intelligence, pp. 67-80. Cambridge, Massachusetts: The MIT Press.

Pople, H. (1985b). Evolution of an Expert System: From Internist to Caduceus. In I. de Lotto &
M. Stefanelli, Eds., Proceedings of the International Conference on Artificial Intelligence
in Medicine, Pavia, Italy, pp. 179-208. Amsterdam: Elsevier Science Publishers.

Portinale, L. & Torasso, P. (1995). ADAPtER: An integrated diagnostic system combining case-
based and abductive reasoning. In M. Veloso & A. Aamodt, Eds., Proceedings of the 1st
International Conference on Case-Based Reasoning Research and Development, ICCBR-
95, Sesimbra, Portugal, pp. 277-288. Berlin: Springer-Verlag.

Quinlan, J. R. (1996). Bagging, boosting, and C4.5. In Proceedings of the 13th National
Conference on Artificial Intelligence, AAAI-96, Portland, Oregon, pp. 725-730. Menlo
Park, California: AAAI Press.

Reggia, J., Nau, D. & Wang, P. (1983). Diagnostic expert systems based on a set covering
model. International Journal of Man-Machine Studies, 19, 437-460.

Rumelhart, D. & McClelland, J. (1986). Parallel Distributed Processing: Exploration in the
Microstructure of Cognition. Cambridge, MA: MIT Press.

Tsumoto, S. (2000). Automated discovery of positive and negative knowledge in clinical
databases. IEEE Engineering in Medicine and Biology, 19, 56-62.

Valdes-Perez, R. (1999). Principles of human-computer collaboration for knowledge discovery
in science. Artificial Intelligence, 107, 335-346.

van der Gaag, L. & Wessels, M. (1994). Efficient multiple-disorder diagnosis by strategic
focusing. Technical report. UU-CS-1994-23.









Vinterbo, S. & Ohno-Machado, L. (2000). A genetic algorithm approach to multi-disorder
diagnosis. Artificial Intelligence in Medicine, 18, 117-132.

Wang, L. (2003). The WM Method completed: A flexible fuzzy system approach to data mining.
IEEE Transactions in Fuzzy Systems, 11, 768-782.

Watson, W., Litovitz, T., Rodgers, G., Klein-Schwartz, W., Reid, N., Youniss, J., Flanagan, A. &
Wruk, K. (2004). 2004 annual report of the American Association of Poison Control
Centers Toxic Exposure Surveillance System. The American Journal of Emergency
Medicine, 23, 589-666.

Wu, T. (1990). Efficient diagnosis of multiple disorders based on a symptom clustering
approach. In Proceedings of the 8th National Conference on Artificial Intelligence, AAAI-
90, Boston, Massachusetts, pp. 357-364. Menlo Park, California: AAAI Press.

Wu, T. (1991). A problem decomposition method for efficient diagnosis and interpretation of
multiple disorders. Computer Methods and Programs in Biomedicine, 35, 239-250.

Yu, V., Fagan, L., Bennett, S. Clancey, W., Scott, A., Hannigan, J., Buchanan, B., & Cohn, S.
(1984). An Evaluation of MYCIN's advice. In B. Buchanan & E. Shortliffe, Eds., Rule-
Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming
Project, pp. 589-596. Reading, Massachusetts: Addison-Wesley Publishing Company.

Zadeh, L. (1965). Fuzzy sets. Information and Control, 8, 338-353.

Zhou, Z. (2003). Three perspectives of data mining. Artificial Intelligence, 143, 139-146.









BIOGRAPHICAL SKETCH

Joel Daniel Schipper was born in 1979 to W. Thomas and Harriet Anne Schipper. He

grew up in the suburbs of Los Angeles with his two older brothers, Tom and James. Although an

excellent student, he much preferred spending his time on the athletic field than studying. Joel

attended Loyola Marymount University as a Presidential Scholar and graduated summa cum

laude with a Bachelor of Science in Electrical Engineering. He continued his studies as an

Alumni Fellow at the University of Florida where he received a Master of Science in Electrical

Engineering. During his time at the University of Florida, he met and married Alice Eileen

Brown. He is currently pursuing his doctorate by writing this dissertation, though he would

much rather be outside playing.

Upon completion of his doctoral degree, Joel will join the faculty of Bradley University as

an Assistant Professor of Electrical and Computer Engineering.





+ AMDG +





PAGE 1

1 A KNOWLEDGE-BASED TOXICOLOGY C ONSULTANT FOR DIAGNOSING MULTIPLE DISORDERS By JOEL DANIEL SCHIPPER A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2008

PAGE 2

2 2008 Joel Daniel Schipper

PAGE 3

3 For Alice, my beautiful wife. May we grow ever nearer as the years go by.

PAGE 4

4 ACKNOWLEDGMENTS I am indebted and grateful to Dr. Jay L. Schauben, Dausear Dar McRae, and the Florida Poison Information Center in Jacksonville for their willingness to provide data, technical support, and consultation. Without them my resear ch would not have been possible. I thank Dr. A. Antonio Arroyo and Dr. Douglas D. Danke l II for their guidance and encouragement throughout my doctoral studies. I also thank my parents and brothers, Tom and James, for their support and prayers throughout my academic career. I am grateful to Alice, my beautiful wife, for standing by me in all I do with constant, unwavering love. A bove all, I give glory to God, my creator, without whom I am nothing.

PAGE 5

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ............................................................................................................... 4 LIST OF TABLES ...........................................................................................................................7 LIST OF FIGURES .........................................................................................................................9 LIST OF ABBREVIATIONS ........................................................................................................ 10 ABSTRACT ...................................................................................................................... .............11 CHAP TER 1 INTRODUCTION .................................................................................................................. 13 Applicability of Knowledge-Based Systems to Toxicology .................................................. 13 System Overview ............................................................................................................... .....15 Database Resources ................................................................................................................17 Conclusion .................................................................................................................... ..........18 2 OVERVIEW OF KNOWLEDGE-BASED SYSTEMS AND DATA M INING ................... 19 Knowledge-Based Systems ....................................................................................................19 Utility and Structure ........................................................................................................19 Reasoning from Examples ............................................................................................... 21 Data Mining and Knowledge Discovery in Databases ........................................................... 22 Defining Data Mining and Knowledge Discovery ..........................................................22 Seven Steps of Data Mining ............................................................................................ 24 Mining Data: What and How? .........................................................................................25 Conclusion .................................................................................................................... ..........26 3 DESIGN APPROACHES TO KNOW LEDGE-BASED SYSTEMS ....................................27 Rule-Based Systems ............................................................................................................ ...27 Forward Chaining ............................................................................................................30 Backward Chaining .........................................................................................................32 Inference Networks .........................................................................................................34 Decision Trees .................................................................................................................35 Certainty Factors .............................................................................................................37 Case-Based Reasoning .......................................................................................................... ..40 Nearest Neighbor Approaches ................................................................................................42 Bayes Rule .............................................................................................................................45 Other Approaches to Knowledge-Based Systems .................................................................. 47 Fuzzy Logic .....................................................................................................................47 Dempster-Schafer ............................................................................................................ 49

PAGE 6

6 Rough Sets .......................................................................................................................49 Genetic Algorithms ......................................................................................................... 50 Artificial Neural Networks ..............................................................................................51 Modern Approaches for Diagnosing Multiple Disorders ....................................................... 53 Bayesian Belief Networks ...............................................................................................54 Set Covering ....................................................................................................................56 Conclusion .................................................................................................................... ..........58 4 MEDICAL MATHEMATICS AND RELEVA NT KNOWLEDGE-BASED SYSTEMS ....60 Medical Mathematics ..............................................................................................................60 Probabilistic Measurements ............................................................................................. 60 Diagnostic Scores ............................................................................................................ 66 Literature Review of Knowledge-Based Systems .................................................................. 67 Historical Medical Expert Systems ................................................................................. 67 Expert Systems in Toxicology .........................................................................................69 Knowledge-Based Systems for the Diagnosis of Multiple Disorders ............................. 71 Conclusion .................................................................................................................... ..........76 5 DIAGNOSING SINGLE EXPOSURE CASES .....................................................................77 Source Data .............................................................................................................................77 System Design Principles .......................................................................................................78 System Development ............................................................................................................ ..81 System Operation and User Interface .....................................................................................86 System Testing and Results ....................................................................................................89 System Performance ............................................................................................................ ...99 Conclusion .................................................................................................................... ........100 6 DIAGNOSING MULTIPLE EXPOSURE CASES .............................................................101 Motivation for Diagnosing Multiple Exposures ................................................................... 101 System Approach ............................................................................................................... ...102 Diagnosing Multiple Exposures using Solely Multiple Exposure Cases ............................. 103 Diagnosing Multiple Exposures with Single Exposure Cases .............................................. 108 Conclusions ...........................................................................................................................113 Future Work ..........................................................................................................................115 LIST OF REFERENCES .............................................................................................................118 BIOGRAPHICAL SKETCH .......................................................................................................124

PAGE 7

7 LIST OF TABLES Table page 2-1 Seven steps of data mining ................................................................................................ 24 2-2 Types of patterns that can be mined .................................................................................. 26 3-1 Treatments required for each of Dougs cats ..................................................................... 28 3-2 Cat characteristic s for identification .................................................................................. 29 3-3 System parameters for cat identification ............................................................................ 29 3-4 Houses most similar to Alices house ................................................................................ 41 3-5 Characteristics of various sports balls................................................................................43 4-1 Contingency table ......................................................................................................... .....61 4-2 Experimental HIV testing extended contingency table ..................................................... 63 4-3 HIV testing (1% chance of HI V) extended contingency table ..........................................64 4-4 HIV testing (0.1% chance of HIV) extended contingency table ....................................... 64 4-5 Diagnostic scores for acute appendicitis ............................................................................ 66 4-6 Final diagnosis score significance ..................................................................................... 67 5-1 Accuracy by substance in 10% (MC = 25) ........................................................................ 92 5-2 Accuracy by major and minor categories in 10% (MC = 25) ............................................ 92 5-3 Accuracy by major cat egory in 10% (MC = 25)................................................................ 92 5-4 Accuracy by substance in 10% (MCE = 10) ...................................................................... 94 5-5 Accuracy by major and minor categories in 10% (MCE = 10) .........................................94 5-6 Accuracy by major categ ory in 10% (MCE = 10) ............................................................. 94 5-7 Accuracy by substance in 10 (MCE = 10) ......................................................................... 95 5-8 Accuracy by major and minor categories in 10 (MCE = 10) ............................................. 95 5-9 Accuracy by major cat egory in 10 (MCE = 10) ................................................................95 5-10 Accuracy in 10% with MC = 10 and MCE = 0 ................................................................. 98

PAGE 8

8 5-11 Accuracy in 10% with MC = 10, MCE = 0, and 3+ CEs ................................................. 99 6-1 Accuracy (varying MC) of system tr ained & tes ted on multiple exposures .................... 104 6-2 Accuracy (varying ) of system trained & test ed on multiple exposures ........................ 105 6-3 Accuracy comparison of various syst em s for multiple exposure diagnosis .................... 106 6-4 Accuracy diagnosing primary c ontributors using single exposures ................................ 109 6-5 Accuracy diagnosing secondary co ntributors using single exposures ............................. 110 6-6 Comparison of system accuracies w hen diagnosing singl e exposure cases .................... 113

PAGE 9

9 LIST OF FIGURES Figure page 2-1 Expert system block diagram .............................................................................................20 3-1 Rule-based system block diagram ..................................................................................... 28 3-2 Rules for identifying Dougs cats ......................................................................................30 3-3 Inference network for Dougs cats based on rules in Figure 3-2 ....................................... 35 3-4 Exhaustive inference network for Dougs cats .................................................................. 35 3-5 Decision tree for id entifying Dougs cats .......................................................................... 36 3-6 Vectors for sports balls plotte d in 2-dim ensional solution space ...................................... 43 3-7 Fuzzy logic graph for human heights................................................................................. 48 3-8 Typical artificial neural network with two hidden layers. ................................................. 52 3-9 Example Bayesian belief network .....................................................................................55 3-10 Set covering graph of relationships between disorders and symptom s .............................58 5-1 User interface ............................................................................................................ .........87 5-2 Results table ............................................................................................................. ..........89 5-3 Accuracy ratios by substance .............................................................................................96 5-4 Accuracy ratios by major and minor categories ................................................................96 5-5 Accuracy ratios by major category ....................................................................................97

PAGE 10

10 LIST OF ABBREVIATIONS A small positive parameter in the adjusted likelihood ratio equation AAPCC American Association of Poison Control Centers CE Clinical Effect CF Certainty Factor FAR False-Alarm Rate or False-Positive Rate FN False Negative FNR False-Negative Rate FP False Positive FPIC Florida Poison Information Center FPR False-Positive Rate or False-Alarm Rate LR Likelihood Ratio MC Minimum Exposure Cases MCE Minimum CE Occurrences NPDS National Poison Data System NPV Negative Predictive Value PCC Poison Control Center PDA Personal Digital Assistant PPV Positive Predictive Value TESS Toxic Exposure Surveillance System TN True Negative TNR True-Negative Rate or Specificity TP True Positive TPR True-Positive Rate or Sensitivity

PAGE 11

11 Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A KNOWLEDGE-BASED TOXICOLOGY C ONSULTANT FOR DIAGNOSING MULTIPLE DISORDERS By Joel Daniel Schipper May 2008 Chair: A. Antonio Arroyo Cochair: Douglas D. Dankel II Major: Electrical and Computer Engineering Every year, toxic exposures kill twelve hundred Americans. More than half of these deaths are the result of exposures to multiple substa nces. In addition to being dangerous, multiple exposures are particularly difficult to diagnose. At this time, no general solution exists for the diagnosis of multiple disorders due to the non-lin ear interactions observe d in such cases. This dissertation presents the development of a prototype knowledge-based system for diagnosing toxic exposures. The goal of the syst em is to generate differential diagnoses for unknown exposure cases based on the clinical effects observed in patients. The system is not meant to replace physicians, but, rather, to serve as a medical decision support system. Acting as a consultant, the system provides access to case-based summary data that is normally unavailable. The system is automatically generated by appl ying data mining techniques to a database supplied by the Florida Poison Information Center. For diagnosis, the system uses pre-test probabilities and likelihood ratioscalculations commonly used throughout the medical profession. To overcome certain shortcomings of likelihood ratios, the equation employed by the system is adjusted to account for every possi ble outcome. Using the adjusted likelihood ratio

PAGE 12

12 enables robust calcula tions while closely modeling the like lihood ratio that physicians know and trust. Trained and tested on single exposures, the system achieved an accuracy of 81.0% on cases involving at least three clinical effects. Repeating the process for multiple exposures alone resulted in a failure, at least partially due to insufficient data. However, training on various combinations of single, double, and/or multiple exposures, the system achieved an accuracy of 86.9% when diagnosing the primary cont ributors for multiple exposure cases. Although a solution for diagnosing multiple diso rders remains elusive, the ability to identify primary contributors is a significant cont ribution to addressing the problem. This system is the first American diagnostic sy stem for the field of clinical toxicology and its use of adjusted likelihood ratios serves as a method to bridge the gap between intelligent systems and the medical field. Furthermore, by automatically gene rating the system, this research addresses the knowledge acquisition bottleneck that plagues traditional expert systems.

PAGE 13

13 CHAPTER 1 INTRODUCTION Toxicology is the study of poisons and their eff ects on living organism s. One of the most prominent uses of toxicology for the benefit of mankind is the development of poison control centers. Thousands of people call poison control cen ters daily for free consultation and information regarding chemicals and drugs. In 2004, the American Association of Poison Control Centers (AAPCC) consisted of 62 poison control centers servin g all 50 of the United States and handling more than 2.4 million report ed human poison exposure cases (Watson et al., 2004). The AAPCC has compiled a database containing the details of over 38.7 million human poison exposure cases from the calls received and documented by its 62 poison control centers (Watson et al., 2004). A medical database of this magnitude represents a great opportunity for data mining and knowledge-based systems research. By tapping into the vast amount of data contained in the AAPCC database, a knowledge-based system could use the information to help diagnose and treat poison patient s quickly and effectively. Applicability of Knowledge-Ba sed Systems to Toxicology Knowledge-based system s should not be appl ied in every situation. In many cases, conventional algorithms offer a more appropria te and effective soluti on to the problem. However, the field of medicine inherently contai ns many traits that make it an ideal domain for knowledge-based systems. On a daily basis, physicians must make decisions based on experience using incomplete data. Knowledge-b ased systems also excel at solving problems from uncertain data using heuris tics. Additionally, the field of medicine is continually changing as more knowledge is acquired and new technology becomes availa ble. Likewise, a strength of knowledge-based systems is adaptability in dyn amic domains. Beyond the general obstacles

PAGE 14

14 common to all fields of medicine, the field of to xicology itself faces three specific challenges for which knowledge-based syst ems are well tailored. The first challenge is making pertinent info rmation available to physicians, emergency medical services, and the public involved at th e time of a poisoning. Toxicology is a narrow specialization within the medical profession consisting of a small number of experts, called toxicologists. To make the expertise of toxicolo gists available to the medical field at large, the AAPCC offers direct consultation with toxicologists to physicians at hospitals around the country. Physicians may call po ison control centers for information on how to treat a drug overdose or identify an unknown drug that a patient has ingested. In spite of the efforts of the AAPCC, the limited number of toxicologists makes expertise in toxicology a scarce commodity. Knowledge-based systems offer a solution to this scarcity. Creating a r eadily available system that can aid physicians in diagnosis when experts are unavailable could be an invaluable asset in saving lives. A second challenge in toxicology is dealing with cases involving multiple substances. In many cases, consultations are a simple matter for the toxicologist, consis ting mainly of matching signs and symptoms that are known to be directly associated with the mechanisms and behaviors of one class of drug. Cases that toxicologists find difficult tend to consist of multiple unknown drugs interacting to produce si gns and symptoms that cannot be matched with any single substance. If all substances had linear inte ractions, determining multiple unknown drugs by their signs and symptoms would amount to identifyi ng the drug combinations that, when summed together, produce the observed results. Unfort unately, many drug interactions are non-linear. Some drug combinations cause a dramatic increa se in symptom severity, some mask symptoms normally observed with one of the drugs, and some can cause symptoms that normally would not

PAGE 15

15 appear with any of the drugs individually. In 2004, although only 8.6% the exposures reported were multiple substance exposures, .6% of fatal cases involved 2 or more drugs or products (Watson et al., 2004, p. 593). Being able to address multiple exposures is an important concern for saving lives. A knowledge-based system can aid in addressing multiple exposures by effectively making the relevant information in the AAPCC database available to the toxicologist. The goal of a knowledge-based system is not to re place the toxicologist, but to act as a powerful consulting tool providing case-ba sed summary data for the toxico logist. Human beings have senses and intuition that are important for diagnosis, which computer s cannot replicate. However, by offering speculative advice, the system may facilitate accurate and timely diagnoses. A third toxicological challenge is ensuring the rapid diagnosis and treatment of exposures. When dealing with poisons and drug overdoses, time is of the essence. In 2004, 1183 people died of toxic exposures (Wats on et al., 2004), many because they did not receive the correct treatment in a timely manner. Every minute spent waiting to speak with an expert or consult a clinical manual could make the difference between life and death for a patient. A knowledge-based system is a rapid aid in diagno sing toxic exposures. Because the system is computerized, it offers physicians a directed search with a faster response time than written literature. System Overview The goal of this research is to create a general purpose knowledge-bas ed system that can automatically learn relationships in diagnostic domain s. This particular application of the system uses the Florida Poison Information Center (FPI C) database as its foundation. By mining the FPIC database, the system extracts associations between the signs and symptoms observed in a patient and the final diagnosis. Automatically extracting these relationships enables system

PAGE 16

16 designers to bypass much of the knowledge acq uisition bottleneck by removing the need to interview experts, a requiremen t for traditional know ledge-based system design. Furthermore, by applying a generalized process, knowledge engineers need not acquire a comprehensive understanding of every domain for whic h they create a knowledge-based system. Being applied to toxicology, the system utilizes the simple, standard medical mathematics of pre-test probabilities and lik elihood ratios to calculate an d communicate the relationships discovered in the database. Currently, the sy stem is a proof-of-conc ept prototype and only primary contributors with a significant number of exposure occurrences are included. As the system grows to include more substances and substance combinations, however, the systems simple, mathematical representation will become essential for scalabilit y purposes. Additionally, medical mathematics is not only more understa ndable to users in the medical field, but communicates information that is more relevant for medical diagnosis than other traditional measurements, such as accuracy (Cios & Moore, 2002; Lavrac, 1999). In spite of the simplicity necessarily inherent in the system, the system seeks to diagnose complex cases involving multiple unknown substances. In the past, very little research has been performed in the area of diagnosing multiple disorders, and this system seeks to further the fields of machine intelligence and knowledge engineeri ng by offering a simple and practical approach for addressing the problem. Fundamentally, the system treats multiple disorders in much the same way as single disorders. Multiple expos ures are treated as a separate case from the individual substances involved, with identical operations being performed on each multiple exposure case to create associations. At this time, the data available is insufficient to fully test the diagnosis of multiple exposur es; however, the system demonstrates significant potential in accurately identifying the primary cont ributors in multiple exposure cases.

PAGE 17

17 Ultimately, the systems goal is to serve as a consultant to all physicians that may encounter toxic exposure cases. Fo r a toxicologist, the system may serve as an idea generator by offering plausible drug combinations that perhaps th e toxicologist failed to consider. For other physicians, the system may act as a solution finder or simply be used to confirm an uncertain diagnosis. As the system develops, expandi ng to encompass the entire FPIC database, the system may begin to discover re lationships previously undocumente d in the field of toxicology. Further development may lead to the real-time m onitoring of cases as they are entered into the database so the system can signal a warning for epidemics or perceived threats, such as substances associated with terrorism. Database Resources The Florida Poison Inform ation Center (FPIC) consists of three of the poison control centers in the AAPCC. Sin ce 1996, the FPIC has compiled a database logging every call it receives. When a caller goes to the hospital, the FPIC makes a follow-up call to gather all the medical information available on the case. In 2004 alone, the FPIC received over 120 thousand calls and made more than 43 thousand follow-up calls related to human exposures (Florida Poison Information Center Network, 2005). The FPIC database also contains over 65 thousand records of multiple exposure cases. Entries in the database are regulated by AAPCC Toxic Exposure Surveillance System (TESS) standards that ensure the collection of a specific set of information about each case. In following TESS re quirements, the majority of entries in the database have discrete values that are easy to process with a computer program. Furthermore, the national standardization by the AAPCC increases portability of a system designed for the FPIC to other poison contro l centers around the country. For this research, the FPIC has generously granted access to all relevant information recorded in their database from 2002-2006. Init ially, concerns were expressed regarding the

PAGE 18

18 accuracy of the data. In some cases, patients may lie about the drugs they took. In others, nurses may relay either inaccurate or incomplete inform ation to the FPIC. Alt hough these errors affect system accuracy, the systems performance shows that, in general, the discrepancies can be treated as random errors whose c ontributions will become negligib le as the database grows. Another problem is that two drugs taken togeth er in varying proportions can yield different symptoms. The observed symptoms might va ry depending on the amount of interaction occurring between the two drugs or which drug is affecting the body more strongly at the time. However, the findings of the system indicate th at most multiple exposures are dominated by the signs and symptoms associated with a primary contri butor. As a result, the system is capable of diagnosing the primary contributor, which is a significant contribution to addressing the problem of multiple disorder diagnosis. Conclusion This chapter began by introducing the field of toxicology and the Am erican Association of Poison Control Centers. It then explained the applicability of knowledge-based systems to the medical field, particularly the field of toxicol ogy. Finally, it presented a broad overview of the system followed by a discussion about the databa se used in the development of the system. Chapter 2 presents a general overview of knowledge-based systems and data mining, while Chapter 3 elaborates on Chapter 2 by describing in greater detail the traditional approaches used in designing knowledge-based systems. Chapte r 4 discusses medical mathematics and gives a literature review of rele vant systems that have been created. Chapter 5 presents the system design in detail along with the results for dia gnosing single exposure cases Finally, Chapter 6 presents the results for diagnosing multiple exposure cases followed by some concluding remarks.

PAGE 19

19 CHAPTER 2 OVERVIEW OF KNOWLEDGE-BASED SYSTEMS AND DATA MINING In the pas t two decades, the availability of information has skyrocketed. Continual advances in computer technology have made th e collection and storage of massive amounts of data a reality, while the advent of the Internet has enabled the da ta to be shared and accessed by many users throughout the world. Today, the volu me of data generated and stored is so enormous that it has become impossible for th e human mind to locate and process most of the available information. Furthermore, as we le arn more about the complexity of our world, researchers and practitioners alike are forced to specialize to the point where only a few people are truly knowledgeable in any pa rticular field. If humans are to continue in the quest to understand and subdue the world, it has become imperative that we create systems and algorithms capable of filteri ng out useless data while iden tifying, processing, and applying relevant information. Knowledge-Based Systems Utility and Structure Knowledge-based system s, also known as expe rt systems, are computerized systems that use information to provide relevant advice and problem solutions within a specific domain. Knowledge-based systems enable expert knowledge to be accessed 24 hours a day, even when an expert is unavailable. They al so provide a means to preserve information that otherwise might be lost when an expert retires. Figure 2-1 shows the basic struct ure of an expert system cons isting of an inference engine, a knowledge base, and a fact base. The inferenc e engine is a program that manipulates the knowledge base and fact base using a general probl em solving technique. The knowledge base is the fixed set of information or data that is necessary to solve problems within a particular

PAGE 20

20 domain. The fact base contains problem-specific data, such as user inputs and information derived from the knowledge base by the in ference engine (Gonzalez & Dankel, 1993). Expert System User User Interface Inference Engine Knowledge Base Fact Base Figure 2-1. Expert sy stem block diagram Unlike conventional algorithms that embe d domain knowledge within the program, inference engines are problem independent. Su ch independence provides versatility, enabling the inference engine to be a pplied to any number of domains by simply changing the knowledge base. The same diagnostic infere nce engine could be effectively applied to the medical field as well as automobile repair or trouble shooting a manufacturing process. The beauty of this independence is that it allows the progra mmer to focus on the domain knowledge, often expressed as facts and rules, without having to de bug faulty algorithmic code.

PAGE 21

21 Reasoning from Examples For a knowledge-based system to produce accurate results, it must obtain its conclusions via some logical process. In logic, there are three fundamental ways of reaching a conclusion: deductive reasoning, inductive reasoning, and abductive reasoning. Deductive reasoning is reasoning from general to specific. For example: Premise: All oceans have waves. Premise: The Pacific is an ocean. Conclusion: Therefore, the Pacific has waves. Deductive reasoning is a sound form of argument, meaning that if its premises are true its conclusion is guaranteed to be true as well. Inductive reasoning is inferring from specific to more general statements. For example: Premise: The Pacific is an ocean. Premise: The Pacific has waves. Conclusion: Therefore, oceans have waves. Inductive reasoning is an unsound form of reasoning, meaning that even if the premises are true the conclusion is not guaranteed to be true. Abductive reasoni ng is drawing a hypothesis based on observed characteristics. For example: Fact: Oceans have waves. Observation: This body of water has waves. Hypothesis: This body of water is an ocean. Like inductive reasoning, abduc tive reasoning is unsound. In fact, abductive reasoning can be viewed as a form of inductive reasoning because it is reasoning from specific observations to draw generalized hypotheses that are plausi ble but not guaranteed (Gonzalez & Dankel, 1993). An ideal knowledge-based system should o ffer the correct solution to every problem within its domain. To guarantee the validity of every solution, the system would have to contain all first principles within its domain and empl oy a sound reasoning technique, such as deductive reasoning. Although some systems attempt to reason from first principles, in general, attempting

PAGE 22

22 to program a system in such a manner is not pr actical or even possible. Many fields are not understood well enough to compile a list of foundational rules and, even if they were, the compilation and programming of such rules woul d prove an extremely arduous task for any domain of significance. Because we cannot create an ideal knowle dge-based system, many systems take the practical approach of reasoning using examples. Rather than di rectly programming the system from first principles, the system is given exam ples from which it generates its own governing principles. These principles can be expressed as rules, statis tics, case matching, or another representative form. Chapter 3 discusses many of these approaches. Like inductive reasoning, the system makes inferences from the specific, i. e. an example, to the general, i.e. a governing principle. At first, such an approach seems troublesome because the systems reasoning is unsound and, therefore, inherently can make mistakes. It is im portant to note, however, that scientists discovered every scientific principle that we accept as fact in the same manner: by observing that many examples, or experiments, all followed the same law of nature. Additionally, Kononenko et al. (1998) have shown that, in many domains, systems that automatically generate their own diagnostic rules are capable of performing with a higher degree of accuracy than physicians, when given identi cal information. Furthermore, knowledge-based systems that generate their own governing principl es for problem solving take less time to create because the programmer need not spend time de termining the governing principles by hand. Instead, the system itself can determine its ow n rules by processing a database of examples. Data Mining and Knowledge Discovery in Databases Defining Data Mining and Knowledge Discovery In recent years, the developm ent of mass storage devices has enabled the creation of extremely large and complex databases. The amo unt of available data has increased so greatly

PAGE 23

23 that it would be impossibly tedious for the human mind to process all of the information. As a result, the rising demand for systems capable of meeting this new need has given birth to the field of data mining. Data mining is the process of extracting information from a database (Han & Kamber, 2001). Data mining is also referred to as knowledge discovery in databases, where discoveryis the generation of novel, interesting, plausible, and intelligible knowledge about the objects of study (Valdes-Perez, 1999, p. 336) The knowledge discovered through data mining comes in two different forms: novel information and established information. Novel information is previously undiscovered knowledge that is a new concept within a domain. To demonstrate how a computer system can uncover valuable information, let us consider the medical field. Most facts within the medical field were established by researchers who performed studies and processed data numer ically to determine relationships between various observations. In these studi es, it is the numerical and statisti cal values that give credence to the study. If a physician claims that smoking increases ones chance of lung cancer based on a general trend the physician has observed in his pa tients, the physician will be asked for the numbers to support his statement. It is not until the physician performs a study using a numerically significant amount of cases and produces values that support his statement that his observation will be taken seriously. Since comp uters perform numerical analysis exceedingly well, it makes sense to create data mining system s that will automatically search for and output the numerically significant relationships they encounter. Researchers can then examine and determine the validity of the discove ries made by the computer system. The second form of knowledge that data mining systems can discover is established information. Established information is informa tion that is already known and available within a domain. At first, rediscovery of such informa tion may seem like a confirmation of knowledge at

PAGE 24

24 best or redundant reiteration at wo rst. However, the ability to automatically discover established principles through data processing is in fact a powerful tool in the field of knowledge-based systems. Traditionally, knowledge engineers create knowledge-based systems incrementally through an interview process with experts in the domain of interest. In each interview, the knowledge engineer tries to glean rules and heuristics from the expert so that these can be implemented in the system. Using this process of generating an expert system takes years of man-hours to complete and forces the knowledge e ngineer to train himsel f through immersion in the domain to create the system properly. It ha s always been desirable to shorten the creation time of these systems without compromising their accuracy. The solution to these problems lies in the power of data mining to automatically extract rules and knowledge from a database without the necessity of a human to accumu late and program these rules directly. Seven Steps of Data Mining According to Han and Kam ber (2001), there are seven steps in data mining (Table 2-1). Do not be confused that data mining is listed as only one of the steps in knowledge discovery in databases. In practice, the step of data mining is indispensable and usually requires the most computation and intelligence. As a result, the whole process of knowledge discovery in databases has commonly become known as data mining. Table 2-1. Seven steps of data mining Step name Description Data cleaning Removing noisy, inc onsistent data from the database Data integration Combining data from multiple sources Data selection Choosing the da ta relevant to the task Data transformation Changing the selected da ta into a useable format for data mining Data mining Extracting relationships and patterns from the data Pattern evaluation Determining if the knowledge discovered has value Knowledge presentation Presenting the results to the user via tables, graphs, charts, etc.

PAGE 25

25 Although Table 2-1 lists seven step s in data mining, these steps are not rigidly enforced in data mining system design. Depending on the form of the data, not every step on the table is required for every data mining problem. For example, data integration is not required if only one data source is used. Some steps may be perfor med in a different order. For example, data cleaning may be handled by allowing a robust algo rithm implemented in the data mining step to eliminate noise. Although many of these steps may be indispensabl e with a particular data set, in general the data mining and pattern evaluation st eps are the core components of a data mining system. Mining Data: What and How? Table 2-2 su mmarizes the six types of patterns that can be mined according to Han and Kamber (2001). The research presented here fo cuses primarily on classification in the medical field of toxicology. In general terms, classifi cation attempts to take a database of cases belonging to known classes and cr eate models that are used to identify cases with unknown classes based on the information supplied abou t the case. Specifically, given a database containing the signs and symptoms observed in a patient paired w ith the appropriate diagnosis of the substances affecting the patient, a system can learn to identify different substances based on the associated signs and symptoms. Many different methods can be implemented to perform data mining. Some overlap with the methods of knowledge-based systems, discu ssed in more detail in Chapter 3. A good summary of the most common methods can be found in Lavrac (1999) or Han and Kamber (2001). Zhou (2003) discusses three philosophic al approaches to data mining which focus primarily on the efficiency, effectiveness, or validity of the system design.

PAGE 26

26 Table 2-2. Types of patte rns that can be mined Patterns Description Characterization & Discrimination Summarizes different classes w ithin the data so they can be compared and contrasted with other classes Associational analysis Searches for ru les that reveal relationships between different classes in the data Classification & Prediction Identifies models where, given certain inputs, the system can output the most probable class or number associated with the inputs Cluster analysis Treats every parame ter as a value and groups the most similar cases into clusters that will be treated as a single class Outlier analysis Identifies cases that are sufficiently deviant from all other cases so they can be examined further Evolution analysis Searches for tendenc ies of class parameters to change over time in a characteristic manner Conclusion This chapter has given a general overview of knowledge-based system s and data mining. The section on knowledge-based systems discusse d the general structur e and usefulness of knowledge-based systems followed by the importanc e of reasoning from examples. The section on data mining presented the concepts of novel in formation and established information as well as the seven steps to data mining and the types of patterns that can be discovered. The next chapter discusses many of the different appr oaches to knowledge-based system design. Although these approaches are pr esented within the context of knowledge-b ased systems, many are used jointly in the field of data mining.

PAGE 27

27 CHAPTER 3 DESIGN APPROACHES TO KNOW LEDGE-BASED SYSTEMS Since the inception of k nowledge-based systems in the 1970s, researchers have developed many varied approaches for their design and im plementation. This chapter presents a brief overview of the most common design schemes, with an emphasis on those most similar to the system presented in Chapters 5 and 6. Although th e schemes are presented within the context of knowledge-based systems, most are used jointly in the field of data mining. The chapter begins with the foundational topics of rule-based systems, case-based reasoning, nearest neighbor classification, and Bayes rule, followed by a disc ussion of lesser topics, including fuzzy logic, Dempster-Schafer, rough sets, genetic algorithms, and artificial neural networks. The chapter concludes by discussing the modern approaches most relevant to so lving problems involving multiple disorders, namely Bayesian belief networks and set covering theory. Rule-Based Systems In designing knowledge-based system s, the use of rules is an obvious choice. Not only do humans naturally use rules when they reason and solve classification problems, but rules inherently are heuristic in na ture, enabling them to handle un certainty. As discussed in Chapter 2, rule-based systems consist of an infe rence engine, a knowledge base, and a fact base (Figure 3-1). The inference engine is the gene ral problem solving technique utilized by the system, such as the forward and backward chaining approaches discussed below. The knowledge base consists of a domain specific list of if-then statements, known as rules, used to gather information and solve problems. The if portion of a rule is known as the premise and the then portion of the rule is known as the co nclusion. The fact base is problem specific and consists of knowledge obtained from the user a nd sensors along with all knowledge derived from

PAGE 28

28 implemented rules. Greater detail on rule-bas ed systems can be found in Gonzalez and Dankel (1993). Expert System User User Interface Inference Engine Knowledge Base Fact Base Figure 3-1. Rule-based system block diagram To better understand rule-based systems, let us consider an example using Doug and his cats. Let us assume that Doug owns four cats named Princess, Panther, Ivan, and Jimmy, each requiring special care that it must receive dail y. Doug wants to go on vacation, so he hires a pet-sitter and creates a list of the treatments for each cat (Table 3-1). Table 3-1. Treatments require d for each of Dougs cats Cats name Treatment Princess Requires at least 30 minutes of petting per day Panther Given 50% more food Ivan Must not be allowe d outside at all costs Jimmy Must receive antibiotics once a day

PAGE 29

29 Doug soon becomes aware, however, that the pet-sitter does not know the names of the cats. To ensure that each cat receives the necessary treatment, he decides to create a rule-based system to help the pet-sitter identify the cat s. He begins by writi ng down the distinguishing characteristics of each cat, including the cats ma jor color, fur length, and whether the cats fur is a solid color or not (Table 3-2). Table 3-2. Cat characteristics for identification Cats name Major color Solid color? Fur length Princess Tan No Medium Panther Black Yes Short Ivan Tan No Short Jimmy Black No Short Doug begins to define the parameters used by his system as well as their allowable values (Table 3-3). He quickly reali zes that defining fur length as be ing short, medium, or long is a subjective measurement. To reduce uncertainty, he creates a new parameter called FurMeasurement, that allows the pet-sitter to input a length of fur in inches. From this measurement, the fur length is determined. Table 3-3. System paramete rs for cat identification System parameters Allowable values MajorColor black, tan SolidColor yes, no FurMeasurement Length of fur in inches FurLength short, medium, long Cat Princess, Panther, Ivan, Jimmy Finally, Doug creates seven system rules that identify each cat based on the characteristics observed by the pet-sitter. As a whole, these rules are known as the knowledge base (Figure 3-2). The fact base cont ains any facts entered by the pet-si tter, such as stating that the unknown cat has a FurMeasurement = 1. Additional facts derived from the rule set are also included in the fact base, such as rule R1 derivi ng that the cat must ha ve FurLength = short if FurMeasurement = 1.

PAGE 30

30 Figure 3-2. Rules for identifying Dougs cats The following subsections discuss the basic inference engine algorithms used in rule-based systems. Throughout these sections, this example of Doug and his cats is referenced frequently. Forward Chaining Forward chaining is the process of reasoning from inputs to conc lusions. The first step in a forward chaining system is to receive user and sensor inputs by storing them in the fact base. Next, the system searches the rule set and identifies those rules whose premises are satisfied by the facts contained in the fact base. The pro cess of identifying these rules is called pattern matching. If more than one rule is satisfied, the system identifies the rule with the highest priority and executes it, also know n as rule firing. The results obtained from the fired rule are added to the fact base. This process of patte rn matching, prioritizing, an d rule firing continues until a solution is reached or no solution can be reached. If no solution is attained, complex systems may request information from the user that might enable the system to reach a conclusion. Alternately, the system might apply uncertainty management to offer the most fitting solutions based on the facts it has received.

PAGE 31

31 Using the example of Doug and his cats, the user might input that a cat has MajorColor = tan, FurLength = 1.5, and SolidCo lor = no. The forward chaining system adds these facts to the fact base and th en searches the premise of every rule for a match. It discovers a match on R2 and a partial match on R6. Being the only rule satisfied, R2 fires, adding the fact FurLength = medium to the fact base. Once agai n, the system searches through the premises and finds matches on R2 and R4 as well as a partia l match on R6. The system must now prioritize the rules. Since the rules closer to solutions are further down the list, the sy stem gives priority to rules with higher rule numbers. No te that this also prevents the system from entering an infinite loop by evaluating R2 over and over again. R4 is se lected as higher priority than R2, so R4 fires yielding the result that Cat = Princess. Since the variable Cat is the solution variable, the system stops and informs the user that the cat being observed must be Princess. In complex rule-based systems, forward chaining is extremely inefficient due to the exhaustive search performed during pattern matching. To alleviate this bottleneck, an algorithm known as the Rete algorithm was developed. The Rete algorithm creates predetermined networks, known as the pattern network and the join network, to limit the amount of matching that must take place every cycle of the pattern matching process. The Rete algorithm along with the formation of pattern and join networks are discussed in detail by Gonzalez and Dankel (1993). Forward chaining systems are used primarily fo r problems that involve a small number of inputs compared to the number of possible solutions. Synthe sis problems, including design, configuration, planning, and sc heduling problems, are good candidates for forward chaining applications. These types of problems are often open ended, where many solutions or configurations can satisfy all the given constrai nts. Since the solutions cannot be known until

PAGE 32

32 they are generated, it would be impossible to work from the problem conclusions to the inputs. There are, however, many problems with a finite number of solutions and, in these cases, it may be advantageous to begin with the c onclusions and work towards the inputs. Backward Chaining Backward chaining is the process of reasoni ng from conclusions to inputs. Backward chaining systems assume an answer and then atte mpt to prove or disprove the truth of that assumption. To begin this process, the system se lects a rule whose conclu sion yields a solution. The system then attempts to satisfy the rule by obtaining values for the variables in the premise of the solution. For each premise, the system first checks the fact base for the value, then searches for a rule that can generate the necessary value to satisfy the premise, and finally asks the user when all else fails. If the fact base contains a value that cont radicts the premise, the system disregards the solution as invalid and assumes a new solution by moving on to a different rule. When examining a rule, if the fact base contains a value matching one of the rules premises, the system continues to assume that the rule is correct and atte mpts to prove the next premise until all the premises are satisfied. If no value is found in the fact base for a premise, but a rule is discovered that can deri ve its value, the system attempts to prove the premises of that rule through the same process. If, however, no rule capable of satisfying the premise can be found, the system asks the user as a last resort Then the user can enter a value, which the system adds to the fact base. If the value entered by the user corre sponds to the necessary premise value, the system conti nues trying to prove the rule. If it contradicts the premise, the system moves to a new rule that can generate a solution. This process co ntinues until the system has either found a solution or exhausted al l rules capable of yielding a solution. The backward chaining process is much easier to understand with an example, so let us return to Doug and his cats. When the pet-sitter enters Dougs house, Ivan comes over to greet

PAGE 33

33 him. To identify Ivan, the pet-sitter consults the system that Doug designed for him. The backward chaining system knows that when th e variable Cat has a value it has reached a solution. It begins by searching th e list for a rule whose conclusion assigns a value to Cat. Rules R1, R2, and R3 do not have the variable Cat in the conclusion, so the system begins with R4. To prove that R4 is true, the system must satisfy th e premise that FurLength = medium. It searches the fact base and finds nothing. Next, it search es for a rule whose conclusion can generate FurLength = medium and discovers that R2 satisfies this requirement. The system now attempts to prove R2 by looking at its first premise, Fu rMeasurement > 1. The system again checks the fact base and finds no matching values, so it s earches for a rule that generates a corresponding solution. Finding none, it asks the us er to input the length of the ca ts fur in inches. The user inputs 0.5, the length of Ivans fur. This value is saved in the fact base, but since 0.5 is not greater than 1, R2 fails and the system returns to R4. The system discovers that there are no more rules that can satisfy the premise FurLe ngth = medium, so it discards R4 as false and proceeds to R5. R5s premise requires SolidColor = yes. The system searches the fact base and finds no values corresponding to SolidColor. It then searches for rules that can generate the value SolidColor = yes. Again if finds none. As a last resort, the system asks the user if the cat is one solid color, and the user enters no. Since SolidColor now has a contradicting value, R5 fails and the system moves to R6. The first pr emise on R6 is MajorColor = tan. The system again checks the fact base and then searches fo r rules that can satisfy the parameter, but finds none. It asks the user for the cats fur color. The user enters tan. Sinc e this satisfies the first premise, the system attempts to prove the second premise, FurLength = short. The system finds no facts in the fact base corres ponding to FurLength; however, it finds that R1 can generate the desired solution. The system then attempts to satisfy R1 by looking at its premise

PAGE 34

34 FurMeasurement 1. It checks its fact base and discovers that the fact base contains the value FurMeasurement = 0.5. Since this value satisfies FurMeasurement 1, both premises for R6 are satisfied and the system c oncludes that the cat is Ivan. Backward chaining systems can only be used fo r problems that involve a finite number of conclusions. Diagnostic problems, where the in puts outnumber the solutions are good candidates for backward chaining applications. Diagnostic systems can vary from determining automobile malfunctions to properly identifying diseases in the medical field (Gon zalez & Dankel, 1993). Inference Networks Inference networks are som e of the simplest rule-based systems and can only be used when the relationship between each rule is known in advance. Figure 3-3 shows an inference network that was directly translated from the rules in Figure 3-2. Note that at the intersection of each line an arc is drawn. The arc represents the AND operator and, although not included in this example, the absence of arc implies an OR operato r. Because the relationship between each rule is known ahead of time, inference networks only need to execute th e rules directly connected to facts and rules that have been satisfied. This makes inference networks significantly more efficient than the exhaustive search used by th e pattern matching systems discussed above. The drawback is that inference networ ks are often impractical or unfeasible for complex systems with a large number of in teracting rules. Although Figure 3-3 is correct for the rules th at Doug generated in the example above, it should be noted that the inference network does not contain all of the information available for each cat shown in Table 3-2. Should a second solid colored cat be added to the knowledge base, the inference network as drawn would require modification. To allow for expansion of the knowledge base, it may be better to include a ll of the information available (Figure 3-4).

PAGE 35

35 Unfortunately, this results in a loss of efficien cy, since the system would require all three of Panthers characteristics to identify him. Major Color Solid Color Fur Measurement Jimmy Ivan Panther Princess Fur Length = Long Fur Length = Medium Fur Length = Short = Black = Tan > 3" > 1" & < 2" < 1" = Yes = No Figure 3-3. Inference network for Dougs cats based on rules in Figure 3-2 Major Color Solid Color Fur Measurement Jimmy Ivan Panther Princess Fur Length = Long Fur Length = Medium Fur Length = Short = Black = Tan > 3" > 1" & < 2" < 1" = Yes = No Figure 3-4. Exhaustive infere nce network for Dougs cats Decision Trees A decision tree is a specialized form of in ference network that arranges inputs in a hierarchical fashion. Rather than utilizing every input avai lable from the beginning, decision trees only address one input at a time. Once that input has been assigned a value, the next input

PAGE 36

36 in the tree is addressed until enough informati on has been gathered to offer a solution. Figure 3-5 displays an example decision tree for Doug and his cats. As shown, the system first asks the user for the cats major color. After rece iving an input, it asks whether the cat is a solid color or not. Given a black cat, the system can offer a solution after two questions. Tan cats, however, require three questions to identify. Note th at if the user informs the system that the cat is tan and solid colored, a null set is reached, cau sing the system to output an error message or backtrack in an attempt to find a solution. It is also important to unde rstand that the decision point for solid color need not be located at the same depth on every branch of the tree. Major Color Princess Jimmy Ivan Panther O O Solid Color Fur Length Solid Color Black Tan Yes Yes No No Short Medium Long Figure 3-5. Decision tree for identifying Dougs cats The ordering of the decision tree in Figure 3-5 was assigned in an arbitrary manner; however, there exist mathematical approaches based on information theory that seek to minimize the number of branches necessary to solve a prob lem. The most widely recognized approach is known as the ID3 algorithm, which was created by J. Ross Quinlan and is discussed in detail in

PAGE 37

37 Gonzalez and Dankel (1993). More recently, a desce ndent of the C4.5 algorithm, also created by Quinlan, has become popular (Quinlan, 1996). Othe r variations of decision trees enable the system to handle uncertainty. One such approach implemented by Althoff et al. (1998), adds an extra branch to encompass uncertainty. For ex ample, the node requesti ng a major color would not only include the responses black and tan, but also a branch re presenting that the color could not be determined. Certainty factors, discusse d in the following section, are another method for handling uncertainty in rule-based systems. Certainty Factors The use of certainty factors is one of the ol dest and most establis hed methods of handling uncertainty in rule-based systems. Certainty f actors were originally created for use in MYCIN, an expert system for the treatm ent of infectious blood diseases (Buchanan & Shortliffe, 1984a). MYCIN was the first medical expert system a nd is discussed in more detail in Chapter 4. Certainty factor values range fr om -1 to 1, where -1 represents a statement being false, 1 represents a statement being true, and 0 represents complete uncertainty whether the statement is false or true. Each rule is a ssigned a certainty factor (CF) th at represents confidence that a statement is true or false. The general fo rm of a rule with a certainty factor is: If , Then (CF), where is the premise containing the observed or derived fact s and is the conclusion that results from satisfying the prem ise. CF represents the confidence that the hypothesis is correct, gi ven the evidence. Certainty factors can be a ssigned in a number of ways Some may be assigned subjectively by asking an expert to assign a value of confidence to a rule based on past experience. Others may be determined by using probability to calculate a measure of belief and

PAGE 38

38 disbelief, then mathematically combining the results to yield a certainty factor. Regardless of how they are determined, the math for combining and propagating certainty factors is as follows (Buchanan & Shortliffe, 1984b): ) 1(old new old revisedCF CFCF CF if CFold and CFnew 0, (3-1) )) 1( (old new old revisedCF CFCF CF if CFold and CFnew 0, (3-2) ),min(1new old new old revisedCFCF CFCF CF if CFold XOR CFnew < 0. (3-3) These equations assume that a rules premises are known with absolute certainty. Unfortunately, such an assumption is often false because premises also have associated certainty factors. To handle situations where the evidence itself may be uncertain, the following rules of combination are used (Gonzalez & Dankel, 1993): 1. A rule with a single uncertain premise yields a CF that is the product of the conclusions CF and the premises CF. 2. A rule with a conjunction of uncertain premises yields a CF that is the product of the conclusions CF and the minimum CF of all the premises. 3. A rule with a disjunction of uncertain premises yields a CF that is the product of the conclusions CF and the maximum CF of all the premises. To better understand the use of these equations and rules, let us consider an example from driver training. In driver training, students are told that they should scan ahead for dangerous driving situations. When they identify a possibly dangerous situ ation, they should predict what might occur. At this point, they should decide what to do and execute their planned course of action. Let us imagine that Steve is an androi d that has been created to function as a normal human being in society. One day, he is riding hi s motorcycle along a narrow side street where a car is parked on the left hand side. In general, he would want to stay fart her to the right of the street to minimize the danger of an unseen child running out from behind the car or a person in the car opening a door into his dr iving path. As Steve nears the car, however, he notices a young

PAGE 39

39 boy on the right hand side of the street. He must now decide whether the pa rked car or the child is more likely to introduce a dangerous driving s ituation. Surveying the situation, Steve notices that the childs mother is present, but appear s preoccupied with her gardening. The child is playing with a ball and seems comp letely oblivious to the approach of the motorcycle. Steves computer brain begins processing this info rmation by assigning certainty factors to his observations. He is confident that the child is playing with a ball and that the mother is present, so he assigns these premises a CF = 0.9. He is only half certain that the moth er is inattentive, so he assigns this a CF = 0.5. He is fairly certain that the child has not notic ed the vehicle, so he assigns that a CF = 0.7. Steve re calls the rules of thumb that he learned from similar driving experiences: Rule 1: If the child is playing with a ball, then the child will run out into the street. (CF = 0.6) Rule 2: If the childs parent is present AND not attentive, then the child will run out into the street. (CF = -0.3) Rule 3: If the child has not noticed the vehicle, then the child will run out into the street. (CF = 0.7) Steve initializes his CF to zero and starts at Ru le 1. Based on the first rule of combination, he combines the certainty factors of the premise (CF = 0.9) and the conclusion (CF = 0.6) by multiplying to obtain a CF = 0.54. He then im plements Equation 3-1 to calculate his new certainty factor: 54 .0)01(54.00 revisedCF (3-4) Next, Steve moves to Rule 2. Using the second rule of combinati on, he selects the lower of the two CFs for the premise (CF = 0.5) and multiplies it with the CF for the conclusion (CF = -0.3) to obtain a CF = -0.15. He then combines th e current CF of 0.54 with the new CF using Equation 3-3:

PAGE 40

40 46.0 )15.0,54.0min(1 15.054.0 revisedCF. (3-5) Finally, Steve addresses Rule 3. Using the fi rst rule of combination, he combines the certainty factors of the premise (CF = 0.7) and the conclusion (CF = 0.7) by multiplying them to get CF = 0.49. Using Equation 3-1, he combines the current CF of 0.46 with the new CF to obtain his answer: 72.0)46.01(49.046.0 finalCF (3-6) So, Steve has a certainty factor of 0.72 that the child will run out into the street in front of him. Since this is a fairly high value, Steve chooses to move away from the right hand side of the street where the child is and closer to the car on the left hand side. Case-Based Reasoning Case-based reasoning is the method of using documented examples a nd their solutions to solve problems. Unlike traditional methods where the system designer must generate a cohesive set of rules that yield the correct answer, cas e-based reasoning systems generally maintain a database of examples, or cases, that are used to solve a problem. The basic structure of a case-based reasoning system consists of a library or database of historical cases, a way to retrieve similar cases from the library, and a way to m odify the solutions if the retrieved case is not identical to the problem. When a problem is introduced to a case-based reasoning system, the system searches its database for cases with similar attributes. When a sufficient number of similar cases are discovered, the solutions to th ese cases are then combined and modified to better match the problem being solved before the final solution is presented to the user. Greater detail on case-based reasoning can be found in Kolodner (1993). To better understand case-base d reasoning, let us examine an example adapted from Gonzalez and Dankel (1993). Alice is considering selling her ho me in Wonderland; however,

PAGE 41

41 she is unsure how much her house is worth. S earching the Internet, Al ice discovers an online case-based reasoning system designed to calculate the current market va lue of houses based on region. The system accomplishes this by keeping a record of all the recent home sales by region as well as basic information on each house, including the square footage, number of bedrooms, number of bathrooms, and whether the house has a pool or not. Alice tells the system that her house has 1500 square feet, 3 bedrooms, 3 bathrooms, and a pool. The system proceeds to look up all the property sales in Wonde rland with comparable size a nd characteristics to Alices house. At the end of the system search, the top fi ve most similar houses are selected (Table 3-4). The system selects the first house as most similar to Alices house, giving it a starting value of $150,000. The system now seeks to adapt the hous e value to include the value of Alices pool. The primary difference between the second and f ourth house is that the second house has a pool and the fourth house does not. The system takes the difference between the values of the two houses and determines that a pool is worth approximately $45,000. The same operation is performed for the third and fift h houses, yielding an approximate value of $65,000 for the pool. Averaging these two values, the system determin es that Alices pool is worth approximately $55,000. Adding the pools value to the starting value, the system approximates the value of Alices house to be $205,000. Table 3-4. Houses most similar to Alices house House ID Price Square footage # of bedrooms # of bathrooms Pool? 1 $150,000 1470 3 3 No 2 $225,000 1540 4 3 Yes 3 $180,000 1480 3 2 Yes 4 $180,000 1520 4 3 No 5 $115,000 1460 3 2 No There are some distinct advantages to casebased reasoning. First, case-based reasoning bypasses the bottleneck of gathering information fr om experts and converting them into rules.

PAGE 42

42 Second, case-based reasoning can be used in fiel ds where examples abound, but the fundamental principles are not well understood. As long as there are suffici ent cases that the system can access, the system can still function. The drawba ck is that without a well documented set of available cases a case-based approach cannot be implemented. Much research has been performed in the area of case-based reasoning in r ecent years. For a summary of this research refer to Nilsson and Sollenborn (2004). For an example of a case-based reasoning applied to the medical field, refer to papers on CASEY, a system for diagnosing heart failure (Koton, 1988). Also, see Althoff et al. (1998) for a case-based r easoning system directly a pplied to the field of toxicology. Nearest Neighbor Approaches Another common approach for solving classi fication problems is the use of nearest neighbor methods. Nearest neighbor methods require a number of training samples with characteristics that have been parametrized to create a numerical vector Each vector can be thought of as the point in n -dimensional solution space that is occupied by the sample, where n is the number of parameters in the vector. Once all the training samples are situated in the solution space, a clustering algorithm is used to label classification regions within the space as corresponding to a specific cla ss of objects. When an unknown object is introduced to the system, the system parametrizes the object by crea ting a vector of the same form as the training samples. These vector coordinates are used to calculate the distance between the objects vector and each of the classification regions. Finally, the unknown object is classified based on the label of the nearest re gion (Han & Kamber, 2001). Let us examine a simple nearest neighbor syst em designed to identify sports balls at a recreation center. Each ball is classified based on size and color. The system could use the diameter of the ball in centimeters as its si ze value; however, our system will simplify the

PAGE 43

43 problem by identifying balls as small, medium, or large with the corresponding values of 1, 2, and 3 respectively. Since some colors might be misidentified by certain peopl e, it is important to assign similar colors with consecutive numbers. For this reason, the system will use the order of the visible light spectrum in numbering the colors where red = 1, orange = 2, yellow = 3, green = 4, blue = 5, indigo = 6, and violet = 7. Table 3-5 shows six types of balls that the recreation center stocks along with their sizes, colors, and the corr esponding values. (For readers unfamiliar with indoor soccer balls, simply imagine a tennis ball the size of a standard soccer ball.) In the final column of the table, the ve ctors containing size and co lor values are shown. This vector situates each ball as a point in the domains 2-dime nsional solution space (Figure 3-6). Table 3-5. Characteristics of various sports balls Type Size Size value Color Color value Vector Tennis ball Small 1 Green 4 (1,4) Racket ball Small 1 Blue 5 (1,5) Water polo ball Medium 2 Yellow 3 (2,3) Indoor soccer ball Medium 2 Green 4 (2,4) Four square ball Large 3 Red 1 (3,1) Basketball Large 3 Orange 2 (3,2) Racket Ball Tennis Ball Water Polo Ball Indoor Soccer Ball Four Square Ball Basketball0 1 2 3 4 5 6 7 01234 SizeColor Figure 3-6. Vectors for sports balls plotted in 2-dimensional solution space

PAGE 44

44 If one of the workers at the recreation center were trying to identify an unknown ball that he describes as a large, green ball, he could in put the description into the system. For this example, let us assume that the system uses Euclidean distance to determine the nearest neighbor match. Euclidean distance from point p to point q is defined as: 2 2 22 2 11... ),(qnpn qp qpxx xxxxqpd (3-7) where ) ,...,,(21pn ppxxxp and ) ,...,,(21qn qqxxxq are n-dimensional vectors that define the object. To identify the large, gr een ball, the balls description must be parametrized. Looking at the definitions above, we can s ee that large is defined as 3 and green is defined as 4 giving the ball a vector of (3, 4). Next, the distance betw een the unknown ball and each ball type within the system must be measured as follows: 2 2),(qColor pColor qSize pSizexxxxqpd (3-8) where p represents the unknown ball and q represents a ball recorded in the system. Calculating the distance to each ball in the system yields: 24413) ,(2 2 TennisBall pd, (3-9) 236.25413) ,(2 2 RacketBall pd, (3-10) 414.13423) ,(2 2 all WaterPoloB pd, (3-11) 14423) ,(2 2 erBall IndoorSocc pd, (3-12) 31433) ,(2 2 Ball FourSquare pd, (3-13) 22433) ,(2 2 BasketBall pd. (3-14) From these results, the ball with the smallest Euclidean distance is selected as the nearest neighbor. Namely, the unknown large, green ball is identified as an in door soccer ball. The example above is a simplistic system. It is extremely limited in size and represents only one approach to solving classification prob lems using the nearest neighbor method. In

PAGE 45

45 reality, there is no limit, other than computational power, to the number of parameters that can be included in a vector. Characteristic parameters are not limited to linear mathematical values. They can also include non-linear values, binary values, nominal variables with multiple states, and many other representations. The Euclidean distance is only one of many methods for determining the nearest neighbor and may not al ways be appropriate for some parameters. Although our system did not use a clustering algorith m, most real world systems do. One way to incorporate a clustering algorithm into the sports ball recogniti on system would involve taking many samples of each ball type and entering them into the system. Perhaps only some tennis balls are green, while others are yellow or orange. Some systems might take a sampling of all of these types of tennis balls and th en calculate the centroid as the point to which distance should be calculated. More complex systems might define multiple points or a region to represent the tennis ball. For more general information on nearest neighbor methods and clustering algorithms, consult Han and Kamber (2001). Br adley et al. (1998) di scuss the scaling of clustering algorithms to handle large databases. Bayes Rule Bayes rule is perhaps the most widely know n and implemented technique for uncertainty management. Given certain knowledge, it enables th e user to identify and select the most likely solution through the use of probability th eory. Bayes rule is defined as: )( )()|( )|( xp ypyxp xyp (3-15) where p(x) and p(y) are the probabilities of events x and y occurring, respectively. The probability of event x occurring, given that even t y has occurred, is represented by p(x | y). Likewise, the probability of event y occurring, given that event x has occurred, is represented as p(y | x) (Duda et al., 2001).

PAGE 46

46 To put this equation in perspective, let us l ook at an example. The University of Florida campus includes a lake called Lake A lice. The lake is known to have alligators. For this reason, many people go to Lake Alice in the hopes of seei ng an alligator in the wild. Many birds also inhabit the regions of Lake A lice, including ducks. Unlike huma ns, who may have a hard time locating an alligator, ducks are more aware of th eir surroundings and tend to steer clear of areas where an alligator is present. By gathering data from many visits to La ke Alice, it has been determined that the probability of seeing a duck, p(duck), is 0.8, and the probability of seeing an alligator, p(gator), is 0.4. Since ducks avoid alligators, the probability of seeing a duck given that an alligator is present, p(duc k | gator), is only 0.2. With th is knowledge, we visit the lake in an attempt to locate an alligator. Looking around, we notice that there are ducks present, so we use Bayes rule to calculate the probability that an alligator is present: 1.0 8.0 4.0*2.0 )( )()|( )|( duckp gatorpgatorduckp duckgatorp (3-16) We find that the probability of an alligator bei ng present is only one in ten, so we probably should come back to look for alligators on a different day. Bayes rule has a distinct advantage over many other methods in that it has the support of well established mathematical theory. Bayes rule is limited, however, in that it assumes that all observations are mutually independent. Unfortunately this is not the case in the real world. One proposed solution to this problem is the use of Bayesian belief networks, which are discussed below in the section entitled Modern Approaches for Diagnosing Multiple Disorders. In spite of its limitations, Bayes rule can be applied e ffectively in many situations and can be expanded to include the probability of many events. For further info rmation on Bayes rule, including its derivation, refer to Duda et al. (2001).

PAGE 47

47 Other Approaches to Knowledge-Based Systems Fuzzy Logic In 1965, Lotfi Zadeh wrote a paper introducing f uzzy sets to the world. This paper was the birth of fuzzy logic. Fuzzy logic is an adva nced form of Boolean algebra that allows partial membership within different sets or categories. Boolean variables can only be absolutely true, represented by a 1, or absolutely false, represented by a 0. Fuzzy logic, however, allows variables to be partially true a nd partially false, represented by any value between 0 and 1. In a normal Boolean representation, a person can be tall or not tall. The problem with this representation is that there must be a clean cutoff for where tall begins and ends. If 6 were set as the cutoff for being tall, someone with a height of 5.9 would be considered not tall. Such differentiation does not fully represent the worl d in which we live because our world is not discrete. To compound the problem, human beings often think and speak using general, imprecise language where the characteristics of things such as tallness, are subjective in nature. Fuzzy logic is an attempt to capture the meaning of the imprecise, or fuzzy, statements inherent to human thinking and represent them in a mann er that enables a system to solve problems. Figure 3-7 shows a fuzzy logic graph with four sets: midget, short, tall, and giant. The graph shows that the set(s) to which a person belongs have memberships ranging from 0 to 1 depending on the height of the i ndividual. Based on the graph, a person with a height of 5 is considered fully short with a value of 1 and be longs to no other sets because they all have a membership value of 0 at 5. Likewise, a person w ith a height of 6 is tall, a person shorter than 4 is a midget, and a person taller than 7 is a giant with no membership in any other sets. What happens between these heights demonstrates the di fference between fuzzy logic and Boolean algebra. If Joann is 5 tall, as shown on the chart, she has a 0.5 membership in both short and tall sets. Likewise, Ryan, who is 6, has a membership of 0.25 in tall and 0.75 in giant.

PAGE 48

48 Although Figure 3-7 is drawn with linear slopes any variety of func tions may be used. Furthermore, it is not necessary for memberships to add up to 1. If a medium height membership were created with its peak at 5, Joann would then have a membership of 1 in the medium set in addition to her 0.5 membership in both short and tall sets. Membership 5'6" Joann Ryan 6'9" Figure 3-7. Fuzzy logic graph for human heights Fuzzy logic uses a modified version of Boolean operators to perform its operations. The fundamental operators are as follows: 1. Compliment: NOT(A) = 1-A, 2. Union: A OR B = Max(A,B), 3. Intersection: A AND B = Min(A,B). Other operators have been created to mimic human language, such as using 2A to represent very and A to represent more or less functions. Due to its imprecise nature and lack of mathematical proofs, fuzzy logic has many opponents in the technical world. In spite of this, it has been successfully implemented in a variety of fields including data mining and know ledge-based systems. Recent research in applying fuzzy logic to data mining includes a sy stem by Delgado et al. (2000) to mine medical databases, Au and Chans (2003) system for mining rules from a large banking database, and Wangs (2003) system for generalized data mini ng. Liu and Yan (1997) have also created a

PAGE 49

49 system that combines fuzzy networks and casebased reasoning to solve diagnostic problems. For more general information on fuzzy l ogic, consult Gonzalez and Dankel (1993). Dempster-Schafer The Dempster-Schafer theory was developed by Arthur Dempster a nd Glenn Schafer in 1967. Although rarely used in practice due to its high computational requirements, the Dempster-Schafer theory is one of the clas sic approaches to handling uncertainty in knowledge-based systems. Dempster-Schafer is unique in that it gives conf idence values to sets, rather than solely individual facts, and is cap able of representing our certainty about certainty (Gonzalez & Dankel, 1993, p. 253). For more information regarding the Dempster-Schafer method, refer to Gonzalez and Dankel (1993) or the original paper by Arthur Dempster (1967). Rough Sets Rough set theory was proposed for the field of data mining by Zdzislaw Pawlak in 1982. Rough sets are formed by examining the data available to the syst em and identifying any extraneous feature points that are not necessa ry for differentiating between cases. These extraneous features are then removed and the rema ining features form a construct called a reduct that is used for classificati on or identification of unknowns (Kusiak et al., 2000). Although computationally effective to use reducts, users of expert systems can be reluctant to make decisions based on the minimum number of features, rather they would like to see the same decision reached by alternative sets of features" (Kusiak et al., 2001, p. 225). For this reason, the use of rough sets can be abhorrent to doctors that feel the more information used in making a decision, the better. Kusiak et al. (2000) offer an excellent short summary of rough sets as it relates to their research in diagnosing and treating lung a bnormalities called solitary pulmonary nodules. Kusiak et al. (2001) present an algorithm based around rough sets for extracting rules relating to

PAGE 50

50 heart arrhythmia. Tsumoto (2000) also presents a rough set appro ach to diagnosing diseases. To increase system accuracy, Tsumotos system creat es both positive rules to rule in and negative rules to rule out possible diseases. Genetic Algorithms John Holland introduced the idea of geneti c algorithms in 1975. The philosophy behind genetic algorithms is to model natural selection in nature. In short, natura l selection states that the organisms with the most advantageous genes fo r survival tend to pass their genetics on to the next generation of organisms, while those with in ferior genes tend to die before they reproduce. Through this process, the offspring of a species gradually become fitter and more capable of survival. In the same way that the DNA of a species is divided into chromosomes, genetic algorithms are made of building blocks of code, called primitives. These primitives are the smallest functional units of code and cannot be separated. By randomly assembling algorithms from primitives, the first generation of algorithms is created. Each of these algorithms is then evaluated by a fitness function to quantify its performance, or fitness. The fitness of each algorithm is used to determine the probability that the algorithm is selected to contribute to the next generation of algorithms. The better the algo rithms fitness, the more likely it is to be selected. There are three ways that an algor ithm can contribute to the next generation: reproduction, crossover, and mutation. For each of these methods, a certain percentage of the current generation of algorithms is randomly selected. The algorithms selected for reproduction are copied directly to the next generation wit hout modification. The algorithms selected for crossover are paired with a second algorithm. Bo th algorithms are broke n at a random location in their code. The segments are then exchanged between the algorithms so that each algorithm has a piece of the others code and these newly modified algorith ms become a part of the next

PAGE 51

51 generation. Algorithms chosen for mutation have a random segment of code deleted from their programming and replaced by another randomly generated set of code. Once all of this has occurred, the algorithms selected for reproduction, crossover, a nd mutation are all compiled to become a new generation of algorithms. Like their predecessors, the new generation will be evaluated by a fitness function and then some are selected to repr oduce, crossover, or mutate to create the next generation of algorithms (Nilsson, 1998). The process of creating a useful algorithm using the genetic algorithm approach takes thousands to millions of iterations to complete and may never fully optimize the algorithms code. Due to their randomness, however, genetic algorithms have great utility in optimization problems because they are much less likely to converge on local maxima or minima. Vinterbo and Ohno-Machado (2000) have also applied gene tic algorithms to the problem of diagnosing multiple disorders. For more general informa tion on genetic algorithms, refer to Nilsson (1998). Artificial Neural Networks Like genetic algorithms, artifici al neural networks were in spired by nature. Artificial neural networks are composed of units that roughl y approximate the firing of a neuron in a biological organism. In a biolog ical organism, a neuron sits in active until it is stimulated beyond its activation threshold. When this occurs, the neuron fires, sending its signal to the brain. The firing of a neuron is an all or nothing response. There is no variability in the signal it sends. In the same way, units in artificial neural networ ks can be thought of as having an activation threshold that turns on when its inputs exceed a certain threshold and has a value of zero at all other times. In practice, th e units employ a differentiable function, like a sigmoid, to approximate a step response. Differentiability is important in the traini ng of a neural network, which is discussed below.

PAGE 52

52 The most common artificial neural networ k is known as a f eedforward multilayer perceptron. This type of neural network consists of an input la yer, an output layer, and any number of hidden layers. Figure 3-8 shows an example neural network with two hidden layers. In creating a neural network, the designer must de termine the inputs to the system as well as how many outputs are necessary to solve the problem. The designer must also select the number of hidden layers and the number of units to be includ ed per layer. As seen in the figure, inputs are directly connected to every unit in the first hidden layer. Likewise, every unit in the first hidden layer is connected to every un it in the second layer and so on, all the way through the output layer. Each connection has an associated weight that acts as a multiplier. By adjusting the weights, the influence an input ha s on a specific unit can be contro lled. The weights of a neural network are usually initialized randomly with small values. A bias unit, th at always outputs 1, is also included at every layer. Adjusting the weig ht between the bias unit and another unit shifts the activation threshold of that unit (Nechyba, 2003). Figure 3-8. Typical ar tificial neural networ k with two hidden layers. Figure used with permission from Nechyba (2003, p. 7).

PAGE 53

53 Training a neural network to solve problem s involves adjusting the weights for the connections between each unit. The most well known algorithm for adjusting weights is the backpropagation algorithm, published in 1986 by Rume lhart and McClelland. By introducing an input with a known solution to th e neural network, the difference between the output of the system and the desired solution output can be co mpared. The backpropagation algorithm is then used to adjust the weights accordingly. By rep eating this process many times with a variety of samples, the weights gradually converge to a lo cal minima in an attempt to maximize the number of samples the system can correctly identify. Artificial neural networks ar e non-linear function approximators Their strength lies in their ability to train them selves from sample cases, however, this is also their weakness. Because of the complexity of the network itself, it is hard to understand a nd explain the internal workings of a trained neural network. Abidi and Manickam (2002) have create d a hybrid system using case-based reasoning and neural networks to da ta mine medical systems. For a general discussion on neural networ ks, refer to Nilsson (1998). Modern Approaches for Diagnosing Multiple Disorders Over the years, researchers have implemented a variety of methods, including those discussed above, in an attempt to diagnose problem s involving multiple disorders. In most cases, linearity and statistical independe nce cannot be assumed in problem s of this nature. For this reason, the challenge of efficiently and effec tively diagnosing multiple disorders remains an important area of research today. In recent y ears, two problem solving methods appear to be taking the forefront. The first involves the m odification of Bayesian methods to account for dependencies. The second involves the use of set theory to a pproach the problem.

PAGE 54

54 Bayesian Belief Networks As discussed above, Bayes rule requires st atistical independence of events to solve problems. Over the years, many variations using Bayesian methods have been developed to account for dependencies in a data set. In this section, we di scuss perhaps the best documented of these approaches, namely Bayesian belief networks. Bayesian belief networks allow dependencies to be included in a systems probability calculations. Figure 3-9 displays a graphical representation of a belief network. As shown, it can be seen that belief networks consist of tw o parts, a directed acyclic graph and conditional probability tables (Han & Kamber, 2001). The gra ph portion consists of nodes, which represent random events, and arcs, which portray statistica l dependencies between nodes. The example in Figure 3-9 contains six random events, including DarkClouds, Humidity, and Rain. Arcs are drawn between DarkClouds and Rain as well as Humidity and Rain to show that the presence of DarkClouds and/or Humidity influences the lik elihood of Rain. A conditional probability table is drawn to the right of the graph. Belief netw orks have one table for every node in the graph, however, only the table for the Rain node is given in this example. The table shows the various probabilities for the occurrence of Rain given the presence or absence of Rains parents, DarkClouds and Humidity. Represented mathematica lly, the first column of the table states that: 9.0) | ( True Humidity True DarkClouds TrueRainP, (3-17) 1.0) | ( True Humidity True DarkClouds False RainP. (3-18) There are many ways to train a Bayesian belief ne twork. If the struct ure of the network is known and the events are observable, conditiona l probability tables can be calculated using standard probability and statistics calculations. If the structure is known but not all of the events are observable, a gradient descent method can be used to determine a local optimum of

PAGE 55

55 probabilities. For more general information on th e structure and generati on of Bayesian belief networks, refer to Han & Kamber (2001). (A) (B) DarkClouds Humidity Rain Erosion LandSlides RoadClosures 0.9 0.7 0.4 0.1 0.1 0.3 0.6 0.9 DC, HDC, HDC, HDC, H R R Figure 3-9. Example Bayesian belief network. (A) Directed acyclic graph of dependencies. (B) Conditional probability table for Rain where R = Rain, DC = DarkClouds, and H = Humidity. The application of modified Bayesian methods is one of the most promising areas of research in diagnosing multiple disorders. Ba yesian belief networks have been used by van der Gaag and Wessels (1994) in an attempt to efficiently diagnose multiple disorders. The HEPAR II system, by Onisko et al. (2000, 2001), also uses belief networks to diagnose multiple disorders in the field of hepa tology. Other Bayesian variations exist, including Peng and Reggias (1989) use of a comfort measure that attemp ts to adapt Bayes rule to the diagnosis of multiple disorders. Additional research by Peng and Reggia (1986, 1987) includes the creation of a hybrid system, combining both Bayesian classification techniques and the set covering model. The next section provides an introduction to set covering.

PAGE 56

56 Set Covering Another approach to diagnosing multiple disorder s is the use of set covering theory. Given a case with a set of observed symptoms, set c overing seeks to constr uct a solution set of disorders that can best account for the symptoms. In genera ting solution sets, it is not uncommon for several plausible problem solutions to exist. In such cases, the principle of Occams Razor is generally applied, meaning that the simplest explanation is usually the best. For this reason, set covering is also known as the parsimonious covering theory (Peng & Reggia, 1986). To understand set covering, we must begin by formally defining three universal sets: 1. },...,,{21mdddD where D is the set containing every possible disorder, d, 2. },...,,{21nsssS where S is the set containing every possible symptom, s 3. },...,,{21prrrR where R is the set containing ev ery possible relationship, r The relationships within set R are tuples consisting of a di sorder and a corresponding symptom such that ) ,(jiksdr where sj is a symptom that may be caused by disorder di. It is important to note that disorder di does not always result in symptom sj. Likewise, symptom sj may be caused by a disorder other than di (Peng & Reggia, 1986). Generate-and-test is the simplest algorithm for solving cases involvi ng set covering. To implement the algorithm correctly, three more sets must be defined: 4. SFO where FO contains the observed symptoms for a particular case, 5. DH where H is the hypothesized disorder se t that may be responsible for FO, 6. SFH, where FH contains the symptoms associated with H by set R When a set of observed symptoms is presente d to the system for diagnosis, each of these symptoms is stored in FO. The system then generates hypothetical sets of H to determine the disorders that could be causing the obse rved symptoms. When a hypothetical set H is generated, FH is populated by all of the symptoms that can be caused by the proposed disorders in H FH is

PAGE 57

57 then compared with FO to determine if FH covers, or contains, every symptom in set FO. If FH covers FO, the solution is a plausible solution. As stated above, set covering systems generally follow the principle of Occams Razor. For th is reason, generate-and-test algorithms usually begin with sets of H consisting of single disorders. If no suitable solution is found, the system then considers double exposure cases followed by complex multiple exposure cases as necessary (Baumeister et al., 2001). Figure 3-10 presents a graphical re presentation of the relationships, R between disorders, D and symptoms, S. Let us assume that that all five symptoms are presented to the system as observed findings, FO. Using a simple generate-and-test algorithm, the system checks every individual disorder, d, in an attempt to find a single disorder solution to the problem. As seen from the graph, no single disorder can satisfy the observed symptoms. Then, the system checks for multiple disorders capable of co vering the symptoms contained in FO. It should be apparent that there exist at least three solution sets to the problem, (d2, d3), ( d1, d3, d4), and ( d1, d4, d5). Using the heuristic of minimality, the system would select ( d2, d3) as the solution. At times, however, minimality will not yield the best soluti on to a problem. For example, if we were aware that disorder d2 was an extremely rare disease and high ly unlikely to occur, either of the other two possible solutions might provide a better answer. Furthermore, in comparing ( d1, d3, d4) to (d1, d4, d5), we can see that the former results in more redundancy due to the presence of d3 in the solution. If redundancy is consid ered a negative aspect in a solution, the system should ultimately select ( d1, d4, d5) as the solution set. From this example, it can be seen that the principle of parsimony and Occams Razo r can be applied in many different ways and, although favoring the smallest solu tion set is often a useful heur istic, there are many instances where this approach does not yield the best re sult. For more information on the nature of

PAGE 58

58 parsimony, refer to Peng & Reggia (1986). Atzmuelle r et al. (2004a) also a ddress this topic from a different perspective. Figure 3-10. Set covering graph of rela tionships between diso rders and symptoms Set covering theory has been applied to numerous systems for diagnosing multiple disorders. The paper by Reggia et al. (1983) o ffers a solid introduction to set covering theory and its applications to knowle dge-based systems. Peng & Reggia (1986, 1987) expanded this work by creating a hybrid system to take a dvantage of both set covering and Bayesian classification techniques. In more recent years, a paper by Baumeister et al. (2001) presents a set covering system that incrementally refines itself al ong with an excellent ov erview of set covering theory. In the past 5 years, Atzmueller et al. (2003a, 2003b, 2004a, 2004b) have presented a significant amount of research incl uding the expansion of set cove ring to make use of diagnostic scores and case-based reasoning. Conclusion In this chapter, we discussed the methods used in designing and implementing both knowledge-based systems and data mining systems. We began with an extensive discussion of

PAGE 59

59 rule-based systems and certainty factors. Fr om there we moved to other foundational topics, including case-based reasoning, nearest neighbor classification, and Bayes rule. Next we discussed the less mainstream topi cs of fuzzy logic, Dempster-S chafer, and rough sets along with other approaches less relevant to our research, such as genetic algorithms and artificial neural networks. The chapter concluded by discussing Bayesian belief networks and set theory, which are two of the most relevant modern appro aches to diagnosing multiple disorders with knowledge-based systems. The following chapter begins with a discus sion of the mathematics used throughout the medical field. It continues with a discussi on of important knowledge-based systems in the medical field. Finally, it concludes with a literature review of systems that have been designed for the purpose of diagnosing medical disorders.

PAGE 60

60 CHAPTER 4 MEDICAL MATHEMATICS AND RELEVA NT KNOWLEDGE-B ASED SYSTEMS The previous chapter discussed a variety of established approaches for knowledge-based systems and data mining. This chapter gives an overview of many systems that have been developed using those techniques. A strong empha sis is given to historical medical expert systems, diagnostic systems in toxicology, and modern systems for diagnosing multiple disorders, as these are most relevant to this res earch. The information presented in this chapter is cursory at best, and the reader is encouraged to study the refe rences for a proper understanding of any systems of interest. Before addressing the systems, however, the chapter begins with a discussion of the mathematics employed in the field of medicine. Medical Mathematics The medical field presents a unique set of challenges for knowledge engineering. Distinctions ranging from ethical and legal issues to fundament ally different mathematical understandings set the domain of medicine apart from all other domains. Cios and Moore (2002) thoroughly discuss the considerations that must be observed in the medical field. For our purposes, however, let us focus on the sta ndard mathematical approaches used for decision-making in medicine. Probabilistic Measurements Many knowledge-based systems use precisi on as a conclusive measurement of performance. Precision is the percentage of true positives (TP) compared to the total number of cases classified as positive events: %100 FPTP TP precision (4-1) where FP represents false positives. According to Cios and Moore (2002), This measurement is very popular in machine learning and pattern re cognition communities, but is not acceptable in

PAGE 61

61 medicine because it hides essential details of the achieved results (p. 4). To better understand the performance of a diagnostic test, the me dical profession define s a number of other measurements. Let us begin by examining a conti ngency table (Table 4-1) Contingency tables contain four variables: true pos itives (TP), true negatives (TN), false positives (FP), and false negatives (FN). A true positive occurs when a test correctly diagnoses a patient as having a disorder. A true negative occurs when a test correctly diagnoses a patient as not having a disorder. A false positive occurs when a test incorrectly diagnoses a patient as having a disorder. A false negative occurs when a test incorrectly diagnoses a patient as not having a disorder. Table 4-1. Contingency table Test results Disorder present Disorder absent Total Positive TP FP TP + FP Negative FN TN TN + FN Total TP + FN TN + FP Another measurement of performance, frequen tly used in conjunction with precision, is accuracy. Accuracy is the number of correctly classified cases co mpared to the total number of cases presented to a system: %100 FNFPTNTP TNTP accuracy (4-2) Even when used in combination, however, pr ecision and accuracy do not fully capture the information necessary for medical diagnosis. Perhaps the most common measurements in the medical field are sensitivity and sp ecificity, which are defined as: FNTP TP ysensitivit (4-3) FPTN TN yspecificit (4-4) Sensitivity is also known as the true-positive rate (TPR). It represents the probability that the test detects the disorder, given that the patient ha s the disorder. Specificity is also known as the

PAGE 62

62 true-negative rate (TNR). It represents the probability of the test de tecting no disorder, given that the patient truly does not ha ve the disorder. These measurem ents are important in diagnosis because tests are never absolutely accurate. Th ere are always instances where a patient with a disorder displays fewer symptoms than a patient without the disorder. For this reason, diagnostic tests in medicine are tuned to rule in or rule out a diagnosis. If a physician is attempting to rule in a disorder, he should select a test with a high specificity so th at a positive test result strongly confirms his premonition. Likewise, a physician attempting to rule out a disorder should use a test with a high sensitivity. Two measurements often used in conjunction with sensitivity and specificity are the false-negative rate (FNR) and the false-positive rate (FPR), also known as the false-alarm rate (FAR): ysensitivit TPFN FN FNR 1 (4-5) yspecificit TNFP FP FAR 1 (4-6) The false-negative rate and false-alarm rate are pr obabilities associated wi th a test inaccurately diagnosing a patient. The false-negative rate repres ents the probability of a test failing to detect a disorder that is present, whereas, the false-alarm rate represents the probability of a test falsely indicating that a patie nt has a disorder. Alternates to sensitivity and specificity are the positive predictive value (PPV) and negative predictive value (NPV): FPTP TP PPV (4-7) FNTN TN NPV (4-8)

PAGE 63

63 The positive predictive value is the likelihood that given positive test re sults, the patient does indeed have the disorder. In a similar manner, the negative predictive value is the likelihood that, given negative test results, the patient trul y does not have the disorder. There is an important difference between the measurements of sensitivity and specificity and the measurements of positive and negative predictive values. While sensitivity and specificity are independent of the population being tested, positive and negative predictive value are effected by the prevalence of a diseas e within a population. To contrast sensitivity and speci ficity with positive and negative predictive values, let us consider an example from the Medical Universi ty of South Carolina Doctoring Curriculum (2000). Imagine that a new test for detecting HIV is discovered. To determine its usefulness, an experiment with 10,000 HIV infected blood sa mples and 10,000 non-infected blood samples is performed. The testing results in all correct an swers except for 10 false positives and 10 false negatives, yielding a sensitivity, specificity, PPV, and NPV of 99.9% (Table 4-2). Additionally, the pre-test probability indicates that there is a 50% chance for a randomly selected blood sample to contain HIV. Table 4-2. Experimental HIV te sting extended contingency table Test results Disorder present Disorder absent Total Positive 9,990 (TP) 10 (FP) 10,000 PPV = 9,990/10,000 = 99.9% Negative 10 (FN) 9,990 (TN) 10,000 NPV = 9,990/10,000 = 99.9% Total 10,000 10,000 20,000 Sensitivity = 9,990/10,000 = 99.9% Specificity = 9,990/10,000 = 99.9% Pre-test probability = 10,000/20,000 = 50%

PAGE 64

64 Now let us apply the test to a population of one million people where 1% of the population is infected with HIV. Since sensitivity and spec ificity are a function of the ability of a test to identify HIV carriers, their values do not change (Table 4-3). In contrast, the PPV decreases by 8.9% and the NPV increases slightly. The signifi cance of the decrease in PPV is that if the physician informed patients that they had HIV based solely on this test, 990 individuals would be falsely informed that they were infected. Neither sensitivity nor specificity gives the physician an indicator of this change from the previous contingency table. Table 4-3. HIV testing (1% chance of HIV) extended contingency table Test results Disorder present Disorder absent Total Positive 9,990 (TP) 990 (FP) 10,980 PPV = 9,990/10,980 = 91.0% Negative 10 (FN) 989,010 (TN) 989,020 NPV = 989,010/989,020 = 99.999% Total 10,000 990,000 1,000,000 Sensitivity = 9,990/10,000 = 99.9% Specificity = 989,010/990,000 = 99.9% Pre-test probability = 10,000/1,000,000 = 1% Table 4-4. HIV testing (0.1% chance of HIV) extended contingency table Test results Disorder present Disorder absent Total Positive 999 (TP) 999 (FP) 1,998 PPV = 999/1,998 = 50.0% Negative 1 (FN) 998,001 (TN) 998,002 NPV = 998,001/998,002 = 99.999% Total 1,000 999,000 1,000,000 Sensitivity = 999/1,000 = 99.9% Specificity = 998,001/999,000 = 99.9% Pre-test probability = 1,000/1,000,000 = 0.1% Let us look at one more example in relation to HIV testing. Applying the test to a pool of blood donors that have already b een screened for HIV risk f actors, we would expect the

PAGE 65

65 percentage of HIV infected indivi duals to be closer to 0.1%. Again, the contingency table is shown for a population of one million people (Table 4-4). The calculations show that the PPV drops to 50%, while sensitivity and specificity remain constant. The results from these three contingency tables demonstrate that while sensitivity and spec ificity normally should remain constant, the PPV and NPV are dependent upon the pr evalence of a disease within a population. The final mathematical expression commonly us ed in the medical field is the likelihood ratio (LR). The likelihood ratio is the odds that a specific test result is given to a patient with the disorder compared to the same test result being gi ven to a patient without the disorder. There are two types of likelihood ratios, LR+ and LR-, which can be calculated as follows: FAR ysensitivit TNFP FP FNTP TP LR (4-9) yspecificit FNR TNFP TN FNTP FN LR (4-10) where LR+ is the odds that a positive test result occurs for a patient with the disorder versus one without the disorder and LRis the odds that a negative test re sult occurs for a patient with the disorder verses one without the disorder. T hus, good diagnostic tests should have a high LR+ and a low LR-. One major advantage of lik elihood ratios is that they can be easily combined through multiplication. For this reason, the syst em discussed in Chapters 5 and 6 utilizes LR+ to diagnose toxic exposures. In this section, we have discussed the mo st common statistical expressions in medical literature. For more detailed information, re fer to Owens and Sox (2001). The next section discusses a different process freque ntly used in medical diagnosis.

PAGE 66

66 Diagnostic Scores The use of diagnostic scores is a simple approach to risk analys is in the medical field. In this method, signs and symptoms are assigned a point value, or score, based on their correlation to a specific disorder. In forming a diagnosis, the physician gathers a list of all the signs and symptoms observed in the patient. He then looks up the score for each observation on a chart. Adding the scores together yields the final diagnostic score, whic h is compared to another chart to determine the risk of the patient for a specific disorder. Table 4-5 and Table 4-6 are examples of th e charts that a physician might use when implementing diagnostic scores for appendicitis, based on the research by Ohmann et al. (1999). As can be seen, if a patients onl y variables are rigidity and having an age less than 50 years old, the final diagnostic score sums to 2.5. In Tabl e 4-6, we find that the odds of the patient having appendicitis are 3%. From these observations, the physician can be fa irly certain that the patient does not have appendicitis. However, if the patient satisfied the requirements for every variable in Table 4-5, the patients final diagnostic score would be 16.0 indi cating a 68% risk of appendicitis. Depending on the fi nal diagnostic score, the phys ician may recommend different tests or treatments for the patient. Table 4-5. Diagnostic scores for ac ute appendicitis (Ohmann et al., 1999) Variable Points Tenderness, right lower quadrant 4.5 Rebound tenderness 2.5 No micturition difficulties 2.0 Steady pain 2.0 Leukocyte count 10.0 109/L 1.5 Age < 50 years 1.5 Relocation of pain to right lower quadrant 1.0 Rigidity 1.0

PAGE 67

67 Table 4-6. Final diagnosis score significance (Ohmann et al., 1999) Diagnostic score Frequency < 4.0 points 3% 4.0.5 points 5% 6.0.5 points 11% 8.0.5 points 24% 10.0.5 points 32% 12.0.5 points 55% > 14.0 points 68% Literature Review of Knowledge-Based Systems Although the field of medicine possesses it s own established mathematical methods, surprisingly few systems have taken advantage of them. The majority of systems rely on established engineering approaches or devise their own repres entation scheme. This section begins by discussing two significant hi storical systems in the field of medicine. It then presents a selection of systems directly applied to the field of toxicology. The chapter concludes by addressing the systems specifically designed for the diagnosis of multiple disorders. Historical Medical Expert Systems Research in the field of me dical expert systems began in the early 1970s with the development of MYCIN. Created at Stanford University by Buchanan and Shortliffe (1984a), MYCIN was designed for the purpose of diagnosing infectious blood dis eases to recommend the appropriate antibiotics for treatment. MYCIN utilized a knowledge base consisting of approximately 500 rules, and its inference engine architecture was constructed as an inference network. The system would query the user using simple yes or no questions until enough information was gathered to identify the bacter ia responsible for the symptoms. To handle uncertainty, MYCIN employed certai nty factors, discussed in Chap ter 3. Research has shown that using these methods MYCIN was able to out perform faculty at the Stanford medical school in diagnosing diseases within its domain (Yu et al., 1984). Interest ingly, later research seems to indicate that the use of certai nty factors was superfluous and that MYCIN could perform equally

PAGE 68

68 effectively without them (Buchanan & Shortliffe, 1984b). A lthough a foundational pillar in the field of expert systems, MYCIN was never used in the medical field. Ethical and legal issues along with a mistrust of computer systems within the field of medicine were major contributors in preventing its commercializat ion. For an exhaustive discu ssion about MYCIN, see Buchanan & Shortliffe (1984a). A second foundational system in the field of medical expert syst ems is INTERNIST. INTERNIST was developed by Pople (1977) at the University of Pittsburgh during the same era as MYCIN. The goal in developing INTERNIST was to create a system capable of handling general internal medicine, as opposed to the spec ialized domains traditiona lly occupied by expert systems (Pople, 1985a). INTERNIST was devel oped and refined over the course of a decade through interviews with Jack My ers, MD, and became one of the largest and broadest expert systems ever created. Before completion, th e system contained information on more than 3550 symptoms (Miller et al., 1982) and could dia gnose more than 750 diseases (Pople, 1985b). INTERNISTs inference engine employed a ranking program to perform diagnosis. To make the domain size manageable, it also used heuristic ally guided partitioning rules. By breaking problems down into smaller subsets, the system was able to better handl e the broad domain of internal medicine. CADUCEUS, an eventual successor to INTERNIST, implemented a problem decomposition method in an attempt to better handle multiple disorders. Regrettably, CADUCEUS suffered from other limitations due to its requirement for prior knowledge of domain structure. Pople ( 1985b) presents a thorough progres sion of the INTERNIST system from its origins through the development of CA DUCEUS. Other references of note include Pople (1977) and Miller et al. ( 1982). For an excellent example of INTERNISTs interface and interaction with the us er, see Pople (1985a).

PAGE 69

69 Expert Systems in Toxicology There exist surprisingly few knowledge-based syst ems in the field of clinical toxicology. In fact, according to Darmoni, in 1995 Toxline and Toxlit [showed] that less than ten computer-aided decision support systems [had] been developed in clinical toxicology (p. 234). Of these systems, two in particular stand out from the rest: a French system called SETH and a Bulgarian system called MEDICOTOX-CONSILIUM. In recent years, two more systems of interest have been developed, th e Inreca system for use in Russia and a Polish veterinary system. A summary of each of these four systems is given below. SETH was developed in France by Darmoni et al. (1994, 1995) for use in the Rouen University Hospital. The system uses 70 signs and symptoms for diagnosis and contains over 1000 drugs from over 75 toxicological classes (Darmoni, 1994). SETH was implemented on a commercial off the shelf, object oriented, expert system shell called KBMS. Its inference engine is a rule-based, forward chaining system that ut ilizes the Rete algorithm for pattern matching. SETH also makes use of set theory for diagnos ing cases involving multiple drugs. In 1992, the system began experimental use at the Rouen University Hospit al where it was used in the diagnosis of over 2000 drug intoxication cases (D armoni, 1995). Although its creators caution that the system was not designed for use by expert s, the ratings given by re sidents at the hospital indicate that they were pleased with the sy stem. For more information on SETH, refer to Darmoni et al. (1994, 1995). MEDICOTOX-CONSILIUM was developed for use in the hospitals of Bulgaria as a diagnostic system for first aid c linical toxicology. It was firs t implemented in 1988 at a single hospital and eventually distribut ed to 11 more hospitals around the country. The system is described as a classical system that uses frame structures, rules, and scores provided by experts for diagnosis. Within the frame structure, pois ons are divided into 10 classes with 310 groups

PAGE 70

70 containing a total of 2500 different kinds of poisons (Monov et al., 1992). The system contains 1000 rules and facts that use 47 syndrome and 134 symptom definitions to identify poisons and supply the user with information about the appr opriate cure from any of 86 treatments and 55 antidotes (Monov et al., 1992). MEDICOTOX-CONSILIUM is focused on user interaction and, rather than simply producing a diagnosis, seeks to leave the final decision to the user. It also offers three different modes to maximize its utility in different circumstances. The first mode is the clinical orientation mode and is useful for diagnosing urgent cases where immediate action is required. The second mode is the diagnostic resear ch mode that can be used to carefully reason through less urgent cases. The fi nal mode is the expert-reference mode that enables the user to look up information on any of the drugs and toxi ns contained within the system. For more information on MEDICOTOX-CONSILIUM, refer to Monov et al. (1992). Another system for toxicology was develope d by Althoff et al. (1998) for use by the Russian Toxicology Information and Advisory Ce nter in Moscow. The system is based on previous research called the In reca (Induction and Reasoning from Ca ses) European project. The Inreca approach is a case-based reasoning system designed to use histor ical cases to diagnose disorders. This particular system utilizes the database of the Toxicology Information and Advisory Center of the Russian Federation Ministry of Health and Me dical Industry to supply the cases for diagnosing poison exposures. The location of Russia was explicitly chosen for its abundance of data because every year Russia has more intoxication cases than any other country in Europe (Althoff et al., 1998, p. 27). A distinct aspect to th is case-based reasoning system is that, rather than interpreting the cas es at run time, it compiles the data into an Inreca-Tree in advance to improve performance. The Inreca-Tree is basically a specialized decision tree that includes a br anch at every decision node to account for the possibility of

PAGE 71

71 unknown measurements. For more information a bout the Inreca system for diagnosing poison cases, refer to Althoff et al. (1998). The final toxicology system of interest is being developed at Warsaw Agriculture University in Poland by Kluza (2004) for the area of veterinary medicine. The system utilizes case-based reasoning for the purpose of offering remo te consultations to veterinarians working in the field. As presented by Kluza in 2004, the project is still in th e launching phase; however, since it is being designed for veterinarian medi cine the system faces some unique challenges of interest. First, being designed for animals, Kluzas system must not only include gender and age in its diagnosis, but must also account for di fferences between various species and breeds. Second, because animals cannot verbally communicat e with veterinarians, every diagnosis must be performed without certain prior knowledge that is often available in toxicology cases involving humans. For more in formation, see Kluza (2004). The four systems presented in this section we re selected to give the reader a general understanding of the techniques th at have been used in clinical toxicology. SETH and MEDICOTOX-CONSILIUM are two of the most prominent systems in the field. The Inreca approach is an excellent example of the si mplicity and robustness that are necessary for advancement in the fields of knowledge-based systems and data mining. Finally, the Polish veterinary system represents a current topic of research in the field and poses some unique challenges for consideration. Very few knowledge-b ased systems exist for the field of clinical toxicology. For a fairly exhaus tive list of the systems being used throughout the field, consult Darmoni et al. (1994). Knowledge-Based Systems for the Di agnosis of Multiple Disorders The previous section discussed expert systems in the field of clini cal toxicology. Although most of these systems are by necessity forced to address the challenge of diagnosing multiple

PAGE 72

72 disorders to some degree, it was not the primary thru st of the research. This section presents an overview of the research that has been done e xplicitly for the purpose of diagnosing multiple disorders. Like the field of c linical toxicology, relatively little knowledge-based system research has been performed in the area of multiple disorders. This section presents research on four major types of systems used in multiple disorder diagnosis. The four me thods discussed include Bayesian approaches, case-based reasoning, set covering, and diagnostic scores. Note that, although we divide these systems into four types for the sake of discussion, many systems may contain aspects from multiple approaches. To begin, let us discuss system s that use Bayesian approaches. Bayes rule is a probability based equation that can be used to identify the most likely disorder. The problem is that Bayes rule requires independence of the symptoms used in diagnosis. Much research has focused on the generalization of Bayes rule to account for dependencies within a domain. Research by Ben-Bassat et al. (1983) presents a Bayesian pattern recognition algorithm used in the MEDAS emergency diagnosis system to overcome the limitations of Bayes rule. Ben-Bassat et al. (1980) al so discuss some of the other early approaches for handling a violation of the conditional independence assumption in cl assical Bayesian diagnosis models (p. 153). One of the most noteworthy accomplishments of Bayesian research was the construction of Bayesian belief networks, discussed in Chapter 3. One system, created by van der Gaag and Wessels (1994) uses belief networks to diagnose multiple disorders. The distinctive feature of the system is that it utilizes a clustering algorith m to strategically focus on small sets within the domain as a method of improving efficiency. In more recent years, another system cal led HEPAR II was devel oped by Onisko et al. (2000). HEPAR II uses belief networks to diagnose multiple disorders in the field of hepatology.

PAGE 73

73 Onisko et al. (2001) further developed the sy stem by creating a method for building belief networks from a small data set. To accomplish this, they implemented what they refer to as Noisy-OR gates to increase the accuracy of the system. A second approach to diagnosing multiple disorders is case-based reasoning. The advantage of case-based reasoning is that systems can essentially create themselves from historical cases, unlike most complex models including Bayesian networks, that generally require knowledge acquisition from experts (Atzmueller et al., 2004b) It is important to note that case-based reasoning is an approach to system development rather than a method for reconciling uncertainty and proba bilistic dependencies. For this reason, many case-based reasoning systems make use of other methods The ADAPtER system combines case-based reasoning with abductive model based reasoning to diagnose multiple car engi ne faults (Portinale & Torasso, 1995). SONOCONSULT makes use of inductive methods to augment its case-based reasoning and recognize multiple disorders in the field of sonography (Baumeister et al., 2002). Atzmueller et al. (2003a) continued the resear ch on SONOCONSULT by exploring the use of decomposition methods within a case-based syst em. Finally, Atzmueller et al. (2004b) present three approaches to case-based reasoning for the diagnosis of multiple disorders. The approaches presented include compositional case adaptation, where a group of cases is recalled for diagnosis rather than a singl e case, the partition class approach, where domains are divided into independent subsets for diagnosis, and set co vering, which is discussed in Chapter 3. Set covering is a method that seeks to find co mbinations of disorder s that can account for observed symptoms. The simplicity and elegance of the approach makes it one of the most promising areas in research relating to multiple disorder diagnosis. In the 1980s, Reggia and Peng published a large amount of fo undational research on set coveri ng. The paper by Reggia et

PAGE 74

74 al. (1983) is one of the cleares t and most referenced papers in the area of set covering for multiple disorders. The system they propose, however, only holds for the extreme case that might be called complete decomposability (Wu, 1991, p. 240). In later research, Peng and Reggia (1986, 1987) expand on what they refer to as parsimonious covering theory by adding Bayesian calculations to allow for multimembership classification. Parsimonious covering theory is essentially set covering where the simplest solution is considered the best solution. In their 1989 paper that presents further enhancem ents to the system, Peng and Reggia implement the use of comfort measures. The purpose of co mfort measures is to ensure that the system maintains a certain level of quality in the solutions it returns to the user. In the early 1990s, Wu extended the field of set cove ring by developing algorithms to increase efficiency. Wus research primarily centers around decomposing a problem into smaller sub-problems using a clustering algorithm (Wu, 1990, 1991). In other re search, genetic algorithms were applied to generate a set covering system for multi-disor der diagnosis (Vinter bo & Ohno-Machado, 2000) and simple systems were given the ability to incrementally refine themselves, adding complexity as more samples become avai lable (Baumeister et al., 2001). The final approach to be disc ussed is the use of diagnostic scores for the diagnosis of multiple disorders. In particular, this discussi on details the research performed by Atzmueller et al. (2003b, 2004a), as it bears the most resemblance to the system presented in Chapters 5 and 6. Atzmueller et al. (2003b) have implemented a cas e-based system for the diagnosis of multiple disorders in the field of sonography. The system is semi-autom atic, meaning that the system generates its rules automatically but still requires an expert to oversee its development and adjust parameters as necessary to ensure the system functions properly. At zmueller et al. (2003b) believe that understandability and interpretabilit y...is of prime importance and so their system

PAGE 75

75 attempts to apply the same representation the human expert favors by using diagnostic scores (p. 23). Diagnostic scores are a simple approach for risk analysis used in the medical field, discussed earlier in this chapter. Using a case base, the system creates scoring rules, r, of the form: dfrs (4-11) where f represents a finding, such as th e observation of a sign or symptom, d represents the diagnosis related to that finding, and s represents a qualitative m easure of uncertainty with },,,0,,,{321123 SSSSSSs Scores of } ,,{321SSSs represent a positive correlation, where S3 strongly supports diagnosis d and S1 weakly supports diagnosis d Likewise, scores of },,{321 SSSs represent a negative correlation, where S-3 strongly opposes diagnosis d and S-1 weakly opposes diagnosis d When 0 s, no significant correlation is found and the rule is later pruned from the rule set. As defined by Atzmue ller et al. (2003b), four scores from the same category yield the next hi gher score, such that: 21111SSSSS (4-12) 32222SSSSS (4-13) 21111 SSSSS, (4-14) 32222 SSSSS (4-15) Also, any two scores of equal and opposite number cancel, such that: 011SS, (4-16) 022SS, (4-17) 033SS (4-18) A diagnosis d is considered probable if the aggreg ate score is equal to or greater than S3. Note that substituting a score of 1 for S1, 4 for S2, and 16 for S3 makes the system presented here comparable to the diagnostic scoring system pr esented earlier in the chapter, where a final diagnostic score of 16 represents th e cutoff point for a diagnosis bei ng considered highly likely.

PAGE 76

76 To determine the score value that a rule should receive, Atzmuelle r et al. (2003b) use a quasi probabilistic score. The quasi probabilistic score is calculated by a mathematical equation that combines the statistical dependence of a finding with its precision and specificity. The resulting value ranges from -1.0 to 1.0 and is mapped to a corresponding s value. Atzmueller et al. (2003b) are concerned with the balance between accuracy and complexity. For this reason, their system utilizes diagnostic profiles, Pd, defined as: ) ,(F ddfrecFP (4-19) where Fd represents the findings most frequent ly associated with a diagnosis and frecF contains the frequencies of those findings. The frequencies in the diagnostic profile are used to prune less important rules for system efficiency. Further effi ciency is gained through other pruning criteria as well as partitioning the domain using bac kground knowledge provided by an expert in the field. In later work, Atzmueller et al. (2004a) proposed a quality measure equation as a means to measure and determine the appropriate balance between accuracy and simplicity. Conclusion This chapter began by presenting the common mathematical calculations used in the medical field. From there, the discussion move d from historical medical expert systems to systems designed specifically for the field of t oxicology. The chapter concluded by discussing four approaches to diagnosing multiple disord ers. Considerable emphasis was placed on the system created by Atzmueller et al. (2003b) due to its similarity to the system presented in the following chapters. The next chapter details th e development of a system for toxic exposure diagnosis and its performance when diagnosing single exposure cases.

PAGE 77

77 CHAPTER 5 DIAGNOSING SINGLE EXPOSURE CASES The previous chapter presented diagnostic system s for toxicology as well as modern research towards the diagnosis of multiple disorders. This chapter describes the first stage of exploratory research performed us ing data from the Florida Poison Information Center (FPIC) to create a system for diagnosing multiple exposure cas es. The system presented in this chapter is capable of generating a differentia l diagnosis for exposures to a si ngle toxin. The chapter begins by describing the source data, continues by discussing system design principles and development, presents the systems operation with respect to the user interface, and concludes with a discussion of system testing, research results, and, finally, system performance. Source Data Since 1996, the FPIC has collected data on ever y call received and made follow-up calls to obtain additional information about cases referred to hospitals. The collected data is stored in a relational database, consisti ng of tables where each entry in a table is an object with a key that enables relationships to be drawn between ta bles. In 2004 alone, the FPIC received over 120 thousand calls and made more than 43 thousa nd follow-up calls related to human exposures (Florida Poison Information Center Network, 2005). The FPIC database also contains over 65 thousand records of multiple exposures. For th is research, the FPIC provided access to all the cases recorded in its Jacksonville database from 2002 through 2005. The information supplied contains more than 160 thousand toxic exposure cases, with nearly 14 thousand cases involving multiple toxins. To improve data quality, the database records were cleaned so that only cases with clinical effects that were followed to a known outcome remained. The cleaned database contained 30,152 single exposure cases and 7,096 multiple exposure cases, however, the systems training only involved single exposur e cases for this portion of the research.

PAGE 78

78 The database supplied by the FPIC conforms to the Toxic Exposure Surveillance System (TESS) standard. TESS is the older of tw o national standards defined by the American Association of Poison Control Ce nters (AAPCC) to regulate the fields contained within the database of each poison control center (PCC). The newest standard, known as the National Poison Data System (NPDS), was not fully developed at the time of this research. As a result, the system presented here utilizes TESS standard ized data fields. However, using TESS rather than NPDS standards does not affect the system s general design principles because both TESS and NPDS use the same paradigm and record the same set of data. Both are national standards that will enable the system to be expanded to a national level and implemented at various PCCs throughout the country. Both require that the major ity of entries in the database have discrete values, which are easy to process with a comput er program. Most importantly, both contain the observed signs and symptoms, jointly called clinical effects, and the final diagnosis of patients referred to hospitals for treatment. Although the FPIC database is a valuable resour ce, it may contain errors. Patients may lie about the substances they consume or physicia ns and nurses may not fully recount all the important details of a case when reporting to the PCC. Fortunately, these errors can be viewed as random errors. As the case base for the sy stem grows, the incorrect information should become negligible when contributing to system calculations. System Design Principles From the outset, a major objective of th is research was to bypass the knowledge acquisition bottleneck by genera ting a knowledge-based system capable of producing meaningful and useful results without the need for an active, overseeing expert. In de veloping this system to diagnose unknown exposures, certain guiding principles were followed to produce the desired system characteristics, which include simplicity, understandability, automatic system generation,

PAGE 79

79 and incremental updates. Each of these characteristics is disc ussed briefly in the following paragraphs. The characteristic of simplicity is of the utmost importance. Holsheimer et al. (1995) have shown that success in extracting information from databases does not require complex algorithms. In fact, simpler, even trivial, proces ses are better than comp licated ones if they are enough for the job of discovery (Valdes-Perez, 1999, p. 336). Simplicity inherently gives systems several advantages. Generally, systems w ith simple representations and algorithms are more efficient and require less processing power. Simple, linear calculations grant the system scalability, which is extremely important give n the size and continua l growth of the FPIC database. Systems designed with simpler architectur es are often more portable to other systems. Portability is desirable not onl y for aiding other PCCs around the country, but also so that the system approach can be used to solve diagnostic problems in other domains. Finally, simplicity of design gives the system inherent understandability. Not only should the system and its processes be easier to compre hend and implement by other k nowledge engineers, but the solutions yielded by the system should be e xplained in terminology that physicians will understand. The understandability of system results was another chief concern during development. If physicians understand the method by which the syst em obtains its answers, they are more likely to trust the system and use it within the spect rum of its intended purpose. According to Atzmueller et al. (2003b), understa ndability and interpre tability oflearned models is of prime importance and ideally, the le arning method constructs knowledge in the same representation the human expert favors (p. 23). For this reason, the final system design makes use of likelihood ratios. Likelihood rati os are commonly used throughout the medical field and are

PAGE 80

80 discussed in Chapter 4 along with other medical mathematics. After processing, the system presents its results to the user as a differentia l diagnosis. A differential diagnosis is a list of various disorders that can produce similar clinical effects. It is used to determine the most likely cause of a disorder and is a method commonly practiced in the medical fi eld. By using these familiar approaches, physicians should find the system to be relevant, understandable, and easy to operate. Furthermore, the methods used in the systems mathematics are similar to medical case studies seeking to identify patt erns of clinical syndromes. It is believed that this will help the system gain acceptance in the medical field. Automatic system generation is another desirabl e trait. Atzmueller et al. (2003b) state that pure automatic learning methods are usually not good enough to reach a quality comparable to manually built knowledge bases (p. 23). In spite of this deficiency, automatic methods offer certain advantages that should not be overlooked. Automatica lly trained systems fully bypass the knowledge acquisition bottleneck of obtaining information from an expert. An experts time is valuable, and the more processing a system can do without expert input, the more rapidly it can be developed and implemented. Additionally automatically generated system designs can be broadly applicable to solv ing problems, which makes the system significantly more portable than one containing expert input that leads to specialization with in a given field. The system presented in this chapter was gene rated by an engineer with no e xpertise in the area of toxicology and no guidance from toxicologists regarding specific diagnostic approaches. Bypassing the information bottleneck and using a generally applicable, medical solution increases the value of the system as a whole. The final desired attribute of the system is th e ability to perform incremental updates. As the FPIC database grows in size, more valuable information will become available for aiding in

PAGE 81

81 diagnosis. Although the system could recompile all the data from 1996 to the present with every update, such an operation would be inefficient an d could require significant processing time. Rather than beginning anew each update, the syst em can maintain key information about current values and incorporate the information from the latest cases into its calculations. Currently, incremental updates have not been implemented becau se the system is not directly linked to the central database. However, the use of likelihood ratios makes th e implementation of incremental updates a straightforward procedure. To calcula te likelihood ratios, a count of true positives, true negatives, false positives, and false negatives must be determined for each clinical effect. By saving a table of these four values with thei r corresponding substance, the likelihood ratio can be calculated. Updating the system then becomes a simple matter of querying the new data for a count of each of the four values, adding the results to the old table, and recalculating the likelihood ratios. Graefe et al. (1998) presents examples of other information that can be used in incremental updates. Han and Kamb er (2001) also briefly discuss incremental and parallel data mining for the combining of gathered information. System Development The goal of the research presented in this chap ter is to create a system using data mining and knowledge engineering techniques on a data base obtained from the FPIC to aid in the diagnosis of exposures to a si ngle unknown toxin. The system must receive a physicians input in the form of signs and symptoms observed in a patient, process the data, and return a list of the substances that are most likely to induce these clin ical effects. The aim of the system is not to produce infallible results for every case. Rather, the system attempts to give the physician easy access to a refined and organized version of the k nowledge stored in the FPICs vast database. The system offers direction by presenting a di fferential diagnosis of drugs and other toxic

PAGE 82

82 substances that should be considered. Ultimately, the physician makes the final decision regarding the treatment the patient should receive. In generating the system, data mining techniques are used to clean the records and extract the appropriate information from the FPIC database. First, informational calls are removed so that only exposure cases remain. Then, the exposure cases are filtered so that only cases with clinical effects that we re followed to a known outcome remain Although this reduces the size of the dataset to 30,152 single exposure cases and 7,096 multiple exposure cases, the filtering process ensures that only significan t representative cases with the best documentation are used to train the system. Each exposure case has clinical effects associated with it. The clinical effects observed in a patient are rated as either related, unknow n if related, or not related to the substance involved in the exposure. For the purposes of system trai ning, clinical effects that are not related are removed from the database while those that are unknown if related are used for training in the same way as related clinical effects. After extracting and cleaning the cases in the da tabase, a table of prio r probabilities, also known as pre-test probabilities, is calculated for each toxin. A prior probability represents the likelihood of a particular substance being involved given that a toxic exposure has occurred. Prior probability, P, is calculated as: Total Cases P (5-1) where Cases is the number of cases involving a part icular substance and Total is the total number of exposure cases in the database. In addition to prior probabilities, a table of likelihood ratios is calculated. When calculating likelihood ratios, the syst em treats each clinical effect as a diagnostic test that is useful in detecting the presence of a toxic substance. Likelihood ra tios represent the odds that an

PAGE 83

83 observed clinical effect is caused by a particular toxin versus the odds that the clinical effect is the result of exposure to any othe r toxin. The likelihood ratio, LR+, is calculated as: TNFP FP FNTP TP LR, (5-2) where TP represents true positives, TN represents true negatives, FP represents false positives, and FN represents false negativ es. An exhaustive table of likelihood ratios relating every individual clinical effect to ev ery possible substance exposure is the primary resource utilized by the system in creating a differe ntial diagnosis. An advantage of likelihood ratios over many other medical measurements (i.e. sensitivity, specificity, positive and negative predictive values, etc.) is that likelihood ratios can be easily combined through multiplication. Additionally, by including the prior probability, likelihood rati os can account for disorder prevalence. Furthermore, likelihood ratios are easily calcul ated and characterize many cases with a single number, making the system scalable to large databases and ensuring a rapid response time. Although the likelihood ratio has it s advantages, it inherently contains the drawbacks of every mathematical ratio, the possibility of evalua ting to zero or causing a di vide-by-zero error. A likelihood ratio of zero only occurs when 0 TP and may not seem like a problem until we understand how the system calculates combined likelihood ratios. Ever y clinical effect is treated as a test for detecting the presence of a toxic substance. If there are multiple clinical effects, their likelihood ratios are multiplied together to obta in a combined likelihood ratio. If any of the clinical effects has a lik elihood ratio of zero, then the substances combined likelihood ratio also evaluates to zero, regardless of the evidence presen ted by other clinical effects. The problem is that the absence of cases associating a substance with a clinical effect does not mean that the substance absolutely cannot cause th at clinical effect. Furthermor e, even if the substance truly

PAGE 84

84 cannot cause the clinical effect patients may have unassociated clinical effects caused by other ailments. The divide-by-zero error is an obvious problem for any computer system. Looking at Equation 5-2, we can see that if 0 FNTP or 0 FP, the calculation fails. (Note that although 0 TNFP causes an error, addressing 0 FP also prevents that error from occurring.) The sum of true positives and false negatives (FNTP ) is the total number of cases where a particular substance is i nvolved. Due to the structure of the database and its queries, a substance with no recorded cases in the database is ignored and not included as a valid diagnosis in the system. As a result, FNTP never equals zero. The s econd divide-by-zero error, 0 FP, occurs whenever a clinical effect only appear s in the database with an association to one particular substance. As calculated, the likeli hood ratio concludes that since no other substance causes the clinical effect, that substance absolute ly must be the cause, so it divides by zero to obtain an infinite likelihood. In reality, however, no single substance is the only possible cause for any clinical effect in the system. The problem is lack of sufficient data. The divide-by-zero error was encountered during development because the database contains only one instance of fetal death. Although fetal de ath can be caused by any number of substances, the system attempted to conclude that only acetami nophen could cause the death of a fetus. The preliminary system used a simple-minde d approach to solving the multiplication by zero and divide-by-zero problems. Multipli cation by zero was handled by replacing all zero-valued likelihood ratios with a value of one. Although this prevents the system from gaining any knowledge about a substance from a clinical effect not associated with the substance, it prevents that clinical effect from destroying the knowledge gained from other clinical effects. The divide-by-zero error wa s solved by examining the data set and manually

PAGE 85

85 modifying the offending clinical effect records. The likeliho od ratios calculated using the method described in this paragraph are referred to as non-adjusted likelihood ratios from this point forward. Although using non-adjusted likeli hood ratios expedited the res earch process, it introduced significant drawbacks. First, re placing likelihood ratios of zero w ith the value of one ignores the information that could be gained from the cal culation. Likelihood ratios can be fractional, indicating a negative correlation to a substance, and multiplying by zero indicates an infinitely negative correlation. Rather than throwing th e negative association out completely, the zero value might be tempered by using some fracti onal likelihood ratio. S econd, manually removing problematic cases from the databa se violates the impor tant design principle of automatic system generation. To solve these problems, a gene ralized equation was developed to replace the likelihood ratio: TNFP FP FN TP TP LRAdj, (5-3) where TP represents true positives, TN represents true negatives, FP represents false positives, FN represents false negatives, and is a small, positive constant. As discussed in Chapter 4, TP, TN, FP, and FN represent the four possible outcomes of a diagnostic test. By adding to each outcome, the equation states that any of these outcomes is a possibility, even if no supporting cases exist in the database. The end result is a stable equation that closely approximates the likelihood ratio, avoids the difficulties of multiply ing by zero, prevents the divide-by-zero error, and converges to the same value as the likelihood ratio as the number of cases increases. A variety of values were calculated and compared, including 1.0, 0.1, 0.01, and 0.001.

PAGE 86

86 Ultimately, a of 0.01 was selected as it appeared to improve diagnosis signi ficantly while still yielding a suitable substitute for the likelihood ratio. Equation 5-3 with 01.0 is referred to as the adjusted likelihood ra tio from this point forward. The system described is a hybrid system cont aining elements of data mining, case-based reasoning, rule-based systems, and uncertainty ma nagement. Data mining techniques are used to clean and extract relevant in formation from the database. Case-based reasoning methodology makes use of the example cases obtained by da ta mining to develop a system that runs on composite observations. The system calculations ar e essentially a set of simple rules running in parallel with likelihood ratios implemented to ha ndle uncertainty. From the results of these rules, a ranked list is generate d to indicate the most likely subs tances that account for the given signs and symptoms. Moreover, uncertainty management is employed by the use of adjusted likelihood ratios to make system calculations ro bust in the face of database anomalies. System Operation and User Interface As discussed in the previous sec tion, the system utilizes two ta bles of calculations to create a differential diagnosis. The first table contains the prior probabil ities for every substance. The second table consists of likelihood ratios relati ng every individual clinic al effect to every possible substance exposure. When supplied with a set of clinical effects, the system calculates a combined likelihood ratio, including prior pr obability, for every potential single exposure diagnosis. The results are then sorted and pres ented as a differential diagnosis to the user. The user interface reveals more about the functiona lity of the system (Figure 5-1). Clinical effects are grouped into nine categories defined by TESS: car diovascular, dermal, gastrointestinal, heme/hepatic, neurological, ocul ar, renal/GU, respirator y, and miscellaneous. Each group of clinical effects can be viewed by selecting the appr opriate tab from the top of the

PAGE 87

87 interface. In the figure, the gastrointestinal di sorders tab is selected to show the various TESS defined clinical effects associat ed with this category. Three disorders are selected: abdominal pain, dehydration, and diarrhea. More disorders may be selected from other category tabs as well. Figure 5-1. User interface The controls for various system parameters are on the right hand side of the user interface. The Calculate By selection box enables the user to select substance, major and minor categories, or major category. Thus far, we have discussed the resear ch only in terms of diagnosing exposures to a single toxic substance; however, each substance belongs to a minor category, which in turn belongs to a major category. In the sa me manner that the system is trained to diagnose individual substances, it can be trained to diagnose based on major and minor categories or even solely based on major catego ry. Giving physicians a general idea of the drug categories they should consider may prove every bit as valuable as attempting to directly diagnose a substance.

PAGE 88

88 Below the Calculate By selection box is a check box where the user can select to calculate likelihood ratios as non-ad justed or adjusted. As discu ssed in the previous section, non-adjusted calculations replace all likelihood ratios of zero with a one and require the system designer to manually remove anom alies that might cause divide-by-zero errors. The adjusted likelihood ratio, shown in Equation 5-3, makes a s light modification to th e traditional likelihood ratio to create a more robust equation that preven ts the system from failure due to multiplication or division by zero. As shown in Figure 5-1, th e box is checked so that the system calculates the adjusted likelihood ratio. Below the Adjusted check box are numerical values for Minimum Exposure Cases (MC) and Minimum CE Occurrences (MCE). These numbers serve as data filters that are used to eliminate diagnoses and clini cal effects with poor representative sampling sizes. The MC box enables the user to set the mini mum required number of cases for a diagnosis. If a diagnosis does not have at least as many cases in the databa se as the number in the box, the diagnosis does not appear on the results table. The MCE box enables the user to se t the minimum number of times a clinical effect (CE) must appear in the database. If a clinical effect does not appear in the database at least as many times as the number entered, the clinical effect is ignored when calculating the likelihood ratio even if the clinical effect is checked by the user. The last two features of the user interface are the Clear Fields and Diagnose buttons. The Clear Fields button removes all check marks from every clinical e ffect regardless of the selected tab. This enables the user to be sure that the check marks on other tabs have been cleared without having to manually flip through each tab indi vidually. The Diagnose button runs the system program, displaying a differential diagnosis table to the user. Clicking on the Diagnose button with the settings in Figure 5-1 displa ys a table similar to the one shown in

PAGE 89

89 Figure 5-2. The table co ntains the calculated likelihood ratio (LR) on the left and the associated diagnosis on the right. The results in the figure indicate bacterial food poisoning, with a likelihood ratio of 148.9, is by far the most lik ely cause of abdominal pain, dehydration, and diarrhea. The second most likely cause is mushro oms, with a likelihood of 3.32. It should be noted that, although likelihood ratios are helpful fo r indicating the strength of support for various diagnoses, rank on the list is more important. Phys icians should consider many of the substances in the top ten before maki ng their final diagnosis. Figure 5-2. Results table System Testing and Results For testing, the systems prior probab ilities and likelihood ratios were trained on approximately 90% of the cases in the database. After training, the system attempted to diagnose the remaining 10% of the cases us ing only the associated clinical effects. The correct diagnosis for each case was then compared to the systems differential diagnosis and the rank of the correct diagnosis was saved to a summary table. The syst em then retrained on a new set of data and was tested against a different 10% of the database. The training and testing da tasets were determined by the last digit of each case id entification number, ensuring a unique test set every cycle. The process was repeated ten times, completely testin g the system against every case contained in the

PAGE 90

90 database. Throughout the process a large amount of data was gath ered, the results of which are presented in the following paragraphs. The ten-cycle testing process was used to compare the effectiveness of adjusted versus non-adjusted likelihood ratios. Both likelihood ratio calculations were tested at all three diagnostic levels: diagnosing by substance, di agnosing by major and minor categories, and diagnosing by major category alone. Additionally, the settings for the MC and MCE filters were varied to produce multiple points of comparison. While maintaining a constant MCE value of 10, MC was tested at 10, 25, and 100. Likewise while maintaining a cons tant MC value of 25, MCE was tested at 0, 10, and 50. Furthermore, four levels of medical outcomes were tested against the system: all exposures with a minor seve rity or worse, moderate severity or worse, major severity or worse, and a severity level wh ere the outcome was death. These tests yielded sixty resultant sets for both adjust ed and non-adjusted likelihood ratios. After generating these results, th e accuracy of the sixty adjusted sets was compared to the accuracy of the sixty non-adjusted sets. Accu racies were calculated in three ways: the percentage of exposures appeari ng as the top diagnosis, the percen tage of exposures appearing in the top ten diagnoses, and the per centage of exposures appearing in the top 10% of the trained diagnoses. Comparing adjusted accuracies with non-adjusted accuracies, it was determined that adjusted likelihood ratios appear to be a good approximation of non-adjusted likelihood ratios, with adjusted calculations yielding a higher accu racy 90% of the time. Of the 180 accuracy calculations, there were eighteen exceptions where non-adju sted calculations outperformed adjusted calculations. Ten of these exceptions in volved the outcome of death. There are a few explanations for this anomaly. First, there are very few death cases recorded in the database, making it more likely that random variation might favor one system approach over another.

PAGE 91

91 Second, death cases may often disp lay clinical effects that are not normally associated with a particular toxic exposure. The reason is that the systems in the body begin to shut down and extreme failures begin to cause ca scading effects. In such cases, it becomes impossible to reliably compare two diagnostic sy stems. The accuracies of the remaining eight exceptions were within 0.5% of the corresponding adjusted performances. This nominal gain is more than compensated for by the 127 instances where adju sted calculations outperformed non-adjusted calculations on test cases not limited to the outcom e of death. Additionally, a system based on adjusted calculations is much easier to genera te automatically than one based on non-adjusted calculations because it does not require any manua l intervention by the system designer. Having established that the adjusted likelihood ratio is a valid substitute for the traditional likelihood ratio, the remainder of the research results is discussed in terms of adjusted calculations. The next step in system development was to determine the best values for the MCE and MC filters. Beginning with a constant MC value of 25, MCE was varied and tested for values of 0, 2, 5, 10, and 50. For each of these values, th e adjusted system was also tested at the three diagnosis levels of substance, major and minor cat egories, and major categor y alone. Each of the three diagnosis levels yields a system with a significantly different numbe r of trained diagnoses. To enable comparisons between the three dia gnosis levels, the per centage of exposures appearing in the top 10% of the trained diagnoses was used as the accuracy measurement. Table 5-1 shows the accuracy of the system when diagnosing by substance, Table 5-2 when diagnosing by major and minor cat egories, and Table 5-3 when diagnosing by major category alone. Looking at Table 5-1 under minor severity, it can be seen that varying MCE has no effect on the accuracy of the system. Under major seve rity, the accuracy decreases from 77.8% to 77.6%, a negligible change. Likewise, looking at Table 5-2 and Table 5-3 it becomes obvious

PAGE 92

92 that varying MCE causes little to no change fo r minor, moderate, and major severities. Once again, the exception is the severity where the out come is death, which is most likely due to a small sampling size. For example, the 5.1% increa se in accuracy observed in exposures with an outcome of death being diagnosed by major and minor categories is a difference of only four additional cases being diagnosed in the top 10%. Prior to these test s, it was believed that using too low of an MCE cutoff might create falsely hi gh or low likelihood ratios in some substances, decreasing diagnosis accuracy. Howe ver, based on these results, it is reasonable to conclude that filtering by MCE yields negligible changes in system accuracy. Using an adjusted likelihood ratio with = 0.01 already mitigates the potential problem thus, the filter can be removed from the system. Table 5-1. Accuracy by substance in 10% (MC = 25) Minimum CE Occurrences (MCE) 0 2 5 10 50 Severity Minor 64.7% 64.7% 64.7% 64.7% 64.7% Moderate 74.4% 74.4% 74.4% 74.4% 74.2% Major 77.8% 77.8% 77.7% 77.7% 77.6% Death 62.2% 62.2% 62.2% 62.2% 58.1% Table 5-2. Accuracy by major a nd minor categories in 10% (MC = 25) Minimum CE Occurrences (MCE) 0 2 5 10 50 Severity Minor 64.1% 64.1% 64.1% 64.1% 63.9% Moderate 72.7% 72.7% 72.7% 72.6% 72.4% Major 75.4% 75.4% 75.4% 75.1% 75.4% Death 58.2% 58.2% 58.2% 58.2% 63.3% Table 5-3. Accuracy by ma jor category in 10% (MC = 25) Minimum CE Occurrences (MCE) 0 2 5 10 50 Severity Minor 63.9% 63.9% 63.8% 63.8% 63.8% Moderate 70.0% 70.0% 69.9% 69.9% 69.8% Major 70.5% 70.5% 70.4% 70.0% 70.5% Death 55.7% 55.7% 54.4% 54.4% 54.4%

PAGE 93

93 The second filter to be examined was the MC filter. Using adjusted calculations with a constant MCE value of 10, MC was tested for values of 0, 2, 5, 10, 25, 50, and 100. Again, to enable comparisons between the three diagnosis le vels of substance, major and minor categories, and major category alone, the per centage of exposures appearing in the top 10% of the trained diagnoses was used as the accuracy measuremen t. Additionally, since varying MC directly affects the number of trained diagnoses in the system, it was hoped that the 10% accuracy measurement would enable comparisons between systems generated by different MC filter values. Table 5-4 shows the accuracy of the sy stem when diagnosing by substance, Table 5-5 when diagnosing by major and minor categorie s, and Table 5-6 when diagnosing by major category alone. Looking at the accuracies for minor, moderate, and major severities in both Table 5-4 and Table 5-5, it is readily apparent th at accuracy generally appears to decrease as MC increases. Table 5-6 shows the same tendency for MC steps from 10 to 25 and 25 to 100, but appears to plateau for MC values from 0 to 10 and 25 to 50. At first it might appear that using a lower MC yields a more accurate system, and, therefore, the MC filter should be removed. However, such a conclusion fails to account for th e purpose of the MC filter. As MC decreases, more possible diagnoses with less supporting cases are added to the system. As more diagnoses are added to the system, the accuracy calculation based on the top 10% includes substances that are ranked lower on the differential diagnosis. It turns out that th e number of diagnoses that are added to the top 10% outweighs the number of new exposure cas es being tested against the system. As a result, the lower the MC value, the more accurate the system appears. The plateaus observed in Table 5-6 are also accounted for by this explanation because the top 10% of cases evaluates to the same number for MCs of 0, 2, 5, and 10 as well as for MCs of 25 and 50.

PAGE 94

94 Table 5-4. Accuracy by substance in 10% (MCE = 10) Minimum Exposure Cases (MC) 0 2 5 10 25 50 100 Severity Minor 74.4% 72.9% 69.6% 67.4% 64.7% 62.9% 58.9% Moderate 80.3% 79.7% 78.0% 76.6% 74.4% 71.6% 64.8% Major 80.6% 81.7% 81.3% 79.8% 77.7% 75.1% 68.6% Death 62.0% 65.8% 63.3% 62.8% 62.2% 61.2% 46.8% Table 5-5. Accuracy by major and minor categories in 10% (MCE = 10) Minimum Exposure Cases (MC) 0 2 5 10 25 50 100 Severity Minor 71.7% 70.1% 68.9% 67.5% 64.1% 63.0% 58.4% Moderate 79.1% 78.0% 76.9% 75.6% 72.6% 71.8% 66.7% Major 81.1% 80.3% 79.7% 78.5% 75.1% 74.2% 69.2% Death 67.1% 68.4% 68.4% 67.1% 58.2% 59.2% 49.3% Table 5-6. Accuracy by majo r category in 10% (MCE = 10) Minimum Exposure Cases (MC) 0 2 5 10 25 50 100 Severity Minor 68.5% 68.5% 68.5% 68.6% 63.8% 64.1% 60.3% Moderate 73.4% 73.4% 73.4% 73.5% 69.9% 70.2% 66.4% Major 73.9% 73.9% 74.0% 74.0% 70.0% 70.3% 65.2% Death 58.2% 58.2% 58.2% 58.2% 54.4% 56.4% 51.3% Since comparing MC values using an accuracy based on the top 10% of trained diagnoses failed to yield the desired results, a second accuracy measurement was calculated using the correct diagnoses appearing in the top ten slots of the differen tial diagnosis. From a user standpoint, this accuracy measurement is more appr opriate because the list size that a user can process without being overwhelmed is not dependent on the number of trained substances. Table 5-7 shows the accuracy of the system when diagnosing by substance, Table 5-8 when diagnosing by major and minor cat egories, and Table 5-9 when diagnosing by major category alone. Looking at the minor, moderate, and major severity rows in Table 5-7, Table 5-8, and Table 5-9, it can be seen that as MC increases, accuracy also increases. The data tells us little about selecting a value for MC because it indicate s what is expected of any system: As more cases are used to define each substance, system accuracy should increase. Another contributor to

PAGE 95

95 the increase in accuracy is that fewer substanc es are trained as MC increases. With fewer substances, the top ten substances become a la rger portion of the available diagnoses. Even random guessing would experience an increase in accuracy under these circumstances. Table 5-7. Accuracy by substance in 10 (MCE = 10) Minimum Exposure Cases (MC) 0 2 5 10 25 50 100 Severity Minor 41.2% 41.4% 42.1% 43.2% 47.2% 54.4% 71.0% Moderate 50.0% 50.5% 51.4% 52.9% 57.1% 63.2% 76.4% Major 53.4% 54.6% 56.0% 58.2% 62.4% 68.1% 77.7% Death 35.4% 39.2% 41.8% 44.9% 45.9% 50.7% 57.4% Table 5-8. Accuracy by major a nd minor categories in 10 (MCE = 10) Minimum Exposure Cases (MC) 0 2 5 10 25 50 100 Severity Minor 63.0% 63.1% 63.2% 63.4% 64.1% 65.3% 71.0% Moderate 71.4% 71.6% 71.7% 72.0% 72.6% 74.1% 78.1% Major 73.6% 74.2% 74.3% 74.7% 75.1% 76.4% 79.0% Death 58.2% 58.2% 58.2% 58.2% 58.2% 60.5% 61.3% Table 5-9. Accuracy by ma jor category in 10 (MCE = 10) Minimum Exposure Cases (MC) 0 2 5 10 25 50 100 Minor 79.5% 79.5% 79.5% 79.6% 79.8% 80.1% 82.4% Severity Moderate 83.8% 83.8% 83.9% 83.9% 84.0% 84.5% 86.3% Major 83.9% 84.1% 84.3% 84.3% 84.5% 84.7% 85.4% Death 69.6% 70.9% 72.2% 72.2% 72.2% 71.8% 71.8% In an attempt to normalize accuracies, a ratio of the data in Table 5-7, Table 5-8, and Table 5-9 versus the accuracy of diagnosing by random guessing was calculated. However, it was found that the ratio suffered fr om problems similar to the accuracies calculated in Table 5-4, Table 5-5, and Table 5-6. Lowering MC incr eases the number of trai ned diagnoses in the system, adversely effecting random guessing. As a result, the ratio falsely indicated that a lower MC cutoff would yield better resu lts. A second attempt at normalizing the accuracies calculated the ratio of the data in Table 5-7, Table 5-8, a nd Table 5-9 against a system that selected its top ten choices based on prior probabi lities alone. Figure 5-3 shows a graph of the ratio for minor,

PAGE 96

96 moderate, and major severities wh en diagnosing by substance. Li kewise, Figure 5-4 displays the ratio for diagnosing by major and minor categor ies and Figure 5-5 the ratio for diagnosing by major category alone. The graphs indicate that as MC increases, ensuri ng better representative likelihood calculations, the system tends to perfor m better. The increase appears to be almost linear, with perhaps a slight te ndency towards diminishing returns as MC increases. There is no evidence of any breakpoints that would yield a superior MC cutoff. These results indicate that the adjusted likelihood ra tio is performing well and that th e exact value used for MC is unimportant. However, a reasonable MC value of at least ten should be ch osen to ensure that outliers do not excessively influence diagnosis. 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0102030405060708090100 Minimum Exposure Cases (MC)Accuracy Ratio (MC/Prior) Minor Moderate Majo r Figure 5-3. Accuracy ratios by substance 1.0 1.2 1.4 1.6 1.8 2.0 2.2 0102030405060708090100 Minimum Exposure Cases (MC)Accuracy Ratio (MC/Prior) Minor Moderate Majo r Figure 5-4. Accuracy ratios by major and minor categories

PAGE 97

97 1.0 1.1 1.2 1.3 1.4 1.5 0102030405060708090100 Minimum Exposure Cases (MC)Accuracy Ratio (MC/Prior) Minor Moderate Majo r Figure 5-5. Accuracy ratios by major category Comparing Figure 5-3, Fi gure 5-4, and Figure 5-5, it can be seen that the slopes and the ratios are higher for diagnosis by substance than diagnosis by major and minor categories, which in turn are higher than diagnosis by major categories. The reason is that the number of diagnoses trained for diagnosing by substa nce (around 200 to 600) is signifi cantly more than diagnosing by major and minor categories (around 100 to 200) which is more than diagnosing by major category alone (around 50 to 60). With more possible diagnoses, the problem becomes more difficult to diagnose in the top ten without intell igence. Thus, the systems performance ratio improves as more substances are added. Additiona lly, the curves indicate that the system scales well to a large number of diagnoses since the ratios and slopes increase as the available diagnoses increase. Anothe r notable characteristic of the curves is that they indicate that the system performs better the more severe the case. The primary reason is that more severe cases generally have more associated clin ical effects. With more clini cal effects, the system has more information to properly differentiate between va rious diagnoses, yielding a higher accuracy. The good news is that the most important cases are the most severe case s, and this is precisely where the system performs best.

PAGE 98

98 Table 5-10 displays a represen tative chart of system performance with MC = 10 and MCE = 0. To enable comparison between the va rious forms of diagnosis, the percentage of exposures appearing in the top 10% of trained diagnoses is used as the accuracy calculation. Table 5-10 reiterates the fact th at the system performs better the more severe the case. Once again, death is the exception due to limitations in the data and system failures in the body leading to cascading clinical effects. Moreover, the difficulties associat ed with the cases involving death make it futile to discuss trends for that severity For major and moderate severities, diagnosing by substance performs best, followed by majo r and minor categories and finally by major category alone. The converse is true for minor severity cases, where diagnosing by major category alone performs best. T hough not universally observed in the test runs, this accuracy inversion is not uncommon and is mo st likely due to the lack of clinical effects in minor severity cases. With minimal clinical effects it is easier to classify the general major category of a toxin than to identify the specific toxin involved. Table 5-10. Accuracy in 10% with MC = 10 and MCE = 0 Diagnosis by: Substance Major & Minor categories Major category Severity Minor 67.4% 67.5% 68.6% Moderate 76.6% 75.7% 73.5% Major 79.8% 78.9% 74.8% Death 62.8% 67.1% 59.5% The accuracy calculations in Table 5-10 show a high value of 79.8%, which occurs when diagnosing major severity cases by substance. These accuracy calculations include a large number of cases involving only a single clinical effect, which would be difficult for even the most experienced expert to di agnose without additional informa tion. To better demonstrate system functionality, the accuracies from Table 5-10 are recalculated in Table 5-11 to include only cases with at least three recorded clinical effects. A large improvement in system accuracy

PAGE 99

99 is observed, particularly in minor severity cases where accuracies are boosted from the 60% range into the mid-70% range. Additionally, th e accuracy of diagnosing major severity cases by substance and by major and minor categories is raised above 80%. Further system improvements could be achieved by removing us eless categories, such as the unknown drug diagnosis, and consolidating ne arly redundant substances, su ch as aspirin: pediatric formulation, aspirin: unknown if adult or pediatric formulation, and aspirin: adult formulation. However, one purpose of the resear ch presented here is to bypass the need for expert input when generating a system and making such improve ments would assume knowledge in the domain of toxicology. Table 5-11. Accuracy in 10% with MC = 10, MCE = 0, and 3+ CEs Diagnosis by: Substance Major & Minor categories Major category Severity Minor 75.1% 74.6% 74.0% Moderate 78.4% 77.2% 75.0% Major 81.0% 80.5% 75.9% Death 69.4% 71.4% 66.7% Further credence to the systems viability wa s subjectively given by two toxicologists at the FPIC who experimented with the systems us er interface. The toxicologists found the top diagnosis to be reasonable fo r every input given to the syst em as well as several other appropriate diagnoses listed in the top ten. Considering th e purpose of the system, as an automatically generated toxicology consultant, and the intentional simp licity of the system design, the resulting accuracies and positive reac tions by toxicology experts confirm that the research performed to create this system was a success. System Performance An important aspect of system usability is the amount of processing time required to train the system and the response time of the user inte rface to diagnostic queries. System calculations

PAGE 100

100 were intentionally kept simple to enable scalab ility, rapid system generation, and a low response time. For research purposes, the system was developed using Microsoft Access 2002 on a Compaq Presario 2100 laptop with a 2.4GHz pr ocessor and 320MB of RAM. Training the system on four years of data took just under three minutes. Running a diagnosis under worst case conditions takes approximate ly three seconds when the program is first queried. Once loaded into RAM, however, the diagnosis runtime is cut in half. Obvi ously, porting the program to the dedicated SQL server used by the FPI C would offer further speed improvements. As the number of cases in the system incr eases, system training time could increase significantly, though there should be a minimal impact on diagnosis time due to the architecture of the system. The section in this chapter en titled System Design Principles, discusses a method for incremental updates that enables the syst em to retain its training from previous years and simply adjust the systems calculations based on new data. Based on the performance measurements taken, it is reasonable to expect that the system could be trained rapidly by a central database server. In the future, traini ng results could be downloaded to applications on handheld personal digital a ssistants without significant loss of usability. Conclusion This chapter has presented the research and development of a system capable of generating a differential diagnosis for exposures to a single t oxin. First, we discu ssed the source data and guiding system design principles of simplicity, u nderstandability, automatic system generation, and incremental updates. Next, the theory behind system development was explained. Finally, the system operation, user interface, system resu lts, and system performance were presented. The system presented here serves as a foundation for the multiple exposure research presented next chapter.

PAGE 101

101 CHAPTER 6 DIAGNOSING MULTIPLE EXPOSURE CASES The previous chapter presented the developm en t of a system for diagnosing exposures to a single toxin. The resulting system serves as a foundation for the multiple exposure research discussed in this chapte r. Although system development did not proceed as expected, the results reveal intriguing insights into the diagnosis of mu ltiple exposures in the field of toxicology. The chapter begins with the motivation for developing the system, continues by briefly describing the system approach, discusses the results of dia gnosing multiple disorders using various sets of training data, presents the resear ch conclusions, and closes with a discussion of future work. Motivation for Diagnosing Multiple Exposures Although many established methods for designing knowledge-based systems exist, as discussed in Chapter 3, none have fully solved the problem of diagnosing multiple disorders. The difficulty is that multiple disorder cases can display non-linear interactions, a problem observed in the field of toxicology. When simultaneously present in the body, toxins can interact antagonistically or synergistically, masking or otherwise altering the si gns and symptoms that would normally appear for each individual exposu re. Little documentation exists for the majority of toxic exposure combinations that can occur and only a limited number of systems exist in the field of toxicology that attempt to account for multiple exposures in some way, as covered in Chapter 4. None of these system s fully solve the problem, nor are they readily available for use by American toxicologists. Beyond the motivation of developing technology that addresses an unsolved diagnostic problem, the more important concern of savi ng lives is at stake. The Toxic Exposure Surveillance System (TESS) repor t states that in 2004 .6% of fatal cases involved 2 or more drugs or products (Watson et al., 2004, p. 593). This statistic makes it plain that timely and

PAGE 102

102 accurate identification of exposures involving multiple substances is extremely important. For the sake of advancing the field of information e ngineering as well as the preservation of life, the problem addressed by the research in this chapter is both relevant and important. System Approach Chapter 5 presents the development of a system for diagnosing exposures to a single toxin. That system serves as a foundation for the diagno sis of multiple exposure cases discussed in this chapter. Like the single exposure system, the goal of the multiple exposure system is to serve as a consultant by producing differe ntial diagnoses based on the c linical effects supplied by the user. Unless otherwise noted, the training and testing procedures fo r the system described in this chapter conform to the fo llowing characteristics: The clinical effects and substance id entifiers are based on TESS standards. Prior probabilities a nd adjusted likelihood ratios with a of 0.01 are used to determine the differential diagnoses, see Equatio ns 5-1 and 5-3 for details. The system is tested at three diagnostic levels: diagnosing by substance, diagnosing by major and minor categories, and diagnosing by major category alone. Three levels of medical outcomes are tested against the system: exposures with a minor severity or worse, moderate severity or worse, and major severity or worse. (Note that testing solely based on exposures resulting in de ath is not included due to the inaccuracies discussed in Chapter 5.) A minimum exposure cases (MC) value of 10 and minimum clinical effect occurrences (MCE) value of 0 serve as the cutoffs for testing the system. Accuracies are calculated as th e percentage of test exposures identified correctly in the top 10% of the trained diagnoses. For this research, the Florida Poison Informati on Center (FPIC) provided access to all the cases recorded in its Jacksonville database from 2002 through 2006. With the addition of a fifth year, the cleaned database used for system generation now contains 37,617 sing le exposure cases and 8,901 multiple exposure cases.

PAGE 103

103 Diagnosing Multiple Exposures using Solely Multiple Exposure Cases During the initial phase of testing, all multiple exposure cases are extracted from the database. TESS standards require that each substa nce involved in a toxic exposure be assigned a sequence number that ranks the substance in acc ordance with its relative contribution to the observed clinical effects. To simplify the initial attempts to diagnose multiple disorders, only the primary and secondary contributors in each multiple exposure case are considered. TESS standards also require that substances be recorded by a product spec ific code as well as a generic substance code. From this requirement, a prob lem arises. When determining the number of substances involved in an exposure, the FPIC database uses the product specific code. As a result, two products marketed by different companie s are listed as separate substances, even if their active ingredient is the same When cleaning the data, if th e generic substance codes for the top three contributing substances are identical, the case is removed from the dataset. If the first two generic substance codes are id entical but the third is different, the third substance is treated as the secondary contributor for the case. Finall y, the multiple exposure cases are filtered so that only cases resulting in minor effect s, moderate effects, major effects, or death are used to train and test the system. The cleaned dataset contains 8,901 multiple exposure cases. When generating the multiple exposure system, each pair of primary and secondary contributors is trained in dividually as a single diagnosis. Prior probabilities, adjusted likelihood ratios, and both MC and MCE filters are calculated and implemented in the same manner as discussed in Chapter 5. Tes ting also follows the same pro cess of training the system on approximately 90% of the cases and then attempti ng to diagnose the remaining 10% of the cases. By repeating the process ten times, the system is completely tested against every case in the database. Finally, the results are combined and accu racies calculated as th e percentage of test exposures identified corr ectly in the top 10% of the trained diagnoses.

PAGE 104

104 The original results from training and testing the system on multiple exposure cases are displayed in the first column of Table 6-1. W ith an accuracy ranging from 28.3% to 50.1%, the systems deplorable performance is painfully obvi ous. To further explore the failure, the system was tested for MC values of 15, 20, and 25. Th e results of these tests show a similar lack of accuracy (Table 6-1). Looking at the rows in the table from left to right, we can see that the performance gradually decays as MC increases. As discussed in Chapter 5, such an observation is expected due to the MC cutoff lowering the nu mber of diagnoses included in the top 10%. The most interesting characteristic of the data in Table 6-1 is that as the severity increases, the accuracy decreases. This observatio n is contrary to the results observed in the single exposure system. Normally, the systems accuracy increa ses with severity because more severe cases contain more clinical effects, making diagnosis easier for the system. Table 6-1. Accuracy (varying MC) of syst em trained & tested on multiple exposures Minimum Exposure Cases (MC) Diagnosed by Severity 10 15 20 25 Substance Minor 33.5% 30.4% 29.0% 27.6% Moderate 30.0% 26.9% 25.3% 22.9% Major 28.3% 23.3% 21.8% 18.5% Major & Minor Minor 47.3% 43.6% 39.5% 38.2% categories Moderate 45.9% 42.1% 37.6% 36.5% Major 37.6% 34.5% 30.9% 30.6% Major Minor 50.1% 46.8% 45.7% 43.4% category Moderate 47.2% 44.1% 43.0% 40.4% Major 43.0% 39.6% 38.2% 36.5% Average 40.3% 36.8% 34.5% 32.7% There are a number of plausible explana tions for why accuracy might decrease with severity, but two are particularly compelling. The first explanation is that the decrease in accuracy is caused by the non-linear interactions be tween multiple toxins. As the severity of an exposure increases, there is greate r opportunity for a combination of toxins to produce effects not normally associated with any of the toxins indi vidually. This could lower the accuracy of the

PAGE 105

105 system because the clinical effects would beha ve more erratically and might not correspond to the majority of cases. The second explanation is that the decrease in accuracy is simply caused by lack of quality data. As the severity cutoff becomes more stringent, fewer cases are tested against the system, leading to a poor sampling and quite possibly lower accuracies on average. Lack of quality data could account for both the low accuracy observed overall as well as the decrease in accuracy as the severity increases. Another parameter that might contribute to the systems poor accuracy is the parameter implemented in the adjusted likelihood ratio equation, see Equation 5-3. The parameter is meant primarily to safeguard against multiply -by-zero and divide-by-zero errors, however, a small training set might cause to adversely influence the diagnostic results. Table 6-2 compares the original system accuracy, when using a of 0.01, to accuracies calculated with a of 0.1 and 0.001. It was discovered that increasing to 0.1 causes an average decrease in accuracy of 1.6%, while decreasing to 0.001 causes an average increase in accuracy of only 0.1%. These results imply that a of 0.01 yields satisfactory relative performance compared to other parameters that might be selected. Table 6-2. Accuracy (varying ) of system trained & tested on multiple exposures Diagnosed by Severity Substance Minor 32.3% 33.5% 33.5% Moderate 29.0% 30.0% 29.8% Major 26.4% 28.3% 28.0% Major & Minor Minor 46.5% 47.3% 47.4% categories Moderate 44.6% 45.9% 46.1% Major 35.0% 37.6% 38.3% Major Minor 49.4% 50.1% 50.2% category Moderate 46.1% 47.2% 47.2% Major 39.5% 43.0% 43.4% Average 38.8% 40.3% 40.4%

PAGE 106

106 In an attempt to improve accuracy and better understand the systems poor performance, a number of system variations were tested. The resulting accuracies for these systems are presented in Table 6-3, where the column labele d original accuracies represents the original system. The first column of accuracies displays the results for a system that assumes all trained diagnoses are equally likely. As expected, th e system performs wors e than the original. However, the results of this test do reveal a few important insights. Note that, unlike the original, the accuracies for diagnosis by substanc e as well as major and minor categories increase as severity increases. The significance of this obse rvation is that the system is indeed processing clinical effects correctly. T hus, the accuracies decreasing with increased severities in the original testing are not due to the non-linear in teractions of multiple substances. Rather, the results imply that the prior probability is domin ating the original diagnoses. The most likely cause for this problem is lack of quality data Additionally, the fact that diagnosis by major category alone still displays a decreasing accuracy w ith increasing severity fits the explanation. Major categories cover a broad variety of substa nces, making it difficult to train a general model that properly fits the major category as a whol e. The problem is compounded when attempting to identify two different major cat egories in the same diagnosis. Table 6-3. Accuracy comparison of various systems for multiple exposure diagnosis Diagnosed by Exposure severity No prior probability Original accuracies Double exposures Order reversed Primary correct Substance Minor 16.5% 33.5% 35.3% 42.4% 64.8% Moderate 17.5% 30.0% 30.9% 40.3% 63.0% Major 23.9% 28.3% 28.5% 39.8% 63.7% Major & Minor Minor 21.7% 47.3% 47.1% 54.0% 82.7% categories Moderate 23.0% 45.9% 45.1% 53.7% 82.9% Major 23.5% 37.6% 42.3% 49.4% 81.2% Major Minor 24.2% 50.1% 50.8% 56.0% 81.3% category Moderate 23.7% 47.2% 47.1% 54.5% 81.5% Major 22.7% 43.0% 42.0% 53.3% 80.9% Average 21.9% 40.3% 41.0% 49.3% 75.8%

PAGE 107

107 Another issue that could contribute to the lo w accuracy of the system is that multiple exposure cases can consist of more than two s ubstances. Since the system only considers the primary and secondary contributors any additional substances involved could affect the clinical effects in a manner not normally predicted in a case only involving two substances. To improve the quality of the training data, a system was created based solely on cases where exactly two substances are involved. The system accuracy is reported in Table 6-3 under the column titled double exposures. Although this approach improve s data quality, it also reduces the amount of training cases from 8,901 to 5,149, a data re duction of over 40%. The end results yield a nominal increase in the aver age accuracy of only 0.7%. Further attempts to improve accuracy resulted in two more variations of the system. The original system requires the corr ect identification of both primary and secondary contributors for a diagnosis to be considered successful. The firs t variation relaxes the cons traints of the original system by allowing the order of the primary a nd secondary contributi ng substances to be reversed. Thus, diagnosing a te st case with a primary cont ributor of A and a secondary contributor of B as having a pr imary contributor of B and a s econdary contributor of A is considered an accurate diagnosis. As seen in Table 6-3 under the column labeled order reversed, the relaxed diagnos is criteria increase accuracy by an average of 8.9%. Unfortunately, the resulting system is still not viable, having only achieved a maximum accuracy of 56.0%. The second variation on the origin al system attempted to improve accuracy by allowing the system to count any diagnosis as a correct match if the primary contributor matched the primary contributor of the te st case, regardless of the secondary contributors involved. As shown in Table 6-3 under the column labeled pri mary correct, this in creases the systems accuracy drastically, yielding a maximum accuracy of 82.9%. It should be noted that these

PAGE 108

108 results are falsely optimistic because the most common substances involved in multiple exposures are the primary contributors for many different substance combinations. As a result, a number of different possible di agnoses could be considered c orrect diagnoses for any single test case. Additionally, diagnosing multiple exposures by substance has a maximum accuracy of 64.8%, which is not an outstanding number. In sp ite of these shortcomings, the final system test seems to indicate that the primary contributor might be the dominating force in most multiple exposure cases. For that reason, the research presented in the following section focuses on diagnosing the primary contributor. Diagnosing Multiple Exposures with Single Exposure Cases The findings in the previous section seem to indicate that the clini cal effects observed in most multiple exposure cases are dominated by th e signs and symptoms associated with the primary contributor. To determ ine the truth of the matter, a system trained entirely on single exposures was tested to see if it could accurate ly diagnose the primary contributor in multiple exposure cases. The first column of Table 6-4 shows the accuracy of the system when diagnosing the primary contributor for every mult iple exposure case. The next column shows the results when the test cases are limited to doubl e exposures. With accuracies reaching as high as 84.9%, the results confirm that the clinical effects observed in most multiple exposure cases are indeed dominated by those associated with the primary contributor. Furthermore, the evidence indicates that the poor performance obser ved in the system trained solely on multiple exposure cases was not due to non-linear interacti ons between multiple toxins. As discussed in the previous section, the remaini ng explanation for the system failu re is lack of sufficient data.

PAGE 109

109 Table 6-4. Accuracy diagnosing prim ary contributors using single exposures1 Singles Singles Combined Combined diagnosing diagnosin g diagnosing diagnosing Diagnosed by Severity multiples doubles multiples doubles Substance Minor 75.4% 75.2% 79.1% 77.5% Moderate 77.2% 77.3% 81.1% 79.3% Major 78.7% 81.8% 83.5% 83.1% Major & Minor Minor 77.8% 76.4% 80.4% 78.2% categories Moderate 81.3% 80.4% 83.3% 81.6% Major 84.9% 84.9% 86.9% 86.2% Major Minor 74.4% 74.9% 77.7% 76.9% category Moderate 75.5% 76.2% 78.7% 78.5% Major 75.8% 78.3% 79.9% 80.5% Average 77.9% 78.4% 81.2% 80.2% To test whether lack of data caused the poor performance observed in the system trained solely on multiple exposures, a system was trai ned using a combination of multiple exposures and single exposures to diagnose the primary c ontributor in multiple exposure cases. For training purposes, each multiple exposure was treat ed as a single exposure case with the primary contributor as the correct diagnosis. All single exposure cases were used for training along with approximately 90% of the multiple exposure cases. The remaining 10% of the multiple exposures were tested against the system to see if it could identify the primary contributor. The training and testing was repeated ten times to thoroughly test the system against every available multiple exposure case. In a similar manner, a system trained on a combination of double exposures and single exposures was tested to see if it could iden tify the primary contributor in double exposure cases. The results of these two tests are displayed in the last two columns of Table 6-4. On average, the accuracy increased by 3.3% when diagnosing multiple exposures and 1.8% when diagnosing only double e xposures. These results indicate that valuable information 1 To enable maximum comparability, minor restrictions were in stated to ensure that all test runs within the same diagnosis level contained exactly the same number of trained substances on every test cycle. Explicitly, there were exactly 431 possible diagnoses for diagnosing by substance, 129 possible diagnoses for diagnosing by major and minor categories, and 60 possible diagnoses for diagnosing by major category alone.

PAGE 110

110 capable of yielding greater than 80% accuracy is contained in the multiple exposure cases. Moreover, these results are consistent with the e xplanation that the system failure when training on multiple exposures alone was due to lack of suffici ent data. It is also interesting to note that the system performed slightly better diagnosi ng multiple exposures, which generally should contain more extraneous clin ical effects, than when di agnosing double exposures. The explanation is that training with multiple exposures included the information from approximately 8,011 cases per diagnosis cycle, whereas, training with double e xposures included approximately 4,634 cases per diagnosis cycle. Presumably having the same number double exposures as multiple exposures would result in the doubl e exposures performing better. A similar observation can be made of the data presented in Table 6-5. Table 6-5. Accuracy diagnosing seconda ry contributors us ing single exposures2 Singles Singles Combined Combined diagnosing diagnosin g diagnosing diagnosing Diagnosed by Severity multiples doubles multiples doubles Substance Minor 69.6% 68.6% 77.6% 75.7% Moderate 70.5% 69.5% 79.7% 77.6% Major 69.5% 70.0% 81.6% 77.0% Major & Minor Minor 67.8% 63.2% 78.3% 76.8% categories Moderate 73.0% 69.5% 82.4% 80.9% Major 77.6% 76.2% 86.2% 83.9% Major Minor 62.1% 57.1% 71.4% 69.0% category Moderate 64.4% 59.7% 72.9% 70.2% Major 67.4% 63.0% 74.3% 69.6% Average 69.1% 66.3% 78.2% 75.6% The first two columns in Table 6-5 display the accuracies of a system trained solely on single exposure cases and tested against the s econdary contributor for both multiple and double disorder cases. With average accuracies of 69.1% and 66.3%, the system performance is not 2 To enable maximum comparability, minor restrictions were in stated to ensure that all test runs within the same diagnosis level contained exactly the same number of trained substances on every test cycle. Explicitly, there were exactly 431 possible diagnoses for diagnosing by substance, 129 possible diagnoses for diagnosing by major and minor categories, and 60 possible diagnoses for diagnosing by major category alone.

PAGE 111

111 stellar, however, it is high enough to raise a quest ion: If the clinical effects in multiple exposure cases are dominated by the primary contributor, w hy is the accuracy in diagnosing the secondary contributor so high? Recall that during data cleaning all multiple exposure cases involving only products with the same generic substance code are removed from the dataset. This cleaning is only performed at the substance level. It is st ill likely that many multiple exposure cases consist of primary and secondary substances that share the same major and minor categories. Belonging to the same category makes it much more likely th at the two substances exhibit similar clinical effects. Examining the data, it was determin ed that 21.0% of the primary and secondary contributors in all multiple exposure cases bel onged to the same major category and 11.6% belonged to the same minor category as well. Likewise, 21.9% of all primary contributors in double exposure cases belonged to the same ma jor category and 11.1% belonged to the same minor category. Because these cases are more likely to be diagnosed correctly based on the primary contributor, the accuracies are falsely optimistic. The last two columns in Table 6-5 show the accuracies of a system trained on a combination of single exposures and the secondar y contributors for either multiple exposures or double exposures. The addition of the seconda ry contributors improves the average system accuracy by 9.1% for multiple exposure diagnosis and 9.3% for double exposure diagnosis. Such a significant jump in accuracy attests that although dominated by the primary contributors clinical effects, secondary cont ributors do produce enough clinical e ffects that the system can be trained to at least recognize the most common multiple exposure combinations. Although some of the accuracy can be accounted for by prior pr obabilities, the results give hope that further research might enable reasonably accurate identification of secondary contributors.

PAGE 112

112 The final step necessary to fully explore th e impact of combining multiple exposure cases with single exposure cases was to train a system with the combined data and use it to diagnose only single exposure cases (Table 6-6). The first column shows the accuracy of a system trained on single exposures alone when diagnosing single exposures. The second and third columns display the accuracies for systems trained on single exposures along with the primary contributors for either multiple or double exposures. The last two columns contain the accuracies of systems trained on single exposures along with the secondary contributors for either multiple or double exposures. Interestingly, those systems trained with the primary contributors increased the average system accu racy from 74.6% to 74.9% when including multiple exposures and 75.1% when including double exposures. Although a minor increase, it is an increase nonetheless and lends further support to the conclusi on that the clinic al effects in multiple exposure cases are dominated by the pr imary contributor. Furthermore, the average accuracy for systems trained with secondary cont ributors decreased from 74.6% to 74.2% when including multiple exposures and 74.4% when including double exposures. A lower accuracy is to be expected since training on the secondary contributor would a ssociate clinical effects caused by the primary contributor with the secondary contri butor instead. The minimal change in accuracy can be partially explained by the multip le and double exposures that involve closely related substances from the same major and minor categories, as discussed above. Additionally, on average 33,855.3 single exposure cases were used to train the system on each cycle. The added 8,901 multiple exposure cases or 5,149 double exposure cases only account for approximately 20.8% and 13.2% of the training cases.

PAGE 113

113 Table 6-6. Comparison of system accura cies when diagnosi ng single exposure cases3 SingleSingles & Singles & Singles & Singles & exp Multiples Doubles Multiples Doubles Diagnosed by Severity alone (primary) (primary) (secondary) (secondary) Substance Minor 68.3%68.2% 68.4% 68.1% 68.2% Moderate 77.5%78.2% 78.0% 77.4% 77.4% Major 80.7%81.4% 81.4% 80.6% 80.8% Major & Minor Minor 69.0%68.9% 69.0% 68.6% 68.8% categories Moderate 77.6%77.7% 78.0% 77.2% 77.5% Major 79.8%80.6% 81.0% 80.6% 80.3% Major Minor 68.8%68.4% 68.9% 67.6% 67.9% category Moderate 73.9%74.3% 74.3% 73.4% 73.3% Major 76.2%75.9% 76.8% 74.7% 75.0% Average 74.6%74.9% 75.1% 74.2% 74.4% Conclusions This dissertation presents research perfo rmed to create a prototype knowledge-based system for diagnosing toxic exposures. A major goal of the research is to bypass the knowledge acquisition bottleneck of trad itional knowledge-based systems by using data mining to automatically generate the system. Because system generation assumes no knowledge about the field of toxicology, lower accuracy percentages ar e to be expected; however, future research can build off this foundation and intelligently modify substance groupings to improve performance. Another important aspect of the system is the us e of adjusted likelihood ra tios. Likelihood ratios are mathematical calculations that are commonly known and used throughout the medical field. In this research, traditional likelihood ratios are adjusted by adding a fr actional possibility to every potential outcome. The result is a r obust equation that mitigates multiply-by-zero and divide-by-zero errors while rapidly converging to the same value as non-adjusted likelihood ratios. Ultimately, the system is intended to serve as a diagnostic consultant by providing 3 To enable maximum comparability, minor restrictions were in stated to ensure that all test runs within the same diagnosis level contained exactly the same number of trained substances on every test cycle. Explicitly, the average number of possible diagnoses for each testing cycle was 414.4 when diagnosing by substance, 125.6 when diagnosing by major and minor categories, and 58.1 when diagnosing by major category alone.

PAGE 114

114 differential diagnoses for toxic exposure cases based on observed clinical effects. The system enables physicians to tap into the knowledge stor ed in poison control ce nter databases, giving decision support information in a simple, understandable format. Chapter 5 presented the development of the system and its subse quent testing on single exposure cases. The research explored the effect s of two different filters for refining diagnosis based on a minimum number of exposure cases and a minimum number of clinical effects. System accuracy reached as high as 79.8% and increased above 80% when test cases were required to involve more than one clinical effect. Furthermore, the user interface and system operation received a positive response from two toxicologists and the diagnostic process was found to be simple and fast enough to make implementation on personal digital assistants (PDAs) a reality. Chapter 6 continued the research by applying the system approach to multiple exposure cases. Although initial tests yielde d a poor performance, further ex amination determined that the low accuracy was primarily due to lack of multiple exposure training cases. Further testing revealed that the clinical effects observed in multiple exposures tend to be dominated by a single substance called the primary cont ributor. Systems generated fr om a combined training set of both single exposures and primary contribu tors from multiple exposure cases yielded performances as high as 86.9% accuracy when diagnosing primary contributors. More specifically, 86.9% of the cases were diagnosed in the top 13 out of 129 possible major and minor category combinations. The research performed on this system offers a number of contributions to both the field of knowledge-based systems and medicine. First, being automatically ge nerated, the system bypasses the knowledge acquisition bottleneck of tr aditional knowledge-based systems. Second,

PAGE 115

115 the system implements an approach to the uns olved problem of diagnosing multiple exposures. Although lack of data inhibited the diagnosis of more than one substance at a time, the system demonstrates effective diagnostic capabilities in identifying the primary contributor in multiple exposure cases. Being able to diagnose the disord er causing the most detrimental clinical effects is certainly valuable. Once the primary contribu tor is treated, it becomes easier to identify the other contributors in a multiple exposure case. Furthermore, there is hope that with the collection of more data the accuracy when simultaneously diagnosing multiple exposures will improve. A third contribution is th e application of intell igent systems to the field of toxicology. At the present time, no American diagnostic system s exist for the field of clinical toxicology. Although systems have been implemented for Fran ce, Bulgaria, and Russia, they use different methods and are not readily availa ble to assist American physicians Finally, the creation of the adjusted likelihood ratio serves as a method to bridge the gap betw een intelligent systems and the medical field. Too often, intelligent systems fa il because they use methods that are unknown and distrusted by the medical community. The adjusted likelihood ratio utilizes mathematics commonly accepted in medicine with a slight mo dification that creates a robust calculation without losing the essence of the original equation. Future Work The system presented in this dissertation is a prototype. Although the results show great promise, there is much to be done before a final system can be implemented in the real world. Recently, poison control centers (PCCs) around the United States have converted their databases from TESS standards to a new system known as the National Poison Data System (NPDS). To enable long-term growth and development of the knowledge-based consultant, the system must also be converted to the NPDS sta ndard. Additionally, more data must be acquired through a petition to the FPICs in Tampa and Miami and a proposal written to the national

PAGE 116

116 repository. With more data in hand, the sy stem can be thoroughly tested for diagnosing secondary contributors, both indivi dually as well as in combina tion with primary contributors. From the outset, a major objective of the re search was to bypass th e knowledge acquisition bottleneck by generating a knowledge-based system capable of producing meaningful and useful results without the need for an active, overseeing expert. This design prin ciple inherently limited the designer from making any changes that required even a fundamental knowledge of toxicology. Now that the prot otype is complete, several chan ges can be implemented for the betterment of the system. Fi rst, useless substance diagnos es, such as the unknown drug diagnosis, should be removed. Second, redundant substances, such as aspirin: pediatric formulation, aspirin: unknown if adult or pediatric formulation, and aspirin: adult formulation, should be consolidated into a si ngle diagnosis. Third, the category divisions should be examined by a toxicologist to create gr oupings based primarily on clinical effects. For example, most opioids tend to exhibit similar clinical effects, whereas, the effects associated with spider bites vary greatly depending on the species of spider. Inte lligently restructuring diagnosis groupings could greatly increase the accuracy and utility of the knowledge-based consultant. After refining the system, the next step is to field test the system in a PCC. Based on these results, further improvements can be implemente d. One possible concern is that, although the system may perform well on toxic exposure cases as a whole, it may be more beneficial for the system to specialize on more difficult and deadly pr oblems. In other words, it may be better to sacrifice accuracy on simple, routine exposures to increase the accur acy of the system on exposures that are dangerous a nd difficult to diagnose. Once the system is fully tested, there will be freedom to expand in various directions. The general system approach could be applied to other domains, particularly those in the medical

PAGE 117

117 field. The consultant could be implemented as a program on a PDA that physicians can carry with them at all times. Beyond simply diagnosin g disorders, the system could be expanded by the addition of recommended treatments for each type of exposure. Once a physician makes a diagnosis, the program could serve as a reference for the treatment of the patient. Finally, the system could be converted into a program for knowledge discovery within toxicology. When training on cases in the database, the system iden tifies relationships between specific exposures and their clinical effects. While many of th ese relationships are alr eady known, it is quite possible that the system is discovering new rela tionships that were previously undocumented. This is particularly true when characterizing multiple exposure cases, many of which have little documentation. Examining the relationships within a trained system could lead to new discoveries in the field of toxicology.4 4 For examples of systems created to discover unknown relationships within a field, refer to Breault et al. (2002) and Brossette et al. (1998).

PAGE 118

118 LIST OF REFERENCES Abidi, S. & Manickam S. (2002). Leveraging XML-based electronic medical records to extract experiential clinical knowledge: An automate d approach to generate cases for medical case-based reasoning systems. International Journal of Medical Informatics 68, 187-203. Althoff, K., Bergmann, R., Wess, S., Manago, M., Auriol, E., Larichev, O., Bolotov, A., Zhuravlev, Y. & Gurov, S. (1998). Case-b ased reasoning for medi cal decision support tasks: The Inreca approach. Artificial Intelligence in Medicine 12, 25-41. Atzmueller, M., Baumeister, J. & Puppe, F. (2003a). Evaluation of two stra tegies for case-based diagnosis handling multiple faults. In M. Nick & K. Althoff, Eds., Proceedings of the 2nd German Workshop on Experience Management, GWEM 2003 Luzern, Switzerland. http://CEUR-WS.org/Vol67/, Febru ary, 2006. CEUR Workshop Proceedings. Atzmueller, M., Baumeister, J. & Puppe, F. (2003b). Inductive Learning of Simple Diagnostic Scores. In P. Petra, R. Brause & H. Holzhutter, Eds., Medical Data Analysis: 4th International Symposium, ISMDA 2003 Berlin, Germany, pp. 23-30. Berlin: SpringerVerlag. Atzmueller, M., Baumeister, J. & Puppe, F. ( 2004a). Quality measures for semi-automatic learning of simple diagnostic rule bases. In D. Seipel, M. Hanus, U. Geske & O. Bartenstein, Eds., 15th International Conference on Applications of Declarative Programming and Knowledge Management, INAP 2004 and 18th Workshop on Logic Programming, WLP 2004 Potsdam, Germany, pp. 65-78. Berlin: Springer-Verlag. Atzmueller, M., Baumeister, J., Puppe, F., Shi, W. & Barnden, J. (2004b). Case-based approaches for diagnosing multiple disorders. In V. Barr & Z. Markov, Eds., Proceedings of the 17th International Florida Ar tificial Intelligence Resear ch Society Conference, FLAIRS 2004 Miami Beach, Florida, pp. 154-159. Menlo Park, California: AAAI Press. Au, W. & Chan, K. (2003). Mining fuzzy asso ciation rules in a ba nk-account database. IEEE Transactions on Fuzzy Systems 11, 238-248. Baumeister, J., Seipel, D. & Puppe, F. (2001) Incremental development of diagnostic setcovering models with therapy effects. In G. Kern-Isberner, T. Luka siewicz & E. Weydert, Eds., Proceedings of the KI-2001: Workshop on Uncertainty in Artificial Intelligence Vienna, Austria. Baumeister, J., Atzmueller, M., & Puppe, F. (2002). Inductive Learning for Case-Based Diagnosis with Multiple Faults. In S. Craw & A. Preece, Eds., Advances in Case-Based Reasoning: Proceedings of the 6th European Conference on Advances in Case-Based Reasoning, Aberdeen, Scotland, pp. 2842. Berlin: Springer-Verlag. Ben-Bassat, M., Carlson, R., Puri V., Davenport, M. Schriver, J., Latif, M., Smith, R., Portigal, L., Lipnick, E. & Weil, M. (1980). Pattern-b ased interactive diagnosis of multiple disorders: The MEDAS system. IEEE Transactions on Pattern Analysis and Machine Intelligence 2, 148-160.

PAGE 119

119 Ben-Bassat, M. Campell, D., MacNeil, A. & Weil, M. (1983). Evaluating multimembership classifiers: A methodology and application to the MEDAS diagnostic system. IEEE Transactions on Pattern Analys is and Machine Intelligence, 5, 225-229. Bradley, P., Fayyad, U. & Reina, C. (1998). Scali ng clustering algorithms to large databases. In R. Agrawal, P. Stolorz & G. Piatetsky, Eds., Proceedings of the 4th International Conference on Knowledge Disco very and Data Mining, KDD-98 New York, pp. 9-15. Menlo Park, California: AAAI Press. Breault, J., Goodall, C. & Fos, P. (2002) Data mining a diabetic warehouse. Artificial Intelligence in Medicine 26, 37-54. Brossette, S., Sprague, A., Hardin, J., Waites, K., Jones, W. & Moser, S. (1998). Association rules and data mining in hospital infecti on control and public health surveillance. Journal of the American Medical Informatics Association 5, 373-381. Buchanan, B. & Shortliffe, E. (1984a). Rule-Based Expert Systems Reading, Massachusetts: Addison-Wesley Publishing Company. Buchanan, B. & Shortliffe, E. (1984b). Uncertainty and evidential support. In B. Buchanan & E. Shortliffe, Eds., Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project pp. 209-232. Reading, Massachusetts: Addison-Wesley Publishing Company. Cios, K. & Moore, G. (2002). Uni queness of medical data mining. Artificial Intelligence in Medicine 26, 1-24. Darmoni, S., Massari, P., Droy, J., Mahe, N., Blanc, T., Moirot, E. & Leroy, J. (1994). SETH: An expert system for the management on acute drug poisoning in adults. Computer Methods and Programs in Biomedicine 43, 171-176. Darmoni, S., Massari, P., Droy, J., Blanc, T. & Le roy, J. (1995). Functional evaluation of Seth: An expert system in clinical toxicology. In P. Barahona, M. Stefanelli & J. Wyatt, Eds., Artificial Intelligence in Medicine: 5th Conference on Artificial Intelligence in Medicine Europe, AIME Proceedings, Pavia, Italy, pp. 231-238. Berlin: Springer-Verlag. Delgado, M., Sanchez, D., Martin-Bautista, M. & Vila, M. (2000). Mining association rules with improved semantics in medical databases. Artificial Intelligence in Medicine 21, 241-245. Dempster, A. (1967). Upper and lower probabi lities induced by a multi-valued mapping. Annals of Mathematical Statistics 38, 325-399. Duda, R., Hart, P. & Stork, D. (2001). Pattern Classification 2nd Ed. New York: John Wiley & Sons, Inc.

PAGE 120

120 Florida Poison Information Center Network. ( 2005). FPIN statewide annu al reports, calendar year (Jan-Dec) 2004: General call summary report. http://fpicn.jax.ufl.edu/Data /Reports/Calls_state_2004.pdf Decem ber 2005. Florida Poison Information Center Jacksonville. Gonzalez, A. & Dankel, D. (1993). The Engineering of Knowledge-based Systems Theory and Practice Englewood Cliffs, New Jersey: Prentice Hall. Graefe, G., Fayyad, U. & Chaudhuri, S. (1998). On th e efficient gathering of sufficient statistics for classification from large SQL databases. In R. Agrawal, P. Stol orz & G. Piatetsky, Eds., Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, KDD-98 New York, pp. 204-208. Menlo Pa rk, California: AAAI Press. Han, J. & Kamber, M. (2001). Data Mining Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers. Holland, J. (1975). Adaptation in Natural and Artificial Systems Ann Arbor, Michigan: University of Michigan Press. Holsheimer, M., Kersten, M., Mannila, H. & To ivonen, H. (1995). A perspective on databases and data mining. In U. Fayyad & R. Uthurusamy, Eds., Proceedings of the 1st International Conference on Knowledge Disco very and Data Mining, KDD-95 Montreal, Canada, pp. 150-155. Menlo Park, California: AAAI Press. Kluza, A. (2004). Veterinary toxicology information system. TASK Quarterly 2, 297-301. Kolodner, J. (1993). Case-Based Reasoning San Mateo, California: Morgan Kaufmann Publishers. Kononenko, I., Bratko, I. & Kukar, M. (1998). Application of machin e learning to medical diagnosis. In R. Michalski, I. Bratko & M. Kubat, Eds., Machine Learning and Data Mining: Methods and Applications pp. 389-428. New York: John Wiley & Sons, Inc. Koton, P. (1988). Reasoning about ev idence in causal explanations. In Proceedings of the 7th National Conference on Artificial Intelligence AAAI-88 St. Paul, Minnesota, pp. 256-261. Los Altos, California: Mo rgan Kaufmann Publishers. Kusiak, A., Kern, J., Kernstine, K. & Tseng, B. (2000). Autonomous-decision making: A data mining approach. IEEE Transactions on Information Technology in Biomedicine 4, 274284. Kusiak, A., Law, I. & Dick, M. (2001). The Galgorithm for extraction of robust decision rules Childrens postoperative intra-at rial arrhythmia case study. IEEE Transactions on Information Technology in Biomedicine 5, 225-235. Lavrac, N. (1999). Selected techniqu es for data mining in medicine. Artificial Intelligence in Medicine 16, 3-23.

PAGE 121

121 Liu, Z. & Yan, F. (1997). Fuzzy neural network in case-based diagnostic system. IEEE Transactions on Fuzzy Systems 5, 209-222. Medical University of South Caroli na. (2000). Sensitivity and Specificity. http://www.musc.edu/dc/ic rebm /sensitivity.html February 2006. Medical University of South Carolina. Miller, R., Pople, H & Myers, J. (1982). IN TERNIST-I, an experimental computer-based diagnostic consultant for general internal medicine. The New England Journal of Medicine, 307, 468-476. Monov, A., Iordanova, I., Zagorchev P., Vassilev, V., Nissimov, M., Kojuharov, R., Tconev, R. & Damianov, V. (1992). MEDICOTOX CONSILIU M An expert system in clinical toxicology. In K. Lun, P. Degoulet T. Piemme & O. Rienhoff, Eds., Proceedings of the 7th World Congress on Medical Informatics, MEDINFO 92 Geneva Palexpo, Switzerland, pp. 610-614. Amsterdam: Elsevier Science Publishers. Nechyba, M. (2003). Introduction to feedforward neural networks. http://www.mil.ufl.edu/courses/eel58 40/classes/intro_ne ural_networks.pdf February 2006. Machine Intelligence Laboratory, U niversity of Florida. Nilsson, M. & Sollenborn, M. (2004). Advan cements and trends in medical case-based reasoning: An overview of systems and system development. In V. Barr & Z. Markov, Eds., Proceedings of the 17th International Florid a Artificial Intelligen ce Research Society Conference, FLAIRS 2004 Miami Beach, Florida, pp. 178-183. Menlo Park, California: AAAI Press. Nilsson, N. (1998). Artificial Intelligence: A New Synthesis San Francisco, California: Morgan Kaufmann Publishers. Ohmann, C., Franke, C. & Yang, Q. (1999). C linical benefit of a diagnostic score for appendicitis: Results of a pros pective interventional study. Archives of Surgery 134, 993996. Onisko, A., Druzdzel, M. & Wasyluk, H. (2000). Extension of the HEPAR II model to multipledisorder diagnosis. In M. Klopotek, M. Michalewicz & S. Wierzchon, Eds., Intelligent Information Systems pp. 303-313. Heidelberg: Physica-Verlag. Onisko, A., Druzdzel, M. & Wasyluk, H. (2001). Learning Bayesian network parameters from small sets: Applications of Noisy-OR gates. International Journal of Approximate Reasoning, 27, 165-182. Owens, D. & Sox, H. (2001). Medi cal decision-making: Probabilist ic medical reasoning. In E. Shortliffe, L. Perreault, G. Wi ederhold & L. Fagan, Eds., Medical Informatics: Computer Applications in Health Care and Biomedicine pp. 76-131. New York: Springer-Verlag. Pawlak, Z. (1982). Rough sets. International Journal of Com puter & Information Sciences 11, 341-356.

PAGE 122

122 Peng, Y. & Reggia, J. (1986). Plausibility of di agnostic hypotheses: The na ture of simplicity. In Proceedings of the 5th National Conference on Artif icial Intelligence, AAAI-86 Philadelphia, Pennsylvania, pp. 140-145. Los Altos, California: Morgan Kaufmann Publishers. Peng, Y. & Reggia J. (1987). A probabilistic caus al model for diagnostic problem solving Part I: Integrating symbolic causal inferen ce with numeric probabilistic inference. IEEE Transactions on Systems, Man, and Cybernetics, 2, 146-162. Peng, Y. & Reggia, J. (1989). A comfort measure for diagnostic problem solving. Information Sciences 47, 149-184. Pople, H. (1977). The formation of composite hypotheses in diagnostic problem solving: An exercise in synthetic reasoning. In Proceedings of the 5th International Jo int Conference on Artificial Intelligence, IJCAI-77 Cambridge, Massachusetts, pp. 1030-1037. Pittsburgh, Pennsylvania: Carnegie-Mellon University. Pople, H. (1985a). CADUCEUS: An experimental expert system for medical diagnosis. In P. Winston & K. Prendergast, Eds., The AI Business: The Commercial Uses of Artificial Intelligence pp. 67-80. Cambridge, Massac husetts: The MIT Press. Pople, H. (1985b). Evolution of an Expert System: From Internist to Caduceus. In I. de Lotto & M. Stefanelli, Eds., Proceedings of the International Conference on Artificial Intelligence in Medicine Pavia, Italy, pp. 179-208. Amsterda m: Elsevier Science Publishers. Portinale, L. & Torasso, P. (1995). ADAPtER: An integrated diagnostic system combining casebased and abductive reasoning. In M. Veloso & A. Aamodt, Eds., Proceedings of the 1st International Conference on Case-Based Reasoning Research and Development, ICCBR95, Sesimbra, Portugal, pp. 277288. Berlin: Springer-Verlag. Quinlan, J. R. (1996). Bagging, boosting, and C4.5. In Proceedings of the 13th National Conference on Artificia l Intelligence, AAAI-96 Portland, Oregon, pp. 725-730. Menlo Park, California: AAAI Press. Reggia, J., Nau, D. & Wang, P. (1983). Diagnostic expert systems based on a set covering model. International Journal of Man-Machine Studies 19, 437-460. Rumelhart, D. & McClelland, J. (1986). Parallel Distributed Proce ssing: Exploration in the Microstructure of Cognition Cambridge, MA: MIT Press. Tsumoto, S. (2000). Automated discovery of positive and negative knowledge in clinical databases. IEEE Engineering in Medicine and Biology 19, 56-62. Valdes-Perez, R. (1999). Principl es of human-computer colla boration for knowledge discovery in science. Artificial Intelligence 107, 335-346. van der Gaag, L. & Wessels, M. (1994). Efficient multiple-disorder diagnosis by strategic focusing. Technical report. UU-CS-1994-23.

PAGE 123

123 Vinterbo, S. & Ohno-Machado, L. (2000). A genetic algorithm approach to multi-disorder diagnosis. Artificial Intelligence in Medicine 18, 117-132. Wang, L. (2003). The WM Method completed: A flex ible fuzzy system approach to data mining. IEEE Transactions in Fuzzy Systems 11, 768-782. Watson, W., Litovitz, T., Rodgers G., Klein-Schwartz, W., Reid, N., Youniss, J., Flanagan, A. & Wruk, K. (2004). 2004 annual report of the American Association of Poison Control Centers Toxic Exposure Surveillance System. The American Journal of Emergency Medicine 23, 589-666. Wu, T. (1990). Efficient dia gnosis of multiple disorders based on a symptom clustering approach. In Proceedings of the 8th National Conference on Artificial Intelligence, AAAI90, Boston, Massachusetts, pp. 357-364. Menlo Park, Califor nia: AAAI Press. Wu, T. (1991). A problem deco mposition method for efficient di agnosis and inte rpretation of multiple disorders. Computer Methods and Programs in Biomedicine 35, 239-250. Yu, V., Fagan, L., Bennett, S. Clancey, W., Sco tt, A., Hannigan, J., Buchanan, B., & Cohn, S. (1984). An Evaluation of MYCI Ns advice. In B. Buchanan & E. Shortliffe, Eds., RuleBased Expert Systems: The MYCIN Experime nts of the Stanford Heuristic Programming Project pp. 589-596. Reading, Massachusetts: A ddison-Wesley Publishing Company. Zadeh, L. (1965). Fuzzy sets. Information and Control 8, 338-353. Zhou, Z. (2003). Three perspectives of data mining. Artificial Intelligence 143, 139-146.

PAGE 124

124 BIOGRAPHICAL SKETCH Joel Daniel Schipper was born in 1979 to W Thomas and Harriet Anne Schipper. He grew up in the suburbs of Los Angeles with his two older brothers, Tom and James. Although an excellent student, he much pref erred spending his time on the athle tic field than studying. Joel attended Loyola Marymount University as a Pr esidential Scholar and graduated summa cum laude with a Bachelor of Scien ce in Electrical Engineering. He continued his studies as an Alumni Fellow at the University of Florida where he received a Master of Science in Electrical Engineering. During his time at the University of Florida, he met and married Alice Eileen Brown. He is currently pursu ing his doctorate by writing this dissertation, though he would much rather be outside playing. Upon completion of his doctoral de gree, Joel will join the facu lty of Bradley University as an Assistant Professor of Electri cal and Computer Engineering. + AMDG +